How to Make the Robot Recognize You? The Family Recognition Function of OriginBot
When creating the home assistant robot OriginBot, I hoped it could recognize and welcome family members. To achieve this, I introduced the “family member recognition” function, which consists of two core parts: face detection and face recognition.
Face detection is the process of identifying whether there are faces in the camera image. I adopted the classic Haar cascades algorithm and optimized it to ensure it can run efficiently in the ROS environment. By converting ROS images into OpenCV format, we can accurately mark the face positions on the image and label them when faces are detected.
Face recognition is an advanced technology to determine the identity of the face in the image. I chose the Alibaba Cloud Vision Intelligence Open Platform because it is the most convenient option for non - algorithm professionals.
Face Detection
The face detection part draws on the content from https://www.guyuehome.com/45655. I optimized the code therein and added detailed comments. The optimized code is as follows:
1 | # Import required libraries |
The face detection algorithm here is Haar cascades, which is a relatively old algorithm. It may fail to detect faces or produce false detections in some cases. Consider using more advanced algorithms such as MTCNN, Dlib HOG, or Dlib CNN for future optimization.
Package the above code into a ros2 package. After compilation, it can be used.
Face Recognition
Currently, a commonly used face recognition algorithm is FaceNet.
The following quoted content is generated by chatGPT4
FaceNet is a deep - learning - based face recognition system released by Google in 2015. The goal of FaceNet is to map face images into an Euclidean space so that the distance between different images of the same person is as small as possible, while the distance between images of different people is as large as possible. This mapping is achieved through a deep convolutional neural network, and the network structure can be an Inception model or other models.
Advantages:
- High accuracy: FaceNet has achieved the best performance on public datasets such as LFW (Labeled Faces in the Wild) and YouTube Faces DB.
- End - to - end learning: FaceNet is an end - to - end system, and the entire system (including feature extraction and metric learning) can be optimized together.
- Real - time performance: Since the network can directly output embedding vectors, it can be used for real - time face recognition applications.
Disadvantages:
- Difficult to train: The triplet loss used by FaceNet requires careful selection of positive and negative examples, and the training process is relatively complex.
- Requires a large amount of labeled data: Although FaceNet only needs identity labels, a large amount of training data is still required to obtain good performance.
- Sensitive to data quality: If there are incorrect labels in the training data, it may affect the training results.
Deploying such a relatively large algorithm directly on the OriginBot may not yield good results, as it requires a lot of computing power. Additionally, since I’m not an algorithm professional, implementing FaceNet from scratch is a bit difficult for me. So, I finally chose to use the Alibaba Cloud Vision Intelligence Open Platform.
The Alibaba Cloud Vision Intelligence Open Platform provides a series of efficient and easy - to - use visual intelligence API interfaces, aiming to help users easily implement functions such as image recognition, video analysis, and image search, thereby improving business efficiency and user experience, which is just right for me.
The following are some of the main features and functions of the Alibaba Cloud Vision Intelligence Open Platform:
Rich API interfaces: The platform offers a wealth of API interfaces covering multiple fields such as image recognition, video analysis, and image search. Users can select appropriate interfaces for invocation according to their needs. The function I need is included.
Highly customizable: Users can customize models according to their business scenarios, for example, by training their own image recognition models to identify specific objects or scenes.
Powerful image recognition capabilities: The platform supports the recognition of various types of image content, including objects, scenes, faces, text, etc. In addition, it can perform advanced functions such as image style transfer and emotion analysis.
Real - time video analysis: The platform provides real - time video analysis functions, which can process video streams in real - time to identify specific objects, scenes, or behaviors in the video.
Image search service: Users can quickly find similar images in a large - scale image library by uploading pictures or providing picture URLs, supporting the function of searching for pictures by picture.
Ease of use and flexibility: The platform provides complete developer documentation and SDKs, supports multiple programming languages, and facilitates users to quickly integrate and use. At the same time, the platform also provides online testing and debugging tools to help users quickly verify and optimize the effect of interface calls.
Safe and reliable: The Alibaba Cloud Vision Intelligence Open Platform is based on Alibaba Cloud’s security system to ensure the security and privacy of user data.
Elastic scaling: The platform supports elastic scaling and can automatically adjust resources according to users’ business needs to ensure stability and performance in high - concurrency scenarios.
Let me do a little advertisement for Alibaba Cloud. These services of Alibaba Cloud are really practical for non - professional algorithm personnel, and the price is not expensive after the price reduction. It is completely affordable for personal use.
The function I want to use is searchFace. For detailed instructions, please refer to the official documentation.
In simple terms, first, a face database needs to be created, and then the face photos of family members are uploaded. When uploading, the photos should be named with the pinyin of the names so that we can know who it is during the recognition process.
Alibaba Cloud officially has an API debugging console, where you can directly debug on the page and automatically generate code. The final code is as follows:
…
The full text is published on Guyuehome. Please go there to read it.