How to Make the Robot Recognize You? The Family Recognition Function of OriginBot

When creating the home assistant robot OriginBot, I hoped it could recognize and welcome family members. To achieve this, I introduced the “family member recognition” function, which consists of two core parts: face detection and face recognition.

Face detection is the process of identifying whether there are faces in the camera image. I adopted the classic Haar cascades algorithm and optimized it to ensure it can run efficiently in the ROS environment. By converting ROS images into OpenCV format, we can accurately mark the face positions on the image and label them when faces are detected.

Face recognition is an advanced technology to determine the identity of the face in the image. I chose the Alibaba Cloud Vision Intelligence Open Platform because it is the most convenient option for non - algorithm professionals.

Face Detection

The face detection part draws on the content from https://www.guyuehome.com/45655. I optimized the code therein and added detailed comments. The optimized code is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Import required libraries
import cv2
import cv_bridge
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image

# Define the face detection node
class FaceDetection(Node):
def __init__(self, cascade_path, image_topic, output_topic):
super().__init__('face_detection') # Initialize the node with the name 'face_detection'
self.classifier_path = cascade_path # Path to the haarcascade model

# Instantiate the cv_bridge object to convert between ROS and OpenCV images
self.bridge = cv_bridge.CvBridge()
# Load the pre - trained face detection model
self.face_cascade = cv2.CascadeClassifier(self.classifier_path)
# Subscribe to the image topic and register the callback function image_callback
self.image_sub = self.create_subscription(Image, image_topic, self.image_callback, 10)
# Create a Publisher with the topic name output_topic and a queue length of 10
self.pub = self.create_publisher(Image, output_topic, 10)

# Define the image callback function
def image_callback(self, msg):
# Convert the received ROS image message to an OpenCV image
image = self.bridge.imgmsg_to_cv2(msg, 'bgr8')
# Convert the image to grayscale as face detection requires a grayscale image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Perform face detection
faces = self.face_cascade.detectMultiScale(
gray,
scaleFactor = 1.2, # Represents the proportion by which the image size is reduced each time
minNeighbors = 3, # Means that each target must be detected at least 3 times to be considered a real target
minSize=(20, 20) # Set the minimum size of the face
)

# If faces are detected, draw a rectangle on the image to represent the face
if len(faces) > 0:
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x + w, y + h), (255, 0, 0), 2)

# Convert the OpenCV image back to a ROS image message and publish it
self.pub.publish(self.bridge.cv2_to_imgmsg(image, 'bgr8'))

# Define the main function
def main(args = None):
rclpy.init(args = args) # Initialize ROS
face_detection = FaceDetection("haarcascade_frontalface_default.xml", "/image_raw", "/camera/process_image") # Instantiate the FaceDetection node
rclpy.spin(face_detection) # Start the loop and continuously call the callback function
face_detection.destroy_node() # Destroy the node
rclpy.shutdown() # Shut down ROS

# If this file is run directly, execute the main function
if __name__ == '__main__':
main()

The face detection algorithm here is Haar cascades, which is a relatively old algorithm. It may fail to detect faces or produce false detections in some cases. Consider using more advanced algorithms such as MTCNN, Dlib HOG, or Dlib CNN for future optimization.

Package the above code into a ros2 package. After compilation, it can be used.

Face Recognition

Currently, a commonly used face recognition algorithm is FaceNet.

The following quoted content is generated by chatGPT4

FaceNet is a deep - learning - based face recognition system released by Google in 2015. The goal of FaceNet is to map face images into an Euclidean space so that the distance between different images of the same person is as small as possible, while the distance between images of different people is as large as possible. This mapping is achieved through a deep convolutional neural network, and the network structure can be an Inception model or other models.

Advantages:

  1. High accuracy: FaceNet has achieved the best performance on public datasets such as LFW (Labeled Faces in the Wild) and YouTube Faces DB.
  2. End - to - end learning: FaceNet is an end - to - end system, and the entire system (including feature extraction and metric learning) can be optimized together.
  3. Real - time performance: Since the network can directly output embedding vectors, it can be used for real - time face recognition applications.

Disadvantages:

  1. Difficult to train: The triplet loss used by FaceNet requires careful selection of positive and negative examples, and the training process is relatively complex.
  2. Requires a large amount of labeled data: Although FaceNet only needs identity labels, a large amount of training data is still required to obtain good performance.
  3. Sensitive to data quality: If there are incorrect labels in the training data, it may affect the training results.

Deploying such a relatively large algorithm directly on the OriginBot may not yield good results, as it requires a lot of computing power. Additionally, since I’m not an algorithm professional, implementing FaceNet from scratch is a bit difficult for me. So, I finally chose to use the Alibaba Cloud Vision Intelligence Open Platform.

The Alibaba Cloud Vision Intelligence Open Platform provides a series of efficient and easy - to - use visual intelligence API interfaces, aiming to help users easily implement functions such as image recognition, video analysis, and image search, thereby improving business efficiency and user experience, which is just right for me.

The following are some of the main features and functions of the Alibaba Cloud Vision Intelligence Open Platform:

  1. Rich API interfaces: The platform offers a wealth of API interfaces covering multiple fields such as image recognition, video analysis, and image search. Users can select appropriate interfaces for invocation according to their needs. The function I need is included.

  2. Highly customizable: Users can customize models according to their business scenarios, for example, by training their own image recognition models to identify specific objects or scenes.

  3. Powerful image recognition capabilities: The platform supports the recognition of various types of image content, including objects, scenes, faces, text, etc. In addition, it can perform advanced functions such as image style transfer and emotion analysis.

  4. Real - time video analysis: The platform provides real - time video analysis functions, which can process video streams in real - time to identify specific objects, scenes, or behaviors in the video.

  5. Image search service: Users can quickly find similar images in a large - scale image library by uploading pictures or providing picture URLs, supporting the function of searching for pictures by picture.

  6. Ease of use and flexibility: The platform provides complete developer documentation and SDKs, supports multiple programming languages, and facilitates users to quickly integrate and use. At the same time, the platform also provides online testing and debugging tools to help users quickly verify and optimize the effect of interface calls.

  7. Safe and reliable: The Alibaba Cloud Vision Intelligence Open Platform is based on Alibaba Cloud’s security system to ensure the security and privacy of user data.

  8. Elastic scaling: The platform supports elastic scaling and can automatically adjust resources according to users’ business needs to ensure stability and performance in high - concurrency scenarios.

Let me do a little advertisement for Alibaba Cloud. These services of Alibaba Cloud are really practical for non - professional algorithm personnel, and the price is not expensive after the price reduction. It is completely affordable for personal use.

The function I want to use is searchFace. For detailed instructions, please refer to the official documentation.

In simple terms, first, a face database needs to be created, and then the face photos of family members are uploaded. When uploading, the photos should be named with the pinyin of the names so that we can know who it is during the recognition process.

Alibaba Cloud officially has an API debugging console, where you can directly debug on the page and automatically generate code. The final code is as follows:

The full text is published on Guyuehome. Please go there to read it.