FaceNet is a face recognition system using deep neural network, introduced in 2015 by researchers at Google. The main idea of this system is to train a CNN to extract an 128-element vector from a face image, called Embedding. The vectors extracted from same person’s images should be very close to each other, while distance between vectors extracted from two different persons’ face image should be sensibly much more. To reach this condition, the CNN is trained with triple images at each steps, named Anchor, Positive and Negative. The Anchor and Positive images belong to same person and the Negative is from a different individual.
The Triplet loss function helps CNN to reduce the distance between Anchor’s Embedding and Positive’s Embedding, and to increase distance between Anchor’s Embedding and Negative’s Embedding in the meanwhile :
Where, a is the anchor, p is the positive, n is the negative, and alpha is margin we specify to guaranty the (alpha+) margin between Embedding of two different persons’ face. In the original paper authors of FaceNet described Triplet Loss goal as :
Here we want to ensure that an image A (anchor) of a specific person is closer to all other images P (positive) of the same person than it is to any image N (negative) of
any other person.
As the CNN trained well, it can produce unified Embedding for each given face, which represents its features :
Now, we are able to have a numerical representation of any person’s face image, and we are sure that not any two different faces have Embedding closer than Alpha . So we can apply any comparison method to find nearest face in the dataset to a given face based on their Embeddings. The process of finding best match (nearest face) to a given face is called Inference :
We can use well known distance measures to calculate distance between two Embeddings such as Minkowski or Euclidean distance measures. Then it would be easy to find minimum among calculated distances. So, what happens if a company hired a new employees and wants to identify them on the entrance gate? They just simply take a photo of the employee’s face, give it to the same FaceNet CNN model to produce its Embedding, and add it to the existing dataset :
How to Work with FaceNet in Python
Here we will write a snippet code that detects and identify face using a FaceNet Model. First We use a pretrained FaceNet model to build our database of Embeddings correspond to existing face images dataset. Next we test the system on single image that contains a face that belongs to one of persons in database, and try to identify it. The important part of this code is that we must detect faces area before we can perform face identification. So, in this article we use MTCNN library to detect face area in images, it is simple tools that detect as many face as present in the given image and returns their bounding-box coordinates. When we have bounding-box of a face in an image, we can easily crop the face part of the image and use it in our system. Lets dive into the code and see what is going on in each lines :
extract_face_and_preprocessing : This function detects the faces in the given image (Line 16), crops the first detected face(Line 22) , after resizing (Line 23) and normalizing the extracted image (Line 25) returns it and its coordinates on original images as output (Line 27).
get_encode : Retrieves all images belong to persons in the face folder (Line 30), read them (Line 31) and extract the face area (Line 32). In the line 34, it produces the Embedding of each image and finally adds the Embedding to a dictionary which we use as a database of person’s embedding (Line 36).
In line 38 , we call get_encode function to produce the persons dictionary. It should be like :
Next, in line 43 we read the test image which we are going to identify the person in it, then extract the face area (Line 45). Before we could identify the new face in image, we must produce its Embedding, so we do it on line 45 using extract_face_and_preprocessing function. Now that we produced the new image’s Embedding we are able to calculate distance between the new image’s Embedding and each person’s face Embedding which are already stored in database, as we do it in line 50. Finally, it would be easy to find minimum distance among calculated distance, by the way, in this code we don’t accept distance higher that 0.5 as a identification result (Line 52).
see WYNA post, which is implementation of Facenet in a tiny bot application.
Hope you find this post helpful !
how to pass 3 images to network in same time?