Hunting Faces

Detecting Masked Faces in the Wild with LLE-CNNs

Shiming Ge,Jia Li, Qiting Ye, Zhao Luo

In IEEE CVPR, 2017

Left: Examples of annotated faces in MAFA dataset. Right: Some image exemples in MAFA.


Detecting faces with occlusions is a challenging task due to two main reasons:1) the absence of large datasets of masked faces, and 2) the absence of facial cues from the masked regions. To address these two issues, this paper first introduces a dataset, denoted as MAFA, with 30,811 Internet images and 35,806 masked faces. Faces in the dataset have various orientations and occlusion degrees, while at least one part of each face is occluded by mask. Based on this dataset, we further propose LLE-CNNs for masked face detection, which consist of three major modules. The Proposal module first combines twopretrained CNNs to extract candidate facial regions from the input image and represent them with high dimensional descriptors. After that, the Embedding module is incorporated to turn such descriptors into a similarity based descriptor by using locally linear embedding (LLE) algorithm and the dictionaries trained on a large pool of synthesized normal faces, masked faces and nonfaces. In this manner, many missing facial cues can be largely recovered and the influences of noisy cues introduced by diversified masks can be greatly alleviated. Finally, the Verification module is incorporated to identify candidate facial regions and refine their positions by jointly performing the classification and regression tasks within a unified CNN. Experimental results on the MAFA dataset show that the proposed approach remarkably outperforms 6 state of the arts by at least 15.6%.


Shiming Ge, Jia Li, Qiting Ye, and Zhao Luo. Detecting masked faces in the wild with LLE-CNNs. In: IEEE CVPR 2017. [pdf]



MAFA (MAsked FAces) is a masked face detection benchmark dataset, of which images are collected from Internet images. MAFA contains 30,811 images and 35,806 masked faces. Faces in the dataset have various various orientations and occlusion degrees, while at least one part of each face is occluded by mask. In the annotation process, each image contains at least one face occluded by various types of masks, while the six main attributes of each masked face, including locations of faces, eyes and masks, face orientation, occlusiondegree andmask type.


Related Datasets

Below we list other face detection datasets.

  • WIDER FACE: WIDER FACE is proposed for face detection in the "more" wild enviroments. It contains 32,203 images and 393,703 faces.

  • IJB-A dataset: IJB-A is proposed for face detection and face recognition. It contains 24,327 images and 49,759 faces.

  • MALF dataset: MALF is the first face detection dataset that supports fine-gained evaluation. It consists of 5,250 images and 11,931 faces.

  • FDDB dataset: FDDB contains the annotations for 5,171 faces in a set of 2,845 images.

  • AFW dataset: AFW dataset is built using Flickr images. It has 205 images with 473 labeled faces. For each face, annotations include a rectangular bounding box, 6 landmarks and the pose angles.


       author = {Ge, Shiming and Li, Jia and Ye, Qiting and Luo, Zhao},
       title = {Detecting Masked Faces in the Wild With LLE-CNNs},
       booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
       m  {July},
       year = {2017},

       pages = {2682--2690}



This work was partially supported by grants from National Key Research and Development Plan (2016YFC0801005), and National Natural Science Foundation of China (61672072 & 61402463).

Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation

Shiming Ge, Shengwei Zhao, Chenyu Li,Jia Li

IEEE TIP, 2019

Figure 1
Figure 2


Typically, the deployment of face recognition models in the wild needs to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve higher speed and lower memory at the cost of minimal performance drop. Inspired by that, this paper proposes a learning approach to recognize low-resolution faces via selective knowledge distillation. In this approach, a two-stream convolutional neural network (CNN) is first initialized to recognize high-resolution faces and resolution-degraded faces with a teacher stream and a student stream, respectively. The teacher stream is represented by a complex CNN for high-accuracy recognition, and the student stream is represented by a much simpler CNN for low-complexity recognition. To avoid significant performance drop at the student stream, we then selectively distil the most informative facial features from the teacher stream by solving a sparse graph optimization problem, which are then used to regularize the finetuning process of the student stream. In this way, the student stream is actually trained by simultaneously handling two tasks with limited computational resources: approximating the most informative facial cues via feature regression, and recovering the missing facial cues via low-resolution face classification. Experimental results show that the student stream performs impressively in recognizing low-resolution faces and costs only 0.15MB memory and runs at 418 faces per second on CPU and 9,433 faces per second on GPU.



author={S. Ge and S. Zhao and C. Li and J. Li},
 journal={IEEE Transactions on Image Processing},
title={Low-Resolution Face Recognition in the Wild via Selective Knowledge Distillation},
 keywords={Face;Face recognition;Feature extraction;Image resolution;Computational modeling;Image coding;Facial features;Face recognition in the wild;two-stream architecture;knowledge distillation;CNNs},



For questions, please contact Shiming Ge at

Related Resources