A person’s face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, they can provide additional information for recognition of some affective states such as curiosity, frustration and boredom. One of the reasons that this problem has not gained attention is the lack of naturalistic occluded faces that contain hand over face occlusions as well as other types of occlusions. Traditional approaches for obtaining affective data are time demanding and expensive, which limits researchers in affective computing to work on small datasets. This limitation affects the generalizability of models and deprives researchers from taking advantage of recent advances in deep learning that have shown great success in many fields but require large volumes of data. We first introduce a novel framework for synthesizing naturalistic facial occlusions from an initial dataset of non-occluded faces and separate images of hands, reducing the costly process of data collection and annotation. We then propose a model for facial occlusion type recognition to differentiate between hand over face occlusions and other types of occlusions such as scarves, hair, glasses and objects. Finally, We present a model to localize hand over face occlusions and identify the occluded regions of the face. 

Fig. 1. Example synthetic images generated by our framework

Our synthesis pipeline contains several steps towards generating the synthetic image. Figure 2 shows the overview of our pipeline. We first start with segmenting the hand (or other) occlusions from a real occluded face to get the initial occlusion occluder. After the segmented occluders are obtained, we retrieve a face image from the database on which we want to generate the occlusion. Then we proceed to do color correction, quality matching and re-scaling of the occluder according to a the retrieved face. Finally, we generate the occlusion at the desired region of the face and get the corresponding occluder.

Fig. 2. Our synthesis pipeline.

Differentiating types of occlusions can also be informative for affect recognition; however, there is no known technique to differentiate between facial occlusions. Given our large corpus of data with various combinations of hand over face and other types of occlusions across different individuals and conditions, our goal is to learn an effective representation and build computational models to classify the type of facial occlusions. In our initial experiments for occlusion type classification, hand over face occlusion vs. other types of occlusion, we used several baseline models that are described in this section. Table 1 shows the results of our baseline models for facial occlusion type recognition. Our input features to the models are CNN features, which, given an input image, represent information regarding color, text, shape and parts of the faces and their occlusions. Ideally the classifier should be able to draw conclusions about facial occlusion type, using a combination of factors both from the face and its occluder. 

Table 1. Facial occlusion type recognition

In our experiment for hand over face occlusion localization, we first divided the face into 8 different regions (See Figure 3). We extracted CNN features from each region and used the deep neural network classifier to classify each region as an occluded/non-occluded area. Our results suggest that recognizing hand over face occlusions when they occur on eye and mouth regions are more successful compared to other areas. This is consistent with the statement that previous works have made about the more discriminatory power of these regions. When these areas are occluded, it is easier for the classifier to notice their absence. On the other hand, for areas on the borders of the face and nose, it is more challenging to recognize that they are occluded, as hand and face have similar color and texture and hand occlusions can be considered as a part of these areas.

Fig. 3. Hand over face occlusion localization results. On the left we show how we divide the face into 8 regions and on the right you can see the accuracy of occlusion detection in each region.


Dataset is available here

Related Publications
- Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions

B. Nojavanasghari, T. Baltrusaitis, C. E. Hughes, and L.-P. Morency, accepted to Affective Computing and Intelligent Interaction (ACII). [PDF]