Figure 1. The problem we are solving in this paper is, given affective state of one person in a dyadic interaction, we generate the facial behaviors of the other person. By facial behaviors, we refer to facial expressions, movements of facial landmarks and head pose.
A social interaction is a social exchange between two or more individuals, where individuals modify and adjust their behaviors in response to their interaction partners. Our social interactions are one of most fundamental aspects of our lives and can profoundly affect our mood, both positively and negatively. With growing interest in virtual reality and avatar-mediated interactions, it is desirable to make these interactions natural and human like to promote positive effect in the interactions and applications such as intelligent tutoring systems, automated interview systems and e-learning. In this paper, we propose a method to generate facial behaviors for an agent. These behaviors include facial expressions and head pose and they are generated considering the users affective state. Our models learn semantically meaningful representations of the face and generate appropriate and temporally smooth facial behaviors in dyadic interactions.
To generate facial behaviors for agent,we use a two stage architecture, both of which are based on conditional generative adversarial networks. Figure 2 shows an overview of our ﬁrst network. In the ﬁrst stage we use a conditional generative adversarial network which takes face shape parameter as input for vector z as well as the conditioning vector which is the 8 affective states of one partner in the interaction, and it generates sketches of faces for the other partner in the interaction. Note that the z vector will be sampled using one of our proposed strategies, which are explained in the following sections. In the second stage, we input the generated sketches in the ﬁrst stage as input to GAN and generate the ﬁnal facial expressions. Figure 3 shows an overview of our second network.
Figure 2. Overview of our affect-sketch network. In ﬁrst stage of our two-stage facial expression generation, we generate face sketches conditioned on the affective state of the interviewee and using a z vector that carries contains semantically meaningful information about facial behaviors. This vector is sampled from generated distributions of rigid and non-rigid face shape parameters using our proposed methods.
Figure 3. Overview of our sketch-image network. In the second stage of our two-stage facial expression generation, we generate face images conditioned on the generated sketch of the interviewer which is generated in the ﬁrst stage of our network.