Outline

  • 1. Introduction
  • 2. Preliminaries
  • 3. Feature Extraction
  • 4. Regression Based Intensity Estimation
  • 5. Experimental Results and Discussions
  • 6. Conclusion and Future Work
  • 7. Acknowledgment

رئوس مطالب

  • چکیده
  • 1. مقدمه
  • 2 مقدمات
  • 2.1 امتیاز دهی شدت FACS
  • 2.2 پایگاه داده حالت ( قیافه )
  • 3. استخراج چهره
  • 4. رگرسیون مبتنی بر تخمین شدت
  • 4.1 رگرسیون در حاشیه های SVM
  • 4.2 رگرسیون در ویژگی های تصویر
  • 5. نتایج و بحث های آزمایشی
  • 6. نتیجه گیری و کار آینده

Abstract

The paradigm of Facial Action Coding System (FACS) offers a comprehensive solution for facial expression measurements. FACS defines atomic expression components called Action Units (AUs) and describes their strength on a five-point scale. Despite considerable progress in AU detection, the AU intensity estimation has not been much investigated. We propose SVM-based regression on AU feature space, and investigate person-independent estimation of 25 AUs that appear singly or in various combinations. Our method is novel in that we use regression for estimating intensities and comparatively evaluate the performances of 2D and 3D modalities. The proposed technique shows improvements over the state-of-the-art person-independent estimation, and that especially the 3D modality offers significant advantages for intensity coding. We have also found that fusion of 2D and 3D can boost the estimation performance, especially when modalities compensate for each other’s shortcomings.


6. CONCLUSION AND FUTURE WORK

In this paper we investigated person-independent intensity estimation of 25 AUs from still images comparatively on 2D and 3D modalities. Our intensity estimator operate in a data driven manner, thus do not require the aid of landmarks. The only other person-independent study in the literature on estimation of AU intensities apply SVM margins and Gabor features and address eight AUs [1]. Our proposed intensity estimator based on regression of appearance features proves to be superior to that based on SVM margins, both for 2D and 3D data modalities. To the best of our knowledge we are the first one to employ regression for intensity estimation, whether for subject-independent or for subject-dependent estimation.

Our 3D experiments show improvements on some AUs but also performance drops on some other AUs, both in the detection and intensity estimation problems. However, when 3D is fused with 2D luminance images, the overall performance increases significantly. We have observed that whenever a modality is better for detection of an AU, its intensity estimation is also superior in the same modality. However, the performance drop in intensity estimation for certain AUs with 3D data is more pronounced as compared to the performance differential for detection. As discussed in Section 5, we have conjectured that this may be because of 3D acquisition noise in eye regions, since texture is missing, and also because FACS ground-truths were scored on 2D appearance data, which could have created a bias toward 2D modality.

There are several directions of future work of this problem. First of all, while we have used features optimally selected for AU detection, it is possible to redesign features specifically for intensity estimation.

Person-independent AU intensity estimation must deal with the confounding factor of subject variability. To reduce the portion of variability due to attributes of individuals, one can subtract the neutral face of the subject, whenever available. A future study can reveal if intensity estimation benefits from the subtraction of the neutral.

Assessment of AU detection and intensity estimation in spontaneous expressions is important for development of real-life systems. This is a more challenging problem for several reasons. Spontaneous expressions are accompanied by uncontrolled head movements. They typically happen in relatively lower intensities, i.e., are more subtle than the posed ones. However, though 3D spontaneous databases are currently not available and 3D acquisition devices have some drawbacks, such as light projection onto subject’s face and higher cost of real-time 3D video, recent progress [6, 4] points out the possibility of such databases. Therefore, our work will progress toward 3D spontaneous expressions.

دانلود ترجمه تخصصی این مقاله دانلود رایگان فایل pdf انگلیسی