D. Mcneill, Hand and Mind, 1992.
DOI : 10.1515/9783110874259.351

F. P. Eyereisen and J. D. Lannoy, Gestures and Speech: Psychological Investigations, 1991.

M. P. Michalowski, S. Sabanovic, and H. Kozima, A dancing robot for rhythmic social interaction, Proceeding of the ACM/IEEE international conference on Human-robot interaction , HRI '07, pp.89-96, 2007.
DOI : 10.1145/1228716.1228729

K. Munhall, J. A. Jones, D. E. Callan, T. Kuratate, and E. Vatikiotis-bateson, Visual Prosody and Speech Intelligibility: Head Movement Improves Auditory Speech Perception, Psychological Science, vol.11, issue.2, pp.133-137, 2004.
DOI : 10.1016/S0167-6393(98)00048-X

T. Kuratate, K. G. Munhall, P. E. Rubin, E. Vatikiotis-bateson, and H. Yehia, Audio-visual synthesis of talking faces from speech production correlates, Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH), pp.1279-1282, 1999.

L. Valbonesi, R. Ansari, D. Mcneill, F. Quek, S. Duncan et al., Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures, Proceedings of the European Signal Processing Conference(EUSIPCO), pp.75-78, 2005.

F. Quek, D. Mcneill, R. Ansari, X. Ma, R. Bryll et al., Gesture cues for conversational interaction in monocular video, Proceedings International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. In Conjunction with ICCV'99 (Cat. No.PR00378), pp.64-69, 1999.
DOI : 10.1109/RATFG.1999.799234

H. P. Graf, E. Cosatto, V. Strom, and F. J. Huang, Visual prosody: facial movements accompanying speech, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, pp.381-386, 2002.
DOI : 10.1109/AFGR.2002.1004186

M. E. Sargn, Y. Yemez, E. Erzin, and A. M. Tekalp, Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, issue.8, pp.1330-1345, 2008.
DOI : 10.1109/TPAMI.2007.70797

D. Talkin, A robust algorithm for pitch tracking, " in Speech Coding and Synthesis, pp.497-518, 1995.

E. M. Chutorian and M. M. Trivedi, Head Pose Estimation in Computer Vision: A Survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.4, pp.607-626, 2009.
DOI : 10.1109/TPAMI.2008.106

P. Viola and M. J. Jones, Robust Real-Time Face Detection, International Journal of Computer Vision, vol.57, issue.2, pp.137-154, 2004.
DOI : 10.1023/B:VISI.0000013087.49260.fb

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.9805

K. Wong, K. Lam, and W. Siu, A robust scheme for live detection of human faces in color images, Signal Processing: Image Communication, pp.103-114, 2003.
DOI : 10.1016/S0923-5965(02)00088-7

K. W. Wong, K. I. Lam, and W. Siu, An efficient algorithm for human face detection and facial feature extraction under different conditions, Pattern Recognition, vol.34, issue.10, 1993.
DOI : 10.1016/S0031-3203(00)00134-5

B. Yip, W. Y. Siu, and S. Jin, Pose determination of human head using one feature point based on head movement, Proceedings of IEEE Int. Conf. on Multimedia and Expo (ICME), pp.1183-1186, 2004.

F. Ringeval, J. Demouy, G. S. Chetouani, L. Robel, J. Xavier et al., Automatic Intonation Recognition for the Prosodic Assessment of Language-Impaired Children, IEEE Transactions on Audio, Speech, and Language Processing, vol.19, issue.5, pp.1-15, 2010.
DOI : 10.1109/TASL.2010.2090147

T. Arai and S. Greenberg, The temporal properties of spoken japanese are similar to those of english, Proceedings of Eurospeech, pp.1011-1114, 1997.

K. Nickel and R. Stiefelhagen, Real-Time Recognition of 3D-Pointing Gestures for Human-Machine-Interaction, Proceedings of DAGM- Symposium, pp.557-565, 2003.
DOI : 10.1007/978-3-540-45243-0_71

S. A. Moubayed and J. Beskow, Effects of visual prominence cues on speech intelligibility, Proceedings of the International Conference on Auditory-Visual Speech Processing (AVSP), 2009.

L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

I. Rezek, P. Sykacek, and S. Roberts, Coupled hidden Markov models for biosignal interaction modelling, First International Conference on Advances in Medical Signal and Information Processing, 2000.
DOI : 10.1049/cp:20000317

I. Rezek and S. J. Roberts, Estimation of coupled hidden Markov models with application to biosignal interaction modelling, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501), 2000.
DOI : 10.1109/NNSP.2000.890160

A. V. Nean, L. Liang, X. Pi, X. Liu, and C. Mao, A coupled hidden markov model for audio-visual speech recognition, Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.2013-2016, 2002.

L. Liang, X. Liu, X. Pi, Y. Zhao, and A. V. Nean, Speaker independent audio-visual continuous speech recognition, Proceedings. IEEE International Conference on Multimedia and Expo, pp.25-28, 2002.
DOI : 10.1109/ICME.2002.1035365

W. Penny and S. Roberts, Gaussian observation hidden markov models for eeg analysis, 1998.