Understanding the Lombard effect in Mandarin - Comparison of speech recognition across speakers and vocal effort in tonal languages with human data and models
In search of a generalized description of speech recognition across languages and speakers, comparisons between tonal languages (such as Mandarin) and non-tonal languages as well as low and high vocal effort (Lombard effect) are of considerable interest. For this purpose, speech recognition thresholds with normal-hearing listeners were obtained for Mandarin Matrix sentence tests recorded with 5 female and 6 male talkers in plain and Lombard speech. Matrix Sentence Tests are suitable because information content and syntactic structure are well controlled and comparable across languages. In order to interpret and understand the observed effects, a comparison of human data with two different model approaches is performed: The standardized Speech Intelligibility Index (SII) and the automatic speech recognition-based Framework for Auditory Discrimination Experiments (FADE). The models are validated in terms of prediction accuracy of SRT and Lombard effect. While the Lombard effect appears to be primarily driven by spectral differences in the produced speech correctly predicted by both models, the difference between tonal and non-tonal languages appears to be better described by FADE that utilizes the information content of the respective time-frequency representations.