Using a Blind EC Mechanism Modelling the Interaction between Binaural and Temporal Speech Processing
Hauth et al. (2020) introduced a blind equalization cancellation (EC) mechanism as front-end of the binaural speech intelligibility model (BSIM). Blind means that no knowledge of the clean speech and interferer signals and no knowledge of the binaural room impulse response (BRIR) is required.In this study we combined this front-end with the following four back-ends for predicting speech recognition thresholds (SRTs): speech intelligibility index (SII), a speech-based speech transmission index (STI) version based on the magnitude cross power spectrum (STI-MCP), a speech-based STI version based on the normalized covariance (STI-NCV), and the non-intrusive short-time objective measure (niSTOI).The predictions of these models were evaluated using SRT data in stationary noise by Rennies et al. (2019) where the direct sound of the target speech was manipulated by adding different numbers of reflections with varying amplitude and delay time. Furthermore, direct sound, reflections, and/or noise were manipulated with respect to their interaural phase differences. The combination of blind binaural front-end and niSTOI achieved the highest coefficient of determination (R²=0.88) but also the highest root-mean-square error (RMSE=4.1) due to a systematic underestimation of the SRT. The combinations with the other back-ends achieved smaller R² values, but also a smaller RMSEs.