Improving face-to-face communication in loud environments by means of blind source separation
In this study a Deep Neural Network-based blind source separation, which had previously been used successfully for enhancing speech in tv broadcasting, was used in a hearing protection device with the intention to improve speech intelligibility in noisy environments. The employed long short-term memory (LSTM) network predicts masks in the frequency domain for speech as well as background noise. In order to keep situational awareness, speech was not isolated completely, but rather mixed back with the separated noise at a better SNR. Offline evaluations with speech quality and listening effort measures using noises common to industrial work environments yielded promising results. The algorithm was then implemented in “real-time” in an electronic earpiece functioning as a hearing protection device. Testing with speech in diffuse field noise using dummy head recordings were conducted to compare the device to other hearing protection devices. These evaluations also included a possible negative impact of state-of-the-art level-adaptive damping on binaural cues, which may potentially decrease localization abilities and orientation. Results indicate that, while the LSTM approach can potentially improve speech communication in noisy environment, care has to be taken regarding processing delay in relation to the direct sound in order to avoid disturbing echoes.