Deep Learning-based Speech Source Localization using Binaural and Monaural Microphone Arrays
This study proposes a deep learning-based method for speech source localization (SSL) by using binaural and monaural microphone arrays in hearing aids (HA). During the last few years, several deep models based on binaural or non-coplanar microphone channels (i.e. the microphone arrays used in environment-robot interaction) have been proposed for SSL by using short-time Fourier transform as a pre-processing stage. In our study, in addition to a binaural method, a monaural model that is more applicable in HAs is proposed for SSL. Two behind-the-ear HAs with a total of eight microphone channels is used to capture the signals of sound sources in a noisy and reverberant environment. Six microphone channels behind the two ears and four microphone channels behind and in the right ear are employed in the binaural and monaural methods, respectively. Motivated by the human auditory system, an auditory peripheral pre-processing stage is used to prepare the interaural signals for input to the deep models. Although the trained monaural model shows acceptable performance at higher signal to noise ratios in the test stage, the trained binaural model accurately estimates the azimuth angles of speech sources at much more noisy conditions.