Deep Learning for Sound Source Localization based on Data from the DCASE Challenge 2020
The development of methods to acoustically decode environmental sound scenes with microphone arrays holds enormous potential for a wide range of application fields. This contribution shows how Deep Learning can improve the localization results for non-artificial sound sources based on acoustic source maps compared to classical beamforming.The training and test data sets – taken from the DCASE2020 ("Detection and Classification of Acoustic Scenes and Events") challenge – contain spatially distributed sound events of different categories, recorded in different acoustic environments.The acoustic source maps were created with beamforming using eigenvalue and eigenvector techniques. These source maps were used as features to train a residual neural network that predicts the direction of arrival of a single source. A similar approach, based on artificial and stationary sound sources, was developed by Kujawski et al. (2019).With the present method, a localization error of 12.8 degrees was achieved on the test data set, an improvement of almost 60 percent compared to the pure beamforming localization results.In summary, it is confirmed that the method developed by Kujawski et al. (2019) is also applicable to non-artificial sound sources. Learned algorithms can provide improved spatial resolution compared to model-based methods such as beamforming.