Research Projects
In most cases the results were not made public in any way. Here is a list of some of them with a brief description.
Source Separation: A Generative Approach
Masters research internship at Deezer Research, Paris.Supervision: Romain Hennequin and Gabriel Meseguer Brocal
Keywords: Source separation, music generation, enhancement, neural codecs, token-based generation
Music source separation is the decomposition of an audio recording into the recordings of its individual sources. It plays an integral role in applications ranging from musicological tasks such as transcription to practical industrial applications such as karaoke.
In this dissertation, we explore the complex task of music source separation, examining the nuances of current challenges in the field and presenting novel methods to overcome them.
Although state-of-the-art deep learning approaches have made significant progress, they often introduce distortions, artefacts and other perceptual inaccuracies.
We propose a paradigm shift towards prioritising auditory experience over high precision in the reconstruction of a reference target.
Our research presents a generative approach to the task, using models currently employed for automatic music generation to approach source separation and separation enhancement in a less conditioned way.
The primary goal is to improve sound quality, even if this means deviating from the original target, thus examining the essence of what makes audio "good" or "authentic" without being tied to direct waveform or spectrogram comparisons.
To this end, we use token-based audio generation and exploit neural codec architectures to achieve a balance between fidelity of reconstruction of the original source and perceived audio quality.
While our research has had its challenges, partly due to the rapidly evolving field of automatic music generation and the early-stage nature of some of the proposed methods, our work serves as a fundamental step in understanding the potential of token-based music generation in highly conditioned tasks.
The work concludes by highlighting the need for a holistic approach to music source separation, one that aligns with the human auditory experience and expands the horizons of musicians and listeners alike.
Timbre and Choral Blending Analysis of Uruguayan Murga Singing
Independent research project with a short paper accepted at TIMBRE2023 and a conference paper accepted at CAICU2023.A first approach to computationally aided musicological analysis of Murga singing through a comparative study.
Chamber Music Recording and Informed Music Source Separation
Group project at the ATIAM Masters, Ircam.Supervision: Benoit Fabre and Mathieu Fontaine
Keywords: Music, Source separation, Non-Negative Matrix Factorization, Live Recording, Acoustics
Music Source Separation (MSS) is the process of separating individual audio signals from a mixed recording containing multiple sound sources, such as different musical instruments, vocals and ambient noise. Its various applications include remixing, transcription and music recommendation.
In the context of real acoustic recordings, the separation task is particularly challenging due to the complexity and variability of acoustic instruments and recording conditions such as room acoustics and microphone directivity. We propose the use of Non-negative Matrix Factorization (NMF) algorithms for this task, and in our multi-channel setting, we aim to implement efficient, conditioned versions of this algorithm to be applied to musical recordings performed in a known and controlled context. We investigate methods of informing this algorithm by conditioning on temporal and spectral information from the instruments, that were specifically registered at the time of the recording for this purpose.
To this end, we conducted a professional-level recording of a chamber music quintet.
We have compared our results with other state-of-the-art algorithms, obtaining comparable results on benchmark datasets, and we have carried out subjective evaluation according to the MUSHRA protocol, where we see a good performance of our algorithm. We observe a strong effect of the processing of the recording, which helps or hinders the separation depending on the instrument.
Our approach confirms the versatility of the FastMNMF algorithm and the possibility of extending and making these algorithms more versatile. Audio results can be heard on our website.
Predicting Wind Turbine Power with Deep Learning
Project undertaken as a Assistant Researcher at the Institute of Electrical Engineering, University of the RepublicSupervision: Pablo Massaferro
Research into Deep Learning applications for modelling of wind flux in wind farms.
Music Style Translation with Supervision DL methods
Year-long "research and innovation project" as an M1 student with the ADASP group, Telecom Paris.Supervision: Ondrej Cífka, Gaël Richard and Umut Simsekli
Style transfer and domain translation have recently experienced a surge in popularity as a research topic, and applications in text and image have obtained excellent results for language translation and artistic effects respectively.
In the case of music, it has not been so successful. Partly because of the vagueness of what might be defined as style in a musical piece, and partly because of the scarcity of large datasets on which to work on with Deep Learning models that have provided state of the art results in similar areas of study.
In this project we worked on adapting the work by Cífka et al. (2019) on symbolic music -which was the first to present a fully supervised algorithm for the task of style translation and transfer- to work on audio inputs. We performed a parallel comparison of the performance of the two models.