Research Projects
In most cases the results were not made public in any way. Here is a list of some of them with a brief description.
Automatic Music Metadata Extraction
Research Assistant at Centre for Digital Music, Queen Mary University of LondonCollaboration: InnovateUK-funded project with Stage
Keywords: Sample identification, music information retrieval, graph neural networks, automatic lyrics transcription, source separation
This project focuses on developing deep learning approaches for automatic music metadata extraction, my focus was on automatic sample identification and I collaborated on song lyric transcription.
Sample Identification: Automatic sample identification (ASID), the detection and identification of portions of audio recordings that have been reused in new musical works, is an essential but challenging task in the field of audio query-based retrieval. While a related task, audio fingerprinting, has made significant progress in accurately retrieving musical content under "real world" (noisy, reverberant) conditions, ASID systems struggle to identify samples that have undergone musical modifications. Thus, a system robust to common music production transformations such as time-stretching, pitch-shifting, effects processing, and underlying or overlaying music is an important open challenge.
In this work, we propose a lightweight and scalable encoding architecture employing a Graph Neural Network within a contrastive learning framework. Our model uses only 9% of the trainable parameters compared to the current state-of-the-art system while achieving comparable performance, reaching a mean average precision (mAP) of 44.2%.
To enhance retrieval quality, we introduce a two-stage approach consisting of an initial coarse similarity search for candidate selection, followed by a cross-attention classifier that rejects irrelevant matches and refines the ranking of retrieved candidates - an essential capability absent in prior models. In addition, because queries in real-world applications are often short in duration, we benchmark our system for short queries using new fine-grained annotations for the Sample100 dataset, which we publish as part of this work.
Lyric Transcription: We investigated the impact of music source separation on automatic lyrics transcription using state-of-the-art ASR models, systematically evaluating performance on original audio, separated vocals, and vocal stems across short-form and long-form transcription tasks.
Custom Speech Recognition for Speakers with Aphasia
Freelance Project Lead Engineer at Interamerican Institute for Disability and Inclusive Development (iiDi)Keywords: Automatic speech recognition, aphasia, accessibility, disability inclusion, interdisciplinary collaboration
This project focused on developing a custom Automatic Speech Recognition (ASR) system specifically designed for speakers with aphasia and other speech disorders. Aphasia affects language production and comprehension, making conventional ASR systems inadequate for this population.
The work involved leading the technical research and development while coordinating closely with an interdisciplinary team including psychologists, social workers, and disability and inclusion experts. This collaboration was essential to ensure the system addressed the real needs of users and their support networks.
The project also established partnerships with regional and international NGOs to facilitate future deployment and continued development of accessible speech technologies.
Source Separation: A Generative Approach
Masters research internship at Deezer Research, Paris.Supervision: Romain Hennequin and Gabriel Meseguer Brocal
Keywords: Source separation, music generation, enhancement, neural codecs, token-based generation
Music source separation is the decomposition of an audio recording into the recordings of its individual sources. It plays an integral role in applications ranging from musicological tasks such as transcription to practical industrial applications such as karaoke.
In this dissertation, we explore the complex task of music source separation, examining the nuances of current challenges in the field and presenting novel methods to overcome them.
Although state-of-the-art deep learning approaches have made significant progress, they often introduce distortions, artefacts and other perceptual inaccuracies.
We propose a paradigm shift towards prioritising auditory experience over high precision in the reconstruction of a reference target.
Our research presents a generative approach to the task, using models currently employed for automatic music generation to approach source separation and separation enhancement in a less conditioned way.
The primary goal is to improve sound quality, even if this means deviating from the original target, thus examining the essence of what makes audio "good" or "authentic" without being tied to direct waveform or spectrogram comparisons.
To this end, we use token-based audio generation and exploit neural codec architectures to achieve a balance between fidelity of reconstruction of the original source and perceived audio quality.
While our research has had its challenges, partly due to the rapidly evolving field of automatic music generation and the early-stage nature of some of the proposed methods, our work serves as a fundamental step in understanding the potential of token-based music generation in highly conditioned tasks.
The work concludes by highlighting the need for a holistic approach to music source separation, one that aligns with the human auditory experience and expands the horizons of musicians and listeners alike.
Timbre and Choral Blending Analysis of Uruguayan Murga Singing
Independent research project with a short paper accepted at TIMBRE2023 and a conference paper accepted at CAICU2023.A first approach to computationally aided musicological analysis of Murga singing through a comparative study.
Chamber Music Recording and Informed Music Source Separation
Group project at the ATIAM Masters, Ircam.Supervision: Benoit Fabre and Mathieu Fontaine
Keywords: Music, Source separation, Non-Negative Matrix Factorization, Live Recording, Acoustics
Music Source Separation (MSS) is the process of separating individual audio signals from a mixed recording containing multiple sound sources, such as different musical instruments, vocals and ambient noise. Its various applications include remixing, transcription and music recommendation.
In the context of real acoustic recordings, the separation task is particularly challenging due to the complexity and variability of acoustic instruments and recording conditions such as room acoustics and microphone directivity. We propose the use of Non-negative Matrix Factorization (NMF) algorithms for this task, and in our multi-channel setting, we aim to implement efficient, conditioned versions of this algorithm to be applied to musical recordings performed in a known and controlled context. We investigate methods of informing this algorithm by conditioning on temporal and spectral information from the instruments, that were specifically registered at the time of the recording for this purpose.
To this end, we conducted a professional-level recording of a chamber music quintet.
We have compared our results with other state-of-the-art algorithms, obtaining comparable results on benchmark datasets, and we have carried out subjective evaluation according to the MUSHRA protocol, where we see a good performance of our algorithm. We observe a strong effect of the processing of the recording, which helps or hinders the separation depending on the instrument.
Our approach confirms the versatility of the FastMNMF algorithm and the possibility of extending and making these algorithms more versatile. Audio results can be heard on our website.
Predicting Wind Turbine Power with Deep Learning
Project undertaken as a Assistant Researcher at the Institute of Electrical Engineering, University of the RepublicSupervision: Pablo Massaferro
Research into Deep Learning applications for modelling of wind flux in wind farms.
Music Style Translation with Supervision DL methods
Year-long "research and innovation project" as an M1 student with the ADASP group, Telecom Paris.Supervision: Ondrej Cífka, Gaël Richard and Umut Simsekli
Style transfer and domain translation have recently experienced a surge in popularity as a research topic, and applications in text and image have obtained excellent results for language translation and artistic effects respectively.
In the case of music, it has not been so successful. Partly because of the vagueness of what might be defined as style in a musical piece, and partly because of the scarcity of large datasets on which to work on with Deep Learning models that have provided state of the art results in similar areas of study.
In this project we worked on adapting the work by Cífka et al. (2019) on symbolic music -which was the first to present a fully supervised algorithm for the task of style translation and transfer- to work on audio inputs. We performed a parallel comparison of the performance of the two models.
Publications
- Bhattacharjee, A., Meresman Higgs, I., Sandler, M., & Benetos, E. (2025). "Refining music sample identification with a self-supervised graph neural network", ISMIR 2025
- Syed, J., Meresman Higgs, I., Cífka, O., & Sandler, M. (2025). "Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper", ICME 2025
- Meresman Higgs, I. (2023). "Timbre and Choral Blending Analysis of Uruguayan Murga Singing", Timbre 2023
- Meresman Higgs, I. (2023). "El Timbre y la Interpretación Coral de la Murga Uruguaya: Una primera aproximación desde la Musicología Computacional", CAICU 2023
Conference Organization
CAICU - Congreso Académico Interdisciplinario sobre Carnaval Uruguayo (2023, 2025)Co-organizer and proceedings editor for the interdisciplinary academic conference on Uruguayan carnival, bringing together researchers from multiple disciplines to study the cultural, artistic, and social dimensions of carnival.