Memoire Deezer | ATIAM

If audio reproduction fails, the audios are all here: Google Drive Link



Example 1

Name: Diffusion separation output from Plaja-Roglans et al.

Trained for 99k steps.

Separated Vocals

Separated Accompaniment


Example 2

Name: Make-it-Sound-Good (MSG) bass enhancement

Trained for 50 epochs (120k steps) on the MUSDB18 train set.

Input (Spleeter's output)

Target (Ground Truth)

Output (MSG System)


Example 3

Name: Make-it-Sound-Good (MSG) accompaniment enhancement

Trained for 50 epochs (120k steps) on the MUSDB18 train set.

Input (Spleeter's output)

Target (Ground Truth)

Output (MSG System)


Example 4

Name: Soundstream by Lucidrains' output

Sample after training on 10k steps of MUSDB samples

Sample after training on 29k steps of Bean Drums


Example 5

Name: RAVE Codec Reconstruction

Comparison between the representation learning phase's output and the output after the adversarial fine-tuning phase.

Output after representation Learning Phase

The following examples first reproduce the input and then the reconstruction, of validation set audios.

Output after complete training (Adversarial Fine-tuning Phase)

The following examples first reproduce the input and then the reconstruction, of validation set audios.


Example 6

Name: Finetuneing EnCodec using embedding distance

Input (Spleeter output)

Target

Quantized Target

Output


Example 7

Name: EnCodec Super Overfit using embedding distance

Input (Spleeter output)

Output

Quantized Target


Example 8

Name: EnCodec fine-tuning for denoising using embedding distance

Input

Output

Quantized Input

Quantized Target


Example 9

Name: Residual Loss experiments

Input

Output

Quantized Input

Quantized Target


Example 10

Name: Embedding distance loss on Drums

Input

Output

Quantized Input

Quantized Target


Example 11

Name: Adding transformer layer, trained on Bass

Input

Output

Quantized Input

Quantized Target


Example 12

Name: Mix to Bass using embedding distance

Input

Output

Quantized Input

Quantized Target


Example 13

Name: Mix to Drums using embedding distance

Input

Output

Quantized Input

Quantized Target


Example 14

Name: Mix to Bass with transformer layer

Input

Output

Quantized Input

Quantized Target


Example 15

Name: Mix to Drums with transformer layer

Input

Output

Quantized Input

Quantized Target


Example 16

Name: Mix to Other using transformer layer

Input

Output

Quantized Input

Quantized Target


Example 17

Name: Mix to Bass using RAVE model

Input

Output

Quantized Input

Quantized Target


Example 18

Name: Vampnet reconstruction with Demucs as input

Input

Target

Quantized Input

Reconstruction of fine tokens from coarse (using coarse2fine model)

Reconstruction of coarse and fine tokens from 1 coarse token (using coarse and coarse2fine models)


Example 19

Name: Vampnet reconstruction with mixture as input (filtering with encoder)

Input

Target

Reconstructed Input

Reconstruction of fine tokens from coarse (using coarse2fine model)

Reconstruction of coarse and fine tokens from 1 coarse token (using coarse and coarse2fine models)



Extra examples of VampNet predictions are here: