DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

The paper is going to be published at ISMIR'22. Audio files in Part I and II are the ones used in the figures and the subjective evaluation in our paper.

Note: Please kindly make sure that your web browser supports the format of wav audio. The numbers in the following table specify the first browser version that fully supports the audio element. And, we suggest hearing the audio files with a headset.

web browser supports

It may take some time to load the audios, please wait for a while if the playback buffers are not yet finished.


Content


Part I. Audios of Figures in the Paper


These are the audio files of the spectrograms displayed in Figure 1

web browser supports





These are the audio files of the examples discussed in Figure 4

web browser supports


DDSP-Add (harmonic-plus-noise)           SawSing (harmonic-plus-noise)


Part II. Audios of the Subjective Evaluation

Note: The post-processed version of SawSing (de-buzzed) is not included in our user study (MOS).

Regular: 3h data, well-trained

female singer (f1) male singer (m1)
Recording

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Parallel WaveGAN (PWG)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Hifi-GAN

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

FastDiff

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Neural Source Filter (NFS)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Differentiable Wavetable Synthesizer (DWTS)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

DDSP-Add

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

SawSing

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

SawSing
(De-buzzed)

not included in MOS

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3



Resource-limited: 3min data, 3h training

female singer (f1) male singer (m1)
Recording

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Parallel WaveGAN (PWG)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Hifi-GAN

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

FastDiff

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Neural Source Filter (NFS)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

Differentiable Wavetable Synthesizer (DWTS)

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

DDSP-Add

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

SawSing

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3

SawSing
(De-buzzed)

not included in MOS

f1-test-song1

f1-test-song2

f1-test-song3


m1-test-song1

m1-test-song2

m1-test-song3