MeloDISinger Demo

TL;DR

MeloDISinger edits sung lyrics while preserving the original melody, total duration, and non-edited regions. It combines melody-aware duration prediction with flow-matching-based audio infilling to regenerate only the edited region.

Table of Contents

Method Overview

Overview figure for MeloDISinger — Overview of MeloDISinger: (a) overall text-based SVE pipeline and (b) MeloDRP architecture for melody-aware duration-ratio prediction.

Guess which part is edited

Comparison with Related Works

Replacement-P (Rep-P) Phoneme-matched replacement, where the phoneme count is unchanged.

Replacement-S (Rep-S) Syllable-matched replacement, where the syllable count is unchanged but the phoneme count differs.

Replacement-SM (Rep-SM) Syllable-mismatched replacement, where both phoneme and syllable structures differ.

Mixed A combined edit containing insertion, replacement, and deletion operations.

No examples in this category yet.

BibTeX

MeloDISinger

@misc{park2026melodisingermelodyawaredurationpreserving,
  title = {MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling},
  author = {Yoonjeong Park and Jaekwon Im and Juhan Nam},
  year = {2026},
  eprint = {2606.30580},
  archivePrefix = {arXiv},
  primaryClass = {eess.AS},
  url = {https://arxiv.org/abs/2606.30580}
}

Ethics Statement

Text-based singing voice editing can improve accessibility, and creative workflows, but it can also be misused for impersonation or unauthorized modification. MeloDISinger should be used only with proper consent from rights holders, singers, and data providers. Demo samples are provided for academic research and qualitative inspection. Any release of edited singing audio should clearly disclose that the audio was generated or modified.