Interspeech 2026 Demo

MeloDISinger

Melody-aware, duration-preserving singing voice editing with audio infilling.

1Graduate School of Artificial Intelligence, KAIST, South Korea 2Graduate School of Culture Technology, KAIST, South Korea
TL;DR

MeloDISinger edits sung lyrics while preserving the original melody, total duration, and non-edited regions. It combines melody-aware duration prediction with flow-matching-based audio infilling to regenerate only the edited region.

Method Overview

Overview figure for MeloDISinger
Overview of MeloDISinger: (a) overall text-based SVE pipeline and (b) MeloDRP architecture for melody-aware duration-ratio prediction.

Guess which part is edited

Comparison with Related Works

Replacement-P (Rep-P) Phoneme-matched replacement, where the phoneme count is unchanged.
Replacement-S (Rep-S) Syllable-matched replacement, where the syllable count is unchanged but the phoneme count differs.
Replacement-SM (Rep-SM) Syllable-mismatched replacement, where both phoneme and syllable structures differ.
Mixed A combined edit containing insertion, replacement, and deletion operations.

No examples in this category yet.

BibTeX

MeloDISinger

@misc{park2026melodisingermelodyawaredurationpreserving,
  title = {MeloDISinger: Melody-Aware & Duration-Preserving Singing Voice Editing with Audio Infilling},
  author = {Yoonjeong Park and Jaekwon Im and Juhan Nam},
  year = {2026},
  eprint = {2606.30580},
  archivePrefix = {arXiv},
  primaryClass = {eess.AS},
  url = {https://arxiv.org/abs/2606.30580}
}

Ethics Statement

Text-based singing voice editing can improve accessibility, and creative workflows, but it can also be misused for impersonation or unauthorized modification. MeloDISinger should be used only with proper consent from rights holders, singers, and data providers. Demo samples are provided for academic research and qualitative inspection. Any release of edited singing audio should clearly disclose that the audio was generated or modified.