A creator economy platform wanted to localise English-language hero creators into 9 languages - without losing the creators' vocal identity. We built a Seed-VC + Whisper + Qwen3-TTS pipeline that keeps the original voice across every language.
Conventional dubbing replaces the original voice with a voice actor - which destroys the creator's brand. Existing AI dubbing tools produce generic synthetic voices that sound nothing like the original. The platform wanted: same creator voice, native pronunciation in 9 target languages, lip-sync-ready output, at a cost that scales to thousands of videos a month.
Extracted a high-quality voice embedding for each enrolled creator from 2–4 min of clean reference audio. Verified with consent + identity check before any cloning ran.
Whisper large-v3 transcribed the source audio with word-level timestamps. A domain-tuned LLM translated to the target language while preserving timing-friendly phrasing.
Qwen3-TTS in CustomVoice mode synthesised the translated text with native phoneme accuracy in the target language - but in a generic voice.
Seed-VC Standard converted the generic-voice synthesis to the original creator's vocal identity, preserving pitch contour, prosody, and tone. The output sounds like the creator speaking the target language fluently.
Final pass aligns segment boundaries to the source video for lip-sync compatibility, exports WAV/MP4 stems for the editing pipeline.
“Our creators' voices are their brand. This is the first dubbing pipeline that doesn't erase them.”
- VP Content, creator-economy platform (name withheld)
A 30-minute call. We'll tell you whether we can help - and if not, who can.