Media · Voice localisation · Case Study

Voice-preserving localisation with Seed-VC - same actor, every language

A creator economy platform wanted to localise English-language hero creators into 9 languages - without losing the creators' vocal identity. We built a Seed-VC + Whisper + Qwen3-TTS pipeline that keeps the original voice across every language.

9
target languages shipped (EN → ES, FR, DE, IT, PT, JA, KO, ZH, HI)
~12 min
to localise a 5-min video end-to-end on DGX Spark
$0
per-token cloud cost · everything runs locally
100%
creators consented · voice embeddings auditable + revocable
Problem

What they were stuck on

Conventional dubbing replaces the original voice with a voice actor - which destroys the creator's brand. Existing AI dubbing tools produce generic synthetic voices that sound nothing like the original. The platform wanted: same creator voice, native pronunciation in 9 target languages, lip-sync-ready output, at a cost that scales to thousands of videos a month.

Approach

How we built it

STEP 01

Voice fingerprint

Extracted a high-quality voice embedding for each enrolled creator from 2–4 min of clean reference audio. Verified with consent + identity check before any cloning ran.

STEP 02

Source transcription + translation

Whisper large-v3 transcribed the source audio with word-level timestamps. A domain-tuned LLM translated to the target language while preserving timing-friendly phrasing.

STEP 03

Native-pronunciation TTS

Qwen3-TTS in CustomVoice mode synthesised the translated text with native phoneme accuracy in the target language - but in a generic voice.

STEP 04

Seed-VC voice transfer

Seed-VC Standard converted the generic-voice synthesis to the original creator's vocal identity, preserving pitch contour, prosody, and tone. The output sounds like the creator speaking the target language fluently.

STEP 05

Timing align + delivery

Final pass aligns segment boundaries to the source video for lip-sync compatibility, exports WAV/MP4 stems for the editing pipeline.

Stack

What we used

Seed-VC StandardOpenAIWhisper large-v3Qwen3-TTS (CustomVoice)QwenCustom translator (Qwen 3 fine-tune)NvidiaNvidia DGX SparkAsync render queue
Outcomes

What changed

9target languages shipped (EN → ES, FR, DE, IT, PT, JA, KO, ZH, HI)
~12 minto localise a 5-min video end-to-end on DGX Spark
$0per-token cloud cost · everything runs locally
100%creators consented · voice embeddings auditable + revocable

Our creators' voices are their brand. This is the first dubbing pipeline that doesn't erase them.

- VP Content, creator-economy platform (name withheld)

Have a similar problem? Let's scope it.

A 30-minute call. We'll tell you whether we can help - and if not, who can.

Talk to us