A hospital network needed real-time transcription of physician dictation that never left their network. We fine-tuned Whisper large-v3 on 800 hours of de-identified clinical audio and deployed it on their on-prem GPU cluster.
The client's physicians were spending 90+ minutes a day on documentation. Off-the-shelf medical transcription vendors required sending audio to a third-party cloud - a non-starter for their compliance team. Generic Whisper missed too many drug names, dosages, and ICD-10 codes to be usable. They needed clinical-grade accuracy without the audio ever leaving the hospital network.
800 hours of de-identified physician dictation (Safe Harbor de-identification), aligned with corrected transcripts. PHI scrubbed at the token and acoustic level before any model touched it.
LoRA fine-tune of Whisper large-v3 with a medical-vocab augmented BPE tokenizer. Added 14,000 clinical terms (drugs, conditions, procedures, abbreviations) to the decoder vocabulary.
Pyannote-based diarisation to separate physician from patient, plus a small classifier head for SOAP-note section detection (subjective / objective / assessment / plan).
Compiled with TensorRT, served via a custom WebSocket gateway. 200ms chunked streaming, partial-result correction on commit.
Deployed inside the hospital VPC on 4× A100 nodes. Zero outbound network access for the inference path. SOC 2 + HIPAA evidence pack produced for their audit.
“It hears 'metformin XR 500' the way our doctors say it. Generic models guessed. Ours actually knows.”
- CIO, regional hospital network (name withheld)
A 30-minute call. We'll tell you whether we can help - and if not, who can.