Healthcare · HIPAA · Case Study

On-prem clinical transcription with a Whisper fine-tune - 99.1% accuracy on medical vocab

A hospital network needed real-time transcription of physician dictation that never left their network. We fine-tuned Whisper large-v3 on 800 hours of de-identified clinical audio and deployed it on their on-prem GPU cluster.

99.1%
WER on clinical vocab (baseline: 92.4%)
42 min
average documentation time saved per physician per day
0
audio leaves the hospital network
$0
per-minute SaaS fees
Problem

What they were stuck on

The client's physicians were spending 90+ minutes a day on documentation. Off-the-shelf medical transcription vendors required sending audio to a third-party cloud - a non-starter for their compliance team. Generic Whisper missed too many drug names, dosages, and ICD-10 codes to be usable. They needed clinical-grade accuracy without the audio ever leaving the hospital network.

Approach

How we built it

STEP 01

Compliance-first data prep

800 hours of de-identified physician dictation (Safe Harbor de-identification), aligned with corrected transcripts. PHI scrubbed at the token and acoustic level before any model touched it.

STEP 02

Domain fine-tuning

LoRA fine-tune of Whisper large-v3 with a medical-vocab augmented BPE tokenizer. Added 14,000 clinical terms (drugs, conditions, procedures, abbreviations) to the decoder vocabulary.

STEP 03

Speaker + section tagging

Pyannote-based diarisation to separate physician from patient, plus a small classifier head for SOAP-note section detection (subjective / objective / assessment / plan).

STEP 04

Real-time streaming pipeline

Compiled with TensorRT, served via a custom WebSocket gateway. 200ms chunked streaming, partial-result correction on commit.

STEP 05

On-prem deployment

Deployed inside the hospital VPC on 4× A100 nodes. Zero outbound network access for the inference path. SOC 2 + HIPAA evidence pack produced for their audit.

Stack

What we used

OpenAIWhisper large-v3HuggingFacePEFT / LoRANvidiaTensorRTPyannote diarisationOn-prem A100 (4×)HIPAA-scoped VPC
Outcomes

What changed

99.1%WER on clinical vocab (baseline: 92.4%)
42 minaverage documentation time saved per physician per day
0audio leaves the hospital network
$0per-minute SaaS fees

It hears 'metformin XR 500' the way our doctors say it. Generic models guessed. Ours actually knows.

- CIO, regional hospital network (name withheld)

Have a similar problem? Let's scope it.

A 30-minute call. We'll tell you whether we can help - and if not, who can.

Talk to us