Healthcare · HIPAA · Case Study

On-prem clinical transcription with a Whisper fine-tune - 99.1% accuracy on medical vocab

A hospital network needed real-time transcription of physician dictation that never left their network. We fine-tuned Whisper large-v3 on 800 hours of de-identified clinical audio and deployed it on their on-prem GPU cluster.

99.1%

WER on clinical vocab (baseline: 92.4%)

42 min

average documentation time saved per physician per day

audio leaves the hospital network

per-minute SaaS fees

Problem

What they were stuck on

The client's physicians were spending 90+ minutes a day on documentation. Off-the-shelf medical transcription vendors required sending audio to a third-party cloud - a non-starter for their compliance team. Generic Whisper missed too many drug names, dosages, and ICD-10 codes to be usable. They needed clinical-grade accuracy without the audio ever leaving the hospital network.

Approach

How we built it

STEP 01

Compliance-first data prep

800 hours of de-identified physician dictation (Safe Harbor de-identification), aligned with corrected transcripts. PHI scrubbed at the token and acoustic level before any model touched it.

STEP 02

Domain fine-tuning

LoRA fine-tune of Whisper large-v3 with a medical-vocab augmented BPE tokenizer. Added 14,000 clinical terms (drugs, conditions, procedures, abbreviations) to the decoder vocabulary.

STEP 03

Speaker + section tagging

Pyannote-based diarisation to separate physician from patient, plus a small classifier head for SOAP-note section detection (subjective / objective / assessment / plan).

STEP 04

Real-time streaming pipeline

Compiled with TensorRT, served via a custom WebSocket gateway. 200ms chunked streaming, partial-result correction on commit.

STEP 05

On-prem deployment

Deployed inside the hospital VPC on 4× A100 nodes. Zero outbound network access for the inference path. SOC 2 + HIPAA evidence pack produced for their audit.

Stack

What we used

Whisper large-v3PEFT / LoRATensorRTPyannote diarisationOn-prem A100 (4×)HIPAA-scoped VPC

Outcomes

What changed

99.1%WER on clinical vocab (baseline: 92.4%)

42 minaverage documentation time saved per physician per day

0audio leaves the hospital network

$0per-minute SaaS fees

“It hears 'metformin XR 500' the way our doctors say it. Generic models guessed. Ours actually knows.”

- CIO, regional hospital network (name withheld)

Have a similar problem? Let's scope it.

A 30-minute call. We'll tell you whether we can help - and if not, who can.

Talk to us

More work

Legal Tech & Compliance

On-prem clinical transcription with a Whisper fine-tune - 99.1% accuracy on medical vocab

What they were stuck on

How we built it

Compliance-first data prep

Domain fine-tuning

Speaker + section tagging

Real-time streaming pipeline

On-prem deployment

What we used

What changed

Have a similar problem? Let's scope it.

Domain-adapted 7B LLM cuts inference costs by 60%

Sub-300ms voice assistant on edge hardware

Brand-styled video b-roll using LTX-Video

Flux + brand LoRA at SKU scale

Edge VLM defect detection on Jetson

Air-gapped policy RAG on DGX Spark

Whisper fine-tune for Uzbek STT

TRELLIS image-to-3D for AR catalogues

Seed-VC voice conversion for localisation

PersonaPlex full-duplex voice concierge