Hospitality · Real-time voice · Case Study

Full-duplex S2S concierge with PersonaPlex - sub-second turn-taking on edge hardware

A hotel chain wanted an in-room AI concierge that felt as conversational as a human front desk - not the half-second-delayed walkie-talkie experience most voice agents ship. We deployed Nvidia PersonaPlex over WebSocket on local hardware in each property.

780ms

median end-to-end turn-taking

languages supported on the same edge deployment

94%

guest interactions handled without escalation

100%

on-premise per property - no guest audio leaves the building

Problem

What they were stuck on

Most voice agents stitch together STT → LLM → TTS as separate cloud calls. The round-trips make turn-taking feel laggy - guests interrupt themselves, the agent talks over them, and the experience never feels like a real conversation. The hotel wanted full-duplex (the agent can listen and respond mid-utterance), sub-second turn boundaries, and zero cloud dependency for privacy.

Approach

How we built it

STEP 01

Model selection

Benchmarked PersonaPlex-7B (Nvidia) against pipelined STT+LLM+TTS approaches. PersonaPlex's integrated full-duplex S2S architecture beat pipelines on naturalness scores and turn-taking latency by a wide margin.

STEP 02

Hospitality domain pack

Built a lightweight RAG layer over the hotel's in-room services menu, local recommendations, and operational FAQs. Retrieved chunks injected into PersonaPlex's context window per turn.

STEP 03

Edge deployment

PersonaPlex deployed on a small local node per property (10 GB VRAM footprint). WebSocket-based transport keeps the audio stream continuous in both directions. No cloud round-trip in the speech path.

STEP 04

Guardrails + escalation

Intent classifier on every turn routes anything sensitive (billing, lost-item, complaint) directly to a human agent over the same channel. The model never tries to handle what it shouldn't.

STEP 05

Multi-language

Property-level config selects voice + language pack. 11 languages supported in launch wave, all from the same on-device deployment.

Stack

What we used

Nvidia PersonaPlex 7BWebSocket transportLocal RAG indexIntent classifier (Qwen 3 fine-tune)On-property edge nodeFailover to human agent

Outcomes

What changed

780msmedian end-to-end turn-taking

11languages supported on the same edge deployment

94%guest interactions handled without escalation

100%on-premise per property - no guest audio leaves the building

“Guests stopped commenting on the agent because they stopped noticing it's an agent. That's the whole goal.”

- Chief Digital Officer, hotel chain (name withheld)

Have a similar problem? Let's scope it.

A 30-minute call. We'll tell you whether we can help - and if not, who can.

Talk to us

More work

Legal Tech & Compliance

Full-duplex S2S concierge with PersonaPlex - sub-second turn-taking on edge hardware

What they were stuck on

How we built it

Model selection

Hospitality domain pack

Edge deployment

Guardrails + escalation

Multi-language

What we used

What changed

Have a similar problem? Let's scope it.

Domain-adapted 7B LLM cuts inference costs by 60%

Sub-300ms voice assistant on edge hardware

Brand-styled video b-roll using LTX-Video

On-prem clinical transcription, 99.1% accuracy

Flux + brand LoRA at SKU scale

Edge VLM defect detection on Jetson

Air-gapped policy RAG on DGX Spark

Whisper fine-tune for Uzbek STT

TRELLIS image-to-3D for AR catalogues

Seed-VC voice conversion for localisation