Hospitality · Real-time voice · Case Study

Full-duplex S2S concierge with PersonaPlex - sub-second turn-taking on edge hardware

A hotel chain wanted an in-room AI concierge that felt as conversational as a human front desk - not the half-second-delayed walkie-talkie experience most voice agents ship. We deployed Nvidia PersonaPlex over WebSocket on local hardware in each property.

780ms
median end-to-end turn-taking
11
languages supported on the same edge deployment
94%
guest interactions handled without escalation
100%
on-premise per property - no guest audio leaves the building
Problem

What they were stuck on

Most voice agents stitch together STT → LLM → TTS as separate cloud calls. The round-trips make turn-taking feel laggy - guests interrupt themselves, the agent talks over them, and the experience never feels like a real conversation. The hotel wanted full-duplex (the agent can listen and respond mid-utterance), sub-second turn boundaries, and zero cloud dependency for privacy.

Approach

How we built it

STEP 01

Model selection

Benchmarked PersonaPlex-7B (Nvidia) against pipelined STT+LLM+TTS approaches. PersonaPlex's integrated full-duplex S2S architecture beat pipelines on naturalness scores and turn-taking latency by a wide margin.

STEP 02

Hospitality domain pack

Built a lightweight RAG layer over the hotel's in-room services menu, local recommendations, and operational FAQs. Retrieved chunks injected into PersonaPlex's context window per turn.

STEP 03

Edge deployment

PersonaPlex deployed on a small local node per property (10 GB VRAM footprint). WebSocket-based transport keeps the audio stream continuous in both directions. No cloud round-trip in the speech path.

STEP 04

Guardrails + escalation

Intent classifier on every turn routes anything sensitive (billing, lost-item, complaint) directly to a human agent over the same channel. The model never tries to handle what it shouldn't.

STEP 05

Multi-language

Property-level config selects voice + language pack. 11 languages supported in launch wave, all from the same on-device deployment.

Stack

What we used

NvidiaNvidia PersonaPlex 7BWebSocket transportLocal RAG indexQwenIntent classifier (Qwen 3 fine-tune)On-property edge nodeFailover to human agent
Outcomes

What changed

780msmedian end-to-end turn-taking
11languages supported on the same edge deployment
94%guest interactions handled without escalation
100%on-premise per property - no guest audio leaves the building

Guests stopped commenting on the agent because they stopped noticing it's an agent. That's the whole goal.

- Chief Digital Officer, hotel chain (name withheld)

Have a similar problem? Let's scope it.

A 30-minute call. We'll tell you whether we can help - and if not, who can.

Talk to us