A hotel chain wanted an in-room AI concierge that felt as conversational as a human front desk - not the half-second-delayed walkie-talkie experience most voice agents ship. We deployed Nvidia PersonaPlex over WebSocket on local hardware in each property.
Most voice agents stitch together STT → LLM → TTS as separate cloud calls. The round-trips make turn-taking feel laggy - guests interrupt themselves, the agent talks over them, and the experience never feels like a real conversation. The hotel wanted full-duplex (the agent can listen and respond mid-utterance), sub-second turn boundaries, and zero cloud dependency for privacy.
Benchmarked PersonaPlex-7B (Nvidia) against pipelined STT+LLM+TTS approaches. PersonaPlex's integrated full-duplex S2S architecture beat pipelines on naturalness scores and turn-taking latency by a wide margin.
Built a lightweight RAG layer over the hotel's in-room services menu, local recommendations, and operational FAQs. Retrieved chunks injected into PersonaPlex's context window per turn.
PersonaPlex deployed on a small local node per property (10 GB VRAM footprint). WebSocket-based transport keeps the audio stream continuous in both directions. No cloud round-trip in the speech path.
Intent classifier on every turn routes anything sensitive (billing, lost-item, complaint) directly to a human agent over the same channel. The model never tries to handle what it shouldn't.
Property-level config selects voice + language pack. 11 languages supported in launch wave, all from the same on-device deployment.
“Guests stopped commenting on the agent because they stopped noticing it's an agent. That's the whole goal.”
- Chief Digital Officer, hotel chain (name withheld)
A 30-minute call. We'll tell you whether we can help - and if not, who can.