Building an AI-powered healthcare IVR taught me that most voice bots fail before they even speak.
Not because of bad models.
Because of bad product decisions.
Context
I built an AI-driven IVR system designed to call patients for appointment reminders, use RAG to fetch contextual patient data, handle dynamic responses (reschedule, confirm, decline), and reduce human telecaller workload.
Tech stack included FastAPI, Twilio voice calls, RAG with contextual retrieval, and structured fallback handling.
The goal wasn't just automation. It was reducing no-shows while maintaining patient trust.
Key Product Lessons
Trust > Intelligence
Users judge the system in the first 5 seconds.
If the tone feels robotic or overly scripted, they disengage. Conversational warmth matters more than model sophistication.
Voice personality design is not cosmetic. It's retention strategy.
Latency Kills Engagement
Even a 1.5–2 second delay in response feels broken in voice interactions.
- The system froze
- The call glitched
- It's spam
In real-time AI systems, latency matters more than model quality.
Personalization > General Intelligence
Fetching patient-specific context using RAG dramatically increased engagement.
“Hi, this is a reminder for your cardiology appointment on Tuesday” works better than “You have an upcoming appointment.”
Specificity builds credibility.
Fallback Design Is the Real Product
Most conversations don't follow the happy path. Users:
- Speak unclearly
- Switch languages
- Ask unrelated questions
- Interrupt
The fallback logic determined success more than the primary flow.
Design for confusion, not perfection.
What I'd Improve in v2
- Multi-lingual adaptive routing
- Sentiment detection for escalation
- Smart retry scheduling based on engagement probability
- A/B testing conversational tone
Practical Takeaway
If you're building AI voice products, design for:
- Trust
- Speed
- Context
- Failure handling
Not just model accuracy.