01 - Real-time voice agent architecture

Front office in crisis.

A LiveKit voice agent for an orthopedic practice drowning in calls. The product call: build a constrained scheduling agent, not a broad receptionist, and prove it through a deterministic policy gate, a medical-grade voice stack, and a staff-facing review queue.

Public-source implementation simulation. This is a serious proposal for how the system should be designed, not a claim of live production deployment, partnership, or access to private company systems.

Runtime

LiveKit Agents

Low-latency voice orchestration with inspectable traces.

Voice stack

Deepgram / Cartesia / GPT-4o

Chosen for streaming, speed, and controllability.

System boundary

eClinicalWorks

Enough integration to schedule, not enough to verify payer truth.

Delivery scope

Scheduling and routing MVP

Narrower than a receptionist, safer than a broad automation claim.

02 - Operational State

900 calls. 200 callers. Patients redialing.

Summit Orthopedics fired its outsourced call center after sustained patient complaints, poor insurance knowledge, and incorrect scheduling. The product problem was not a generic phone bot. It was an access workflow under pressure.

Inbound calls / day

~900

Observed peak call load in the operating frame used for the concept.

Unique callers

~200

The gap points to repeat dialing because callers were not getting through cleanly.

Scheduling intent

50%

Scheduling is the highest-volume candidate for bounded automation.

Billing calls

~33%

Billing routes outside the agent to an external revenue-cycle queue.

Operating environment

A practice growing faster than its front-office process.

The practice had removed an outsourced call center after patient complaints, poor insurance knowledge, and incorrect scheduling. What remained was high call volume hitting a front office that had no infrastructure to absorb it.

eClinicalWorks support

+
Patient lookup by name and date of birth
+
Provider schedule and availability reads
+
Appointment creation
+
Basic patient demographics

Not available

-
Insurance eligibility checking
-
Complex insurance mapping
-
Referral management

Risk signal

Repeat dialing suggests callers were not just waiting on hold. They were failing to complete the access workflow at all.

03 - Decision Point

The decision is not whether voice AI can answer phones. The decision is whether Summit should trust an agent to perform part of its access workflow while the organization is still restructuring.

The product ruling is narrow: launch a constrained scheduling and routing agent with hard exclusions, staff review, and staged rollout. Earn scope through performance, not ambition.

04 - Product Strategy

A bounded scheduling agent, not a fantasy receptionist.

A broad AI receptionist would be impressive in a demo and dangerous in production. A narrow scheduling agent can be tested, launched, reviewed, and improved.

Why workers comp is excluded

Workers compensation scheduling requires authorization data, claim numbers, and insurance coordination that no current integration supports. One wrong booking creates downstream legal exposure.

Why insurance logic waits

Insurance knowledge exists in staff heads more than in structured systems. Until that knowledge is documented and tested, the agent should not present itself as an eligibility authority.

Included in MVP

+
Inbound call answer and intent classification
+
New and existing patient scheduling
+
Name and DOB lookup against eClinicalWorks
+
Provider availability lookup
+
Appointment creation after confirmation
+
Referred-provider and body-part routing
+
Billing transfer to RCM queue
+
Workers compensation hard transfer
+
Medical request capture and escalation
+
Staff review dashboard

Excluded from MVP

-
Medical advice of any kind
-
Complex insurance interpretation
-
Insurance eligibility verification
-
Referral validation
-
Workers compensation scheduling
-
Surgery scheduling
-
Post-operative triage
-
Prior authorization handling
-
Payment collection

05 - Architecture

LiveKit is the runtime. Twilio is the carrier.

This build is designed as a LiveKit showcase, not a Twilio bot. Telephony becomes an ingress channel into LiveKit, not the center of the system. Each inbound call creates a LiveKit room: a bounded context for the entire patient interaction.

The room model matters because callers interrupt, pause, search for insurance cards, ask side questions, and become frustrated. A request-response chatbot architecture would feel brittle.

LiveKit room per call

Participants

+
patient_sip_participant - caller from PSTN via SIP ingress
+
summit_ai_agent - greeting, classification, scheduling, escalation
+
human_staff_participant - optional warm transfer or supervisor override

Room metadata

+
Call ID
+
Phone number
+
Agent version
+
Location
+
Patient identity status
+
Appointment workflow state
+
Exclusion flags
+
Tool-call eligibility
+
Escalation status
+
Transcript confidence

SIP ingress flow

+
Patient dials Summit phone number
+
Twilio or Telnyx receives the PSTN call
+
SIP trunk routes the call to LiveKit SIP
+
LiveKit creates a SIP participant in a room
+
The LiveKit Agent joins and begins the workflow
+
Human transfer bridges back out through SIP when needed

06 - Technical Design

Every layer earns its place.

The product challenge is not generating answers. The product challenge is managing real-time human conversation over audio in a medical office context while keeping irreversible actions policy-gated.
+
Runtime. LiveKit Agents for real-time audio sessions, programmable agent participation, tool use, and observability.
+
Telephony. LiveKit SIP plus Twilio SIP trunk. Twilio is the phone carrier, not the voice-agent runtime.
+
STT. Deepgram Nova-3 Medical for orthopedic terms, provider names, and phone audio quality.
+
Turn handling. Silero VAD plus LiveKit turn detector for both barge-in and slow callers.
+
Live LLM. GPT-4o for constrained tool use and intent classification, with policy carrying safety-critical logic.
+
Offline review. Stronger batch reasoning for transcript review, policy checks, and QA categorization.
+
TTS. Cartesia Sonic 3 streaming with calm pacing and immediate interruption.
+
Workflow. Deterministic state machine outside free-form model behavior.

STT evaluation set

+
Provider names must clear confirmation-loop testing.
+
Body-part vocabulary must clear the common orthopedic set.
+
DOB capture must be confirmed through readback before patient lookup.
+
Clinical terms are evaluated separately from LLM behavior.

Voice style parameters

+
Slightly slower than default speaking speed
+
One question at a time
+
Calm and clear, not clinical
+
No sales tone or exaggerated warmth
+
Pause after confirmations

Body-part routing

The LLM normalizes caller language. The routing table chooses the provider group. No freeform routing decision gets to book an appointment.

07 - Safety Layer

The LLM proposes. The policy gate decides.

Appointment creation is an irreversible action in eClinicalWorks. It writes to a schedule, creates downstream work, and cannot be treated as a loose model suggestion.

Core tool signatures

+
lookup_patient(name, dob)
+
get_provider_availability(provider, appointment type, date range)
+
create_appointment(payload) - policy-gated
+
transfer_call(destination, reason)
+
create_staff_task(payload)
+
flag_for_review(call_id, reason, payload)
+
log_patient_statement(payload)

Allowed

Appointment with explicit confirmation

The scheduling tool is valid only after the caller confirms the slot and the path remains inside approved policy.

Excluded

Workers compensation booking

Workers compensation flips the session into transfer-only mode because the integration cannot validate authorization and claim data.

Excluded

Medical advice response

Clinical questions route to capture-and-escalate. The system prompt and tool gate both block generated medical guidance.

Human approval mode

During pilot, create_appointment prepares the booking record and sends it to staff review for one-click confirmation instead of writing directly to eClinicalWorks.

08 - Voice Engineering

Barge-in and end-of-turn are two different problems.

Fast barge-in protects the caller from being talked over. Slower end-of-turn detection protects elderly or uncertain callers from being interrupted mid-thought. They are different engineering problems.

Patience prompts

+
"No problem, take your time."
+
"I heard part of that, but I want to make sure I get it right."
+
"I may have misheard the provider name. Did you say Dr. Chen or Dr. Cohen?"

No-action-on-uncertainty rule

If STT confidence falls below threshold on a provider name, DOB, or date, the agent must request confirmation before proceeding. No irreversible action is taken on low-confidence capture.

Normal scheduling conversation

700-900ms

Elderly or slow caller

1000-1400ms

Yes / no confirmation

500-700ms

Caller searching for insurance card

1500-2500ms

Noisy audio

Conservative plus clarification

09 - Operational Control

The dashboard is the trust layer.

Without the staff review queue, the system becomes an invisible automation layer and staff will not trust it. The dashboard turns the agent from a black box into an auditable work surface.

Most important early metric

Not automation rate. Corrected appointment rate.

Control surface

Room inspector

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Live transcript stream

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Turn detection timeline

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Tool-call panel

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Latency metrics

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Replay mode

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Failure injection

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Staff review queue

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Transfer simulation

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

Control surface

Correction reason ledger

Visible to staff or reviewers so the agent can be judged from trace evidence, not vibes.

10 - Staged Rollout

Earn scope through performance.

The agent does not go from zero to direct booking on day one. Each phase has a clear gate and the system must prove reliability at the current scope before expanding.

Rollback triggers

!
Wrong-provider correction rate exceeds threshold
!
Workers compensation booked by mistake
!
Medical advice policy violation detected
!
Duplicate or phantom appointments created
!
Transfer failure rate spikes
!
Patient complaints exceed threshold

Phase 0

Discovery and rule capture

Document provider routing, appointment types, transfer numbers, work comp policy, billing transfer policy, and medical escalation triggers.

Phase 1

Internal shadow mode

Agent listens to test calls or historical replays and produces proposed actions, but does not book.

Phase 2

Human approval mode

Agent answers a limited call group, prepares bookings, and sends them to staff for one-click confirmation.

Phase 3

Limited direct booking

Allow direct booking only for approved paths with same-day review and rollback triggers armed.

Phase 4

Production V1

Expand call coverage after measured quality, correction rate, and staff rescue rate prove the agent reduces work.

11 - Acceptance Gates

Targets are design gates, not fake production bragging.

Every number is a target measured against a call simulation set, not a public claim about live production performance.

Intent classification

>=92%

Target on a simulation set after one clarification pass.

DOB capture

>=96%

Required before any patient lookup or appointment logic.

Median turn latency

<700ms

End-of-turn detection to first agent audio frame.

P95 turn latency

<1.5s

Tail latency budget for harder tool-bound replies.

Invariant 01

Appointment without confirmation

The create_appointment tool requires an explicit caller-confirmation flag. The gate rejects the call otherwise.

Invariant 02

Workers comp booking

A workers-comp signal makes scheduling tools unavailable. The only valid action is transfer.

Invariant 03

Medical advice response

Medical-question intent forces capture-and-escalate mode and blocks clinical content generation.

12 - Live Build

The agent on the line.

This section describes the instrumented build target without pretending a live production phone path is active.

What the demo will do

Initiate a browser or SIP session, classify intent, then schedule, transfer, or escalate based on the policy path.

What is instrumented

Transcript, tool-call panel, per-stage latency, policy accept/reject events, and replay of the full call message graph.

Failure injection

Noisy caller, slow elderly caller, ambiguous provider name, eCW timeout, unavailable slots, and workers-comp utterance.

13 - PM Recommendation

Build the voice agent, but do not build the fantasy version.

The product should be judged by whether it makes Summit calmer, not by whether it maximizes automation. The agent earns more scope only after it proves that it can schedule safely, transfer honestly, and create less work than it removes.

14 - Source Note

Serious proposal. Not a deployment claim.

The value of this page is not that it pretends to be shipped work. The value is that it treats Summit like a real product problem with operational pressure, a bounded MVP, explicit safeguards, and a concrete path to a production-ready voice system.

What this case study is

A serious implementation proposal for how AI voice infrastructure, scheduling policy, review tooling, and rollout discipline could work in this environment.

What it is not

A claim that this exact system was deployed for Summit Health, or that private systems, private data, or live production metrics were available.

Start Here

Start with the business boundary, then choose the AI.

Use the assessment to determine whether the right first move is a constrained intake agent, a deeper workflow build, or a more review-heavy path before any live voice rollout is considered.