One deployment, seven phases.
A single AI voice agent traced end to end through the INTENT Framework: first notice of loss intake at a regional insurance carrier. Live, customer facing, emotionally loaded, regulated. Every phase below shows the same three way split.
navigate with ← → · diamonds are gates: evidence required to pass
The running example
A customer calls after a car accident. The agent verifies identity, collects incident details, screens for injury, opens the claim, and schedules an adjuster callback. The Intent Contract target: complete FNOL data in under 6 minutes for 85% of calls. The business metric: after hours abandonment, currently 41%.
A composite deployment for illustration, not a client case study.
00 / 06 · optional phase
DISCOVER
Is this problem real, and is it worth a contract?
The carrier suspects FNOL intake is a problem but has not quantified it. An LLM reads six months of call center transcripts and abandonment logs, work no human will ever do at this scale. The pipeline that feeds it and the aggregation that follows are deterministic code, so the business case is built from math, not from the model's impressions.
The split: the model classifies each transcript. Code aggregates. A human looks at the numbers and decides whether this becomes a FRAME. The model never owns the go or no go.
FNOL moment
The output that triggers everything downstream: 34% of FNOL calls arrive after hours. 41% of those abandon before reaching a human.
A claims ops lead reads that and opens a contract.
◇ Gate
DISCOVER is optional and has no formal gate. The exit is a human judgment: the quantified problem justifies writing an Intent Contract.
transcript_mining.pypython
# CODE: deterministic pipeline. Owns iteration, sampling, storage. for transcript in call_archive.query(line="FNOL", period="6mo"): # MODEL: semantic classification. One cheap call per transcript. tags = llm.classify(transcript, schema={ "outcome": ["completed", "abandoned", "transferred"], "abandonment_reason": str | None, "after_hours": bool }) metrics.append(tags) # CODE: aggregation. No model judgment in the numbers. report = aggregate(metrics) # HUMAN: reads the report. Owns the decision to open a FRAME.
What the project actually ships
Not a suite of agents. The Intent Contract scopes one thing: a single, narrowly bounded voice agent that handles FNOL intake and hits the outcome written in FRAME. Keeping the deliverable that small is part of the discipline.
One provable agent
A governed FNOL voice agent with a versioned Trust Envelope, tested escalation paths, and a Proof Report showing it meets its thresholds. Measurable against the contract: capture time, completeness, abandonment.
The governance substrate
The enforcement layer, scenario replay harness, runtime telemetry, and Constitution. Agent number two reuses roughly 80% of the Trust Envelope structure and all of the rails. The org buys the capability to ship governed agents repeatedly.
Teams that scope "agent platform" on day one end up in the cancellation statistics. Teams that scope one provable agent plus the rails get the platform anyway, as a byproduct of evidence.
The pattern across all seven phases
The model's job changes every phase. Code's job never changes: validate schemas, own state, fire triggers, enforce timeouts, compute thresholds, block gates. Humans sit exactly where judgment cannot be reduced to either.
| Phase | The model | Code | Humans |
|---|---|---|---|
| DISCOVER | Mines transcripts at scale | Aggregates the numbers | Own the go or no go |
| FRAME | Drafts the contract | Validates schema in CI | Sign the risk tier |
| EXPLORE | Generates the plan | Checks constitution compliance | Run the Direction Check |
| BUILD | Implements at A2 | Is the enforcement layer | Review the rails line by line |
| VALIDATE | Plays attacker and judge | Asserts paths and thresholds | Resolve judge disagreements |
| SHIP | Mostly idle | Gates, canaries, rollback | Approve the Proof Report |
| LEARN | Finds novel situations | Measures spec drift | Approve new scenarios |