For Payers: Shadow Evaluation Guide

Establish a behavioral baseline for AI voice agent transparency — zero vendor changes, zero production risk. Run in parallel with live traffic; observe, collect evidence, decide what to require later.

Diagram contrasting unverified impersonation latency with the verified NHID-Clinical trust pathway: early disclosure, pre-data checkpoint, escalation, and sealed audit.

Impersonation latency vs. verified disclosure pathway

Start here — a pilot in three steps

The Tier 0 Shadow Pilot Kit runs on your own call logs. Observe-only, no vendor changes, usable numbers in 2–4 weeks.

1 Get the kit

Download the Tier 0 Shadow Pilot Kit — a minimal event schema, a measurement script, and a report template.

2 Run it on your logs

Map a sample of your call records to the minimal event schema, then run measure_pilot.py. Nothing touches production.

3 Read your numbers

You get impersonation-latency and disclosure metrics on your own traffic in a ready-to-share report. Use it to decide what to require of vendors.

See the full shadow-evaluation guide →

Important: No payer has adopted NHID-Clinical in production yet. It is a voluntary reference model, not an accredited standard, certification, or regulatory requirement.

What You Gain

Metric	Today (typical)	Target with NHID-Clinical
Verification latency	3–5 min or call terminated	< 5 seconds
Audit effort per vendor	Manual call review (hours)	~2 minutes (test suite)
RFP disclosure language	Custom per vendor	One standard clause
Escalation response time	Untracked	≤ 2 seconds, logged

How the 90-Day Shadow Evaluation Works

1 Weeks 1–2 — Add RFP Language

Insert this clause into your next voice AI vendor RFP or BAA amendment:

"The vendor's AI agent SHALL produce NHID-Clinical v1.3 JSON trace logs for all B2B administrative calls, including disclosure timestamps and opt-out handling. The payer may run the open-source conformance test suite against vendor output at any time."

Existing contract? Send as a formal amendment request.

2 Weeks 3–6 — Vendor Sandbox Testing

Ask your vendor to run the open-source suite — results in under 5 minutes.

git clone https://github.com/NHID-Clinical/NHID-Clinical.git
pip install -r requirements.txt
python -m pytest tests/ -v
Send full terminal output (+ optional sample traces)

3 Weeks 7–10 — Validate Logs Yourself

Place vendor JSON traces in traces/
Run python -m pytest tests/ -v
Verify: disclosure before NPI/member ID; no deceptive artifacts; escalation ≤ 2s on request

4 Weeks 11–12 — Measure Impact

Verification latency (target < 5s)
Escalation volume from identity uncertainty (target > 30% reduction)

5 Decide Next Steps

Tests pass + metrics improve → consider requiring conformance in future contracts
Tests fail or flat metrics → remediation or disqualify from future bids

v1.3 vs. v2: v1.3 covers disclosure behavior and audit trails — suitable for shadow evaluations today. v2 adds cryptographic agent identity (NHID-Auth v2) for when you must verify the agent was authorized by the provider it claims to represent. NPIs are public; v1.3 alone does not prove authorization. Read the v2 roadmap →

Evaluation Resources

Executive Brief → — one-page overview for leadership and procurement
Shadow Evaluation Guide → — full 90-day process
Evidence Pack → — guarantees, failure trace, audit model
Regulatory Alignment → — CMS-0057-F, MACPAC, DOJ FCA, state AI laws
Reference demonstration line → — hear the disclosure controls in a real call

Get involved

Read the specification. Share what you think.

Whether it is right, wrong, incomplete, or misses the real problem — that feedback shapes the next version.

Start a pilot →

For Payers: Shadow Evaluation Guide

Start here — a pilot in three steps

What You Gain

How the 90-Day Shadow Evaluation Works

Evaluation Resources

Read the specification. Share what you think.

Procurement at a glance

Where to go next