Shadow Evaluation · 90 Days · Observe Only
For Payers: Shadow Evaluation Guide
A practical starting point for payer operations teams establishing a behavioral baseline for AI voice agent transparency — no vendor changes, no production risk.
Regulatory Alignment
How NHID-Clinical v1.3 maps to CMS-0057-F, MACPAC 2026, DOJ FCA enforcement, and state AI laws.
What You Gain From a Pilot
| Metric | Today (typical) | Target with NHID-Clinical |
|---|---|---|
| Verification latency ("are you human?") | 3–5 minutes or call terminated | <5 seconds |
| Audit effort per vendor | Manual call review (hours) | ~2 minutes (run test suite) |
| RFP disclosure language | Custom per vendor | One standard clause |
Step-by-Step Pilot Instructions
1 Weeks 1–2: Add RFP Language
Copy and paste this exact clause into your next voice AI vendor RFP or BAA amendment:
"The vendor's AI agent SHALL produce NHID-Clinical v1.3 JSON trace logs for all B2B calls, including disclosure timestamps and opt-out handling. The payer may run the open-source conformance test suite against vendor output."
If you already have a vendor contract in place, send this clause as a formal amendment request.
2 Weeks 3–6: Request Sandbox Testing
Ask your vendor to:
- Clone the NHID-Clinical repository:
git clone https://github.com/NHID-Clinical/NHID-Clinical.git - Install dependencies:
pip install -r requirements.txt - Run the conformance test suite in their sandbox:
python -m pytest tests/ -v - Send you the resulting JSON trace logs (100 calls recommended)
📎 Template email to vendor: See below ↓
3 Weeks 7–10: Validate the Logs Yourself
Run the same test suite against the vendor's logs:
- Clone the repository (if you haven't already)
- Place the vendor's JSON logs in the
traces/folder - Run
python -m pytest tests/ -v— look for all tests passing - Manually verify three things:
- ✅ Disclosure timestamp appears before any NPI/member ID request
- ✅ No deceptive artifacts (fake breathing, human-only names)
- ✅ Opt-out requests trigger a human transfer within 2 seconds
4 Weeks 11–12: Measure the Impact
Compare two metrics before and after vendor implementation:
- Average verification latency — time from call start to disclosure (target: <5 seconds)
- Escalation call volume — calls requiring human intervention due to identity uncertainty (target: >30% reduction)
If you don't have these metrics today, start collecting them now.
5 Decide on Next Steps
Based on results:
- All tests passed + metrics improved: Consider requiring NHID-Clinical conformance in all vendor contracts.
- Tests failed or metrics unchanged: Work with vendor on remediation, or disqualify them from future bids.
Why this matters: NPIs are public. Any AI can look up a provider's NPI in the NPPES registry and use it on a call. v1.3 tells you the caller was automated; it does not tell you the caller was actually authorized by that provider. That gap is real, it's easy to exploit, and it's what NHID-Auth (v2) is designed to close.
Evaluation resources
Shadow Evaluation Guide → — Step-by-step 90-day process for establishing a behavioral baseline.
Evidence Pack → — System behavior guarantees, anonymized failure trace example, and audit readiness model.
Regulatory alignment matrix → — How v1.3 maps to CMS-0057-F, MACPAC, and DOJ FCA enforcement.
Get involved
Read the proposal and share your reaction.
Whether you think it is right, wrong, incomplete, or misses the real problem — that feedback is what shapes the next version.