Note: NHID-Clinical is an early-stage open proposal by Brianna Baynard. It is not an accredited standard or regulatory requirement.
Home/For Payers

Shadow Evaluation · 90 Days · Observe Only

For Payers: Shadow Evaluation Guide

A practical starting point for payer operations teams establishing a behavioral baseline for AI voice agent transparency — no vendor changes, no production risk.

⚠️ Important: No payer has piloted or adopted NHID-Clinical yet. You would be among the first. This is a voluntary reference model from a solo founder.

Regulatory Alignment

How NHID-Clinical v1.3 maps to CMS-0057-F, MACPAC 2026, DOJ FCA enforcement, and state AI laws.

See the regulatory alignment matrix →

What You Gain From a Pilot

Metric Today (typical) Target with NHID-Clinical
Verification latency ("are you human?") 3–5 minutes or call terminated <5 seconds
Audit effort per vendor Manual call review (hours) ~2 minutes (run test suite)
RFP disclosure language Custom per vendor One standard clause

Step-by-Step Pilot Instructions

1 Weeks 1–2: Add RFP Language

Copy and paste this exact clause into your next voice AI vendor RFP or BAA amendment:

"The vendor's AI agent SHALL produce NHID-Clinical v1.3 JSON trace logs for all B2B calls, including disclosure timestamps and opt-out handling. The payer may run the open-source conformance test suite against vendor output."

If you already have a vendor contract in place, send this clause as a formal amendment request.

2 Weeks 3–6: Request Sandbox Testing

Ask your vendor to:

  1. Clone the NHID-Clinical repository: git clone https://github.com/NHID-Clinical/NHID-Clinical.git
  2. Install dependencies: pip install -r requirements.txt
  3. Run the conformance test suite in their sandbox: python -m pytest tests/ -v
  4. Send you the resulting JSON trace logs (100 calls recommended)

📎 Template email to vendor: See below ↓

3 Weeks 7–10: Validate the Logs Yourself

Run the same test suite against the vendor's logs:

  1. Clone the repository (if you haven't already)
  2. Place the vendor's JSON logs in the traces/ folder
  3. Run python -m pytest tests/ -v — look for all tests passing
  4. Manually verify three things:
    • ✅ Disclosure timestamp appears before any NPI/member ID request
    • ✅ No deceptive artifacts (fake breathing, human-only names)
    • ✅ Opt-out requests trigger a human transfer within 2 seconds

4 Weeks 11–12: Measure the Impact

Compare two metrics before and after vendor implementation:

  • Average verification latency — time from call start to disclosure (target: <5 seconds)
  • Escalation call volume — calls requiring human intervention due to identity uncertainty (target: >30% reduction)

If you don't have these metrics today, start collecting them now.

5 Decide on Next Steps

Based on results:

  • All tests passed + metrics improved: Consider requiring NHID-Clinical conformance in all vendor contracts.
  • Tests failed or metrics unchanged: Work with vendor on remediation, or disqualify them from future bids.
Production readiness note: v1.3 (current) covers disclosure and audit — suitable for shadow pilots. v2 (August 2026) adds cryptographic agent identity and delegation — required for production deployments where you need to verify the agent is actually authorized by the provider it claims to represent. Read the v2 roadmap →

Why this matters: NPIs are public. Any AI can look up a provider's NPI in the NPPES registry and use it on a call. v1.3 tells you the caller was automated; it does not tell you the caller was actually authorized by that provider. That gap is real, it's easy to exploit, and it's what NHID-Auth (v2) is designed to close.

Evaluation resources

Shadow Evaluation Guide → — Step-by-step 90-day process for establishing a behavioral baseline.

Evidence Pack → — System behavior guarantees, anonymized failure trace example, and audit readiness model.

Regulatory alignment matrix → — How v1.3 maps to CMS-0057-F, MACPAC, and DOJ FCA enforcement.

Get involved

Read the proposal and share your reaction.

Whether you think it is right, wrong, incomplete, or misses the real problem — that feedback is what shapes the next version.

Join the Discussion →