Note: NHID-Clinical is an early-stage open proposal by Brianna Baynard. It is not an accredited standard or regulatory requirement.
Home/Developers/Evidence Pack

For Enterprise & Procurement Teams · v1.3 · June 2026

Evidence Pack

A summary of the system behavior guarantees, a worked failure trace example, and the audit readiness model for the NHID-Clinical reference implementation — for teams evaluating adoption.

Scope of this document: This describes the reference implementation's technical properties. NHID-Clinical does not issue certifications, conduct audits, or validate vendor implementations. Everything here is self-attested and open for community review.

1. System Behavior Guarantees

Deterministic output

Identical input event + identical policy version → identical trace output. The policy engine is a pure function with no side effects. Output does not depend on wall-clock time, process state, or external calls.

Replay guarantee

Any stored trace can be replayed. The policy version is embedded in each event header. Replay with a different policy version is detected and flagged as a mismatch.

Failure invariants

The engine never raises an unhandled exception. Invalid or malformed input returns a deterministic error trace (action: LOG_ONLY, recoverable: true). The caller always gets a response.

Idempotency

Submitting the same request_id twice produces the same policy decision. The event store deduplicates on request_id at the PERSIST stage.

2. Anonymized Failure Trace Example

The example below is synthetic — constructed from observed behavior patterns, with all identifying information removed. It shows what an IDG-01 (late disclosure) violation looks like in the NHID-Clinical audit trace, and what a payer auditor would see when reviewing it.

Anonymized Failure Trace — IDG-01 Violation
Source: Synthetic example based on observed behavior patterns. No real PHI, no real provider data.
Generated: 2026-06-07 | Policy version: nhid-clinical-v1.3 | Correlation ID: [REDACTED]

t=00:00.000  INGEST      POST /voice/process received
             session_id: [REDACTED]
             call_sid:   [REDACTED]
             caller_type: ai_agent

t=00:00.084  VALIDATE    SpeechResult normalized
             turn_count: 0
             content_hash: [REDACTED]

t=00:00.091  STATE       Session reconstructed
             turn_count: 0
             disclosure_timestamp: null
             disclosure_confirmed: false

t=00:00.098  POLICY      IDG-01 evaluated
             rule: "Disclose AI identity before any data exchange"
             turn_count: 0
             disclosure_confirmed: false
             trigger: FIRST_TURN_NO_DISCLOSURE
             action: DISCLOSE_IDENTITY
── Violation recorded ──────────────────────────────────────────────
t=00:00.103  VIOLATION   IDG-01
             severity: critical
             message: "AI identity not disclosed at call start"
             action_taken: DISCLOSE_IDENTITY (forced)
             data_exchanged_before_disclosure: false
             recoverable: true
────────────────────────────────────────────────────────────────────
t=00:00.109  EXEC        TwiML rendered — forced disclosure statement
             text: "This call is being handled by an automated system on behalf of [Provider Name Redacted]."
             disclosure_forced: true

t=00:00.114  PERSIST     Event written
             disclosure_timestamp: 00:00.109
             boundary_violations: ["IDG-01"]
             partial_failure: true
             deterministic_hash: [REDACTED]

── What this means ─────────────────────────────────────────────────
The AI agent did not disclose its automated nature at call start.
The policy engine detected a turn_count=0 exchange with no prior
disclosure and forced a disclosure statement before any data could
be shared. The violation is logged as critical but recoverable.
A payer auditing this session would see:
  - disclosure_timestamp set 109ms into the call (forced, not voluntary)
  - partial_failure: true
  - boundary_violations: ["IDG-01"]
────────────────────────────────────────────────────────────────────

3. Failure & Attack Simulation Coverage

The failure injection harness covers the following scenarios:

Scenario Expected behavior
Empty SpeechResult Policy evaluated, event written, no 500
Null bytes in input Sanitized before engine, sanitized text stored
Missing CallSid (session binding failure) 400 returned, no event written, structured error body
Late disclosure (IDG-01 + PDX-01) DENY_DATA action, 2 critical violations logged
Escalation path unavailable (EIT-01) ESCALATE_HUMAN with TwiML fallback, violation logged
Deceptive artifact (DBC-01) LOG_ONLY, partial_failure=true, session continues
Missing audit fields (ATR-01) Violation logged, pipeline continues, gap recorded
Bot-to-bot, undisclosed agent DENY_DATA, stricter gate applied for ai_agent counterparty
Replay with external_calls_cached=false Divergence detected, ATR-01 violation, replay flagged FAIL
Duplicate request_id (idempotency) Identical trace returned, no duplicate event written

4. Audit Readiness Model

An external auditor reconstructing a session from the event store can determine:

  • When the call started and when the first disclosure statement was made
  • Whether disclosure preceded any PHI or credential exchange
  • Whether opt-out or escalation was requested and how it was handled
  • Which policy engine version processed each event
  • Whether any partial failures or boundary violations were recorded

Example correlation ID lifecycle:

correlation_id: "auth-2026-05-26-001"

t=00:00.000  INGEST     POST /voice/process received
t=00:00.123  VALIDATE   SpeechResult normalized
t=00:00.131  STATE      Session reconstructed: turn_count=0, disclosure=null
t=00:00.140  POLICY     IDG-01: DISCLOSE_IDENTITY triggered (turn_count=0)
t=00:00.145  EXEC       TwiML disclosure message rendered
t=00:00.152  PERSIST    Event written — disclosure_timestamp set

5. Architecture & Scale Notes

Current reference implementation

FastAPI + SQLite event store. Stateless policy engine. Suitable for development and self-validation. Not load-tested for production at scale.

Path to distributed event store

Replace SQLite with Kafka or S3-backed event log. Policy engine is stateless and horizontally scalable — no changes required to the core engine.

Replay preservation

Store input payload + policy version with each event. Any node can replay from the event store. Policy version change detection prevents silent audit corruption.

6. Risk Register

Risk Mitigation
Timestamps break exact replay Hash computed over non-timestamp fields only; deterministic_hash excludes wall-clock values
Policy engine version change between runs Policy version embedded in every event; replay rejects version mismatches
JSON key ordering variance Canonical JSON (sorted keys) enforced before hashing
LLM re-invocation during replay JSON Schema if/then enforces external_calls_cached=true when replay_mode=cached
partial_failure accumulation undetected boundary_violations[] written per event; partial_failure rate trackable across sessions

7. One-Page Architecture Summary

What it is: A lightweight, stateless service that logs AI voice agent disclosure behavior. Input: call events from Twilio or equivalent. Output: tamper-evident, deterministically reproducible trace with policy decision and boundary violations.

What it is not: A caller identity verifier, a certification body, or a compliance guarantor. Adoption does not confer HIPAA or TCPA compliance.

Event flow:

[AI Voice Agent] → INGEST → VALIDATE → STATE → POLICY → EXEC → PERSIST
                                                        ↓
                                               [Event Store]
                                                        ↓
                                          [Auditor / Payer System]
← Full technical reference Shadow Evaluation Guide → Contact →

Open for feedback

Questions about implementation or adoption?

Reach out directly or join the community discussion.

Community →