Confidence & HITL

Calibration, rationale, source pointers, thresholds, feedback loop, audit trail.

Confidence score semantics

Confidence scores are floats in [0, 1] and are calibrated, meaning the score corresponds to an empirical correct-rate. A 0.85 score means: across all outputs scored 0.85, approximately 85% are correct. Calibration is tenant-specific and refreshed on a regular cadence using the feedback loop.

Calibration is performed jointly across the four modules, so a 0.85 from the Customer Matcher is comparable to a 0.85 from the Product Matcher. Cross-module routing rules become possible.

Score band	Default treatment
0.95 – 1.00	Auto-accept.
0.80 – 0.95	Pre-populated, rep confirms.
0.60 – 0.80	Flagged for review.
0.00 – 0.60	Routed to manual handling, rep starts from scratch.

Bands are starting points

Default bands are starting points. Final bands are tuned with the customer during UAT based on observed accuracy and operational preferences.

Rationale schema

Every scored output is accompanied by a rationale object:

{
  "rationale_text": "Manufacturer + part number exact match (SKF 6205-2RS); customer ordered same SKU 11 times in last 18 months.",
  "rationale_signals": [
    { "signal": "manufacturer_part_exact_match", "weight": 0.6 },
    { "signal": "customer_history_repeat_order", "weight": 0.3, "evidence": { "prior_order_count": 11, "window_months": 18 } },
    { "signal": "semantic_description_similarity", "weight": 0.1, "evidence": { "cosine": 0.93 } }
  ]
}

rationale_text is for human display. rationale_signals is machine-readable and supports auditability.

Source pointer schema

For information extraction, every extracted value carries a source pointer:

{
  "source": {
    "document": "RFQ_April22.pdf",
    "page": 1,
    "bounding_box": { "x": 102, "y": 540, "w": 320, "h": 22 },
    "line_text": "50 x SKF 6205-2RS bearings, 2 weeks"
  }
}

For email-body extractions, document is email_body and bounding_box is omitted.

Threshold configuration

Thresholds are configured per tenant via a JSON document. Example:

{
  "thresholds": {
    "intent_classifier": {
      "default": 0.85,
      "by_intent": { "quote": 0.85, "order": 0.90 }
    },
    "customer_matcher": {
      "default": 0.92,
      "by_customer_type": { "named_account": 0.95, "spot": 0.85 }
    },
    "product_matcher": {
      "default": 0.85,
      "by_product_category": { "high_value": 0.95, "consumables": 0.80 },
      "by_branch": { "BR-UK-014": 0.88 }
    }
  }
}

Threshold changes are versioned and applied via API; no deployment required.

Routing decisions

Routing is the orchestrator's responsibility, but Faction's response carries a recommended action:

{
  "confidence_score": 0.74,
  "recommended_action": "review",
  "review_reasons": ["below_threshold", "ambiguous_alternatives_present"]
}

`recommended_action`	Meaning
`auto_accept`	Above auto-accept band.
`confirm`	In confirm band; rep should glance and accept.
`review`	Below threshold; rep should review carefully.
`manual`	Below floor; rep should treat as if Faction did not respond.

Feedback loop

When a rep edits or rejects a Faction output, the orchestrator can call the feedback endpoint:

POST /v1/feedback
{
  "case_id": "CRM-2026-04-29-00417",
  "module": "product_matcher",
  "original_output": { "...": "..." },
  "corrected_output": { "...": "..." },
  "actor": { "type": "sales_rep", "id_hash": "..." },
  "timestamp": "2026-04-29T14:22:01Z"
}

Mode	Behavior	Default
Off	Discards.
Audit only	Stored for analytics. Never used for training.	Default
Tenant-scoped tuning	Used to retrain tenant-specific layers. Stays tenant-scoped.	Opt-in.
Shared improvement	Aggregated, anonymized signals contribute to base-model improvement.	Opt-in. Never default.

Training policy

No training on customer data without explicit written customer permission. The default mode is audit only.

Audit trail

Every Faction call, threshold decision, and rep edit is captured in an audit record:

{
  "audit_id": "aud_01J...",
  "timestamp": "2026-04-29T14:22:01Z",
  "tenant_id": "rubix",
  "correlation_id": "8f3c-b21",
  "case_id": "CRM-2026-04-29-00417",
  "actor": { "type": "system | user", "id_hash": "..." },
  "event_type": "module_call | threshold_decision | rep_edit | feedback_submitted",
  "module": "product_matcher",
  "input_hash": "sha256:...",
  "output_hash": "sha256:...",
  "decision": { "recommended_action": "review", "applied_action": "rep_corrected" }
}

Audit records are exportable to a SIEM via scheduled export or on-demand API.

Suggested UX patterns

Faction provides outputs. The caller decides how they're surfaced:

Pre-populated D365 case fields with confidence indicators (green / amber / red).
Rationale visible on hover or click.
Alternatives accessible via dropdown when confidence is in the confirm or review band.
"Why this match?" link that opens the rationale signals in a side panel.

Confidence & HITL

On this page