Confidence & HITL
Calibration, rationale, source pointers, thresholds, feedback loop, audit trail.
Confidence score semantics
Confidence scores are floats in [0, 1] and are calibrated, meaning the score corresponds to an empirical correct-rate. A 0.85 score means: across all outputs scored 0.85, approximately 85% are correct. Calibration is tenant-specific and refreshed on a regular cadence using the feedback loop.
Calibration is performed jointly across the four modules, so a 0.85 from the Customer Matcher is comparable to a 0.85 from the Product Matcher. Cross-module routing rules become possible.
| Score band | Default treatment |
|---|---|
| 0.95 – 1.00 | Auto-accept. |
| 0.80 – 0.95 | Pre-populated, rep confirms. |
| 0.60 – 0.80 | Flagged for review. |
| 0.00 – 0.60 | Routed to manual handling, rep starts from scratch. |
Bands are starting points
Default bands are starting points. Final bands are tuned with the customer during UAT based on observed accuracy and operational preferences.
Rationale schema
Every scored output is accompanied by a rationale object:
{
"rationale_text": "Manufacturer + part number exact match (SKF 6205-2RS); customer ordered same SKU 11 times in last 18 months.",
"rationale_signals": [
{ "signal": "manufacturer_part_exact_match", "weight": 0.6 },
{ "signal": "customer_history_repeat_order", "weight": 0.3, "evidence": { "prior_order_count": 11, "window_months": 18 } },
{ "signal": "semantic_description_similarity", "weight": 0.1, "evidence": { "cosine": 0.93 } }
]
}rationale_text is for human display. rationale_signals is machine-readable and supports auditability.
Source pointer schema
For information extraction, every extracted value carries a source pointer:
{
"source": {
"document": "RFQ_April22.pdf",
"page": 1,
"bounding_box": { "x": 102, "y": 540, "w": 320, "h": 22 },
"line_text": "50 x SKF 6205-2RS bearings, 2 weeks"
}
}For email-body extractions, document is email_body and bounding_box is omitted.
Threshold configuration
Thresholds are configured per tenant via a JSON document. Example:
{
"thresholds": {
"intent_classifier": {
"default": 0.85,
"by_intent": { "quote": 0.85, "order": 0.90 }
},
"customer_matcher": {
"default": 0.92,
"by_customer_type": { "named_account": 0.95, "spot": 0.85 }
},
"product_matcher": {
"default": 0.85,
"by_product_category": { "high_value": 0.95, "consumables": 0.80 },
"by_branch": { "BR-UK-014": 0.88 }
}
}
}Threshold changes are versioned and applied via API; no deployment required.
Routing decisions
Routing is the orchestrator's responsibility, but Faction's response carries a recommended action:
{
"confidence_score": 0.74,
"recommended_action": "review",
"review_reasons": ["below_threshold", "ambiguous_alternatives_present"]
}recommended_action | Meaning |
|---|---|
auto_accept | Above auto-accept band. |
confirm | In confirm band; rep should glance and accept. |
review | Below threshold; rep should review carefully. |
manual | Below floor; rep should treat as if Faction did not respond. |
Feedback loop
When a rep edits or rejects a Faction output, the orchestrator can call the feedback endpoint:
POST /v1/feedback
{
"case_id": "CRM-2026-04-29-00417",
"module": "product_matcher",
"original_output": { "...": "..." },
"corrected_output": { "...": "..." },
"actor": { "type": "sales_rep", "id_hash": "..." },
"timestamp": "2026-04-29T14:22:01Z"
}| Mode | Behavior | Default |
|---|---|---|
| Off | Discards. | |
| Audit only | Stored for analytics. Never used for training. | Default |
| Tenant-scoped tuning | Used to retrain tenant-specific layers. Stays tenant-scoped. | Opt-in. |
| Shared improvement | Aggregated, anonymized signals contribute to base-model improvement. | Opt-in. Never default. |
Training policy
No training on customer data without explicit written customer permission. The default mode is audit only.
Audit trail
Every Faction call, threshold decision, and rep edit is captured in an audit record:
{
"audit_id": "aud_01J...",
"timestamp": "2026-04-29T14:22:01Z",
"tenant_id": "rubix",
"correlation_id": "8f3c-b21",
"case_id": "CRM-2026-04-29-00417",
"actor": { "type": "system | user", "id_hash": "..." },
"event_type": "module_call | threshold_decision | rep_edit | feedback_submitted",
"module": "product_matcher",
"input_hash": "sha256:...",
"output_hash": "sha256:...",
"decision": { "recommended_action": "review", "applied_action": "rep_corrected" }
}Audit records are exportable to a SIEM via scheduled export or on-demand API.
Suggested UX patterns
Faction provides outputs. The caller decides how they're surfaced:
- Pre-populated D365 case fields with confidence indicators (green / amber / red).
- Rationale visible on hover or click.
- Alternatives accessible via dropdown when confidence is in the confirm or review band.
- "Why this match?" link that opens the rationale signals in a side panel.