Technical Reference
Modules Per-module deep dive across the four Faction services.
The four modules are independently callable. Each has well-defined inputs, outputs, edge-case behavior, performance targets, and configuration knobs.
Purpose. Determine whether an inbound case is a quote request, an order, a status update, an RFQ, or something else, so the orchestrator can route it correctly.
Endpoint. POST /v1/intent/classify
Field Type Notes case_idstring Caller-owned identifier; opaque to Faction. senderstring (email) Used as a signal, not as authentication. subjectstring Empty string allowed. bodystring Plain text or HTML; HTML is sanitized server-side.
Field Type Why it helps attachments[]array PDFs, images, spreadsheets. Improves classification when subject and body are sparse. countryISO-3166 alpha-2 Enables country-scoped taxonomy and rules. branch_idstring Allows branch-tuned routing. thread_history[]array Prior messages in the same email thread.
Field Type Notes intentenum quote, order, update, status, rfq, other. Configurable per tenant.confidence_scorefloat (0–1) Calibrated; comparable across modules. rationalestring Natural-language justification with cited tokens. secondary_intents[]array Ranked alternatives with confidence scores.
Case Behavior Empty subject and body Returns intent: "other" with low confidence; rationale flags missing content. Attachment-only request Engine reads attachments via the same pipeline used by the Info Extractor. Result is still a single intent. Multi-intent message Dominant intent in intent, secondary in secondary_intents. Foreign language Auto-detected. Supported: English, French, German, Spanish, Italian, Dutch. Other languages return language_unsupported: true. Auto-reply / OOO Detected; returns auto_reply: true to suppress downstream processing.
Metric Target p50 latency Under 600 ms p95 latency Under 1.5 s Max payload 10 MB total. Larger payloads use the async pattern.
Intent taxonomy (the enum values).
Confidence threshold per intent class.
Routing rules attached to thresholds.
Country-scoped overrides.
Purpose. Extract structured quote fields (line items, delivery, urgency, special conditions) from the body and attachments of a quote-related case.
Endpoint. POST /v1/extract/quote
Field Type Notes case_idstring Caller-owned identifier. bodystring Email body. quote_schemaobject Target schema. Configured at tenant level; can be overridden per call.
Field Type Why it helps attachments[]array PDFs, spreadsheets, images, scanned documents. customer_idstring If already resolved, allows customer-specific format conventions. country, branch_idstrings Locale defaults (date, number, currency).
Field Type Notes line_items[]array Description, quantity, unit, requested delivery. Per-field confidence and source span. delivery_requirementsobject Ship-to hint, requested-by date, urgency. urgency_signalsobject Detected signals: explicit deadline, "ASAP" language, escalation tone. payment_termsobject If detectable. special_conditions[]array Free-form (export controls, hazmat, certifications required). attachments_processed[]array Which attachments were read; which were skipped and why.
Schema-aware extraction
Extraction is schema-aware. If the tenant's quote_schema defines a field, the extractor will look for it. If it isn't in the schema, it isn't returned. This keeps outputs aligned with the caller's data model.
Case Behavior Handwritten note (photo / scan) OCR pipeline runs first. Per-field confidence reflects OCR quality. Multi-language document Language detected per attachment. Mixed-language documents handled. Spreadsheet attachment Tabular extraction with header detection. Multi-sheet workbooks: relevant sheets retained, others skipped with reason. Quantity / unit ambiguous Field returned with low confidence and ambiguous: true flag. Encoded / password-protected attachment Skipped, reason recorded in attachments_processed[]. No exception thrown. Email signature with line-item-like text Filtered out using signature-block detection.
Metric Target p50 (body only) Under 1.5 s p50 (one PDF, < 5 pages) Under 4 s p50 (multi-page scan, OCR required) Under 12 s Max payload 50 MB. Larger sizes use async with callback.
Quote schema (field definitions, types, required vs. optional).
Per-field confidence thresholds.
Locale defaults.
Customer-specific format hints.
Purpose. Resolve the inbound case to a customer ID, ship-to ID, and contact ID in master data.
Endpoint. POST /v1/match/customer
Field Type Notes case_idstring Caller-owned identifier. senderstring (email) Primary signal.
Field Type Why it helps phonestring (E.164) For WhatsApp / phone-originated cases. bodystring Allows extraction of customer references. signature_blockstring If extracted separately. country, branch_idstrings Scopes the match space.
Field Type Notes customer_idstring Empty if no match clears threshold. ship_to_idstring Best ship-to inferred from request or customer default. contact_idstring Best contact match for the sender. confidence_scorefloat Joint score across customer, ship-to, contact. match_rationalestring Which signals matched. alternatives[]array Ranked candidates above floor threshold. branch_hintstring Inferred from customer's typical branch.
The matcher combines three signal sources, weighted by tenant configuration:
Deterministic (highest weight): exact email domain → account, phone → contact, prior-thread linkage.
Structured : address normalization, name normalization with company-suffix handling.
Behavioral : prior order patterns, typical branch, typical ship-to.
Disambiguation policy
For customers with multiple subsidiaries on shared domains (e.g., a holding company), the matcher returns the most likely entity in customer_id and the rest in alternatives[] with reason codes. The orchestrator can choose to surface the disambiguation to the rep.
Case Behavior Generic email domain (gmail.com, hotmail.com) Email-domain signal weighted near zero. Falls back to phone, signature, prior-thread, body extraction. New customer (no master-data match) Returns empty customer_id, confidence_score: 0, unmatched_reason. Orchestrator can route to onboarding. Multiple ship-tos on one customer Best ship-to inferred from request body; otherwise customer default with reduced confidence. Stale contact (left the company) Contact match drops; customer match still resolves via domain.
Metric Target p50 latency Under 400 ms p95 latency Under 1.0 s
Match thresholds per customer type.
Branch-aware matching rules.
Disambiguation policy (single best vs. surface alternatives).
Refresh cadence for master data.
Purpose. Map extracted line-item descriptions to product IDs, with substitutes and rationale.
Endpoint. POST /v1/match/product
Field Type Notes case_idstring Caller-owned identifier. line_items[]array From the Info Extractor, or constructed by the orchestrator.
Field Type Why it helps customer_idstring Enables customer-specific product history boost. branch_idstring Enables branch-level catalogue scoping. countrystring Country-specific catalogue scoping. match_strategyenum strict, balanced (default), permissive.
Field Type Notes matches[]array One entry per input line item. matches[].rubix_product_idstring Empty if unmatched_flag: true. matches[].confidence_scorefloat Calibrated. matches[].match_rationalestring Which match path won. matches[].alternatives[]array Substitutes / equivalents with reason codes. matches[].unmatched_flagbool True if no candidate cleared threshold. unmatched[]array Convenience list for orchestrator routing.
The matcher tries multiple paths and returns the strongest:
Manufacturer + part number exact match (highest confidence path).
Manufacturer cross-reference (competitor part to stocked equivalent).
Semantic match (description embeddings against catalogue).
Historical-pattern match (this customer ordered this SKU before).
Branch-local override (a branch-supplied spreadsheet).
The path that won is reported in match_rationale.
Case Behavior Multiple SKUs at similar confidence Best candidate if customer history indicates preference; otherwise unmatched_flag: true with all candidates in alternatives[]. Discontinued SKU Returns the successor with rationale, if mapped. Competitor part with no equivalent unmatched_flag: true with reason no_rubix_equivalent.Quantity unit mismatch Resolves match; unit_conversion_required: true flag added with proposed conversion.
Metric Target p50 latency (10 line items) Under 800 ms p95 latency (10 line items) Under 2.0 s p50 latency (100 line items) Under 4.0 s
Match thresholds per product category.
Substitute / equivalent rules.
Country-specific catalogue scoping.
Per-customer override lists.
Branch-local-knowledge ingestion (SFTP, API, or scheduled file drop).
The four modules are independently callable. The orchestrator can choose any of these patterns:
Pattern When to use Notes Intent only Routing decisions only. Cheap, fast. Intent + Extract Extract content for non-quote intents (e.g., status requests with attached PDFs). Full pipeline (all four) Standard quote handling. Faction shares case context internally across the four calls when invoked within a short window with the same case_id and a shared correlation_id. Single module reuse Re-running just product matching after a rep edits a line. Idempotent; safe to call repeatedly.
There is no requirement to call modules in a specific order. The orchestrator decides. Modules do not call each other; the caller is always in charge of orchestration.