Modules

The four modules are independently callable. Each has well-defined inputs, outputs, edge-case behavior, performance targets, and configuration knobs.

Intent Classifier

Purpose. Determine whether an inbound case is a quote request, an order, a status update, an RFQ, or something else, so the orchestrator can route it correctly.

Endpoint. POST /v1/intent/classify

Required inputs

Field	Type	Notes
`case_id`	string	Caller-owned identifier; opaque to Faction.
`sender`	string (email)	Used as a signal, not as authentication.
`subject`	string	Empty string allowed.
`body`	string	Plain text or HTML; HTML is sanitized server-side.

Optional inputs

Field	Type	Why it helps
`attachments[]`	array	PDFs, images, spreadsheets. Improves classification when subject and body are sparse.
`country`	ISO-3166 alpha-2	Enables country-scoped taxonomy and rules.
`branch_id`	string	Allows branch-tuned routing.
`thread_history[]`	array	Prior messages in the same email thread.

Outputs

Field	Type	Notes
`intent`	enum	`quote`, `order`, `update`, `status`, `rfq`, `other`. Configurable per tenant.
`confidence_score`	float (0–1)	Calibrated; comparable across modules.
`rationale`	string	Natural-language justification with cited tokens.
`secondary_intents[]`	array	Ranked alternatives with confidence scores.

Edge cases

Case	Behavior
Empty subject and body	Returns `intent: "other"` with low confidence; rationale flags missing content.
Attachment-only request	Engine reads attachments via the same pipeline used by the Info Extractor. Result is still a single intent.
Multi-intent message	Dominant intent in `intent`, secondary in `secondary_intents`.
Foreign language	Auto-detected. Supported: English, French, German, Spanish, Italian, Dutch. Other languages return `language_unsupported: true`.
Auto-reply / OOO	Detected; returns `auto_reply: true` to suppress downstream processing.

Performance

Metric	Target
p50 latency	Under 600 ms
p95 latency	Under 1.5 s
Max payload	10 MB total. Larger payloads use the async pattern.

Configuration knobs (caller-controlled)

Intent taxonomy (the enum values).
Confidence threshold per intent class.
Routing rules attached to thresholds.
Country-scoped overrides.

Field	Type	Why it helps
`attachments[]`	array	PDFs, spreadsheets, images, scanned documents.
`customer_id`	string	If already resolved, allows customer-specific format conventions.
`country`, `branch_id`	strings	Locale defaults (date, number, currency).

Field	Type	Notes
`line_items[]`	array	Description, quantity, unit, requested delivery. Per-field confidence and source span.
`delivery_requirements`	object	Ship-to hint, requested-by date, urgency.
`urgency_signals`	object	Detected signals: explicit deadline, "ASAP" language, escalation tone.
`payment_terms`	object	If detectable.
`special_conditions[]`	array	Free-form (export controls, hazmat, certifications required).
`attachments_processed[]`	array	Which attachments were read; which were skipped and why.

Extraction is schema-aware. If the tenant's quote_schema defines a field, the extractor will look for it. If it isn't in the schema, it isn't returned. This keeps outputs aligned with the caller's data model.

Edge cases

Case	Behavior
Handwritten note (photo / scan)	OCR pipeline runs first. Per-field confidence reflects OCR quality.
Multi-language document	Language detected per attachment. Mixed-language documents handled.
Spreadsheet attachment	Tabular extraction with header detection. Multi-sheet workbooks: relevant sheets retained, others skipped with reason.
Quantity / unit ambiguous	Field returned with low confidence and `ambiguous: true` flag.
Encoded / password-protected attachment	Skipped, reason recorded in `attachments_processed[]`. No exception thrown.
Email signature with line-item-like text	Filtered out using signature-block detection.

Performance

Metric	Target
p50 (body only)	Under 1.5 s
p50 (one PDF, < 5 pages)	Under 4 s
p50 (multi-page scan, OCR required)	Under 12 s
Max payload	50 MB. Larger sizes use async with callback.

Configuration knobs

Quote schema (field definitions, types, required vs. optional).
Per-field confidence thresholds.
Locale defaults.
Customer-specific format hints.

Customer Matcher

Purpose. Resolve the inbound case to a customer ID, ship-to ID, and contact ID in master data.

Endpoint. POST /v1/match/customer

Required inputs

Field	Type	Notes
`case_id`	string	Caller-owned identifier.
`sender`	string (email)	Primary signal.

Optional inputs

Field	Type	Why it helps
`phone`	string (E.164)	For WhatsApp / phone-originated cases.
`body`	string	Allows extraction of customer references.
`signature_block`	string	If extracted separately.
`country`, `branch_id`	strings	Scopes the match space.

Outputs

Field	Type	Notes
`customer_id`	string	Empty if no match clears threshold.
`ship_to_id`	string	Best ship-to inferred from request or customer default.
`contact_id`	string	Best contact match for the sender.
`confidence_score`	float	Joint score across customer, ship-to, contact.
`match_rationale`	string	Which signals matched.
`alternatives[]`	array	Ranked candidates above floor threshold.
`branch_hint`	string	Inferred from customer's typical branch.

Match strategy

The matcher combines three signal sources, weighted by tenant configuration:

Deterministic (highest weight): exact email domain → account, phone → contact, prior-thread linkage.
Structured: address normalization, name normalization with company-suffix handling.
Behavioral: prior order patterns, typical branch, typical ship-to.

Disambiguation policy

For customers with multiple subsidiaries on shared domains (e.g., a holding company), the matcher returns the most likely entity in customer_id and the rest in alternatives[] with reason codes. The orchestrator can choose to surface the disambiguation to the rep.

Edge cases

Case	Behavior
Generic email domain (gmail.com, hotmail.com)	Email-domain signal weighted near zero. Falls back to phone, signature, prior-thread, body extraction.
New customer (no master-data match)	Returns empty `customer_id`, `confidence_score: 0`, `unmatched_reason`. Orchestrator can route to onboarding.
Multiple ship-tos on one customer	Best ship-to inferred from request body; otherwise customer default with reduced confidence.
Stale contact (left the company)	Contact match drops; customer match still resolves via domain.

Performance

Metric	Target
p50 latency	Under 400 ms
p95 latency	Under 1.0 s

Configuration knobs

Match thresholds per customer type.
Branch-aware matching rules.
Disambiguation policy (single best vs. surface alternatives).
Refresh cadence for master data.

Product Matcher

Purpose. Map extracted line-item descriptions to product IDs, with substitutes and rationale.

Endpoint. POST /v1/match/product

Required inputs

Field	Type	Notes
`case_id`	string	Caller-owned identifier.
`line_items[]`	array	From the Info Extractor, or constructed by the orchestrator.

Optional inputs

Field	Type	Why it helps
`customer_id`	string	Enables customer-specific product history boost.
`branch_id`	string	Enables branch-level catalogue scoping.
`country`	string	Country-specific catalogue scoping.
`match_strategy`	enum	`strict`, `balanced` (default), `permissive`.

Outputs

Field	Type	Notes
`matches[]`	array	One entry per input line item.
`matches[].rubix_product_id`	string	Empty if `unmatched_flag: true`.
`matches[].confidence_score`	float	Calibrated.
`matches[].match_rationale`	string	Which match path won.
`matches[].alternatives[]`	array	Substitutes / equivalents with reason codes.
`matches[].unmatched_flag`	bool	True if no candidate cleared threshold.
`unmatched[]`	array	Convenience list for orchestrator routing.

Match paths

The matcher tries multiple paths and returns the strongest:

Manufacturer + part number exact match (highest confidence path).
Manufacturer cross-reference (competitor part to stocked equivalent).
Semantic match (description embeddings against catalogue).
Historical-pattern match (this customer ordered this SKU before).
Branch-local override (a branch-supplied spreadsheet).

The path that won is reported in match_rationale.

Edge cases

Case	Behavior
Multiple SKUs at similar confidence	Best candidate if customer history indicates preference; otherwise `unmatched_flag: true` with all candidates in `alternatives[]`.
Discontinued SKU	Returns the successor with rationale, if mapped.
Competitor part with no equivalent	`unmatched_flag: true` with reason `no_rubix_equivalent`.
Quantity unit mismatch	Resolves match; `unit_conversion_required: true` flag added with proposed conversion.

Performance

Metric	Target
p50 latency (10 line items)	Under 800 ms
p95 latency (10 line items)	Under 2.0 s
p50 latency (100 line items)	Under 4.0 s

Configuration knobs

Match thresholds per product category.
Substitute / equivalent rules.
Country-specific catalogue scoping.
Per-customer override lists.
Branch-local-knowledge ingestion (SFTP, API, or scheduled file drop).

Modular vs. unified call patterns

The four modules are independently callable. The orchestrator can choose any of these patterns:

Pattern	When to use	Notes
Intent only	Routing decisions only.	Cheap, fast.
Intent + Extract	Extract content for non-quote intents (e.g., status requests with attached PDFs).
Full pipeline (all four)	Standard quote handling.	Faction shares case context internally across the four calls when invoked within a short window with the same `case_id` and a shared `correlation_id`.
Single module reuse	Re-running just product matching after a rep edits a line.	Idempotent; safe to call repeatedly.

There is no requirement to call modules in a specific order. The orchestrator decides. Modules do not call each other; the caller is always in charge of orchestration.

Modules

Intent Classifier

Required inputs

Optional inputs

Outputs

Edge cases

Performance

Configuration knobs (caller-controlled)

Quote Info Extractor

Required inputs

Optional inputs

Outputs

Edge cases

Performance

Configuration knobs

Customer Matcher

Required inputs

Optional inputs

Outputs

Match strategy

Edge cases

Performance

Configuration knobs

Product Matcher

Required inputs

Optional inputs

Outputs

Match paths

Edge cases

Performance

Configuration knobs

Modular vs. unified call patterns

On this page