FactionDocs
Technical Reference

Data Requirements

Data domains, schemas, ingestion patterns, initial load, delta refresh, data quality, tenant scoping, PII.

Data domains and ownership

DomainOwnerFaction's use
Intent taxonomyCallerClassification labels.
Quote schemaCallerExtraction target.
Customer masterCallerCustomer matching.
Contact listCallerCustomer matching, contact resolution.
Ship-to addressesCallerShip-to inference.
Product catalogueCallerProduct matching.
Historical quotes / ordersCallerCustomer-specific patterns.
Branch-level knowledgeCaller (per branch)Product matching overrides.
Cross-reference tablesCaller or supplierProduct matching alternates.

Representative schemas

Final fields are agreed during design; these examples are illustrative.

Customer master

{
  "customer_id": "CUST-UK-1042",
  "country": "GB",
  "name": "ACME Engineering Ltd",
  "name_aliases": ["Acme Eng", "ACME"],
  "primary_branch_id": "BR-UK-014",
  "domains": ["acme-engineering.co.uk"],
  "billing_address": { "...": "..." },
  "ship_to_locations": [
    { "ship_to_id": "SHIP-UK-1042-BHM", "name": "Birmingham depot", "address": { "...": "..." } }
  ],
  "customer_type": "named_account",
  "active": true,
  "updated_at": "2026-04-15T08:30:00Z"
}

Product master

{
  "rubix_product_id": "GB-BRG-6205-2RS",
  "country": "GB",
  "description": "SKF 6205-2RS deep groove ball bearing",
  "manufacturer": "SKF",
  "manufacturer_part_number": "6205-2RS",
  "category": "bearings",
  "uom": "each",
  "alternative_uoms": [{ "uom": "box_of_50", "factor": 50 }],
  "substitutes": ["GB-BRG-6205-2Z"],
  "discontinued": false,
  "successor_id": null,
  "active": true,
  "updated_at": "2026-04-20T11:00:00Z"
}

Historical quote / order

{
  "transaction_id": "QUO-2025-09-887442",
  "type": "quote",
  "customer_id": "CUST-UK-1042",
  "ship_to_id": "SHIP-UK-1042-BHM",
  "branch_id": "BR-UK-014",
  "lines": [
    { "rubix_product_id": "GB-BRG-6205-2RS", "quantity": 50, "unit_price": 4.85 }
  ],
  "outcome": "won",
  "created_at": "2025-09-10T09:00:00Z"
}

Ingestion patterns

Faction supports four patterns, in order of preference for ongoing operation:

PatternBest forNotes
API push from callerReal-time updates of customer master and catalogue.Lowest latency, lowest data-warehouse footprint.
Scheduled pull from caller APIHourly or daily refresh.Faction polls; caller supplies endpoints.
Batch file drop (SFTP, S3, Azure Blob)Large catalogue diffs, branch-level spreadsheets.Faction watches a path; ingests on arrival.
One-time historical loadInitial onboarding.CSV, SQL extract, or Parquet.

Mix patterns by domain

Customers can mix patterns by domain: e.g., customer master via scheduled pull (daily), catalogue via batch file (nightly), branch spreadsheets via SFTP (ad-hoc).

Initial load process

PhaseActivityOwner
1. Schema agreementConfirm fields, types, identifiers, scoping.Joint.
2. Sample extractCaller supplies a small representative extract per domain.Caller.
3. ValidationFaction validates schema, completeness, identifier uniqueness. Issues a DQ report.Faction.
4. Full historical loadCaller supplies full dataset.Caller.
5. Ingest and indexFaction ingests, normalizes, builds embeddings and indexes.Faction.
6. Sandbox validationRun sample queries; compare against expected outputs.Joint.
7. Cutover to productionPromote to production tenant.Joint.

Delta refresh

Faction tracks updated_at timestamps per record. Delta refresh:

  1. Caller supplies records changed since last sync (or Faction polls with ?since=<timestamp>).
  2. Faction validates the diff (no schema drift, identifiers stable).
  3. Faction applies updates and re-indexes affected records.
  4. Faction reports record counts in / out, plus any rejections.

Default cadence: hourly for customer master, daily for catalogue, on-arrival for branch files.

Data quality requirements

CheckSeverityBehavior on failure
Identifier uniqueness within domainHardReject batch; report.
Required fields populatedHardReject record; rest of batch proceeds.
Foreign key integritySoftAccept record; flag in DQ report.
Encoding (UTF-8)HardReject batch.
Date format (RFC 3339)HardReject record.
Catalogue category coverageSoftAccept; flag unknown categories.

DQ reports are emitted per ingest run and available via API.

Tenant and country scoping

All data is tagged with tenant_id and country. Faction's data layer enforces:

  • A request authenticated as tenant X can only read records tagged tenant_id: X.
  • A request scoped to country GB returns only country: GB records.
  • UK-only data physically resides in UK-region storage when the UK-region option is selected.

Country scoping is enforced at storage and query layers. Cross-country reads are blocked by default and require explicit configuration to enable.

PII handling

Data typeTreatment
Customer business contact (name, email, phone)Used at inference time. Stored. Encrypted.
End-customer PII (e.g., a sales rep's email content)Used at inference time. Pseudonymized in stored logs unless tenant opts in to retain.
Sensitive PII (national ID, financial accounts)Not expected. Detected and redacted before storage if observed.

Subject access requests (GDPR Article 15) supported via API and operational process. SLA: 30 days, faster on request.

Configuration management

Tenant configuration (taxonomy, schemas, thresholds, scoping rules) is versioned. Each change produces a config version with author, timestamp, and diff. Rollback is available.

GET /v1/admin/config/versions
[
  { "version": "v23", "author": "admin@example.com", "applied_at": "2026-04-25T10:00:00Z", "summary": "Tightened product matcher thresholds for high_value category" },
  { "version": "v22", "author": "admin@example.com", "applied_at": "2026-04-20T08:30:00Z", "summary": "Added BR-UK-014 branch threshold override" }
]

On this page