Data Requirements
Data domains, schemas, ingestion patterns, initial load, delta refresh, data quality, tenant scoping, PII.
Data domains and ownership
| Domain | Owner | Faction's use |
|---|---|---|
| Intent taxonomy | Caller | Classification labels. |
| Quote schema | Caller | Extraction target. |
| Customer master | Caller | Customer matching. |
| Contact list | Caller | Customer matching, contact resolution. |
| Ship-to addresses | Caller | Ship-to inference. |
| Product catalogue | Caller | Product matching. |
| Historical quotes / orders | Caller | Customer-specific patterns. |
| Branch-level knowledge | Caller (per branch) | Product matching overrides. |
| Cross-reference tables | Caller or supplier | Product matching alternates. |
Representative schemas
Final fields are agreed during design; these examples are illustrative.
Customer master
{
"customer_id": "CUST-UK-1042",
"country": "GB",
"name": "ACME Engineering Ltd",
"name_aliases": ["Acme Eng", "ACME"],
"primary_branch_id": "BR-UK-014",
"domains": ["acme-engineering.co.uk"],
"billing_address": { "...": "..." },
"ship_to_locations": [
{ "ship_to_id": "SHIP-UK-1042-BHM", "name": "Birmingham depot", "address": { "...": "..." } }
],
"customer_type": "named_account",
"active": true,
"updated_at": "2026-04-15T08:30:00Z"
}Product master
{
"rubix_product_id": "GB-BRG-6205-2RS",
"country": "GB",
"description": "SKF 6205-2RS deep groove ball bearing",
"manufacturer": "SKF",
"manufacturer_part_number": "6205-2RS",
"category": "bearings",
"uom": "each",
"alternative_uoms": [{ "uom": "box_of_50", "factor": 50 }],
"substitutes": ["GB-BRG-6205-2Z"],
"discontinued": false,
"successor_id": null,
"active": true,
"updated_at": "2026-04-20T11:00:00Z"
}Historical quote / order
{
"transaction_id": "QUO-2025-09-887442",
"type": "quote",
"customer_id": "CUST-UK-1042",
"ship_to_id": "SHIP-UK-1042-BHM",
"branch_id": "BR-UK-014",
"lines": [
{ "rubix_product_id": "GB-BRG-6205-2RS", "quantity": 50, "unit_price": 4.85 }
],
"outcome": "won",
"created_at": "2025-09-10T09:00:00Z"
}Ingestion patterns
Faction supports four patterns, in order of preference for ongoing operation:
| Pattern | Best for | Notes |
|---|---|---|
| API push from caller | Real-time updates of customer master and catalogue. | Lowest latency, lowest data-warehouse footprint. |
| Scheduled pull from caller API | Hourly or daily refresh. | Faction polls; caller supplies endpoints. |
| Batch file drop (SFTP, S3, Azure Blob) | Large catalogue diffs, branch-level spreadsheets. | Faction watches a path; ingests on arrival. |
| One-time historical load | Initial onboarding. | CSV, SQL extract, or Parquet. |
Mix patterns by domain
Customers can mix patterns by domain: e.g., customer master via scheduled pull (daily), catalogue via batch file (nightly), branch spreadsheets via SFTP (ad-hoc).
Initial load process
| Phase | Activity | Owner |
|---|---|---|
| 1. Schema agreement | Confirm fields, types, identifiers, scoping. | Joint. |
| 2. Sample extract | Caller supplies a small representative extract per domain. | Caller. |
| 3. Validation | Faction validates schema, completeness, identifier uniqueness. Issues a DQ report. | Faction. |
| 4. Full historical load | Caller supplies full dataset. | Caller. |
| 5. Ingest and index | Faction ingests, normalizes, builds embeddings and indexes. | Faction. |
| 6. Sandbox validation | Run sample queries; compare against expected outputs. | Joint. |
| 7. Cutover to production | Promote to production tenant. | Joint. |
Delta refresh
Faction tracks updated_at timestamps per record. Delta refresh:
- Caller supplies records changed since last sync (or Faction polls with
?since=<timestamp>). - Faction validates the diff (no schema drift, identifiers stable).
- Faction applies updates and re-indexes affected records.
- Faction reports record counts in / out, plus any rejections.
Default cadence: hourly for customer master, daily for catalogue, on-arrival for branch files.
Data quality requirements
| Check | Severity | Behavior on failure |
|---|---|---|
| Identifier uniqueness within domain | Hard | Reject batch; report. |
| Required fields populated | Hard | Reject record; rest of batch proceeds. |
| Foreign key integrity | Soft | Accept record; flag in DQ report. |
| Encoding (UTF-8) | Hard | Reject batch. |
| Date format (RFC 3339) | Hard | Reject record. |
| Catalogue category coverage | Soft | Accept; flag unknown categories. |
DQ reports are emitted per ingest run and available via API.
Tenant and country scoping
All data is tagged with tenant_id and country. Faction's data layer enforces:
- A request authenticated as tenant X can only read records tagged
tenant_id: X. - A request scoped to country
GBreturns onlycountry: GBrecords. - UK-only data physically resides in UK-region storage when the UK-region option is selected.
Country scoping is enforced at storage and query layers. Cross-country reads are blocked by default and require explicit configuration to enable.
PII handling
| Data type | Treatment |
|---|---|
| Customer business contact (name, email, phone) | Used at inference time. Stored. Encrypted. |
| End-customer PII (e.g., a sales rep's email content) | Used at inference time. Pseudonymized in stored logs unless tenant opts in to retain. |
| Sensitive PII (national ID, financial accounts) | Not expected. Detected and redacted before storage if observed. |
Subject access requests (GDPR Article 15) supported via API and operational process. SLA: 30 days, faster on request.
Configuration management
Tenant configuration (taxonomy, schemas, thresholds, scoping rules) is versioned. Each change produces a config version with author, timestamp, and diff. Rollback is available.
GET /v1/admin/config/versions
[
{ "version": "v23", "author": "admin@example.com", "applied_at": "2026-04-25T10:00:00Z", "summary": "Tightened product matcher thresholds for high_value category" },
{ "version": "v22", "author": "admin@example.com", "applied_at": "2026-04-20T08:30:00Z", "summary": "Added BR-UK-014 branch threshold override" }
]Architecture
Deployment topology, network, authentication, API standards, versioning, error model, sandbox, observability, rate limits.
Security & Compliance
Encryption, retention, deletion, training policy, GDPR, audit logs, RBAC, SOC 2, incident management, BCP/DR, vulnerability management, IP ownership.