What is the Field Registry?

The unified knowledge graph of all canonical fields discovered across documents. Fields are organized into three tiers based on frequency: Tier 1 (core), Tier 2 (established), Tier 3 (emerging).

How does the 4-phase extraction pipeline work?

Phase 1 (Resolve) fills ~30% of cells from graph matches — no AI needed. Phase 2 (Agent) uses AI strategies. Phase 3 (Validation) runs cross-field checks. Phase 4 (Re-read) fills remaining gaps with targeted document re-reading.

What are cases in Talonic?

Cases are groups of 2+ documents connected through shared entities (names, reference numbers, project codes). They are automatically discovered by the linking pipeline and include evidence chains and AI narration.

How does the confidence gate work?

Once a cell is filled with confidence ≥ 0.7, no later pipeline phase can overwrite it. This prevents high-confidence lookup results (0.95) from being replaced by lower-confidence agent extractions (0.65).

Field Registry

The Field Registry is Talonic's unified knowledge graph of canonical fields. It maps raw extracted values to standardized field definitions across all documents and schemas.

How it works

When the four-phase pipeline extracts a value, it resolves the raw field name against the Field Registry. The registry contains thousands of canonical field definitions built from the 529-type ontology and refined through production usage. Each field has a canonical name, description, expected type, and tier.

Tier system

Fields are organized into three tiers based on their specificity and standardization level. Higher tiers provide stronger guarantees about data quality and cross-document consistency.

Tier	Name	Description	Example
1	Universal	Fields standard across all document types	date, currency, language
2	Domain	Fields standard within a document category	invoice_number, vendor_name, ISIN
3	Custom	User-defined fields specific to a schema	internal_project_code, custom_tag

Field resolution

During the Resolve phase of the extraction pipeline, raw field names are matched against the registry using semantic similarity. A field labeled "total" in the document might resolve to the canonical total_amount field. This resolution is recorded in the per-cell provenance trace.

Schema Graph integration

The Schema Graph connects schemas to Field Registry entries. When you create a schema via the schemas API, each field is automatically linked to its canonical registry entry. This enables cross-schema field search via the search API.

DIN SPEC 91491 compliance

The Field Registry's tier system and canonical naming conventions are aligned with DIN SPEC 91491, the German standard for AI-assisted document processing that Talonic co-authored. Tier 1 and Tier 2 fields map directly to the standard's recommended field catalog.