Field Registry
The Field Registry is Talonic's unified knowledge graph of canonical fields. It maps raw extracted values to standardized field definitions across all documents and schemas.
How it works
When the four-phase pipeline extracts a value, it resolves the raw field name against the Field Registry. The registry contains thousands of canonical field definitions built from the 529-type ontology and refined through production usage. Each field has a canonical name, description, expected type, and tier.
Tier system
Fields are organized into three tiers based on their specificity and standardization level. Higher tiers provide stronger guarantees about data quality and cross-document consistency.
| Tier | Name | Description | Example |
|---|---|---|---|
| 1 | Universal | Fields standard across all document types | date, currency, language |
| 2 | Domain | Fields standard within a document category | invoice_number, vendor_name, ISIN |
| 3 | Custom | User-defined fields specific to a schema | internal_project_code, custom_tag |
Field resolution
During the Resolve phase of the extraction pipeline, raw field names are matched against the registry using semantic similarity. A field labeled "total" in the document might resolve to the canonical total_amount field. This resolution is recorded in the per-cell provenance trace.
Schema Graph integration
The Schema Graph connects schemas to Field Registry entries. When you create a schema via the schemas API, each field is automatically linked to its canonical registry entry. This enables cross-schema field search via the search API.
DIN SPEC 91491 compliance
The Field Registry's tier system and canonical naming conventions are aligned with DIN SPEC 91491, the German standard for AI-assisted document processing that Talonic co-authored. Tier 1 and Tier 2 fields map directly to the standard's recommended field catalog.