Skip to main content

Field Registry

The Field Registry is Talonic's unified knowledge graph of canonical fields. It maps raw extracted values to standardized field definitions across all documents and schemas.

How it works

When the four-phase pipeline extracts a value, it resolves the raw field name against the Field Registry. The registry contains thousands of canonical field definitions built from the 529-type ontology and refined through production usage. Each field has a canonical name, description, expected type, and tier.

Tier system

Fields are organized into three tiers based on their specificity and standardization level. Higher tiers provide stronger guarantees about data quality and cross-document consistency.

TierNameDescriptionExample
1UniversalFields standard across all document typesdate, currency, language
2DomainFields standard within a document categoryinvoice_number, vendor_name, ISIN
3CustomUser-defined fields specific to a schemainternal_project_code, custom_tag

Field resolution

During the Resolve phase of the extraction pipeline, raw field names are matched against the registry using semantic similarity. A field labeled "total" in the document might resolve to the canonical total_amount field. This resolution is recorded in the per-cell provenance trace.

Schema Graph integration

The Schema Graph connects schemas to Field Registry entries. When you create a schema via the schemas API, each field is automatically linked to its canonical registry entry. This enables cross-schema field search via the search API.

DIN SPEC 91491 compliance

The Field Registry's tier system and canonical naming conventions are aligned with DIN SPEC 91491, the German standard for AI-assisted document processing that Talonic co-authored. Tier 1 and Tier 2 fields map directly to the standard's recommended field catalog.

Frequently asked questions