Schemas as Entities

In Talonic, schemas are versioned entities that define the structure of extraction output. Each schema links its fields to the Field Registry and tracks changes over time.

Schema structure

A schema defines a set of fields, each with a name, type, and optional description. When you create a schema via the schemas API, Talonic resolves each field against the Field Registry and records the mapping in the Schema Graph.

curl -X POST https://api.talonic.com/v1/schemas \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice Schema",
    "fields": {
      "vendor_name": "string",
      "invoice_number": "string",
      "total_amount": "number",
      "due_date": "date"
    }
  }'

Versioning

Every update to a schema via the update endpoint creates a new version. Previous versions are retained so that existing extractions remain tied to the schema version that produced them. This ensures per-cell provenance integrity.

Schema Graph

The Schema Graph is the relationship layer connecting schemas, fields, and the Field Registry. It enables cross-schema search — you can find all schemas containing a total_amount field via the search API. The graph also powers the 529-type ontology classification by mapping schemas to document types.

Best practices

Use descriptive schema names that reflect the document type. Include field descriptions to improve extraction accuracy. Reference schemas by ID in production via schema_id rather than inline JSON — see schema formats for details. Monitor schema usage in the extractions list.

Schemas API

CRUD operations for schemas.

Schema Formats

Inline JSON, schema_id, and auto-detect.

Field Registry

Canonical field naming and tiers.

Per-Cell Provenance

Provenance tied to schema versions.

Extract Endpoint

Use schemas for extraction.