Skip to main content

Schema Formats

Talonic supports three schema formats for defining extraction targets. Choose inline JSON for quick experiments, schema_id for production, or auto-detect to let the platform infer fields.

Inline JSON schema

Pass a JSON object mapping field names to types directly in the schema parameter of the extract endpoint. This is the fastest way to start extracting data. Supported types are string, number, date, boolean, and array.

curl -X POST https://api.talonic.com/v1/extract \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -F "file=@invoice.pdf" \
  -F 'schema={"vendor_name":"string","invoice_number":"string","total_amount":"number","due_date":"date"}'

Schema ID reference

For production workloads, create a reusable schema via the schemas API and reference it by ID. This enables schema versioning, Field Registry enrichment, and consistent extraction across documents. The Schema Graph links fields to the Field Registry for canonical naming.

curl -X POST https://api.talonic.com/v1/extract \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -F "file=@invoice.pdf" \
  -F "schema_id=sch_abc123"

Auto-detect mode

Omit both schema and schema_id to let Talonic automatically discover fields. The four-phase pipeline uses the 529-type ontology to classify the document and infer relevant fields. Auto-detect is ideal for exploratory analysis where you do not know the document structure in advance.

curl -X POST https://api.talonic.com/v1/extract \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -F "file=@unknown-document.pdf"

Choosing a format

Use inline JSON for prototyping and one-off extractions. Use schema_id for production pipelines where you need versioning and per-cell provenance tied to a stable schema. Use auto-detect when the document type is unknown. All three formats produce the same output structure — see extraction data for the response format.