Extract a Document
The extract endpoint is the primary entry point for the Talonic API. Send any document and receive schema-validated structured data with per-cell provenance and confidence scores.
/v1/extractRequest
Send a multipart/form-data request with the document file and an optional schema. See authentication for header requirements and schema formats for all schema options.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| file | file | One of file/file_url/document_id | The document file to extract (PDF, DOCX, image, spreadsheet) |
| file_url | string | One of file/file_url/document_id | URL of a publicly accessible document to extract |
| document_id | string | One of file/file_url/document_id | ID of an existing document to re-extract |
| schema | string (JSON) | No | Inline JSON schema mapping field names to types |
| schema_id | string | No | ID of a saved schema to use for extraction |
| instructions | string | No | Natural language instructions to guide extraction |
| include_markdown | boolean | No | Include markdown representation in the response |
Examples
curl -X POST https://api.talonic.com/v1/extract \
-H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
-F "file=@invoice.pdf" \
-F 'schema={"vendor_name":"string","invoice_number":"string","total_amount":"number","due_date":"date"}'Response
For small documents, the API returns 200 OK with the extraction result inline. For larger documents, it returns 202 Accepted with a job ID — poll via the jobs endpoint or use webhooks for async notification.
# 200 — Synchronous response
{
"extraction_id": "ext_abc123",
"document_id": "doc_xyz789",
"schema_id": null,
"status": "completed",
"data": {
"rows": [
{
"vendor_name": "Acme Corp",
"invoice_number": "INV-2026-0042",
"total_amount": 1250.00,
"due_date": "2026-05-15"
}
]
}
}
# 202 — Async response
{
"job_id": "job_def456",
"status": "processing",
"document_id": "doc_xyz789"
}Pipeline processing
The extract endpoint triggers the four-phase pipeline. The Resolve phase classifies the document using the 529-type ontology. The Agent phase extracts values. The Validate phase applies the confidence gate. The Re-read phase cross-checks flagged values. Results include per-cell provenance linking each value to its source region.
Error handling
See the error reference for all error codes. Common errors include file_too_large (exceeds plan limits), unsupported_format, and invalid_schema.