Skip to main content

Extract a Document

The extract endpoint is the primary entry point for the Talonic API. Send any document and receive schema-validated structured data with per-cell provenance and confidence scores.

POST/v1/extract

Request

Send a multipart/form-data request with the document file and an optional schema. See authentication for header requirements and schema formats for all schema options.

Parameters

ParameterTypeRequiredDescription
filefileOne of file/file_url/document_idThe document file to extract (PDF, DOCX, image, spreadsheet)
file_urlstringOne of file/file_url/document_idURL of a publicly accessible document to extract
document_idstringOne of file/file_url/document_idID of an existing document to re-extract
schemastring (JSON)NoInline JSON schema mapping field names to types
schema_idstringNoID of a saved schema to use for extraction
instructionsstringNoNatural language instructions to guide extraction
include_markdownbooleanNoInclude markdown representation in the response

Examples

curl -X POST https://api.talonic.com/v1/extract \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -F "file=@invoice.pdf" \
  -F 'schema={"vendor_name":"string","invoice_number":"string","total_amount":"number","due_date":"date"}'

Response

For small documents, the API returns 200 OK with the extraction result inline. For larger documents, it returns 202 Accepted with a job ID — poll via the jobs endpoint or use webhooks for async notification.

# 200 — Synchronous response
{
  "extraction_id": "ext_abc123",
  "document_id": "doc_xyz789",
  "schema_id": null,
  "status": "completed",
  "data": {
    "rows": [
      {
        "vendor_name": "Acme Corp",
        "invoice_number": "INV-2026-0042",
        "total_amount": 1250.00,
        "due_date": "2026-05-15"
      }
    ]
  }
}

# 202 — Async response
{
  "job_id": "job_def456",
  "status": "processing",
  "document_id": "doc_xyz789"
}

Pipeline processing

The extract endpoint triggers the four-phase pipeline. The Resolve phase classifies the document using the 529-type ontology. The Agent phase extracts values. The Validate phase applies the confidence gate. The Re-read phase cross-checks flagged values. Results include per-cell provenance linking each value to its source region.

Error handling

See the error reference for all error codes. Common errors include file_too_large (exceeds plan limits), unsupported_format, and invalid_schema.

Frequently asked questions