What file formats does the Talonic API support?

PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS, XLSM, PNG, JPG, JPEG, GIF, WEBP, TXT, MD, HTML, XML, JSON, EML, CSV, MSG, BMP, and ZIP archives.

How does authentication work?

All API requests require a Bearer token in the Authorization header. API keys carry the tlnc_ prefix and are scoped to a source. Create and manage keys from Settings → API Keys.

What schema formats are supported?

Three formats: JSON Schema (full control), simplified fields (recommended), and flat key-type maps (quick prototyping). Supported types: string, number, integer, boolean, date, array, object, enum.

What are the rate limits?

Free: 50 extractions/day, 5/min burst, 10MB max. Pro: 2,000/day, 30/min, 50MB max. Enterprise: unlimited with custom rates.

How do webhooks work?

Webhooks deliver POST requests with HMAC-SHA256 signed JSON payloads. Events: extraction.complete, extraction.failed, document.ingested. Failed deliveries retry with exponential backoff (1min, 5min, 30min, 4hr).

Extract a Document

The extract endpoint is the primary entry point for the Talonic API. Send any document and receive schema-validated structured data with per-cell provenance and confidence scores.

POST/v1/extract

Request

Send a multipart/form-data request with the document file and an optional schema. See authentication for header requirements and schema formats for all schema options.

Parameters

Parameter	Type	Required	Description
file	file	One of file/file_url/document_id	The document file to extract (PDF, DOCX, image, spreadsheet)
file_url	string	One of file/file_url/document_id	URL of a publicly accessible document to extract
document_id	string	One of file/file_url/document_id	ID of an existing document to re-extract
schema	string (JSON)	No	Inline JSON schema mapping field names to types
schema_id	string	No	ID of a saved schema to use for extraction
instructions	string	No	Natural language instructions to guide extraction
include_markdown	boolean	No	Include markdown representation in the response

Examples

curl -X POST https://api.talonic.com/v1/extract \
  -H "Authorization: Bearer tlnc_sk_live_7f3a...x9k2" \
  -F "file=@invoice.pdf" \
  -F 'schema={"vendor_name":"string","invoice_number":"string","total_amount":"number","due_date":"date"}'

Response

For small documents, the API returns 200 OK with the extraction result inline. For larger documents, it returns 202 Accepted with a job ID — poll via the jobs endpoint or use webhooks for async notification.

# 200 — Synchronous response
{
  "extraction_id": "ext_abc123",
  "document_id": "doc_xyz789",
  "schema_id": null,
  "status": "completed",
  "data": {
    "rows": [
      {
        "vendor_name": "Acme Corp",
        "invoice_number": "INV-2026-0042",
        "total_amount": 1250.00,
        "due_date": "2026-05-15"
      }
    ]
  }
}

# 202 — Async response
{
  "job_id": "job_def456",
  "status": "processing",
  "document_id": "doc_xyz789"
}

Pipeline processing

The extract endpoint triggers the four-phase pipeline. The Resolve phase classifies the document using the 529-type ontology. The Agent phase extracts values. The Validate phase applies the confidence gate. The Re-read phase cross-checks flagged values. Results include per-cell provenance linking each value to its source region.

Error handling

See the error reference for all error codes. Common errors include file_too_large (exceeds plan limits), unsupported_format, and invalid_schema.