Skip to main content

Per-Cell Provenance

Every extracted value in Talonic carries per-cell provenance — a record linking the value to its exact source region in the original document, with a confidence score and reasoning trace.

Provenance structure

Each cell in the extraction output includes a provenance object with the source page number, bounding box coordinates, the raw text extracted from that region, a confidence score between 0 and 1, and a reasoning trace explaining why the pipeline chose that value.

# Provenance object in extraction data response
{
  "field": "total_amount",
  "value": 1250.00,
  "provenance": {
    "page": 1,
    "bbox": [320, 450, 420, 470],
    "raw_text": "Total: EUR 1,250.00",
    "confidence": 0.97,
    "reasoning": "Matched 'Total' label adjacent to currency value"
  }
}

Confidence scores

Confidence scores range from 0.0 to 1.0. The confidence gate threshold (configurable per schema) determines whether a value is accepted or flagged for review. Values below the gate appear in the output with a flagged: true marker. See the extraction data endpoint for the full response format.

Reasoning traces

The reasoning trace is a human-readable explanation of the extraction logic. It records which phase of the four-phase pipeline produced the value, which Field Registry entry was matched, and any schema constraints that influenced the result.

Audit and compliance

Per-cell provenance supports DIN SPEC 91491 compliance requirements for traceable AI-assisted document processing. Every value can be traced back to its source region, enabling audit trails for regulated industries. Use the markdown endpoint to retrieve the source document with provenance annotations.

Correcting values

When a human corrects an extracted value via the corrections endpoint, the provenance is updated to record both the original extraction and the human override. This maintains a complete audit trail of how each value was produced and modified.