Skip to main content

DIN SPEC 91491:2026-05

Your AI stack is only as good as its schema layer.

DIN SPEC 91491 is the first standard for transforming PDFs, spreadsheets, APIs, and legacy data into structured, traceable, reusable schemas — using LLMs and prompt engineering. No more one-off ETL. No more brittle mappings. One governed schema layer for enterprise AI.

Initiated by Talonic. Published by DIN. Built with Fraunhofer IIS, GIIC, Humboldt-Innovation, DIN Solutions, and partners.

THE MESS

Enterprise AI fails when schemas are unmanaged.

Every team builds their own extraction pipeline. Every pipeline invents its own schema. Every schema drifts independently. The result: a graveyard of one-off data structures that cannot be audited, versioned, reused, or trusted.

73%

of enterprise AI projects stall at data integration

5–12

hours per schema designed manually

0

standards existed for AI-generated schemas before this

Every one of these schemas was someone's production infrastructure. None of them talk to each other.

THE STANDARD

Five rules for making AI-generated schemas trustworthy.

DIN SPEC 91491 defines a complete framework — from data ingestion through prompt-driven schema generation to governed harmonization and delivery. These are the principles that make it work.

1

Let AI read the data, not a template

LLMs understand semantic context. They extract entities and relationships without rigid pre-definition, eliminating custom ETL per data source.

2

Control schema generation through prompts

Versioned, domain-specific prompt templates guide the model. Every schema can be traced back to the exact prompt and model that produced it.

3

Use a universal intermediate schema

A neutral representation captures structure and meaning before mapping to target formats. Decouple input from output. Enable schema portability.

4

Classify every change before accepting it

Schema changes are categorized as safe, moderate, or critical. Safe changes auto-accept. Critical changes require human review. Drift is governed, not ignored.

5

Validate with synthetic data, verify with real data

Synthetic datasets test edge cases without exposing sensitive information. Real-world data confirms production behavior. Both are mandatory.

THE MACHINE

Nine components. Each auditable. Each replaceable.

Click any component to see what it does, the clause that defines it, and a sample of its output.

Ingest
Extract
Prompt
Generate
Classify
Harmonize
Store
Map
Export

THE LAB

Run a document through the standard.

Pick an input. Watch it get ingested, parsed, prompted, schema-generated, and harmonized — with real JSON artifacts at every stage.

1
2
3
4
5
6

Stage 1/6

Select Input (Clause 7.2)

Choose a sample document to process through the DIN SPEC 91491 pipeline. Each format demonstrates different parsing and schema generation paths.


Each step is modular, traceable, and auditable. That's not a feature. It's a requirement.


SCHEMA EVOLUTION

Can your schema survive production?

Every change is classified by impact. Safe changes auto-accept. Critical changes halt the pipeline and require human approval. This is how you prevent schema drift at scale.

ClassificationExamplesSystem Action
SafeAdd optional field, reorder fields, metadata changeAuto-accept
ModerateType coercion (int → float), field renaming, optional → requiredConfigurable
CriticalRemove field, restructure nesting, merge/split entitiesManual review required

One schema. Every target format.

{
  "type": "object",
  "properties": {
    "invoice_number": { "type": "string" },
    "order_date": { "type": "string", "format": "date" },
    "customer_id": { "type": "string" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "product": { "type": "string" },
          "quantity": { "type": "integer" },
          "unit_price": { "type": "number" }
        }
      }
    }
  },
  "required": ["invoice_number", "order_date", "line_items"]
}

Before: unmanaged

  • invoice_v3_final.json — unknown author, no version history
  • Undocumented field renames break downstream consumers
  • No audit trail from source document to output schema
  • Type changes discovered in production, not in review

After: DIN SPEC 91491

  • invoice v2.4 — harmonized, versioned, traceable to prompt-98aa3f
  • Alias resolution: client_ref → customer_id, logged and reversible
  • Full lineage: document → prompt → schema → harmonization → export
  • Type coercion int → float classified as moderate, auto-accepted by policy

READINESS CHECK

How schema-ready is your organization?

Eight questions. Two minutes. Find out where you stand — and where DIN SPEC 91491 would make the biggest difference.

WHERE IT APPLIES

From invoices to sensor streams.

The standard defines six reference implementations. Each one follows the same pipeline — ingest, extract, prompt, generate, harmonize, deliver.

Financial data consolidation

Clause 9.1

Input

Complex Excel workbooks with nested sheets, merged headers, cross-sheet references

Processing

Input Parser reads all sheets preserving layout. Feature Extractor detects hierarchical headers and merged cells. Prompt Engine selects a financial report template. The LLM infers nested JSON objects from multi-level headers and suggests relational table splits.

Output

Machine-readable schema capturing all fields and relationships from the workbook, aligned with organizational standards.

Benefits

  • Increased automation: seconds instead of days of manual modeling
  • Reduced manual modeling effort for complex spreadsheets
  • Improved interoperability with databases, APIs, and reporting tools

WHO BUILT THIS

Talonic initiated DIN SPEC 91491.

Talonic co-authored this standard with Fraunhofer IIS, DIN Solutions, GIIC, Humboldt-Innovation, and four other consortium members. The Talonic platform is a production implementation of the framework described in this document — from AI-generated schemas to harmonization, versioning, and delivery.

DIN SPEC 91491 Implementation Kit

Everything a Head of Data, CTO, or AI lead needs to evaluate and adopt schema governance.

  • AI-ready data readiness checklist
  • Schema lifecycle maturity scorecard
  • Reference architecture diagram
  • Prompt and schema governance template
  • RFP questions for schema-layer vendors
  • Sample schema evolution diff
  • Executive one-pager for internal buy-in

Schema Audit

Send us a sample. We'll return a schema read, an accuracy estimate, and a recommendation within 5 business days.

Contact Sales →

See it in production

The Talonic platform implements DIN SPEC 91491 end-to-end.

View the platform →