DIN SPEC 91491:2026-05
Your AI stack is only as good as its schema layer.
DIN SPEC 91491 is the first standard for transforming PDFs, spreadsheets, APIs, and legacy data into structured, traceable, reusable schemas — using LLMs and prompt engineering. No more one-off ETL. No more brittle mappings. One governed schema layer for enterprise AI.
Initiated by Talonic. Published by DIN. Built with Fraunhofer IIS, GIIC, Humboldt-Innovation, DIN Solutions, and partners.
THE MESS
Enterprise AI fails when schemas are unmanaged.
Every team builds their own extraction pipeline. Every pipeline invents its own schema. Every schema drifts independently. The result: a graveyard of one-off data structures that cannot be audited, versioned, reused, or trusted.
73%
of enterprise AI projects stall at data integration
5–12
hours per schema designed manually
0
standards existed for AI-generated schemas before this
Every one of these schemas was someone's production infrastructure. None of them talk to each other.
THE STANDARD
Five rules for making AI-generated schemas trustworthy.
DIN SPEC 91491 defines a complete framework — from data ingestion through prompt-driven schema generation to governed harmonization and delivery. These are the principles that make it work.
Let AI read the data, not a template
LLMs understand semantic context. They extract entities and relationships without rigid pre-definition, eliminating custom ETL per data source.
Control schema generation through prompts
Versioned, domain-specific prompt templates guide the model. Every schema can be traced back to the exact prompt and model that produced it.
Use a universal intermediate schema
A neutral representation captures structure and meaning before mapping to target formats. Decouple input from output. Enable schema portability.
Classify every change before accepting it
Schema changes are categorized as safe, moderate, or critical. Safe changes auto-accept. Critical changes require human review. Drift is governed, not ignored.
Validate with synthetic data, verify with real data
Synthetic datasets test edge cases without exposing sensitive information. Real-world data confirms production behavior. Both are mandatory.
THE MACHINE
Nine components. Each auditable. Each replaceable.
Click any component to see what it does, the clause that defines it, and a sample of its output.
THE LAB
Run a document through the standard.
Pick an input. Watch it get ingested, parsed, prompted, schema-generated, and harmonized — with real JSON artifacts at every stage.
Stage 1/6
Select Input (Clause 7.2)
Choose a sample document to process through the DIN SPEC 91491 pipeline. Each format demonstrates different parsing and schema generation paths.
Each step is modular, traceable, and auditable. That's not a feature. It's a requirement.
SCHEMA EVOLUTION
Can your schema survive production?
Every change is classified by impact. Safe changes auto-accept. Critical changes halt the pipeline and require human approval. This is how you prevent schema drift at scale.
| Classification | Examples | System Action |
|---|---|---|
| Safe | Add optional field, reorder fields, metadata change | Auto-accept |
| Moderate | Type coercion (int → float), field renaming, optional → required | Configurable |
| Critical | Remove field, restructure nesting, merge/split entities | Manual review required |
One schema. Every target format.
{
"type": "object",
"properties": {
"invoice_number": { "type": "string" },
"order_date": { "type": "string", "format": "date" },
"customer_id": { "type": "string" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product": { "type": "string" },
"quantity": { "type": "integer" },
"unit_price": { "type": "number" }
}
}
}
},
"required": ["invoice_number", "order_date", "line_items"]
}Before: unmanaged
- invoice_v3_final.json — unknown author, no version history
- Undocumented field renames break downstream consumers
- No audit trail from source document to output schema
- Type changes discovered in production, not in review
After: DIN SPEC 91491
- invoice v2.4 — harmonized, versioned, traceable to prompt-98aa3f
- Alias resolution: client_ref → customer_id, logged and reversible
- Full lineage: document → prompt → schema → harmonization → export
- Type coercion int → float classified as moderate, auto-accepted by policy
READINESS CHECK
How schema-ready is your organization?
Eight questions. Two minutes. Find out where you stand — and where DIN SPEC 91491 would make the biggest difference.
WHERE IT APPLIES
From invoices to sensor streams.
The standard defines six reference implementations. Each one follows the same pipeline — ingest, extract, prompt, generate, harmonize, deliver.
Financial data consolidation
Clause 9.1Input
Complex Excel workbooks with nested sheets, merged headers, cross-sheet references
Processing
Input Parser reads all sheets preserving layout. Feature Extractor detects hierarchical headers and merged cells. Prompt Engine selects a financial report template. The LLM infers nested JSON objects from multi-level headers and suggests relational table splits.
Output
Machine-readable schema capturing all fields and relationships from the workbook, aligned with organizational standards.
Benefits
- Increased automation: seconds instead of days of manual modeling
- Reduced manual modeling effort for complex spreadsheets
- Improved interoperability with databases, APIs, and reporting tools
WHO BUILT THIS
Talonic initiated DIN SPEC 91491.
Talonic co-authored this standard with Fraunhofer IIS, DIN Solutions, GIIC, Humboldt-Innovation, and four other consortium members. The Talonic platform is a production implementation of the framework described in this document — from AI-generated schemas to harmonization, versioning, and delivery.
DIN SPEC 91491 Implementation Kit
Everything a Head of Data, CTO, or AI lead needs to evaluate and adopt schema governance.
- AI-ready data readiness checklist
- Schema lifecycle maturity scorecard
- Reference architecture diagram
- Prompt and schema governance template
- RFP questions for schema-layer vendors
- Sample schema evolution diff
- Executive one-pager for internal buy-in
Schema Audit
Send us a sample. We'll return a schema read, an accuracy estimate, and a recommendation within 5 business days.
Contact Sales →