What happens when I hit a rate limit?

The API returns HTTP 429 with a Retry-After header indicating seconds to wait. Your request is not processed — retry after the specified delay.

Can I increase my rate limits?

Upgrade to Pro for higher limits. Enterprise plans offer custom quotas tailored to your workload. Contact hello@talonic.com for Enterprise pricing.

Do rate limits apply to all endpoints?

Yes. Rate limits apply globally across all API endpoints for a given API key. Read-only endpoints (GET) share the same quota as write endpoints.

Are rate limits per API key or per workspace?

Rate limits are per API key. If you have multiple keys in the same workspace, each key has its own independent rate limit.

Rate Limits

The Talonic API enforces per-tier rate limits to ensure fair usage. Limits apply per API key and reset on a rolling window.

Plan limits

Each plan tier has daily request quotas, per-minute burst limits, and maximum file sizes. Upgrade your plan in the Talonic platform to increase limits.

Limit	Free	Pro	Enterprise
Requests per day	50	2,000	Custom
Requests per minute	5	30	Custom
Max file size	10 MB	50 MB	Custom
Concurrent extractions	1	5	Custom
Schemas	3	50	Unlimited
Sources	1	10	Unlimited

Rate limit headers

Every API response includes rate limit headers so you can monitor usage proactively. Check these headers to avoid hitting limits during bulk extraction workloads.

Header	Description
X-RateLimit-Limit	Maximum requests per minute for your plan
X-RateLimit-Remaining	Requests remaining in the current window
X-RateLimit-Reset	Unix timestamp when the window resets
Retry-After	Seconds to wait before retrying (only on 429)

Handling 429 responses

When you receive a 429 Too Many Requests response, read the Retry-After header and wait that many seconds before retrying. Implement exponential backoff with jitter for resilience. See the error handling guide for a full retry strategy.

Best practices

Batch documents using source ingestion instead of individual extract calls. Use webhooks to receive results asynchronously rather than polling jobs. Cache schema responses to avoid unnecessary reads.