Documentation
FAQ: How much concurrency can the API handle? in the NexoRouter documentation.
FAQ: How much concurrency can the API handle?
Concurrency is controlled by request rate, token rate, model latency, upstream availability, and your client timeout. Do not treat one successful request as proof that production concurrency is safe.
Current public defaults
| Limit | Default |
|---|---|
| Requests per minute | 120 |
| Estimated tokens per minute | 120000 |
Deployment configuration can change these values. For production capacity planning, use your own load test and contact support with target RPM, TPM, model IDs, and latency requirements.
How to test safely
- Start with one API key and one model.
- Increase request rate gradually.
- Watch
rate_limit_exceeded,token_rate_limit_exceeded, timeouts, and cost. - Keep prompts representative of production token size.
- Check Usage Logs for status and latency distribution.