Documentation

FAQ: How much concurrency can the API handle? in the NexoRouter documentation.

FAQ: How much concurrency can the API handle?

Concurrency is controlled by request rate, token rate, model latency, upstream availability, and your client timeout. Do not treat one successful request as proof that production concurrency is safe.

Current public defaults

LimitDefault
Requests per minute120
Estimated tokens per minute120000

Deployment configuration can change these values. For production capacity planning, use your own load test and contact support with target RPM, TPM, model IDs, and latency requirements.

How to test safely

  1. Start with one API key and one model.
  2. Increase request rate gradually.
  3. Watch rate_limit_exceeded, token_rate_limit_exceeded, timeouts, and cost.
  4. Keep prompts representative of production token size.
  5. Check Usage Logs for status and latency distribution.
FAQ: How much concurrency can the API handle? — NexoRouter