Sign in Get API key

Documentation

FAQ: How should I use max_tokens? in the NexoRouter documentation.

FAQ: How should I use max_tokens?

max_tokens limits the size of the model's completion. It does not reduce the tokens already present in your prompt.

When to set it

Situation	Recommendation
First setup test	Use a small value such as `64` or `128`.
Cost control	Set a cap that matches the expected answer length.
Long-form generation	Increase gradually and watch Usage Logs.
Agent or tool loops	Keep it bounded to avoid expensive repeated outputs.

What it does not fix

It does not make an oversized prompt fit.
It does not change the selected model's context capacity.
It does not reduce input token cost.
It does not fix request_too_large if the input is already too large.

If output is cut off

Increase max_tokens moderately.
Ask the model for a shorter format.
Split the task into sections.
Check Usage Logs for completion tokens and cost.

FAQ: How should I use max_tokens? — NexoRouter