Documentation
FAQ: How should I use max_tokens? in the NexoRouter documentation.
FAQ: How should I use max_tokens?
max_tokens limits the size of the model's completion. It does not reduce the tokens already present in your prompt.
When to set it
| Situation | Recommendation |
|---|---|
| First setup test | Use a small value such as 64 or 128. |
| Cost control | Set a cap that matches the expected answer length. |
| Long-form generation | Increase gradually and watch Usage Logs. |
| Agent or tool loops | Keep it bounded to avoid expensive repeated outputs. |
What it does not fix
- It does not make an oversized prompt fit.
- It does not change the selected model's context capacity.
- It does not reduce input token cost.
- It does not fix
request_too_largeif the input is already too large.
If output is cut off
- Increase
max_tokensmoderately. - Ask the model for a shorter format.
- Split the task into sections.
- Check Usage Logs for completion tokens and cost.