Documentation

FAQ: How should I use max_tokens? in the NexoRouter documentation.

FAQ: How should I use max_tokens?

max_tokens limits the size of the model's completion. It does not reduce the tokens already present in your prompt.

When to set it

SituationRecommendation
First setup testUse a small value such as 64 or 128.
Cost controlSet a cap that matches the expected answer length.
Long-form generationIncrease gradually and watch Usage Logs.
Agent or tool loopsKeep it bounded to avoid expensive repeated outputs.

What it does not fix

  • It does not make an oversized prompt fit.
  • It does not change the selected model's context capacity.
  • It does not reduce input token cost.
  • It does not fix request_too_large if the input is already too large.

If output is cut off

  1. Increase max_tokens moderately.
  2. Ask the model for a shorter format.
  3. Split the task into sections.
  4. Check Usage Logs for completion tokens and cost.
FAQ: How should I use max_tokens? — NexoRouter