POST /v1/chat/completions — OpenAI-compatible chat
tools, response_format) come from the provider, not Routify.Bearer authentication header of the form Authorization: Bearer $ROUTIFY_API_KEY.
Model ID used to generate the response. Use GET /v1/models to list all available models.
A list of messages comprising the conversation so far.
1If set to true, the response is streamed to the client as it is generated using server-sent events. The stream ends with data: [DONE].
An upper bound for the number of tokens that can be generated in the completion.
x >= 1Constrains effort on reasoning for reasoning models (e.g. o3, o4-mini). Supported values: low, medium, high, xhigh. Lower effort reduces latency and cost; higher effort improves accuracy on complex tasks.
low, medium, high, xhigh Controls verbosity of the model response.
low, medium, high Controls the format of reasoning summaries in the response. Supported values: auto, detail, concise.
auto, detail, concise What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We recommend altering this or top_p but not both.
0 <= x <= 2An alternative to sampling with temperature, called nucleus sampling, where the model considers only the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We recommend altering this or temperature but not both.
0 <= x <= 1Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
Successful response (JSON or SSE stream)
A unique identifier for the chat completion.
The object type. Always chat.completion.
"chat.completion"The Unix timestamp (in seconds) of when the chat completion was created.
The model used for the chat completion.
A list of chat completion choices.
1