Create a chat completion
Strict MVP subset. Unknown top-level fields are rejected.
Streaming uses SSE and ends with [DONE].
Requires Authorization: Bearer $ROUTIFY_API_KEY.
Authentication
Bearer authentication header of the form Authorization: Bearer $ROUTIFY_API_KEY.
Request
Model ID used to generate the response. Use GET /v1/models to list all available models.
If set to true, the response is streamed to the client as it is generated using server-sent events. The stream ends with data: [DONE].
Constrains effort on reasoning for reasoning models (e.g. o3, o4-mini). Supported values: low, medium, high, xhigh. Lower effort reduces latency and cost; higher effort improves accuracy on complex tasks.
Controls the format of reasoning summaries in the response. Supported values: auto, detail, concise.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We recommend altering this or top_p but not both.
An alternative to sampling with temperature, called nucleus sampling, where the model considers only the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We recommend altering this or temperature but not both.
Response
Successful response (JSON or SSE stream)
The object type. Always chat.completion.
The Unix timestamp (in seconds) of when the chat completion was created.