Create a chat completion

Strict MVP subset. Unknown top-level fields are rejected. Streaming uses SSE and ends with [DONE]. Requires Authorization: Bearer $ROUTIFY_API_KEY.

Authentication

AuthorizationBearer

Bearer authentication header of the form Authorization: Bearer $ROUTIFY_API_KEY.

Request

This endpoint expects an object.
modelstringRequired

Model ID used to generate the response. Use GET /v1/models to list all available models.

messageslist of objectsRequired
A list of messages comprising the conversation so far.
streambooleanOptionalDefaults to false

If set to true, the response is streamed to the client as it is generated using server-sent events. The stream ends with data: [DONE].

max_tokensintegerOptional>=1
An upper bound for the number of tokens that can be generated in the completion.
reasoning_effortenumOptional

Constrains effort on reasoning for reasoning models (e.g. o3, o4-mini). Supported values: low, medium, high, xhigh. Lower effort reduces latency and cost; higher effort improves accuracy on complex tasks.

Allowed values:
verbosityenumOptional
Controls verbosity of the model response.
Allowed values:
reasoningSummaryenumOptional

Controls the format of reasoning summaries in the response. Supported values: auto, detail, concise.

Allowed values:
temperaturedoubleOptional0-2

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We recommend altering this or top_p but not both.

top_pdoubleOptional0-1

An alternative to sampling with temperature, called nucleus sampling, where the model considers only the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We recommend altering this or temperature but not both.

stopstring or list of stringsOptional
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Response

Successful response (JSON or SSE stream)

idstring
A unique identifier for the chat completion.
object"chat.completion"

The object type. Always chat.completion.

createdinteger

The Unix timestamp (in seconds) of when the chat completion was created.

modelstring
The model used for the chat completion.
choiceslist of objects
A list of chat completion choices.
usageobject

Errors