Skip to main content
POST
/
v1
/
chat
/
completions
cURL
curl https://api.routify.ru/v1/chat/completions \
  -H "Authorization: Bearer $ROUTIFY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'
{
  "id": "<string>",
  "object": "<string>",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "<string>"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 1,
    "total_tokens": 1,
    "prompt_tokens_details": {
      "cached_tokens": 1,
      "cache_write_tokens": 1
    }
  }
}
Unknown parameters are forwarded to the upstream provider as-is. Errors for unsupported parameters (e.g. tools, response_format) come from the provider, not Routify.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Authorization: Bearer $ROUTIFY_API_KEY.

Body

application/json
model
string
required

Model ID used to generate the response. Use GET /v1/models to list all available models.

messages
object[]
required

A list of messages comprising the conversation so far.

Minimum array length: 1
stream
boolean
default:false

If set to true, the response is streamed to the client as it is generated using server-sent events. The stream ends with data: [DONE].

max_tokens
integer

An upper bound for the number of tokens that can be generated in the completion.

Required range: x >= 1
reasoning_effort
enum<string>

Constrains effort on reasoning for reasoning models (e.g. o3, o4-mini). Supported values: low, medium, high, xhigh. Lower effort reduces latency and cost; higher effort improves accuracy on complex tasks.

Available options:
low,
medium,
high,
xhigh
verbosity
enum<string>

Controls verbosity of the model response.

Available options:
low,
medium,
high
reasoningSummary
enum<string>

Controls the format of reasoning summaries in the response. Supported values: auto, detail, concise.

Available options:
auto,
detail,
concise
temperature
number

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We recommend altering this or top_p but not both.

Required range: 0 <= x <= 2
top_p
number

An alternative to sampling with temperature, called nucleus sampling, where the model considers only the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We recommend altering this or temperature but not both.

Required range: 0 <= x <= 1
stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Response

Successful response (JSON or SSE stream)

id
string
required

A unique identifier for the chat completion.

object
string
required

The object type. Always chat.completion.

Allowed value: "chat.completion"
created
integer
required

The Unix timestamp (in seconds) of when the chat completion was created.

model
string
required

The model used for the chat completion.

choices
object[]
required

A list of chat completion choices.

Minimum array length: 1
usage
object
required