API Parameters Support

Completion API

Supported Parameters

model - ID of the model to use
prompt - The prompt(s) to generate completions for
suffix - The suffix that comes after a completion
max_tokens - Maximum number of tokens to generate
temperature - Sampling temperature (0-2)
top_p - Nucleus sampling parameter
n - Number of completions to generate
stream - Whether to stream back partial progress
logprobs - Include log probabilities of tokens
echo - Echo back the prompt
stop - Sequences where generation should stop
presence_penalty - Penalize new tokens based on presence
frequency_penalty - Penalize tokens based on frequency
best_of - Generate multiple completions and return best
logit_bias - Modify likelihood of specified tokens
user - Unique identifier representing end-user

Chat API

Supported Parameters

model - ID of the model to use
messages - List of conversation messages
temperature - Sampling temperature (0-2)
top_p - Nucleus sampling parameter
n - Number of chat completion choices
stream - Stream partial message deltas
stop - Sequences where generation should stop
max_tokens - Maximum tokens to generate
presence_penalty - Penalize new tokens based on presence
frequency_penalty - Penalize tokens based on frequency
logit_bias - Modify likelihood of specified tokens
user - Unique identifier representing end-user
transforms - Array of transformation strategies to apply when needed (see below)

Message Transforms

The Chat API supports message transformations through the transforms parameter to handle situations where conversations exceed the model's maximum context size or message count limits.

Available Transforms

middle-out - Intelligently compresses conversations by removing messages from the middle while preserving context from the beginning and end.

How Middle-Out Transform Works

When middle-out compression is enabled, the API will:

Check if the conversation exceeds the model's context length or message count limits
If within limits, no transformation is applied
If limits are exceeded:
- Preserve all system messages to maintain important instructions
- Keep messages from the beginning of the conversation for essential context
- Keep recent messages from the end of the conversation
- Remove messages from the middle
- Add a basic summary message indicating what was removed

The algorithm targets about 80% of the model's maximum context size or message count limit, providing a buffer while preserving as much context as possible.

Token Limits

For token context limits, the algorithm:

Estimates the average tokens per message
Calculates how many messages to remove to reach the target token count
Distributes the kept messages evenly between the start and end of the conversation

Message Count Limits

For models with message count limits (e.g., Claude with ~1000 message limit):

Targets 80% of the maximum allowed messages
Keeps a balanced number of messages from the beginning and end
Adds a single summary message to maintain continuity

Usage Example

{
  "model": "mistralai/mistral-small-24b-instruct-2501",
  "transforms": ["middle-out"],
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"},
    // ... many more messages ...
    {"role": "user", "content": "What's the weather today?"}
  ]
}

When this feature is enabled with the Chat API, you can work with very long conversations that would otherwise exceed model limits. The API will intelligently compress the conversation while keeping the most relevant parts, ensuring continuity and context preservation.

PreviousQuick Start

Last updated 4 months ago