API Parameters Support
Completion API
Supported Parameters
model - ID of the model to use
prompt - The prompt(s) to generate completions for
suffix - The suffix that comes after a completion
max_tokens - Maximum number of tokens to generate
temperature - Sampling temperature (0-2)
top_p - Nucleus sampling parameter
n - Number of completions to generate
stream - Whether to stream back partial progress
logprobs - Include log probabilities of tokens
echo - Echo back the prompt
stop - Sequences where generation should stop
presence_penalty - Penalize new tokens based on presence
frequency_penalty - Penalize tokens based on frequency
best_of - Generate multiple completions and return best
logit_bias - Modify likelihood of specified tokens
user - Unique identifier representing end-user
Chat API
Supported Parameters
model - ID of the model to use
messages - List of conversation messages
temperature - Sampling temperature (0-2)
top_p - Nucleus sampling parameter
n - Number of chat completion choices
stream - Stream partial message deltas
stop - Sequences where generation should stop
max_tokens - Maximum tokens to generate
presence_penalty - Penalize new tokens based on presence
frequency_penalty - Penalize tokens based on frequency
logit_bias - Modify likelihood of specified tokens
user - Unique identifier representing end-user
transforms - Array of transformation strategies to apply when needed (see below)
Message Transforms
The Chat API supports message transformations through the transforms
parameter to handle situations where conversations exceed the model's maximum context size or message count limits.
Available Transforms
middle-out - Intelligently compresses conversations by removing messages from the middle while preserving context from the beginning and end.
How Middle-Out Transform Works
When middle-out compression is enabled, the API will:
Check if the conversation exceeds the model's context length or message count limits
If within limits, no transformation is applied
If limits are exceeded:
Preserve all system messages to maintain important instructions
Keep messages from the beginning of the conversation for essential context
Keep recent messages from the end of the conversation
Remove messages from the middle
Add a basic summary message indicating what was removed
The algorithm targets about 80% of the model's maximum context size or message count limit, providing a buffer while preserving as much context as possible.
Token Limits
For token context limits, the algorithm:
Estimates the average tokens per message
Calculates how many messages to remove to reach the target token count
Distributes the kept messages evenly between the start and end of the conversation
Message Count Limits
For models with message count limits (e.g., Claude with ~1000 message limit):
Targets 80% of the maximum allowed messages
Keeps a balanced number of messages from the beginning and end
Adds a single summary message to maintain continuity
Usage Example
When this feature is enabled with the Chat API, you can work with very long conversations that would otherwise exceed model limits. The API will intelligently compress the conversation while keeping the most relevant parts, ensuring continuity and context preservation.
Last updated