Key Capabilities
- OpenAI-compatible — Drop-in replacement for the OpenAI SDK, no other code changes needed
- 1M context window — 128K max output tokens
- Adaptive thinking — Triggers reasoning only when the task requires it, reducing unnecessary thinking tokens
- Fast mode — Up to 2.5× output speed (research preview, premium pricing)
- In-conversation system messages — Append updated instructions without resending the full system prompt
- Lower cache threshold — Minimum cacheable prompt length reduced to 1,024 tokens
- Streaming — Real-time token streaming via SSE
Quick Example
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Fixed value: claude-opus-4-8 |
messages | array | Yes | Array of { role, content } objects |
max_tokens | integer | No | Maximum number of tokens to generate |
stream | boolean | No | Enable SSE streaming, default false |
stop | string / array | No | Sequences where generation stops |
Claude Opus 4.8 does not support
temperature, top_p, and top_k parameters. Setting non-default values returns a 400 error — guide model behavior through prompts instead.API Reference
View the interactive API Playground for Claude Opus 4.8.

