Key Capabilities
- OpenAI-compatible — Drop-in replacement for the OpenAI SDK, no other code changes needed
- Thinking mode — Deep thinking with configurable reasoning intensity
- Function calling — Native support for tools and function calls
- JSON mode — Structured output via
response_format - Long context — Supports large document processing and multi-turn conversations
- Streaming — Real-time token streaming via SSE
- Prompt caching — Automatic caching with hit/miss statistics returned
Quick Example
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Fixed value: deepseek-v4-pro |
messages | array | Yes | Array of { role, content } objects; supports system, user, assistant, tool roles |
thinking | object | No | Enable/disable thinking: {"type": "enabled"} or {"type": "disabled"}. Optional reasoning_effort: high, max, low, medium |
max_tokens | integer | No | Maximum number of tokens to generate |
temperature | float | No | 0–2, controls randomness, default 1 |
stream | boolean | No | Enable SSE streaming, default false |
stream_options | object | No | {"include_usage": true} to return usage statistics at end of stream |
top_p | float | No | Nucleus sampling threshold, default 1 |
stop | string / array | No | Sequences where generation stops (up to 16) |
response_format | object | No | {"type": "json_object"} to enable JSON mode |
tools | array | No | List of function calling tools |
tool_choice | string / object | No | Tool selection control: none, auto, required, or specify a function |
user_id | string | No | Custom user ID for content safety and KVCache isolation |
API Reference
View the interactive API Playground for DeepSeek V4 Pro.

