Key Capabilities
- SSE streaming — Real-time delivery of thinking chunks and image chunks
- Thinking mode — Internal reasoning chunks (
thought: true) streamed before the image - Text-to-image — Generate images from text descriptions
- Image editing — Pass a reference image via
inline_datacombined with text instructions - Aspect ratio control —
1:1,4:3,3:4,16:9,9:16 - Resolution control —
1K(~1024px),2K(~2048px),4K(~4096px, by longest side)
SSE Response Format
The streaming endpoint returns newline-delimited SSE data lines, each starting withdata: followed by a JSON object. There are three chunk types:
- Thinking chunk — Arrives first;
parts[0].thoughtistrue - Image chunk — Contains
parts[0].inlineDatawithmimeTypeand base64data(note: camelCase in streaming responses) - Final usage chunk — Contains top-level
usageMetadatawiththoughtsTokenCountand per-modality token details
In streaming responses, the image field is
inlineData (camelCase), while in the request body it is inline_data (snake_case). This is native Gemini API behavior.Text-to-Image Example
Image Editing Example (with Reference Image)
Pass both atext instruction and an inline_data reference image in the same parts array.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
key | string | Yes | API key (query parameter) |
alt | string | No | Set to sse to explicitly enable SSE mode (optional, streaming is the default behavior) |
contents[].parts[].text | string | Yes | Text prompt or instruction |
contents[].parts[].inline_data.mime_type | string | No | Reference image type: image/jpeg, image/png, image/webp |
contents[].parts[].inline_data.data | string | No | Base64-encoded reference image data |
generationConfig.responseModalities | array | Yes | ["IMAGE"] or ["TEXT", "IMAGE"] |
generationConfig.imageConfig.aspectRatio | string | No | 1:1 / 4:3 / 3:4 / 16:9 / 9:16 |
generationConfig.imageConfig.imageSize | string | No | 1K / 2K / 4K (default 1K) |
API Reference
View the interactive API Playground for Gemini 3.1 Flash Image Preview (Streaming).

