Key Capabilities
| Feature | Description |
|---|---|
| Text-to-video | Generate video from a text description |
| Image-to-video (first frame) | Use one image as the first frame |
| Image-to-video (first and last frame) | Use two images as the first and last frames respectively |
| Multimodal reference | Combine images, video, and audio as references (1–9 images, up to 3 videos, up to 3 audio clips) |
| Video editing | Modify elements in an existing video using a reference image |
| Video extension | Extend and concatenate reference videos |
| Audio generation | Automatically generate synchronized voice, sound effects, and background music |
| Web search enhancement | Enhance generation with real-time internet content (text-to-video only) |
| Return last frame | Retrieve the last frame of the generated video |
Output Specifications
| Property | Value |
|---|---|
| Resolution | 480p, 720p, 1080p |
| Aspect ratio | 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive |
| Duration | 4–15 seconds |
| Format | mp4 |
| Frame rate | 24 fps |
Workflow
Asset Management Workflow
If you need to use persistent image, video, or audio assets (e.g. a fixed character reference), you can pre-upload them via the Asset Management API and then reference them withasset://<ID> in generation requests.
Step 1: Create an Asset Group
First, create an asset group to obtain a group ID.Step 2: Create an Asset in the Group
Using the group ID from Step 1, upload an image asset (e.g., a character reference face image).Step 3: Generate Video Using the Asset
Reference the asset ID from Step 2 to generate a video.asyn).
Step 4: Poll for Results
Use the task ID to query the generation status.Notes
- Download URLs expire in 12 hours and must be re-fetched after expiry.
- If task progress reaches 100% but returns an error, it typically means the generated content was blocked by the provider’s content moderation system (e.g., celebrity likenesses or copyrighted content). In this case, try modifying the prompt or replacing the reference image.
Examples
Text-to-Video
Multimodal Reference (Image + Video + Audio)
In the prompt you can referencecontent array items using @image1, @video1, @audio1, numbered by media type starting from 1.
Important: Assets must be passed in strict order: text, image_url, video_url, audio_url. Do not reorder them, as this may cause errors; when including multiple assets, also ensure no other asset types are mixed in.
cURL
Video Editing
cURL
Web Search Enhancement (Text-to-Video Only)
cURL
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | seedance-2.0 |
content | array | Yes | Input content array (text, image_url, video_url, audio_url) |
content[].type | string | Yes | text, image_url, video_url, or audio_url |
content[].text | string | For text type | Text prompt (recommended ≤500 Chinese chars or ≤1000 English words) |
content[].image_url.url | string | For image type | Image URL, Base64, or asset://<ID> |
content[].video_url.url | string | For video type | Video URL or asset://<ID> (mp4/mov, ≤50 MB, 2–15 seconds) |
content[].audio_url.url | string | For audio type | Audio URL, Base64, or asset://<ID> (wav/mp3, ≤15 MB) |
content[].role | string | Conditionally | first_frame, last_frame, reference_image, reference_video, reference_audio |
generate_audio | boolean | No | Generate synchronized audio, default true |
resolution | string | No | 480p, 720p, or 1080p, default 720p |
ratio | string | No | 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive, default adaptive |
duration | integer | No | 4–15 seconds, default 5 |
tools | array | No | [{"type": "web_search"}] enables web search (text-to-video only) |
watermark | boolean | No | Whether to add a watermark, default false |
Input Modes
| Mode | Content items | role values |
|---|---|---|
| Text-to-video | 1× text | — |
| Image-to-video (first frame) | text (optional) + 1× image_url | first_frame or omitted |
| Image-to-video (first and last frame) | text (optional) + 2× image_url | first_frame + last_frame |
| Multimodal reference | text (optional) + images/videos/audio | reference_image, reference_video, reference_audio |
| Video editing | text + image_url + video_url | reference_image + reference_video |
| Video extension | text + video_url(s) | reference_video |
Note: First frame, first-and-last-frame, and multimodal reference are three mutually exclusive modes that cannot be combined.
Referencing Assets in Prompts
In the text prompt you can use@<type><N> placeholders to reference media items in the content array, numbered in order of appearance within each media type:
| Placeholder | Referenced item |
|---|---|
@image1, @image2, … | 1st, 2nd, … image_url item |
@video1, @video2, … | 1st, 2nd, … video_url item |
@audio1, @audio2, … | 1st, 2nd, … audio_url item |
"The character from @image1 walks through the scene in @video1, with @audio1 as background music".
Note: Assets must be passed in strict order: text, image_url, video_url, audio_url. Do not reorder them, as this may cause errors; when including multiple assets, also ensure no other asset types are mixed in.
Resolution and Pixel Values by Aspect Ratio
| Resolution | 16:9 | 4:3 | 1:1 | 3:4 | 9:16 | 21:9 |
|---|---|---|---|---|---|---|
| 480p | 864x496 | 752x560 | 640x640 | 560x752 | 496x864 | 992x432 |
| 720p | 1280x720 | 1112x834 | 960x960 | 834x1112 | 720x1280 | 1470x630 |
| 1080p | 1920x1080 | 1664x1248 | 1440x1440 | 1248x1664 | 1080x1920 | 2208x944 |
API Reference
View the interactive API documentation for Seedance 2.0.

