Wan Model Image-to-Video API Documentation¶
Wan/Alibaba Cloud provides high-quality image-to-video generation models. This document describes the complete API interface specification for using Wan/Alibaba Cloud models for image-to-video generation. All video generation calls use the same /v1/video/generations endpoint, with different parameters depending on the use case.
Supported Models¶
Currently supported models include:
| Model | Description |
|---|---|
| wan2.5-i2v-preview | Wan 2.5 image-to-video generation model (preview) |
| wan2.6-i2v | Wan 2.6 image-to-video generation model |
| wan2.1-kf2v-plus | Wan 2.1 first-last frame to video generation model |
Overview¶
The Wan model image-to-video feature provides an asynchronous task processing mechanism:
- Submit Task: Send an image and text prompt to create a video generation task
- Query Status: Query generation progress and status through task ID
- Get Results: Retrieve the generated video file after task completion
Task Status Flow¶
- queued: Task has been submitted and is waiting to be processed
- in_progress: Task is being processed
- completed: Task completed successfully, video has been generated
- failed: Task failed
API List¶
| Method | Path | Description |
|---|---|---|
| POST | /v1/video/generations | Submit video generation task |
| GET | /v1/video/generations/{task_id} | Query task status |
Usage Examples¶
1. Basic Image-to-Video (First Frame)¶
The simplest form of image-to-video generation uses a single image as the first frame. The first frame is specified via the input_reference field of the request. It can be either a URL or base64-encoded data.
Note: Unlike Veo, the base64 data must be presented in data URI format, in which the encoded data is prefixed with the MIME type: data:{MIME_TYPE};base64,{base64_data}, as opposed to simply sending the base64 data. See official documentation for examples and further detail.
Request Body:
{
"prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
"model": "wan2.5-i2v-preview",
"input_reference": "...",
"metadata": {
"input": {
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"resolution": "1080P",
"duration": 5,
"audio": true,
"watermark": false,
"prompt_extend": false
}
}
}
Or using a URL:
{
"prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
"model": "wan2.5-i2v-preview",
"input_reference": "https://example.com/first-frame.png",
"metadata": {
"input": {
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"resolution": "1080P",
"duration": 5,
"audio": true,
"watermark": false,
"prompt_extend": false
}
}
}
Complete Request (base64):
curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY" \
-d '{
"prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
"model": "wan2.5-i2v-preview",
"input_reference": "...",
"metadata": {
"input": {
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"resolution": "1080P",
"duration": 5,
"audio": true,
"watermark": false,
"prompt_extend": false
}
}
}'
Complete Request (URL):
curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY" \
-d '{
"prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
"model": "wan2.5-i2v-preview",
"input_reference": "https://example.com/first-frame.png",
"metadata": {
"input": {
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"resolution": "1080P",
"duration": 5,
"audio": true,
"watermark": false,
"prompt_extend": false
}
}
}'
2. First and Last Frames¶
This feature currently only supports the wan2.1-kf2v-plus model. The first and last frames are specified via the metadata.input.first_frame_url and metadata.input.last_frame_url fields.
Note: Unlike the first-frame only image-to-video generation use case, these fields only accept URLs, not base64-encoded data.
Limitations: In first-and-last-frame mode, resolution is fixed at 720P, duration is fixed at 5 seconds, and audio and shot_type parameters are not available.
Request Body:
{
"prompt": "The hand-shaped statue cracks and collapses, with pieces from above the wrist falling into the water.",
"model": "wan2.1-kf2v-plus",
"metadata": {
"input": {
"first_frame_url": "https://example.com/first-frame.png",
"last_frame_url": "https://example.com/last-frame.png",
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"watermark": false,
"prompt_extend": false,
"seed": 12345
}
}
}
Complete Request:
curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY" \
-d '{
"prompt": "The hand-shaped statue cracks and collapses, with pieces from above the wrist falling into the water.",
"model": "wan2.1-kf2v-plus",
"metadata": {
"input": {
"first_frame_url": "https://example.com/first-frame.png",
"last_frame_url": "https://example.com/last-frame.png",
"negative_prompt": "blurry, low quality, distorted"
},
"parameters": {
"watermark": false,
"prompt_extend": false,
"seed": 12345
}
}
}'
Request Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Model name, e.g., wan2.5-i2v-preview or wan2.1-kf2v-plus |
| prompt | string | Yes | Text prompt describing the video content to be generated |
| input_reference | string | Yes (first frame mode) | URL or base64-encoded data (data URI format) for the first frame |
| metadata | object | No | Metadata object containing input and parameters sub-objects for specifying optional fields from the official Wan request format |
metadata.input Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| img_url | string | No | URL for the first frame image. Note: In first-frame mode, this can also be provided via the top-level input_reference field. For first-and-last-frame mode (wan2.1-kf2v-plus), use first_frame_url and last_frame_url instead |
| first_frame_url | string | Yes (first and last frame mode) | URL for the first frame image. Supported model: wan2.1-kf2v-plus (first-and-last-frame mode only, accepts URLs only, not base64-encoded data) |
| last_frame_url | string | Yes (first and last frame mode) | URL for the last frame image. Supported model: wan2.1-kf2v-plus (first-and-last-frame mode only, accepts URLs only, not base64-encoded data) |
| negative_prompt | string | No | Negative prompt text to exclude certain elements from the video |
| audio_url | string | No | URL of custom audio file for audio-visual synchronization. When provided, the parameters.audio parameter is ignored. Supported models: wan2.5-i2v-preview, wan2.6-i2v. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter |
metadata.parameters Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| resolution | string | No | Video resolution. Options: "480P" (wan2.5 only), "720P", "1080P". Note: The aspect ratio of the output video is determined by the input first frame image, with minor adjustments to meet technical requirements (width and height must be divisible by 16). First-and-last-frame mode (wan2.1-kf2v-plus) is fixed at 720P |
| prompt_extend | boolean | No | Enable intelligent prompt rewriting |
| duration | integer | No | Video duration in seconds. Options: 5, 10, 15 (wan2.6 only). First-and-last-frame mode (wan2.1-kf2v-plus) is fixed at 5 seconds |
| audio | boolean | No | Enable automatic dubbing/background audio generation. When input.audio_url is not provided, setting to true will automatically generate matching background audio or music. Supported models: wan2.5-i2v-preview, wan2.6-i2v. Note: wan2.2 and earlier versions output only silent videos. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter |
| watermark | boolean | No | Add watermark to the video |
| seed | integer | No | Random seed for generation reproducibility. Same seed can produce similar results |
| shot_type | string | No | Specifies the shot type of the generated video, i.e., whether the video consists of a single continuous shot or multiple switched shots. Options: "single" (default, outputs a single-shot video) or "multi" (outputs a multi-shot video). Supported model: wan2.6-i2v. Note: This parameter takes effect only when prompt_extend is set to true. Parameter priority: shot_type > prompt. First-and-last-frame mode (wan2.1-kf2v-plus) does not support this parameter |
Audio Parameter Notes:
Audio behavior is controlled by input.audio_url and parameters.audio parameters. Priority: audio_url > audio. Three modes are supported:
- Generate silent video: Do not pass
audio_url, and setaudiotofalse - Automatically generate audio: Do not pass
audio_url, and setaudiototrue(the model automatically generates matching background audio or music based on the prompt and video content) - Use custom audio: Pass
audio_url(theaudioparameter is ignored, and the video content attempts to align with the audio content, such as lip movements and rhythm)
1. Submit Video Generation Task¶
Endpoint:¶
Request Headers:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| Content-Type | string | Yes | application/json |
| Authorization | string | Yes | Bearer API_KEY |
Response Example:¶
{
"id": "...",
"object": "video",
"model": "wan2.5-i2v-preview",
"status": "queued",
"progress": 0,
"created_at": 1765328779
}
Response Field Descriptions:¶
| Field | Type | Description |
|---|---|---|
| id | string | Task ID for subsequent task status queries |
| object | string | Object type, fixed as "video" |
| model | string | Model used to generate the video |
| status | string | Task status, initially "queued" |
| progress | integer | Task progress, 0-100 |
| created_at | integer | Task creation timestamp |
2. Query Task Status¶
Complete Request¶
curl -X GET "https://computevault.unodetech.xyz/v1/video/generations/TASK_ID" \
-H "Authorization: Bearer API_KEY"
Endpoint:¶
Request Headers:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Yes | Bearer API_KEY |
Path Parameters:¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| task_id | string | Yes | Task ID |
Response Example (Processing):¶
{
"code": "success",
"message": "",
"data": {
"task_id": "...",
"action": "textGenerate",
"status": "IN_PROGRESS",
"fail_reason": "",
"submit_time": 1765328779,
"start_time": 1765328794,
"finish_time": 0,
"progress": "30%",
"data": {
"output": {
"scheduled_time": "2025-12-10 09:06:19.749",
"submit_time": "2025-12-10 09:06:19.731",
"task_id": "...",
"task_status": "RUNNING"
},
"request_id": "..."
}
}
}
Response Example (Success):¶
{
"code": "success",
"message": "",
"data": {
"task_id": "...",
"action": "textGenerate",
"status": "SUCCESS",
"fail_reason": "<OUTPUT_URL>",
"submit_time": 1765328779,
"start_time": 1765328794,
"finish_time": 1765328947,
"progress": "100%",
"data": {
"output": {
"actual_prompt": "<EDITED_PROMPT>",
"end_time": "2025-12-10 09:08:53.863",
"orig_prompt": "The natural light above gains a red tint, and the water in the shallow pool surrounding the hand statue begins to overflow, flooding the surrounding area.",
"scheduled_time": "2025-12-10 09:06:19.749",
"submit_time": "2025-12-10 09:06:19.731",
"task_id": "...",
"task_status": "SUCCEEDED",
"video_url": "<OUTPUT_URL>"
},
"request_id": "...",
"usage": {
"video_count": 1,
"video_duration": 5,
"video_ratio": "1920*1080"
}
}
}
}
You can retrieve the video URL from the data.data.output.video_url field.
Response Example (Failed):¶
{
"code": "success",
"message": "",
"data": {
"task_id": "...",
"action": "textGenerate",
"status": "FAILURE",
"fail_reason": "task failed, code: InvalidParameter , message: image_url must provided",
"submit_time": 1765407269,
"start_time": 1765407278,
"finish_time": 1765407294,
"progress": "100%",
"data": {
"output": {
"code": "InvalidParameter",
"end_time": "2025-12-11 06:54:49.934",
"message": "image_url must provided",
"scheduled_time": "2025-12-11 06:54:29.557",
"submit_time": "2025-12-11 06:54:29.529",
"task_id": "...",
"task_status": "FAILED"
},
"request_id": "..."
}
}
}
Response Field Descriptions:¶
| Field | Type | Description |
|---|---|---|
| code | string | Response status code, "success" indicates success |
| message | string | Response message |
| data | object | Task data object |
| data.task_id | string | Task ID |
| data.status | string | Task status: IN_PROGRESS, SUCCESS, FAILURE |
| data.progress | string | Task progress percentage |
| data.data.output.video_url | string | Video access URL (when task succeeds) |
| data.data.output.task_status | string | Task status: RUNNING, SUCCEEDED, FAILED |
| data.data.usage | object | Usage statistics (when task succeeds) |
| data.data.usage.video_count | integer | Number of videos generated |
| data.data.usage.video_duration | integer | Video duration (seconds) |
| data.data.usage.video_ratio | string | Video resolution |
Important Notes¶
-
Base64 Data Format: For first frame mode, base64 data must use data URI format:
data:{MIME_TYPE};base64,{base64_data}, not plain base64 strings. -
First and Last Frame Mode Limitations: The first and last frame fields for the
wan2.1-kf2v-plusmodel only accept URLs, not base64-encoded data. -
Model Selection:
wan2.5-i2v-preview: Supports first frame mode image-to-video-
wan2.1-kf2v-plus: Supports first and last frame mode image-to-video -
Metadata: The request's
metadatafield can be used to write any field that exists in the official request format. For example, if you need to specify the official format'sparameters.resolutionin the request, usemetadata.parameters.resolution. See official documentation for details about optional request parameters and their allowed values.