Skip to content

Veo Model Image-to-Video API Documentation

Veo is a high-quality image-to-video generation model developed by Google. This document describes the complete API interface specification for using Google Veo model for image-to-video generation. All video generation calls use the same /v1/video/generations endpoint, with different parameters depending on the use case. Image data is provided as a base64-encoded string.


Overview

The Veo model image-to-video feature provides an asynchronous task processing mechanism:

  1. Submit Task: Send an image and text prompt to create a video generation task
  2. Query Status: Query generation progress and status through task ID
  3. Get Results: Retrieve the generated video file after task completion

Task Status Flow

queued → in_progress → completed
            failed
  • queued: Task has been submitted and is waiting to be processed
  • in_progress: Task is being processed
  • completed: Task completed successfully, video has been generated
  • failed: Task failed

API List

Method Path Description
POST /v1/video/generations Submit video generation task (standard format)
GET /v1/video/generations/{task_id} Query task status (standard format)
POST /v1/videos Submit video generation task
GET /v1/videos/{task_id} Query task status
GET /v1/videos/{task_id}/content Get video content (streaming download)

Usage Examples

1. Basic Image-to-Video

The simplest form of image-to-video generation uses a single image as the first frame.

Request Body:

{
  "model": "veo-3.0-fast-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {}
}

2. First and Last Frames

The image in the image field specifies the first frame of the video. The image in metadata.lastFrame specifies the last frame. This allows you to control both the starting and ending frames of the generated video.

Note: This feature is only supported by Veo 3.1 models.

Request Body:

{
  "model": "veo-3.0-fast-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {
    "lastFrame": "<BASE64_ENCODED_IMAGE_DATA>"
  }
}

3. Reference Images

Images are specified in an array in metadata.referenceImages, containing up to 3 elements. Each reference image is an object containing image: base64-encoded image data and referenceType: a string with value "asset" or "style".

Note: This feature is only supported by veo-3.1-generate-preview.

Request Body:

{
  "model": "veo-3.0-fast-generate-001",
  "prompt": "A cat playing piano in a beautiful garden",
  "image": "<BASE64_ENCODED_IMAGE_DATA>",
  "metadata": {
    "referenceImages": [
      {
        "image": "<BASE64_ENCODED_IMAGE_DATA>",
        "referenceType": "asset"
      },
      {
        "image": "<BASE64_ENCODED_IMAGE_DATA>",
        "referenceType": "style"
      }
    ]
  }
}

Request Parameters:

Parameter Type Required Description
model string Yes Model name, e.g., veo-3.0-fast-generate-001
prompt string Yes Text prompt describing the video content to be generated
image string Yes Base64-encoded image data for the first frame
metadata object No Extended parameters object

metadata Parameters:

Parameter Type Required Description
aspectRatio string No Video aspect ratio, options: "16:9", "9:16"
durationSeconds number No Video duration (seconds), options: 4, 6, 8
negativePrompt string No Negative prompt describing content not desired in the video
personGeneration string No Person generation strategy, options: "allow_all" (text-to-video), "allow_adult" (image-to-video)
resolution string No Video resolution, e.g., "1080p", "720p"
sampleCount number No Number of videos to generate, default 1
storageUri string No Google Cloud Storage URI for storing generated videos
lastFrame string No Base64-encoded image data for the last frame (Veo 3.1 models only)
referenceImages array No Array of reference images, up to 3 elements (veo-3.1-generate-preview only)

referenceImages Array Elements:

Parameter Type Required Description
image string Yes Base64-encoded image data
referenceType string Yes Reference type, options: "asset" or "style"

1. Submit Video Generation Task

Complete Request:

curl -X POST "https://computevault.unodetech.xyz/v1/video/generations" -H "Content-Type: application/json" -H "Authorization: Bearer API_KEY" -d @veoImageToVideoTest.json

Endpoint:

POST /v1/video/generations

Request Headers:

Parameter Type Required Description
Content-Type string Yes application/json
Authorization string Yes Bearer API_KEY

Response Example:

{
  "task_id": "TASK_ID"
}

Response Field Descriptions:

Field Type Description
task_id string Task ID for subsequent task status queries

2. Query Task Status

Complete Standard Format Endpoint

curl -X GET "https://computevault.unodetech.xyz/v1/video/generations/TASK_ID" -H "Authorization: Bearer API_KEY"

Endpoint:

GET /v1/video/generations/{task_id}

Request Headers:

Parameter Type Required Description
Authorization string Yes Bearer API_KEY

Path Parameters:

Parameter Type Required Description
task_id string Yes Task ID

Response Example (Processing):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "processing",
    "task_id": "TASK_ID",
    "url": ""
  }
}

Response Example (Success):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "succeeded",
    "task_id": "TASK_ID",
    "url": "https://computevault.unodetech.xyz/v1/videos/TASK_ID/content"
  }
}

Note: Depending on the AI service provider, the video will be returned either as base64-encoded data in the bytes_base64_encoded field (Vertex) or via a content URL in the url field (Gemini).

Response Example (Failed):

{
  "code": "success",
  "message": "",
  "data": {
    "bytes_base64_encoded": "",
    "error": null,
    "format": "mp4",
    "metadata": null,
    "status": "failed",
    "task_id": "TASK_ID",
    "url": "Reference to video does not support this mix of reference images."
  }
}

When a task fails, the url field contains the error message instead of a video URL.

Response Field Descriptions:

Field Type Description
code string Response status code, "success" indicates success
data object Task data object
data.task_id string Task ID
data.status string Task status: queued, in_progress, succeeded, failed
data.format string Video format, e.g., "mp4"
data.url string Video access URL (when task succeeds), or error message (when task fails)
data.bytes_base64_encoded string Base64-encoded video data (when available)
data.error object Error information (when task fails)
message string Error message