Skip to main content
Lip sync generation makes a person in a video appear to “say” what you specify, supporting two input modes — text-driven (with built-in TTS synthesis) or audio file-driven. This endpoint requires a session_id, which must be obtained first by calling the face identification endpoint.

Workflow Overview

1. identify-face  →  session_id
2. advanced-lip-sync (session_id + audio input)  →  task_id
3. Poll GET /kling/v1/videos/advanced-lip-sync/{task_id}  →  video URL

Input Modes

Text-Driven — Built-in TTS

Provide text, voice_id, and voice_language. The platform synthesizes speech from the text using the specified voice, then drives the lip movement.
curl https://ai.alad.com/kling/v1/videos/advanced-lip-sync \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "session_id": "YOUR_SESSION_ID",
      "text": "Hello, welcome to my channel",
      "voice_id": "girlfriend_1_cn",
      "voice_language": "zh"
    }
  }'

Audio-Driven — Using an Existing Audio File

Provide audio_url to drive lip movement directly with an audio file.
curl https://ai.alad.com/kling/v1/videos/advanced-lip-sync \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "session_id": "YOUR_SESSION_ID",
      "audio_url": "https://example.com/speech.mp3"
    }
  }'

Request Parameters

ParameterTypeRequiredDescription
input.session_idstringSession ID returned from the face identification step
input.face_image_urlstringNoFace reference image URL for improved character consistency
input.textstringRequired (text mode)The text the character should speak
input.voice_idstringRequired (text mode)TTS voice ID. See the voice ID reference to preview and choose a voice.
input.voice_languagestringRequired (text mode)Language code: zh (Chinese) or en (English)
input.audio_urlstringRequired (audio mode)Public URL of the audio file

Polling Results

After creating a task, use GET /kling/v1/videos/advanced-lip-sync/{task_id} to query the status. Refer to the task query documentation. Status progression: queuedprocessingsucceeded / failed. On success, the video download URL is at data.data.task_result.videos[0].url.

Prerequisite: Face Identification

Must call this endpoint first to obtain the session_id.

Voice ID Reference

Preview all available voices online and choose the right voice_id parameter value.

API Reference

View the interactive API documentation for Kling Lip Sync Generation.