Kling Lip Sync Generation

Lip sync generation makes a person in a video appear to “say” what you specify, supporting two input modes — text-driven (with built-in TTS synthesis) or audio file-driven. This endpoint requires a session_id, which must be obtained first by calling the face identification endpoint.

Workflow Overview

identify-face  →  session_id
advanced-lip-sync (session_id + audio input)  →  task_id
Poll GET /kling/v1/videos/advanced-lip-sync/{task_id}  →  video URL

Input Modes

Text-Driven — Built-in TTS

Provide text, voice_id, and voice_language. The platform synthesizes speech from the text using the specified voice, then drives the lip movement.

curl https://ai.alad.com/kling/v1/videos/advanced-lip-sync \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "session_id": "YOUR_SESSION_ID",
      "text": "Hello, welcome to my channel",
      "voice_id": "girlfriend_1_cn",
      "voice_language": "zh"
    }
  }'

Audio-Driven — Using an Existing Audio File

Provide audio_url to drive lip movement directly with an audio file.

curl https://ai.alad.com/kling/v1/videos/advanced-lip-sync \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "session_id": "YOUR_SESSION_ID",
      "audio_url": "https://example.com/speech.mp3"
    }
  }'

Request Parameters

Parameter	Type	Required	Description
`input.session_id`	string	✅	Session ID returned from the face identification step
`input.face_image_url`	string	No	Face reference image URL for improved character consistency
`input.text`	string	Required (text mode)	The text the character should speak
`input.voice_id`	string	Required (text mode)	TTS voice ID. See the voice ID reference to preview and choose a voice.
`input.voice_language`	string	Required (text mode)	Language code: `zh` (Chinese) or `en` (English)
`input.audio_url`	string	Required (audio mode)	Public URL of the audio file

Polling Results

After creating a task, use GET /kling/v1/videos/advanced-lip-sync/{task_id} to query the status. Refer to the task query documentation. Status progression: queued → processing → succeeded / failed. On success, the video download URL is at data.data.task_result.videos[0].url.

Prerequisite: Face Identification

Must call this endpoint first to obtain the session_id.

Voice ID Reference

Preview all available voices online and choose the right voice_id parameter value.

API Reference

View the interactive API documentation for Kling Lip Sync Generation.

​Workflow Overview

​Input Modes

​Text-Driven — Built-in TTS

​Audio-Driven — Using an Existing Audio File

​Request Parameters

​Polling Results