How to Generate AI Videos with Your Agent (OpenClaw + Veevid)
Install the Veevid skill, say one sentence, and let your AI agent generate videos for you. 10 models, automatic quoting, and zero manual API work.

You can generate AI videos by talking to your agent. No API calls, no dashboards, no manual polling - just say what you want and get a video back.
The Veevid skill connects your OpenClaw agent to Veevid's API, giving it access to 10 video generation models. Your agent handles model selection, cost estimation, generation, and delivery.
What You Need
- An OpenClaw agent
- A Veevid account with credits (sign up here)
- An API key from veevid.ai/settings/api-keys
Step 1: Install the Skill
One command:
npx clawhub@latest install veevid
Then save your API key:
mkdir -p ~/.config/veevid
echo "vv_sk_your_key_here" > ~/.config/veevid/api_key
Your agent now knows how to generate videos.
Step 2: Talk to Your Agent
Text to Video
Describe what you want:
You: "Generate a cinematic video of a golden retriever running through sunflowers"
Agent: "I'll use Veo 3.1 standard mode. Cost: 20 credits. You have 451 credits. Proceed?"
You: "Yes"
Agent: "Done! Here's your 8-second video: [video URL]"
What the agent does:
- Picks the best model for your request
- Calls
/api/quoteto get the exact cost - Asks you to confirm before spending credits
- Generates the video and polls until ready
- Returns the video URL
Image to Video
Send an image and ask your agent to animate it:
You: "Turn this product photo into a video" (attach image)
Agent: finds your image in chat history "Is this the right image?" quotes the message
You: "Yes, use Kling 3.0, 10 seconds"
Agent: "Kling 3.0 Standard, 10 seconds with audio. Cost: 250 credits. You have 400 credits. Proceed?"
You: "Go ahead"
Agent: "Video ready! [video URL]"
The agent handles image upload, model configuration, and polling - you just talk.
On a Budget
You: "Make a quick test video, keep it cheap"
Agent: "The cheapest option is Grok Imagine at 480p for 6 seconds - 10 credits. Want me to use that?"
The 10 Models
Your agent can access all of these:
| Model | Best For | Credits |
|---|---|---|
| Veo 3.1 | Budget-friendly, native audio | 20-140 credits |
| Grok Imagine | Fastest generation, native audio | 10-60 credits |
| Veevid 1.0 Pro | Audio-visual sync | 12-288 credits |
| Seedance 1.5 Pro | Audio, wide aspect ratios | 12-288 credits |
| Kling 3.0 | Multi-shot, audio, 3-15s | 48-495 credits |
| LTX 2.3 | Up to 4K (2160p), audio | 48-960 credits |
| Kling 2.6 | Cinematic, multi-shot, audio | 70-280 credits |
| Sora 2 Stable | Best prompt accuracy, native audio | 80-2000 credits |
| Sora 2 | Longest clips (25s), storyboard mode | 20-315 credits |
| Wan 2.6 | Video-to-video, multi-shot, audio | 100-450 credits |
You can let the agent choose, or specify:
"Use Kling 3.0 for this one"
"Give me the cheapest option"
"I need a 20-second video, what can do that?"
Advanced: Multi-Shot and Storyboard
Kling 3.0 Multi-Shot
Kling 3.0 supports multi-shot sequences - multiple prompts with different durations in one video:
You: "Make a multi-shot product ad: first 3 seconds close-up of the phone, then 5 seconds of someone using it, then 2 seconds logo reveal"
The agent constructs the multi_prompt array and handles it for you.
Sora 2 Storyboard
Sora 2 Pro has a storyboard mode for multi-scene narratives:
You: "Create a storyboard video: Scene 1 - sunrise over mountains (5s), Scene 2 - hiker reaching the summit (5s), Scene 3 - panoramic view (5s)"
Start + End Frames (Kling 3.0)
Send two images for precise control:
You: "Use these two images as start and end frames for a 10-second transition" (attach two images)
How It Works
The Veevid skill teaches your agent this workflow:
1. Parse user intent - pick model + parameters
2. POST /api/quote - get exact credit cost
3. Confirm with user - "250 credits, proceed?"
4. POST /api/generate-video - start generation
5. Poll /api/video-generation/{id}/status - wait
6. Return video URL to user
Your agent reads the skill's SKILL.md and follows these steps. No coding needed.
Tips
- Costs are confirmed first - The agent quotes before generating, so no surprises
- Be specific - "10-second vertical product video with Kling 3.0" works better than "make a video"
- Pick the right model - Veo 3.1 for quick and cheap, Kling 3.0 for quality, Sora 2 for long-form
- Check your balance - Ask "How many credits do I have?"
Get Started
- Install:
npx clawhub@latest install veevid - Get your API key: veevid.ai/settings/api-keys
- Save it:
echo "vv_sk_xxx" > ~/.config/veevid/api_key - Tell your agent: "Generate a video of a cat playing piano"
One skill, 10 models, no friction.
The Veevid skill is on ClawHub. For API docs, see veevid.ai/developers.