Summary
TL;DR: Generates high-quality, expressive speech from text using the Kokoro voice engine, an alternative to major cloud TTS providers.
Kokoro TTS is an OpenClaw skill that generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech.
Created by edkief, this skill has been downloaded 4k+ times on ClawHub. Install it with one command and your AI agent gains these capabilities right away.
Use cases
- Build voice interfaces where natural intonation matters for user experience
- Generate expressive narration for storytelling apps or interactive fiction
- Create voice samples for prototyping voice assistant features before committing to a provider
- Add spoken output to educational tools where clear, engaging speech improves learning
Installation
Run this command to install the skill on your OpenClaw agent:
npx clawhub@latest install kokoro-ttsSecurity scan
The skill is a straightforward local TTS wrapper, but it references an undeclared environment variable and will POST user text to whatever KOKORO_API_URL is set to — review the configured endpoint and provenance before installing.
SKILL.md
--- name: kokoro-tts description: Generate spoken audio from text using the local Kokoro TTS engine. Use when the user asks to "say" something, requests a voice message, or wants text converted to speech. --- # Kokoro TTS This skill allows you to generate high-quality AI speech using a local or remote Kokoro-TTS instance. ## Configuration The skill uses the `KOKORO_API_URL` environment variable to locate the API. - **Default:** `http://localhost:8880/v1/audio/speech` - **To Configure:** Add `KOKORO_API_URL=http://your-server:port/v1/audio/speech` to your `.env` file or environment. ## Usage To generate speech, run the included Node.js script. ### Command ```bash node skills/kokoro-tts/scripts/tts.js "<text>" [voice] [speed] ``` - **text**: The text to speak. Wrap in quotes. - **voice**: (Optional) The voice ID. Defaults to `af_heart`. - **speed**: (Optional) Speech speed (0.25 to 4.0). Defaults to `1.0`. ### Example ```bash node skills/kokoro-tts/scripts/tts.js "Hello Ed, this is Theosaurus speaking." af_nova ``` ### Output The script will output a single line starting with `MEDIA:` followed by the path to the generated MP3 file. OpenClaw will automatically pick this up and send it as an audio attachment. Example Output: `MEDIA: media/tts_1706745000000.mp3` ## Available Voices Common choices: - `af_heart` (Default, Female, Warm) - `af_nova` (Female, Professional) - `am_adam` (Male, Deep) - `bf_alice` (British Female) For a full list, see [references/voices.md](references/voices.md) or query the API.
Version history
kokoro-tts 0.1.0 - Initial release of the Kokoro TTS skill for generating speech audio from text. - Supports configuration via the KOKORO_API_URL environment variable. - Includes a Node.js script for generating audio files with customizable voice and speed. - Outputs MP3 file paths in a format compatible with OpenClaw for automatic audio attachment. - Provides several built-in voice options and instructions for listing all available voices.
Frequently asked questions
Kokoro tends to produce more expressive, emotionally varied speech. OpenAI TTS is very consistent and professional-sounding. The best choice depends on whether you prioritize expressiveness or consistency for your use case.
Installation method
Send this prompt to your agent to install the skill
npx clawhub@latest install kokoro-ttsSkill data sourced from ClawHub