ElevenLabs Speech-to-Text
Summary
TL;DR: Transcribes audio files to text using ElevenLabs' speech recognition, with support for speaker detection and multiple languages.
ElevenLabs Speech-to-Text is an OpenClaw skill that transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Created by clawdbotborges, this skill has been downloaded 3k+ times on ClawHub. Install it with one command and your AI agent gains these capabilities right away.
Use cases
- Transcribe meeting recordings and extract action items from the text
- Convert podcast episodes to text for show notes, blog posts, or search indexing
- Process customer support calls into searchable transcripts for quality review
- Create subtitles or captions for video content from the audio track
Installation
Run this command to install the skill on your OpenClaw agent:
npx clawhub@latest install elevenlabs-sttSecurity scan
The skill is internally consistent with its stated purpose (uploads audio to ElevenLabs Scribe v2 using ELEVENLABS_API_KEY); the only notable issue is a minor tooling omission (the script uses jq but the declared required binaries list only curl).
SKILL.md
---
name: elevenlabs-stt
description: Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
homepage: https://elevenlabs.io/speech-to-text
metadata: {"clawdbot":{"emoji":"🎙️","requires":{"bins":["curl"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
---
# ElevenLabs Speech-to-Text
Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.
## Quick Start
```bash
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3
# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize
# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en
# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
```
## Options
| Flag | Description |
|------|-------------|
| `--diarize` | Identify different speakers |
| `--lang CODE` | ISO language code (e.g., en, pt, es) |
| `--json` | Output full JSON with word timestamps |
| `--events` | Tag audio events (laughter, music, etc.) |
## Supported Formats
All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.
## API Key
Set `ELEVENLABS_API_KEY` environment variable, or configure in clawdbot.json:
```json5
{
skills: {
entries: {
"elevenlabs-stt": {
apiKey: "sk_..."
}
}
}
}
```
## Examples
```bash
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en
# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
```
Version history
- Initial release of elevenlabs-stt skill. - Transcribe audio files using ElevenLabs Scribe v2 model. - Supports 90+ languages and speaker diarization. - Multiple output options: plain text, JSON with timestamps, audio event tagging. - Works with all major audio and video formats. - Requires ELEVENLABS_API_KEY for authentication.
Frequently asked questions
Accuracy depends on audio quality, accents, and background noise. For clear recordings, you can expect high accuracy. The skill handles accented English and multiple languages well, though very noisy environments reduce quality.
Installation method
Send this prompt to your agent to install the skill
npx clawhub@latest install elevenlabs-sttSkill info
Files
Skill data sourced from ClawHub