OpenClaw Voice

Open-source browser-based voice interface for AI assistants.

Talk to your AI like you talk to Alexa — but self-hosted, private, and connected to your own agent.

Features

Feature	Description
🎤 Local STT	Whisper runs locally via faster-whisper. Your voice never leaves your machine.
🔊 Streaming TTS	ElevenLabs with sentence-by-sentence streaming. Hear responses while they generate.
🎯 Voice Activity Detection	Silero VAD filters background noise. Works in noisy environments.
🧹 Smart Text Cleaning	Strips markdown, hashtags, URLs before TTS. No more "hash hash".
🔌 Any AI Backend	OpenAI, Claude, or full OpenClaw agent with memory and tools.
🌐 Browser-Based	No app install. Works on desktop and mobile.
🚗 Continuous Mode	Hands-free conversation. Auto-listens after each response.

Quick Start

# Clone
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice

# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional: Install VAD support (better noise handling)
pip install torch torchaudio

# Configure (create .env from example)
cp .env.example .env
# Edit .env with your API keys

# Run
PYTHONPATH=. python -m src.server.main

# Open http://localhost:8765

For OpenClaw Agents (Auto-Install)

One-liner for AI agents to self-install:

cd /tmp && \
git clone https://github.com/Purple-Horizons/openclaw-voice.git && \
cd openclaw-voice && \
python3 -m venv .venv && source .venv/bin/activate && \
pip install -r requirements.txt torch torchaudio && \
PYTHONPATH=. ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" OPENAI_API_KEY="$OPENAI_API_KEY" \
  nohup python -m src.server.main > /tmp/voice-server.log 2>&1 &

Configuration

Environment Variables

Variable	Required	Default	Description
`ELEVENLABS_API_KEY`	Yes*	—	ElevenLabs API key for TTS
`OPENAI_API_KEY`	Yes*	—	OpenAI API key (if not using gateway)
`OPENCLAW_GATEWAY_URL`	No	—	OpenClaw gateway URL for full agent
`OPENCLAW_GATEWAY_TOKEN`	No	—	Gateway auth token
`OPENCLAW_PORT`	No	`8765`	Server port
`OPENCLAW_STT_MODEL`	No	`base`	Whisper model size
`OPENCLAW_STT_DEVICE`	No	`auto`	Device: `auto`, `cpu`, `cuda`, `mps`
`OPENCLAW_REQUIRE_AUTH`	No	`false`	Require API keys for clients

*One of OPENAI_API_KEY or OPENCLAW_GATEWAY_URL required.

Whisper Model Sizes

Model	Speed	Quality	VRAM	Best For
`tiny`	Fastest	Fair	~400MB	Quick testing
`base`	Fast	Good	~1GB	Default. Good balance.
`small`	Medium	Better	~2GB	Clearer transcription
`medium`	Slower	Great	~5GB	Accuracy priority
`large-v3-turbo`	Slow	Best	~6GB	Maximum accuracy

TTS Options

Backend	Type	Quality	Latency	Notes
ElevenLabs	Cloud	Excellent	~500ms	Default. Streaming supported.
Chatterbox	Local	Very Good	~1s	MIT license, voice cloning
XTTS-v2	Local	Excellent	~1s	Voice cloning supported
Mock	Local	None	0ms	For testing (silence)

ElevenLabs uses eleven_turbo_v2_5 for fastest response.

OpenClaw Gateway Integration

Connect to your full OpenClaw agent (same memory, tools, and persona as text chat):

# .env
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your-token
ELEVENLABS_API_KEY=your-key

Add to your openclaw.json:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": { "enabled": true }
      }
    }
  },
  "agents": {
    "list": [
      {
        "id": "voice",
        "workspace": "/path/to/workspace",
        "model": "anthropic/claude-sonnet-4-5"
      }
    ]
  }
}

Architecture

┌─────────────┐   WebSocket   ┌─────────────────────────────────────┐
│   Browser   │◄────────────►│          Voice Server               │
│  (mic/spk)  │               │                                     │
└─────────────┘               │  ┌─────────┐  ┌─────┐  ┌─────────┐ │
                              │  │ Whisper │→│ AI  │→│ElevenLabs│ │
                              │  │  (STT)  │  │     │  │  (TTS)  │ │
                              │  └─────────┘  └─────┘  └─────────┘ │
                              │       ↑                     │      │
                              │    [VAD]              [streaming]  │
                              └─────────────────────────────────────┘

Streaming Flow:

User speaks → Whisper transcribes locally
AI responds (streamed) → buffer sentences
First sentence complete → TTS starts immediately
Audio streams to browser while AI continues
Result: ~50% faster perceived response

HTTPS for Mobile

Mobile browsers require HTTPS for microphone access. Options:

Tailscale Funnel (easiest):

tailscale funnel 8765
# Access via https://your-machine.tailnet-name.ts.net

nginx + Let's Encrypt:

server {
    listen 443 ssl;
    server_name voice.yourdomain.com;
    
    location / {
        proxy_pass http://127.0.0.1:8765;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

API

WebSocket Protocol

Connect to ws://localhost:8765/ws:

// Start recording
{ "type": "start_listening" }

// Send audio (base64 PCM float32, 16kHz)
{ "type": "audio", "data": "base64..." }

// Stop recording
{ "type": "stop_listening" }

// Receive events:
{ "type": "transcript", "text": "...", "final": true }
{ "type": "response_chunk", "text": "..." }        // Streaming text
{ "type": "audio_chunk", "data": "...", "sample_rate": 24000 }  // Streaming audio
{ "type": "response_complete", "text": "..." }     // Full response
{ "type": "vad_status", "speech_detected": true }  // VAD feedback

Roadmap

License

MIT License — see LICENSE.

Credits

faster-whisper — Local STT
ElevenLabs — Text-to-Speech
Silero VAD — Voice Activity Detection
Built for OpenClaw

Made with 🦞 by Purple Horizons

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
deploy/runpod		deploy/runpod
docs		docs
packages/react		packages/react
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenClaw Voice

Features

Quick Start

For OpenClaw Agents (Auto-Install)

Configuration

Environment Variables

Whisper Model Sizes

TTS Options

OpenClaw Gateway Integration

Architecture

HTTPS for Mobile

API

WebSocket Protocol

Roadmap

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenClaw Voice

Features

Quick Start

For OpenClaw Agents (Auto-Install)

Configuration

Environment Variables

Whisper Model Sizes

TTS Options

OpenClaw Gateway Integration

Architecture

HTTPS for Mobile

API

WebSocket Protocol

Roadmap

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages