Skip to content

Purple-Horizons/openclaw-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenClaw Voice

Open-source browser-based voice interface for AI assistants.

Talk to your AI like you talk to Alexa β€” but self-hosted, private, and connected to your own agent.

License Python

🌐 Website: openclawvoice.com

Features

Feature Description
🎀 Local STT Whisper runs locally via faster-whisper. Your voice never leaves your machine.
πŸ”Š Streaming TTS ElevenLabs with sentence-by-sentence streaming. Hear responses while they generate.
🎯 Voice Activity Detection Silero VAD filters background noise. Works in noisy environments.
🧹 Smart Text Cleaning Strips markdown, hashtags, URLs before TTS. No more "hash hash".
πŸ”Œ Any AI Backend OpenAI, Claude, or full OpenClaw agent with memory and tools.
🌐 Browser-Based No app install. Works on desktop and mobile.
πŸš— Continuous Mode Hands-free conversation. Auto-listens after each response.

Quick Start

# Clone
git clone https://github.com/Purple-Horizons/openclaw-voice.git
cd openclaw-voice

# Install
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional: Install VAD support (better noise handling)
pip install torch torchaudio

# Configure (create .env from example)
cp .env.example .env
# Edit .env with your API keys

# Run
PYTHONPATH=. python -m src.server.main

# Open http://localhost:8765

For OpenClaw Agents (Auto-Install)

One-liner for AI agents to self-install:

cd /tmp && \
git clone https://github.com/Purple-Horizons/openclaw-voice.git && \
cd openclaw-voice && \
python3 -m venv .venv && source .venv/bin/activate && \
pip install -r requirements.txt torch torchaudio && \
PYTHONPATH=. ELEVENLABS_API_KEY="$ELEVENLABS_API_KEY" OPENAI_API_KEY="$OPENAI_API_KEY" \
  nohup python -m src.server.main > /tmp/voice-server.log 2>&1 &

Configuration

Environment Variables

Variable Required Default Description
ELEVENLABS_API_KEY Yes* β€” ElevenLabs API key for TTS
OPENAI_API_KEY Yes* β€” OpenAI API key (if not using gateway)
OPENCLAW_GATEWAY_URL No β€” OpenClaw gateway URL for full agent
OPENCLAW_GATEWAY_TOKEN No β€” Gateway auth token
OPENCLAW_PORT No 8765 Server port
OPENCLAW_STT_MODEL No base Whisper model size
OPENCLAW_STT_DEVICE No auto Device: auto, cpu, cuda, mps
OPENCLAW_REQUIRE_AUTH No false Require API keys for clients

*One of OPENAI_API_KEY or OPENCLAW_GATEWAY_URL required.

Whisper Model Sizes

Model Speed Quality VRAM Best For
tiny Fastest Fair ~400MB Quick testing
base Fast Good ~1GB Default. Good balance.
small Medium Better ~2GB Clearer transcription
medium Slower Great ~5GB Accuracy priority
large-v3-turbo Slow Best ~6GB Maximum accuracy

TTS Options

Backend Type Quality Latency Notes
ElevenLabs Cloud Excellent ~500ms Default. Streaming supported.
Chatterbox Local Very Good ~1s MIT license, voice cloning
XTTS-v2 Local Excellent ~1s Voice cloning supported
Mock Local None 0ms For testing (silence)

ElevenLabs uses eleven_turbo_v2_5 for fastest response.

OpenClaw Gateway Integration

Connect to your full OpenClaw agent (same memory, tools, and persona as text chat):

# .env
OPENCLAW_GATEWAY_URL=http://localhost:18789
OPENCLAW_GATEWAY_TOKEN=your-token
ELEVENLABS_API_KEY=your-key

Add to your openclaw.json:

{
  "gateway": {
    "http": {
      "endpoints": {
        "chatCompletions": { "enabled": true }
      }
    }
  },
  "agents": {
    "list": [
      {
        "id": "voice",
        "workspace": "/path/to/workspace",
        "model": "anthropic/claude-sonnet-4-5"
      }
    ]
  }
}

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   WebSocket   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser   │◄────────────►│          Voice Server               β”‚
β”‚  (mic/spk)  β”‚               β”‚                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                              β”‚  β”‚ Whisper β”‚β†’β”‚ AI  β”‚β†’β”‚ElevenLabsβ”‚ β”‚
                              β”‚  β”‚  (STT)  β”‚  β”‚     β”‚  β”‚  (TTS)  β”‚ β”‚
                              β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                              β”‚       ↑                     β”‚      β”‚
                              β”‚    [VAD]              [streaming]  β”‚
                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Streaming Flow:

  1. User speaks β†’ Whisper transcribes locally
  2. AI responds (streamed) β†’ buffer sentences
  3. First sentence complete β†’ TTS starts immediately
  4. Audio streams to browser while AI continues
  5. Result: ~50% faster perceived response

HTTPS for Mobile

Mobile browsers require HTTPS for microphone access. Options:

Tailscale Funnel (easiest):

tailscale funnel 8765
# Access via https://your-machine.tailnet-name.ts.net

nginx + Let's Encrypt:

server {
    listen 443 ssl;
    server_name voice.yourdomain.com;
    
    location / {
        proxy_pass http://127.0.0.1:8765;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

API

WebSocket Protocol

Connect to ws://localhost:8765/ws:

// Start recording
{ "type": "start_listening" }

// Send audio (base64 PCM float32, 16kHz)
{ "type": "audio", "data": "base64..." }

// Stop recording
{ "type": "stop_listening" }

// Receive events:
{ "type": "transcript", "text": "...", "final": true }
{ "type": "response_chunk", "text": "..." }        // Streaming text
{ "type": "audio_chunk", "data": "...", "sample_rate": 24000 }  // Streaming audio
{ "type": "response_complete", "text": "..." }     // Full response
{ "type": "vad_status", "speech_detected": true }  // VAD feedback

Roadmap

  • WebSocket voice gateway
  • Whisper STT (local)
  • ElevenLabs TTS
  • Streaming TTS (sentence-by-sentence)
  • Voice Activity Detection (Silero)
  • Text cleaning (markdown/hashtags/URLs)
  • Continuous conversation mode
  • OpenClaw gateway integration
  • WebRTC for lower latency
  • Voice cloning UI
  • Docker support

License

MIT License β€” see LICENSE.

Credits


Made with 🦞 by Purple Horizons

About

🦞 Open-source browser-based voice chat for AI assistants. Self-hosted, private, free. Whisper STT + ElevenLabs TTS. Works with OpenAI, Claude, or custom agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Β 
Β 
Β 

Contributors