One-click local PDF/TXT/DOCX/EPUB to audiobook studio for Windows + RTX 3050 4 GB laptops.
The goal of v4 is simple: after you create a default reference voice one time, you can drop in a PDF and press Start / Resume Audiobook. The app automatically chooses safe defaults:
- Smart PDF text extraction with OCR fallback per page.
- OCR cache, so OCR is not repeated when you resume.
- Safe XTTS chunks below the English 250-character warning.
- Resumable chunk generation. If Windows sleeps or the GPU resets, run the same job again.
- Failed chunk skip/retry so one bad line does not destroy a 1000-page job.
- FLAC part export instead of one giant file.
- Natural room-tone gaps, ACX-style mastering target, and 44.1 kHz MP3 delivery support.
- Gradio one-click interface.
cd D:\download\voicebook_studio_pro_v4_ideal\voicebook_studio_pro
& "D:\download\elevenlabs_style_tts_lab\elevenlabs_style_tts_lab\.venv\Scripts\Activate.ps1"
python -m pip install -e .
voicebook doctor
voicebook uiOpen the URL shown by Gradio, normally:
http://127.0.0.1:7860
If you already have a reference at data/ui_reference.wav, you can skip this. Otherwise create one:
voicebook reference-slice "D:\download\my_voice.mp3" --out data\ui_reference.wav --start 20 --duration 15Then test:
voicebook speak --text "This is a studio narration test." --out outputs\test.wav --emotion studio --mastervoicebook oneclick "D:\download\book.pdf"This uses the saved reference voice automatically and writes output to:
outputs/<book_name>_audiobook/
voicebook audiobook "D:\download\book.pdf" --out outputs\book_audiobook --emotion studio --format flac --ocr auto --chunk-chars 220 --part-segments 80 --fail-policy skipUse the defaults. The job is resumable. If stopped, run the same command again with the same output folder.
Files created:
outputs/book_audiobook/chunks/chunk_000001.wav
outputs/book_audiobook/parts/part_0001.flac
outputs/book_audiobook/state.json
outputs/book_audiobook/manifest.json
outputs/book_audiobook/ocr_cache/
Readable PDFs do not need OCR. Scanned/image PDFs need Tesseract OCR installed.
Try:
winget install -e --id UB-Mannheim.TesseractOCRIf WinGet gives 403, install Tesseract manually from the UB Mannheim Windows build page, then restart PowerShell/VS Code.
Use studio for the default audiobook style. It is intentionally faster and less draggy than old story/warm presets.
Recommended outputs:
flac: best practical audiobook master.wav: editing/training master.mp3: delivery/share version, 44.1 kHz mono 192 kbps CBR.
Only clone your own voice or voices where you have permission. Check model and book licenses before publishing or selling generated audiobooks.