VoiceBook Studio Pro v4

One-click local PDF/TXT/DOCX/EPUB to audiobook studio for Windows + RTX 3050 4 GB laptops.

The goal of v4 is simple: after you create a default reference voice one time, you can drop in a PDF and press Start / Resume Audiobook. The app automatically chooses safe defaults:

Smart PDF text extraction with OCR fallback per page.
OCR cache, so OCR is not repeated when you resume.
Safe XTTS chunks below the English 250-character warning.
Resumable chunk generation. If Windows sleeps or the GPU resets, run the same job again.
Failed chunk skip/retry so one bad line does not destroy a 1000-page job.
FLAC part export instead of one giant file.
Natural room-tone gaps, ACX-style mastering target, and 44.1 kHz MP3 delivery support.
Gradio one-click interface.

Quick start using your existing working venv

cd D:\download\voicebook_studio_pro_v4_ideal\voicebook_studio_pro
& "D:\download\elevenlabs_style_tts_lab\elevenlabs_style_tts_lab\.venv\Scripts\Activate.ps1"
python -m pip install -e .
voicebook doctor
voicebook ui

Open the URL shown by Gradio, normally:

http://127.0.0.1:7860

First-time voice setup

If you already have a reference at data/ui_reference.wav, you can skip this. Otherwise create one:

voicebook reference-slice "D:\download\my_voice.mp3" --out data\ui_reference.wav --start 20 --duration 15

Then test:

voicebook speak --text "This is a studio narration test." --out outputs\test.wav --emotion studio --master

One-click CLI mode

voicebook oneclick "D:\download\book.pdf"

This uses the saved reference voice automatically and writes output to:

outputs/<book_name>_audiobook/

Full manual command

voicebook audiobook "D:\download\book.pdf" --out outputs\book_audiobook --emotion studio --format flac --ocr auto --chunk-chars 220 --part-segments 80 --fail-policy skip

For huge PDFs

Use the defaults. The job is resumable. If stopped, run the same command again with the same output folder.

Files created:

outputs/book_audiobook/chunks/chunk_000001.wav
outputs/book_audiobook/parts/part_0001.flac
outputs/book_audiobook/state.json
outputs/book_audiobook/manifest.json
outputs/book_audiobook/ocr_cache/

OCR

Readable PDFs do not need OCR. Scanned/image PDFs need Tesseract OCR installed.

Try:

winget install -e --id UB-Mannheim.TesseractOCR

If WinGet gives 403, install Tesseract manually from the UB Mannheim Windows build page, then restart PowerShell/VS Code.

Quality target

Use studio for the default audiobook style. It is intentionally faster and less draggy than old story/warm presets.

Recommended outputs:

flac: best practical audiobook master.
wav: editing/training master.
mp3: delivery/share version, 44.1 kHz mono 192 kbps CBR.

Safety/license note

Only clone your own voice or voices where you have permission. Check model and book licenses before publishing or selling generated audiobooks.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
examples		examples
scripts		scripts
src/voicebook_studio		src/voicebook_studio
tests		tests
.gitignore		.gitignore
DATASETS_AND_TRAINING.md		DATASETS_AND_TRAINING.md
IDEAL_ONE_CLICK_MODE.md		IDEAL_ONE_CLICK_MODE.md
LICENSE_AND_SAFETY.md		LICENSE_AND_SAFETY.md
MEGA_PDF_MODE.md		MEGA_PDF_MODE.md
ONE_CLICK_DESIGN.md		ONE_CLICK_DESIGN.md
QUICKSTART_WINDOWS.md		QUICKSTART_WINDOWS.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
UPDATE_CURRENT_PROJECT.md		UPDATE_CURRENT_PROJECT.md
UPDATE_FROM_OLD_PROJECT.md		UPDATE_FROM_OLD_PROJECT.md
pyproject.toml		pyproject.toml
requirements-all.txt		requirements-all.txt
requirements-base.txt		requirements-base.txt
requirements-dev.txt		requirements-dev.txt
requirements-ocr-advanced.txt		requirements-ocr-advanced.txt
requirements-ocr.txt		requirements-ocr.txt
requirements-training.txt		requirements-training.txt
requirements-ui.txt		requirements-ui.txt
requirements-xtts.txt		requirements-xtts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceBook Studio Pro v4

Quick start using your existing working venv

First-time voice setup

One-click CLI mode

Full manual command

For huge PDFs

OCR

Quality target

Safety/license note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VoiceBook Studio Pro v4

Quick start using your existing working venv

First-time voice setup

One-click CLI mode

Full manual command

For huge PDFs

OCR

Quality target

Safety/license note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages