Skip to content

SamirYMeshram/audiobook-generator-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceBook Studio Pro v4

One-click local PDF/TXT/DOCX/EPUB to audiobook studio for Windows + RTX 3050 4 GB laptops.

The goal of v4 is simple: after you create a default reference voice one time, you can drop in a PDF and press Start / Resume Audiobook. The app automatically chooses safe defaults:

  • Smart PDF text extraction with OCR fallback per page.
  • OCR cache, so OCR is not repeated when you resume.
  • Safe XTTS chunks below the English 250-character warning.
  • Resumable chunk generation. If Windows sleeps or the GPU resets, run the same job again.
  • Failed chunk skip/retry so one bad line does not destroy a 1000-page job.
  • FLAC part export instead of one giant file.
  • Natural room-tone gaps, ACX-style mastering target, and 44.1 kHz MP3 delivery support.
  • Gradio one-click interface.

Quick start using your existing working venv

cd D:\download\voicebook_studio_pro_v4_ideal\voicebook_studio_pro
& "D:\download\elevenlabs_style_tts_lab\elevenlabs_style_tts_lab\.venv\Scripts\Activate.ps1"
python -m pip install -e .
voicebook doctor
voicebook ui

Open the URL shown by Gradio, normally:

http://127.0.0.1:7860

First-time voice setup

If you already have a reference at data/ui_reference.wav, you can skip this. Otherwise create one:

voicebook reference-slice "D:\download\my_voice.mp3" --out data\ui_reference.wav --start 20 --duration 15

Then test:

voicebook speak --text "This is a studio narration test." --out outputs\test.wav --emotion studio --master

One-click CLI mode

voicebook oneclick "D:\download\book.pdf"

This uses the saved reference voice automatically and writes output to:

outputs/<book_name>_audiobook/

Full manual command

voicebook audiobook "D:\download\book.pdf" --out outputs\book_audiobook --emotion studio --format flac --ocr auto --chunk-chars 220 --part-segments 80 --fail-policy skip

For huge PDFs

Use the defaults. The job is resumable. If stopped, run the same command again with the same output folder.

Files created:

outputs/book_audiobook/chunks/chunk_000001.wav
outputs/book_audiobook/parts/part_0001.flac
outputs/book_audiobook/state.json
outputs/book_audiobook/manifest.json
outputs/book_audiobook/ocr_cache/

OCR

Readable PDFs do not need OCR. Scanned/image PDFs need Tesseract OCR installed.

Try:

winget install -e --id UB-Mannheim.TesseractOCR

If WinGet gives 403, install Tesseract manually from the UB Mannheim Windows build page, then restart PowerShell/VS Code.

Quality target

Use studio for the default audiobook style. It is intentionally faster and less draggy than old story/warm presets.

Recommended outputs:

  • flac: best practical audiobook master.
  • wav: editing/training master.
  • mp3: delivery/share version, 44.1 kHz mono 192 kbps CBR.

Safety/license note

Only clone your own voice or voices where you have permission. Check model and book licenses before publishing or selling generated audiobooks.

About

new version fast and relevent with an inteface

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors