An author friend's excitement about using AI to create an Audible audiobook of his book quickly turned into a reality check after evaluating the workflow, quality issues, and platform requirements.
Despite the hype around AI narration, creating a professional-quality, human-like 5-hour audiobook suitable for platforms like Audible is still a demanding process with current AI capabilities.
 "The reality is you're still better off paying a pro to record it or doing it yourself," Dunn concluded. "Because you're going to get the quality you need for a four—or five-hour performance."
We'd talk about Descript and Elevenlabs, and taking his hundreds of videos, create a clone of his Voice, resulting in an audiobook for Audible.
The promise says it's so, but still, within 30 minutes of evaluation,
It's easier to record the book than to have AI do it....in this case, AI doubles the time it takes; GAME OVER!
Maybe audio and video jobs are an early growth sector because it takes time to get audio that is good enough for Audible.
We’ll cover what to do, what not to do, and what it means for AI audio in the short and long terms.
1. PROMISE: How to create an audiobook
2. REALITY the facts of AI - even with cloning, you must record a specific amount of time, preferably more.
3. Why can't you get on Audible with an AI audiobook yet?
AI narration is accessible to almost anyone, making audiobooks available to more people beyond the high-quality Audible tier.
As AI improves, it may eventually master human narration's authenticity, nuance, and listener experience. But the "promise" of effortless AI audiobook creation remains mostly hype.
Outline:
I. The Initial Idea and Promise
Friend's excitement about AI narration to complement his book/YouTube
Evaluating workflow using tools like Descript, ElevenLabs voice cloning
Promise of easy, automated audiobook creation for Audible
II. Realities of Human vs AI Narration
- Human narrator costs ($250 min, $1250-2000 for 5 hrs)
 – Time involved (10+ hrs), even for an experienced narrator
– AI limitations with tone, emotion, and pronunciation for long-form
 "An actor would spend 20 hours a voice actor to do ten final finished hours."
III. Attempting AI Narration Workflow
- Recording snippets for voice cloning training data
- Using AI mistake detection tools like Pozotron
- Editing/post-processing required even with AI
"So the key differences between AI-generated and human-narrated audiobooks are that, number one, the authenticity, authenticity, and nuance are better off the charts than humans."
IV. Audible's Requirements and Future
- Sample submission, chapter-by-chapter formatting rules
- Audible working on "virtual voice" AI narration tag
- But human narration is preferred for quality for now
"Many listeners prefer the natural human quality of a professional narrator's performance to synthetic sound."
V. The Verdict
- "The reality is you're still better off paying a pro to record it or doing it yourself."
- AI has promise for accessibility but not pro quality yet
PROMISE – How to Create an Audiobook
Given that a 176-page book is on Amazon, figure out how to create an Audible audiobook using current audio to clone a voice.
Reading a 176-page book with 44,000 words and 240,000 characters for a professional narrator takes 4 to 5 hours to complete the finished audiobook, and recording and editing takes 8 to 15 hours.
- ACX, Amazon's platform, estimates that 9,300 words per hour are typical for professional audiobook narration.
Applying these estimates to the 44,000-word book:
- At 9,300 words per hour, a 44,000-word book would take around 4.7 hours to narrate.
- Accounting for the 2x time required in the studio, the total time to record this book would be 9-10 hours.
- The finished audiobook would then be approximately 4-5 PFH hours - PFH's meaning professionally finished hours -Â long.
For a professional single-voice nonfiction audiobook recording, expect to pay $150-$400 PFH for an experienced narrator, with $250 PFH being a standard rate for experienced narrators and the ACX union minimum.
- Generally, a voice actor spends about 20 hours in the studio recording a typical 10-hour audiobook.
Instead of spending $750-$2500 for professional audio, the AI audio hype machine says:
Upload your audio, train Descript or ElevenLabs or whomever, and get it done.
HOW TO CLONE A VOICE
https://elevenlabs.io/docs/voices/voice-lab/professional-voice-cloning
We have hundreds of YouTube videos with good audio, but they are recorded outdoors.
It would be best to tell AI what to do and give it audio with different styles that may confuse it.
Even if the audio was good enough for Audible, the styles vary from video to video.
ElevenLabs will also suggest this: the consistent style and one or more hours of training data will produce better results.
We're doing a narrative, and to get the Voice, pacing, cadence, tone, and what we want out of the book, we need to record my friend reading his book.
He'll take twice as long since he's never done it, and it takes a pro 10 hours.
We could download YouTube videos (all 2-17 minutes), stitch them together, and fix the audio if the Voice is consistent.
So we test a short 2-minute YouTube video. The results could be better. The answer is clear, video audio doesn’t always work for an audiobook.
Record reading the book.
This project involves a 4.5-hour audio.
Instant Voice takes 30 seconds to 5 minutes for short audio.  This is what most think of with AI Audio, and it’s great for shorter audios.
TRAINING THE AI
30-60 minutes is the minimum for audiobooks, 3 hours preferred (that's 3 hours of content; recording usually takes twice as long as content)
·       To train your voice, you need to be in a quiet environment. Thinking of working through the book, not just stepping up to the microphone, grinding through 10-20 hours of audio (assuming lack of experience)
·       Bonus tip: Run your script through an AI Voice checker to identify phrases that will be challenging for AI to pronounce.Â
·       Go through each challenging word, spell it out phonetically so anyone can say it, and even record it in the Voice.
Pozotron is an AI-powered software suite aiming to streamline and improve audiobook production.
Words—just because you have it in writing doesn't mean it's spoken. You'll find that tons of words aren't easy (yet) for AI.
AI improves bad audio. Still, there's so much it can do.
3. Why can't you get on Audible with an AI audiobook yet?
Audible does allow AI-generated audiobooks, but they are labeled as "Virtual Voice" and are in a separate category from human-narrated audiobooks.
And there's a waiting list, though some famous authors are in the beta.
Some try to fool Audible by doing a live 5-minute recording and hoping Audible doesn't check that the longer piece is AI-narrated.
You can also use other audiobook production services like Findaway Voices or Audiobooks.com to produce and distribute your audiobook.
https://help.acx.com/s/article/acx-audio-submission-requirements
The Human Difference - Emotion and performance
The differences between AI-generated audiobooks and human-narrated audiobooks are:
1.    Authenticity and Nuance: Human narrators bring authenticity, emotion, and nuance to the performance that AI narration lacks. Skilled human narrators convey the written work's meaning, tone, and character in ways AI still needs to do.
2.    Listener Experience: Many listeners prefer the natural, human quality of a professional narrator's performance over the more synthetic sound of AI narration. AI narration is improving and may become more accepted over time.
3.    Availability: AI narration makes audiobook production affordable, allowing more written content to be converted to audio, especially in languages and markets that lack a robust audiobook ecosystem.
Dunn discusses the practical advantages of AI, particularly for short audio pieces, highlighting the efficiency and versatility of editing tools like Pozotron that detect and correct errors with human review.
The Superiority of Human Narration for Long-Form Audiobooks:
Despite the advantages of AI, Dunn makes a case for the of human narration in longer audiobooks.
"There's a warmth and nuance in human narration that AI can't replicate, especially in longer stories where emotional depth and character development are key."
This section of the podcast emphasizes the emotional connection that human narrators create, crucial for engaging listeners and making the narrative more relatable and impactful.
Outline with Key Elements and Details:
1.    Introduction of AI in Audiobooks:
·        Key Elements: Discovery of AI potentials, initial enthusiasm.
·        Details: Declan recounts the pivotal Monday morning meeting with an author friend.
"I remember that Monday morning so distinctly."
2.    Advantages of AI in Audiobook Production:
·        Key Elements: Efficiency, editing tools.
·        Details: Use of Pozotron for error detection and correction, benefits for short audio pieces.
"Using Pozotron allows us to focus more on the creative aspects rather than getting bogged down by the technicalities."
3.    Human Narration for Emotional Depth:
·        Key Elements: Emotional engagement, character development.
·        Details: Importance of human touch in longer audiobooks.
"There's a warmth and nuance in human narration that AI currently can't replicate, especially in longer stories where emotional depth and character development are key."
This breakdown offers a concise roadmap into the intersection of technology and traditional storytelling within the audiobook industry, highlighting the innovations and limitations of current AI apps.
Share this post