Generate a Screenplay Table Read using AI Models

Leon Nicholls
10 min readDec 5, 2024

--

Hey everyone! Are you ready to take your screenplay from the page to the screen? In the last post, we showed you how to whip up a screenplay using the power of Google Gemini 1.5 Pro. Get ready to bring that script to life with an AI-powered audio table read with generated voices and visuals. It’s like movie magic but with a whole lot less Hollywood and a whole lot more AI!

Why bother with a table read, you ask? Even though we’re working with cutting-edge AI, a good old-fashioned table read is still valuable. It’s like a test drive for your screenplay. You get to hear how the dialogue flows, identify awkward bits, and get a feel for the overall pacing. Plus, it’s cool to hear your characters come to life, even if they are AI-generated voices!

So, what’s the game plan? This post will walk you through creating an audio-visual table read experience. We’ll use some seriously excellent AI tools to manage the whole shebang. By the end, you’ll have a video of your screenplay with AI-generated voices playing over AI-generated images. Pretty neat.

Note: This article spotlights techniques for the Google Gemini Advanced chatbot (a paid service). While these concepts also apply to the free version, we’ll focus on the enhanced capabilities offered by the Advanced subscription.

Building Your AI Production Studio

Before proceeding to the fun stuff, like generating images and voices, we need to ensure you have the right tools for the job.

Gear Up!

  • Hardware: First things first, you’re going to need some serious processing power. I tackled this project using an Nvidia 4090 GPU. Why so beefy? Well, these AI models are pretty demanding! If you’ve got a different GPU, you might still be able to make it work but be prepared for longer processing times. Also, remember that these models take up a good chunk of storage space, so make sure you have enough room on your hard drive.
  • Software: Next, I’ll use ComfyUI, a fantastic tool for creating custom workflows for your AI models. Head to their GitHub page and install it on your local machine. I’ll also be doing some automation magic with code written in Node.js.

Model Setup

  • Downloading the Stars: Now for the leading players! Download and configure the FLUX.1 model for image generation and the F5-TTS model for text-to-speech. These models are the engines of our AI production, so treat them with care!
  • Workflows: ComfyUI uses workflows with various nodes connected to generate content. Many of these are available online. I used the simple_ComfyUI_F5TTS_workflow to generate audio and the flux_dev_checkpoint_example workflow to generate images.

Note: I won’t go into the details of installing or configuring ComfyUI; many good online resources cover that. Although ComfyUI requires some technical knowledge, it is very easy to use once installed.

A Quick Note About Costs:

I chose to run these models locally on my machine to keep costs down while experimenting. However, some excellent online services offer even better models, often with faster processing times. Consider those options if you’re serious about AI content creation, especially commercial and copyright implications. But for now, let’s stick with our local setup and keep those creative juices flowing!

Scene Visualization

Now that we’ve set up our AI studio, it’s time to start bringing those screenplay scenes to life! But hold on — we’re not creating full-blown movie scenes here. Instead, we will focus on generating background images that capture the mood and atmosphere of each scene. Think of it like setting the stage for your AI actors to perform on.

Why just backgrounds? For this table read, we want the focus to be on the voices and the dialogue. Adding character images can get complicated (and we’ll tackle that in a future post!). Let’s keep things simple and let the backgrounds do the heavy lifting regarding visuals. Plus, it’s a great way to experiment with FLUX.1 and see what amazing images you can create!

Gemini, as Your Prompt Engineer

How do we generate these images? Enter Google Gemini, our trusty AI assistant! We will use Gemini to craft detailed prompts that FLUX.1 can use to create stunning visuals. Think of Gemini as the director, guiding FLUX.1, the artist, to paint the perfect picture.

Here is the prompt I used:

You are tasked to generate image prompts for a screenplay. Here is the screenplay in Fountain Markdown:

```

[SCREENPLAY]

```

Here is the list of scene headings:

```

[SCENE HEADINGS]

```

Here are some notes of using Flux for image generation:

```

# Flux-Realism prompts

To get the most out of Flux image generation, you need to write your prompts in English and structure them clearly and precisely. Here are 7 points to keep in mind and try to include in your prompts to develop detailed and effective prompts:

1. Main topic: Describe the physical characteristics and attire of the main subject of your photo. For example: middle-aged man with salt-and-pepper hair and a neatly trimmed beard.

2. Action/Pose: Specify the subject’s activity, action and/or posture. For example: sitting at a rustic wooden table in a sunlit café.

3. Context/Location: Detail the environment around your subject. For example: The café backdrop features exposed brick walls, hanging plants, and a chalkboard menu.

4. Specific details: Refine your description with distinctive elements. For example: wearing a crisp white shirt and navy blazer / The café backdrop features exposed brick walls, hanging plants, and a chalkboard menu.

5. Ambiance: Define the overall atmosphere. For example: The overall atmosphere is one of urban sophistication and quiet contemplation.

6. Theme: Explain the context or main activity. For example: His posture and focused expression convey a sense of calm concentration during his morning routine.

7. Style: Indicate a visual or artistic reference.For example: in the style of amateur photography.

For each scene heading, generate a Flux image prompt for that scene. Use all the information you know about the scene, the location and the story up to that point to provide a detailed image description. Avoid repeating scene descriptions. Avoid using any character names in the description. Avoid including any people or animals in the scene. The image will be used as a background image. Avoid referencing other scene prompts or assume any knowledge about the prompts of other scenes, even for the same location.

Note how I used the instructions at the end of the prompt to avoid doing various things uncovered during testing.

Once you get the hang of prompt engineering, you can automate the process using Node.js. Imagine Gemini churning out that FLUX.1 prompts for every scene in your screenplay using the following ComfyUI workflow:

(If you are curious about what the code does, here is a Python sample for generating images. Generating audio is similar, and the code can be easily converted to other languages like Node.js with the help of your favorite LLM).

Voice Casting Call

We’ve visualized our scenes — now it’s time to give those characters a voice! This is where things get interesting. We will use the magic of AI to transform plain text dialogue into spoken words, creating a truly immersive table-read experience.

But first, we need to assemble our cast! There’s no need to call up any agents or hold auditions — we will build our cast from the ground up using AI.

Finding the Perfect Voice

This is where Google Gemini again plays the casting director role. We’ll use Gemini to create detailed character bios for each role in your screenplay. Think age, gender, personality traits, even accents! The more information you give Gemini, the better it can help you find the perfect voice.

Next, we’ll dive into the Common Voice dataset. This treasure trove of voice recordings is our source for realistic character voices. Browse through the dataset, listen to different samples, and find voices that match your Gemini-generated character bios. It’s like having a massive library of voice actors at your fingertips!

From Text to Speech with F5-TTS

Once your voice actors (voice samples) are lined up, it’s time to use F5-TTS. This powerful text-to-speech model will use your chosen voice samples to generate dialogue for each character.

Efficiency is Key!

We’ll use Node.js to batch-process the dialogue, so you don’t have to generate each line individually. This will save you much time, especially with a long screenplay.

And there you have it! You’ve officially cast your AI voice actors and generated their lines. Pretty cool. In the next section, we’ll polish those voices and prepare them for their big debut.

Audio Production

Okay, so we’ve got our scenes and voices, but let’s be honest — sometimes, AI voices can sound rough around the edges. Don’t worry, though! We won’t let a little robotic twang ruin our little experiment. It’s time to wear our audio engineer hats and polish those voices until they shine!

Audacity to the Rescue!

Audacity is our weapon of choice for this audio cleanup mission. It’s a free, open-source audio editor that’s packed with features. If you haven’t already, download it and get ready to work some magic!

Noise Reduction Ninja

First things first, let’s tackle those pesky background noises. Remember those voice samples from the Common Voice dataset? Well, they were only recorded in professional studios. You might hear some background chatter, hums, or even the occasional dog bark! Audacity has some excellent noise-reduction tools that can help minimize those distractions.

Leveling Up

Next, let’s make sure our voices are balanced. Some characters might be naturally louder than others, and we want everyone to stay aware of the situation. Audacity’s normalization tools will help you even out those audio levels so everyone has their moment in the spotlight.

AI Voices: Limitations and Workarounds

Now, let’s be honest — AI voices aren’t perfect (yet!). One of the biggest challenges is emotional range. You might find that your AI actors sound a bit monotone, even in emotionally charged scenes.

While we can’t magically give our AI voices the full spectrum of human emotion, there are some workarounds. Try experimenting with different voice samples or even layering multiple takes to create a sense of variety.

Ultimately, the voice-generating model could support a markup such as SSML, but for now, we’ll have to live with the limitations of the free model.

Assembling the Table Read

Alright, folks, we’re in the home stretch! We’ve got our scenes, our voices, and our polished audio. It’s time to combine everything and create our AI-powered table read masterpiece! Think of this as the editing room, where all the pieces form the final product.

Automation is Your Best Friend

Remember our friend Node.js? It’s back to help us automate this final step. We’ll build a Node.js pipeline to orchestrate all the tasks we need ComfyUI to perform. This includes things like:

  • Organizing Assets: Ensure all our images and audio files are in the right place.
  • Setting the Stage: Matching each audio file with its corresponding image.
  • Timing is Everything: Calculating the duration for each scene ensures the audio and visuals are synced perfectly.

This automation will save you time and headaches, especially if you have a long screenplay with many scenes.

Note: This article won’t go into the coding specifics, primarily since many of the automation coding logic can be generated with the help of an LLM, even when you don’t have coding expertise.

Lights, Camera, ffmpeg!

Now for the final act: using ffmpeg to bring it all together! ffmpeg is a powerful command-line tool for video editing and processing. We’ll use it to combine our audio and images into a seamless video, ready for the world to see (or rather, hear!).

Post-Production Reflections and Refinements

Okay, take a deep breath. You’ve done it! You’ve successfully navigated the wild world of AI and emerged with an audio-visual table ready to share with the world (or at least your friends and family). But before you start raking in those YouTube views, let’s take a moment to reflect on the journey and see what we learned along the way.

No creative project is ever truly “finished,” right? Besides, for this project, we were more focused on execution rather than results. There’s always room for improvement and refinement. Here are a few things to consider:

  • Voice Quality: They sound monotonous without fine-grain control over the generated voices. But it works since this is a table read and not the final production.
  • Prompt Perfection: The automatically generated image prompts were reasonably good, but I could have done better by manually editing those prompts.
  • Efficiency is Key: Even though I could run the models locally, they did push my GPU to its maximum regarding VRAM usage and GPU processing. It took roughly a day to generate and combine all the different kinds of media files.
  • Model files: The model files are enormous and took a big chunk of my monthly ISP download budget.

Looking back, it’s incredible to realize that I started with just a raw screenplay and, with the help of some incredible AI tools, ended up with an entirely produced table read, complete with visuals and voices. Sure, it wasn’t Hollywood-level production, but the quality was surprisingly good for the first attempt! More importantly, the journey was a crash course in AI filmmaking. I learned a ton about prompt engineering, voice cloning, audio processing, and the power of automation. There were moments of frustration and challenges to overcome, but that’s all part of the learning process, right? And hey, who knows what amazing things I’ll be able to create with more practice and the continued evolution of AI?

Conclusion

Now that you’ve learned the basics, imagine applying these techniques to larger projects. Think full-length feature films, animated series, and interactive narratives where the audience can influence the story! With AI by our side, the creative potential is limitless.

This project focused on creating an audio-visual experience, but AI can do much more. Generating video using AI models has come very far just this past year. I’ll be exploring that in the next phase of this project. The future of filmmaking is here, and AI powers it.

We’re all pioneers in this exciting new world of AI-powered creativity. So go out there, experiment, push the boundaries, and see what amazing things you can create.

Check out my reading list of other Google Gemini articles.

This post was created with the help of AI writing tools, carefully reviewed, and polished by the human author.

--

--

No responses yet