No better way to start the new year by procrastinating on my own blog! So, without further ado, let’s dive into the audio stack of AI filmmaking.
Despite splitting the tech stack into two parts, the visual and audio elements are not mutually independent from each other. The score must work with the scene, the sound effect must match the action, and the lip movement must match the speech. In short, audio and visual have to exist in harmony, as demonstrated in the image below.
Music
My favorite part of editing is finding that perfect piece of music. If you got the right music, you double the magic. Music has the power to render additional meaning or influence the emotion of a scene. Watch the following video for a fun demo.
Some people believe that the end game for AI music generation is about talking to a computer and expect Bach to reincarnate, but I firmly believe that’s not the case. Instead, AI music generation is more about generating inspirations. Granted, if what you are after is generic elevator music, then by all means just type in some words and hit ‘generate’. Sure the infinite monkeys may actually have a stroke of genius once in a long while, but good music simply cannot be generated just by prompting.
Keen readers may ask — why didn’t you take this stand towards AI video generation? Why do you believe prompting can yield good video but not good music?
Good question and let me explain. If you paint a mental picture of a scene, you can usually describe what’s in this footage, how they are interacting with each other and the overall visual style of the scene. In other words, you visualize then verbalize. However, for music, the equivalent of “visualization” is humming a melody. In that case, just grab a keyboard and start composing. Modern softwares like Logic or Ableton allow you to be your one person band already. Text prompts that describe music genre, mood, pace, etc simply do not have the intentionality that is required for good music.
As I result, I never intended for AI music generation tools to be the actual place for production, and the following table does not account for editing features like stemming or in-painting. There are also features like mood, genre, bpm, instrument dials that I don’t care about because I believe the model will just get better at complying with the prompt so those hard coded dials will be obsolete.
Most notably, I find the combination of Upload Audio + Extend feature surprisingly useful in my own creative process. In my 2025 New Year piece, I was stuck for a really long time on one section. Since I was writing this blog, I fed Suno my composition up til that point and asked Suno to extend it. I generated about 20 minutes worth of melodies and in the end found 15 seconds that had the spark. What’s more interesting is that I would never come up with those melodies myself, and deviating from my own creative inertia and toying with the anticipation made this piece much more interesting. Can you guess which section is AI generated melody?
Sound Effect
Moving on, we are now in the sound effect department. Here’s a demonstration on how important sound effects are.
Sorry, wrong video. Here you go again.
Most of the tools only have the ability to generate sound effect through text prompting, but Lightricks, MovieGen and FineVoice have this neat feature that can generate more fitting or tailored sound effects based on the actual video footage.
Voice
The three main voice capabilities are
Text to speech: this comes in handy when you need a narrator that talks over the footage such as a documentary host
Voice changer: you might decide that you don’t like the actor/actress’s voice and want something more cinematic
Voice clone: what if you screwed up an actual voice recording and cannot get the actor/actress to re-record? Try voice cloning, with consent of course.
Next Up
Next, I will be making an AI short film using the tech stack I just went over, and god knows how long that’s gonna take or if I will make anything that I find worth sharing. So I will be publishing on some other fun topics. Stay tuned!