So how good are the AI video generators?

Or adventures in the brave new world of generative AI

Feb 09, 2024

I've been following AI video generators for a while and this week I’ve spent my evenings working through what’s possible.

AI video generators utilize deep learning models, to understand and interpret a wide array of inputs — ranging from text descriptions to existing video clips — and turn these inputs into new, original video content. In theory, they can be used for a variety of applications, from creating realistic video simulations for training and educational purposes to generating animated sequences for entertainment and storytelling.

In practice, most of the tools I’ve found online seem to focus on simple online videos. Although in theory they should be able to synthesize scenes, objects, and characters, and animate them according to specified actions or dialogues, today it’s close to an AI newsreader.

So now, we know where we’re heading, what have we got today?

If you want to create a professional looking video in a fraction of the time and the cost then I think they’re pretty good. It would be fun to start a business scraping Google News and producing an evening broadcast, but I don’t think the world is ready for the Tom Weiss Network.

Of all the ones I’ve tried, the only one I’ve managed to get working in Synthesia. It provides a selection of different avatars and voices, and you just need to provide a script and they’ll talk along.

You always need a test project, so I decided to try and make a demo video for a company that I’m an investor in:

It took me about 30 minutes active time to make, including screen recording the demo, and I think it’s pretty neat, but what did I learn?

Don’t expect it to feel like a real person - it works pretty well for short form video, but I imagine it would grate if you had to listen for more than a minute or so.
Editing is really easy - you can add pauses and gestures in the script and that speeds up and slows down what you do. As with most generative AI, the text input is the meat and potatoes.
Rendering is slow - it took over 15 minutes to render the first version, and then it went into a manual QC mode, which resulted in a re-render. The second version didn’t have the manual QC which means I don’t know if they’re just checking for illegal content or for render issues. The re-render makes me suspect both.
What’s the impact? I don’t have any data points on this, but it probably has less impact than a profession video, but more than a text based post.

So how good are the AI video generators?

Or adventures in the brave new world of generative AI

Discussion about this post