Leveraging AI

120 | Create impactful product videos with AI - Step by Step guide w/ Rory Flynn

September 03, 2024 Isar Meitis, Rory Flynn Season 1 Episode 120

In today’s fast-paced digital world, the combination of AI-driven image and video generation is no longer a futuristic concept—it's a competitive necessity. But how can your business effectively integrate these powerful tools to stand out?

Join us for an exclusive, step-by-step walkthrough designed for everyone who wants to dive into the latest use cases and benefits of merging text and image generation with video creation using leading tools like MidJourney and Runway. From crafting the perfect prompt to generating visually stunning content, we’ll cover the key concepts and strategies you need to know.

Our expert guest, Rory Flynn, a recognized leader in AI and business transformation, will share insider tips and practical examples from his work at the cutting edge of AI technology. With years of experience helping businesses leverage AI for growth, Rory is the perfect guide to take you from concept to execution.

About Leveraging AI

If you’ve enjoyed or benefited from some of the insights of this episode, leave us a five-star review on your favorite podcast platform, and let us know what you learned, found helpful, or liked most about this show!

Isar:

Hello and welcome to another live episode of the Leveraging AI podcast, the podcast that shares practical, ethical ways to improve efficiency, grow your business and advance your career with AI. This is Issar Matis, your host, and I've got an incredible session for you today. I'm really excited about this personally for several different reasons. I'll start with reason number one. Reason number one is video. is the ultimate, communication channel that we have that is not face to face. It's literally the best that we have. why? Because it's the closest thing to face. It's the closest thing to real life. If it's not a face, like if you want to convey an atmosphere or a party or promote a product, video is the best way to do it. The problem with video. Is it's a pain in the ass and it's a lot of money and a lot of effort to get a video production in place. What I mean by that is you have to have a. Crew of people. This could be people who are photographers, videographers, sound people, actors, devices, lighting. Like you need all of that to do a photo shoot, even a very simple thing. if you want to shoot a product on the table with the right lighting and the right environment around it, you usually hire a company to do this for you and they will charge you a lot of money. And it will take a bunch of time for you to get the results. And then you will do your critique and then the thing starts all over again. And we're talking about small productions, thousands of dollars, reasonable size productions, tens of thousands of dollars. And big stuff is, there's no upper limit, right? There's Hollywood. so as much as it's awesome to create video for marketing purposes, it's just time consuming and very costly, at least. until recently. And what happened in the past few weeks, I would say, it's actually months or a year and a half, but the past few weeks have actually been absolutely insane. And if you've been following around what's happening with generative AI for video, that we have. Three to five different models that came out within the past six weeks that are already in a completely different level than anything we had access to before and are available to us either for free or almost free depending on which tools you pick and exactly what you want to do with them, which enables us to do stuff from a marketing perspective that was literally science fiction until eight weeks ago. And so that's reason number one. The reason number two I'm excited is the person that we have with us today, Rory Flynn. Is probably the biggest person on the planet today on creating AI assets. That's both, images as well as videos that could be either just from a prompt or a combination of an image and a prompt. And we're going to talk all about that. So that's the reason number two that Rory is the absolute ninja when it comes to that. And. Reason number three, I'm very excited is even Rory's personal story. Like I'm now in the position that I'm teaching a lot of people and training through my courses, a lot of people. And I always tell people, even the intro to this podcast, right? The intro to this podcast says grow your business, but also advance your career. People. Are finding and getting better jobs and getting paid more money because they know AI. I was now requested to potentially, I hope it's going to happen. Teach AI courses to veterans to help them find, give them higher chance of finding jobs in the civilian world. And I'm really personally very excited about this because I'm a veteran myself and I know how not easy the transition is. Even Rory's story, you'll see the things that he's doing right now versus what he did a few months ago is a completely different scale because he's investing in teaching himself AI. So there's multiple reasons to be excited about this, but that was a very long intro just to say, Rory, my friend, I'm really excited to have you at the show. And I can't wait for you to share your brilliance on how to generate videos with AI for business purposes with us today.

Rory:

Thanks Isar. No, it's really, I really appreciate you having me back. Always fun to do these. just so everyone is aware too, basically the first time I'm giving a full video presentation, just on AI video. So the first time this is happening, I'm really excited to do it. First one, first full video. I've. Broken it into pieces and some other ones, but never like just straight up. Here's what I'm doing on video. So I'm really excited about it, but you put together some stuff. Let me share my screen. We'll get into it. So we don't waste anyone's time here. yeah, those of you who

Isar:

are joining us on the live, whether you're on LinkedIn or on zoom, please tell us where you're from. Tell us what you're excited about when it comes to video production with AI. We will share everything that's on the screen. So if you're not watching this, don't worry about this. We'll tell you exactly what we're doing and what we're seeing and so on. So you can keep track. So if you're, driving or working your dog or washing your dishes or whatever it is that you're doing. you can keep on doing this while understanding what we're going to do. The presentation is mostly for us to have a Linear process to talk about.

Rory:

Yeah, that's perfect. So there's going to be a bunch of videos. I'm going to try to talk through everything that you're seeing, just so that if people are listening at home, they can get some sort of understanding of what I'm saying or what they're, what they should be seeing, but, just to jump right into this, a little bit about myself. My name is Rory Flynn. I am the founder of Systematic AI. I am not a designer. I am a marketer by trade, mostly coming from the world of email marketing, paid media, e commerce. That's where I've been for majority of my career. But, really, the business that I found is not necessarily that, really what I'm doing now is I'm working in my business called, Systematic AI, really what we do is I consider ourselves operational AI company, meaning that we look into businesses, we find holes in their operations, and then we plug them with conventional AI tools. So do a number of things between training, consulting, and then also some done for you, creative services, just to help businesses, inject this into their workflows and just make their lives a little bit easier, right where it started. this was being quite candid. The agency that I was working with, we were not doing so great. If anyone's been in a marketing agency before, you know a lot of common problems. it's always insane creative needs. There's always minimal assets to work with. And oftentimes there's no bandwidth. So it's pretty common. I feel like a very relatable. Sort of position for anyone that's worked in marketing before now, we had 90 plus clients, right? And that's a pretty hefty load, especially with a smaller team. And at that point, we're working on if we just isolate even Email marketing that we were doing like I said for the e commerce side of things, 90 clients We're probably doing about 10 emails per month For them. that's 900 emails just right out the gate. Then you extrapolate that over the course of the year. We're talking pretty big numbers of emails, but also not even including like revisions or things along those lines. So it's something that with a smaller team, that's trying to be more nimble. You're just always overworked. And how are we going to keep. the boat floating and not sinking, and that's where AI came into play. We just jumped in it really soon. And, luckily we had a little bit of runway, not, no pun intended to play with this, before we needed to really like actually use it. So we got a lot of testing time and we started about November of 2022. So it was really good for us to have that runway, to test things, to have, a lot of opportunity before we had to jump into everything. that being said.

Isar:

I want to pause you just for one second to generalize what you said, because this is not just for agencies, right? I work myself at small companies, medium companies, and really large companies. Like the largest company I worked for was a corporation with, 10, 000 employees all over the world. it doesn't matter whether you have a marketing team of one, Or 150, which is what we had in the large company. They're always stretched in. There's never enough resources and bandwidth. So this is relatable, even if you're not in an agency. And by the way, this is true, not just for marketing departments. It's the same in sales department. It's the same in finance. It's the same in customer service. We're always, a company is trying to run as efficiently as possible. So they're always just slightly less resources than you think you need. And this. Process that we're going to talk about specifically in marketing and production, you can do with AI across every aspect of your business.

Rory:

Yeah. And even for a business like myself, I'm one person, right? I would not be able to do this 12, 16 months ago, but here we are, right? So it's, it's interesting because I think a lot of times, you want to look at this from the frame of everything is big business, right? But it's really small. this is the best opportunity we've had for smaller business as well to utilize powerful tools and to not need, a ton of different people to help them. But, with that being said. Basically, what how we were using it at my agency, right? Like the standard talking points, we wanted to amplify productivity, reduce costs and generally create happy clients. Now that was achieved because there's a lot of contributing factors that go along the way. But a lot of times just compounding results, right? So compounding results, meaning if we're trimming an hour or minutes off of each individual task across, I don't know, 10 to 20 people, that compounds over the course of a week, that compounds over the course of a month and a year, it's a really easy way for us to get more operationally efficient. that being said, things have changed a little bit. My, my life is a little bit different now. I partnered with this. Massive agency. They're called superside. They're awesome. They're a design firm. They do creative for some of the biggest companies in the world. They have an 800 person team. And our goal there was to really operationalize AI and push AI as boundaries. that's something that's taken me in a little bit different of a direction. that being said, with that now used to be 90 clients, now 500 plus demand is at an all time high. Our capabilities are seemingly. endless. But with that being said, there's always with these new tools, there's new expectations, right now, people know what we can do. So there's a different level of expectation. So let's talk about video, right? this is something that we've been pushing into a lot lately. And I'm going to go through a bunch of, the landscape, how everything looks, how you can personally take advantage of it, and then you know what to do with it. So the current state here, really, we've seen significant advancement in AI video over the last decade. Let's call it three months. it's extremely powerful now. It used to be novel about, I don't know, even as earlier this year, it wasn't something that I would say I would use on a daily basis, but it's now capable. It can handle a lot of things. It can handle, some being involved in a lot of different processes of your business, and it can be used for a number of different variables. So we'll talk about those, but in the last three months, you've seen like this explosion of tools. The thing about this right now is that they all have different strengths. I'd say every one of them Has their own unique individual selling point. Some are better than others. most of them have, like I said, one particular strength that I'm typically looking for, but a lot of them are multi purpose as well. And I don't, I'll talk about this later, but it's not just video that comes from these things. They can also be turned into gifts. They can be turned into other things, right? That you think about the various channels that you're marketing in, whether that be email marketing or organic social, like it doesn't just always have to be like a cinematic production. It could be something as small as a gif that maybe is your product and, made a little bit of motion, right? Like it doesn't have to be this full scale Hollywood movie. Now, the current tools, the landscape, the, basically what I, think and what I've tested. these tools, they're closed source, majority of them, or all of them here that are listed. There are some open source models. You can do a little bit more with those. It's a way more complex. So we're going to focus on the closed source ones today. so runway gen three recently released, I think probably about a month and a half ago. It's extremely powerful. I love this tool. Luma has been a very strong competitor. To them, with their dream machine, pretty on point in terms of, them and runway cling is new to the game, probably about a lot, last four to five weeks itself, Korea, a little bit different, of an experience there, so you can utilize these things for different, for different end results essentially is what I'm trying to say. And then, we don't know what Sora is going to be. Everyone got really excited about Sora when they dropped it. And then from there, it was like, Radio silence for God knows how long now, eight months. So who even knows, I'm not even considering it in the workflow right now because it doesn't exist to us yet. So just something to obviously, everyone saw that and they put that as the gold standard and it was like, here it is, but we can't use it. So now everyone else is, I think using that as their model. It's build and, drive their ship, but I want to

Isar:

pause just for one second. For those of you who don't know Sora and what the conversation is about, earlier this year, really early this year, I don't remember whether February or something like this open AI shared this, new model that creates videos from a prompt without even an image that looks incredibly realistic. And it's a minute long. Now, back then we had nothing that was even, Oh, and at full HD, like 1080p resolution. And so nothing was even remotely close to that, that we've seen before. And everybody went. Oh my God, that's going to change the world. Only open AI never released it to the public. They've done some limited release to mostly, production companies and production studios, including Hollywood to test this out and get comments and so on. but it hasn't been released yet. There are a lot of rumors, but no actual information on when or how, or if this thing is going to be released, but what it did. I always say that was the, spark the fuse of the dynamite, right? Because all the companies who are in this field are considering being in this field now had a completely different level of benchmark to aim for. And I think that accelerated a lot of what we're seeing today from Luma, from a runway gen three. So one way has been this game for very long. They've probably been in this game the longest. and their models were pretty good. Cool to play with, but like you said, there were not something you could use for a business usage. it was cool for people like you and me to play with it. Oh, this is awesome. I can move this thing around. But it was like things would morph all the time and the resolution wasn't there and the movies were like four seconds long. Like it wasn't something you could actually use. And now we are six months later and it's a completely different universe.

Rory:

100%. And, let's, Runway Gen 2, I think, also set the scene for what we're seeing now, because what was revolutionary was you could take your images and animate them. You didn't have to text, text to video. Essentially, we had to come up with a prompt, and then it would generate itself. I could put a reference image in. And then make that image move. we were, again, we were using this and I'll get into this later in the presentation. Just how do we, even if we can't create a full cinematic experience with this, can we use it for other things? Like I said, GIFs to me is like one of the biggest, Selling points for tools like that. They don't need to be 10 seconds. They only need to be four and they can be really small and sometimes just punchy, right? Like it just visual stimulation. So something like that was very easy instead of going to look for one or find one that has good enough resolution to send it an email or put on organic social. So that was, it's transitioned. Let's put it that way from. You know that to now we can probably use some of this stuff way more effectively, but

Isar:

I'll say I'll add one more thing to that as far as the length of the videos. I'm from Israel. Most of, those of you don't know now, but I'm from Israel originally, and I still watch Israeli TV every now and then there's a bunch of shows as I follow Israeli commercials, TV commercials. The vast majority of them are six seconds long. That's it really? So six seconds and you will be shocked. With what you can do in six seconds, if you have the right creativity. And again, before those six seconds would cost you either a high profile celebrity, because that captures the attention or an amazing production. And now you don't need neither to actually make this thing happen. So you can do incredible things in six seconds because I've been used to consuming that content for a few years now.

Rory:

That's amazing. I didn't even know that, but I love that. That would be a great exercise to try and figure out how to make this kind of sell in six seconds. That's awesome. but yeah, I think, looking at again, How I'm looking at the landscape right now is each one of these tools have different strengths, right? Like runway. I look at it. The quality is definitely superior to a lot of the other ones, in my opinion. and also you can have a 10 second generation time. So those to me are pretty impressive. They have really good, different levels of texture and coloring and lighting. So it comes out looking very professional. Luma is great to me because they have key frames. Now, what a key frame is essentially I can, if I want to do an image reference, right? So if I want to put in a picture of myself, I could put a picture of myself in as the first frame and then the last frame. So if it's a picture of me like this, looking at the camera, and then a picture of me turn to the side, basically in the front and the back end, then it will animate the motion of me turning my head. So it's a good way to direct the beginning and the end shot so that you can fill in the middle, which I love. And it also has really great motion. I think, Luma has some more creativity in its image reference motion versus Runway. Runway I think has more creativity in its text to image or text to video. We'll get into that a little bit. Kling to me has been very solid. I wouldn't say it's my favorite. But it's good for a number of things. It's a good generalist tool. Korea is another example. There are another tool that has. more creative capability. So I would say it's a little bit more if you're looking for some more abstract, less cinematic, Korea might be a good option for you. And then again, Sora, we don't even know. So that's where, that's where everything lies right now. But these are the, I think it's good to know the cons, right? what are, there's obviously pros to it, but the cons, it's still glitchy. and not every generation is going to be perfect every single time you push send, they're very sensitive to the images you put in and the text you put in. and really if you're going to generate anything with text, I found this across the board on every platform. Like anything, if you're going to put a soda can and it has small text on it or a logo, there's potential for some morphing, right? So for branding gets a little bit harder, especially with a product focused image where. You have to, you can't have the Coca Cola logo be distorted, right? That's not going to fly for them. So it's something you have to be aware of. Sometimes it works, sometimes it does not, but that's where, again, we're weighing the balances between, can we save a million dollars, hypothetically, or, can we run this 30 times to get it right? So it's, there's a give and take there. And the other thing that I found with a lot of the tools right now in their current state is that they're expensive. so if you're not particularly advanced with the tools, it's going to take you a lot of rerolls, a lot of regenerations to get to where you want. It ends up. burning through monthly credits, then you got to pay for more. So yeah, I think learning the tools and getting an understanding of how to actually leverage them and having a framework will actually save you some money too, because you'll just generate less crap, if that makes sense. So I think it's important. I want to

Isar:

combine two of the things you said. one is it's never going to work the first time. it might for Rory because he's done this about a thousand times already. So he's, he has the muscle better. trained and he knows how to prompt it better and what starting image will probably work and not going to work. And a lot of it has to do with getting the image correctly in order to get the outcome that you want. but If I, even ignore that for a second, you will still need to run X number of times. And then Rory saying, that's what makes it expensive. Again, going back to expensive, it's still significantly cheaper by probably three orders of magnitude than actually shooting the actual video that you wanted to produce. If you can even do it like the beauty of this is you can have. A dark alley with demons flying in it to create a creepy environment that you just can't do unless you're Hollywood, like it's just not possible. So it's, even if you could do some of the things you can do with this, okay, just a regular person walking down the street, smiling, talking to people on the street that's blocking a street and having, People playing the different role. Like it's not an easy thing to do. So even if you have to run it a hundred times and each times cost you 4 and now you spend 400 on this one video, that's Less than the hourly rate of one of those actors, like that's not including all the other stuff and people that you need. So when he's saying it's expensive compared to, Oh yeah, I use chat GPT for 20 bucks a month and I can use it as much as I want. It's compared to that. It's not compared to producing the actual video.

Rory:

Yes. good clarification there. I also think, the way that correct me if I'm wrong here, I think majority of the tools, basically they release a new model. Yeah. And they're oftentimes big, and I say expensive, meaning they probably eat up a lot of resources and, GPU time and things like that. But as they start to work on these models, they all tend to get cheaper, just like you saw with ChatGPT 4, and then there's 4. 0, which is like significantly cheaper, right? cheaper, with, air quotes on it. It's just something that I think the tools, they just, they release these models and then bang, they get faster and then they get cheaper. And then, it's like a standard progression. It seems like all the rollout.

Isar:

By the way, just to put things in perspective, GPT 4. 0 mini. So the latest fast model by open AI is a hundred times Two orders of magnitude cheaper than GPT 4 when it first came out through the API tokens. So 1 percent of the cost, and that was in what, like seven months?

Rory:

Yeah. So expect the same. I expect the same trajectory for a lot of these tools, right? It's like the same thing. It goes like quality, then speed, Then cost. that's, that seems to be like the natural progression, but this is a, for me really again, understanding what these tools do well and what their strengths are. If I know I need something for, I need to use key frames for something, I'll go to Luma. If I know I need a little bit more. Creativity and quality. I'll go to runway. So it's understanding where these things are. It doesn't have to be. I'm only creating on runway. I'm only creating on luma. I'm only creating on Korea. You can mix and match different things to get to the same end result. A lot of times, too, if using image references, you can have that aesthetic and sometimes you just need different shot types that maybe one tool might not be able to do and the other one can. So it's always thinking about how you can string all of these things together. It's not like solo tool, like you have to be super brand loyal and only use one. But, that, that being said, how are we using them today? Let's talk about that a little bit. So current usage, right? I'm sure this will expand, but this is where we found the most, common sort of practice for it. internally, we use it a lot to move projects along faster or to build a vision. For somebody, right? I don't know what something, what a project could look like externally, we're using it to expand assets and storytelling. Now we'll go into this a little bit internally, meaning like basically what we're doing is if we're doing storyboarding, we can animate a storyboard really quick so we can show somebody what this looks like instead of number one, just providing a sketch on a storyboard, whether that's for a tv spot, whether that's for a meta ad, typically using storyboards to like project plan. Now, something like mid journey, we can take and animate this and put a real look and feel to it and also then take it into runway and then animate it real quick, even just a four second little production there. Can just tell the story better. And we can, if we're working with clients in this regard, like a marketing agency would, showing them that might give them a better perspective of what the end result can look like. So we can trim time off this decision making process, right? That's where it's been really helpful for us. Now, the other thing that we're using it for is basically or why we're using it that way is because it's smaller format, right? What we're doing really is right now, we're using it for things like asset expansion, meaning if we have a product image, we'll take that and turn it into a video. So now one image of a product becomes two assets. Same thing we're doing, we're using it in like stories, reels, things for TikTok, YouTube. So it's like a storytelling enhancement. So if we're creating a reel for something, like typically, there's a lot of shortcuts. In a real, maybe a voiceover, we can add these images or videos in to enhance the story to make things better. Same thing with product advertising, right? Like when we have these product ads, I'm sure you've seen a lot of static ads on your Instagram feed, whatnot, where you're scrolling. Oh, here's a product, here's a, picture of lipstick, right? It's buy now. Great. We can also make that lipstick move. Now that's a really, important thing in my world. Yeah. Because we have a static asset of lipstick, and then we also have a moving asset of lipstick, and we're running multiple tests. So whichever one's gonna work better, that's where we're gonna get more results, right? So we're looking at how do we expand our asset base to then basically test to then optimize to then build more assets that look just like that. So that's the story behind that piece of it. Now, why we're doing that and why we're using it externally in those mediums, mostly on social or mobile, is because they're small format. They don't have to be, on a IMAX screen. It's doesn't have to worry about resolution. they're shorter format, again, social, you don't have to have 30 minute videos. They can be 20 seconds. Same thing with, with a lot of the different, excuse me, social platforms, Whether it be Instagram, Tik TOK, whatever, they're all going to be nine by 16. Aspect ratio, meaning the size of your phone. So again, these are really important to us. And oftentimes there's less scrutiny. I'm going to be quite honest. because in a smaller format, it's really hard to find some of the more particular details. So sometimes we can get away with that stuff. and being quite candid, marketing is a little bit of deception. I hate to break the news, but sometimes it can be right. So that's why we're utilizing it in that fashion. Now, How can you do this? That's what we're going to talk about now, really.

Isar:

just before you dive into this, one of the questions, Curtis is asking, Which tool are you using for storyboards? I assume it's a mix of all of the above, depending on exactly what you need. It depends. Most of the time I'm using Runway, to be quite honest.

Rory:

it's just So mid journey

Isar:

to

Rory:

create the images, and then Runway to create the videos. Okay. That's correct. Good answer. Yeah. I'm just Runway to me has given me the most consistent results and I don't have to end up spending a ton of time on there. Other ones I do, Luma especially is my, probably my secondary. where because of the key frames, right? I wanted to start here and end here. That can be super helpful, especially if you can't get something on runway. So that's where majority of this is coming from, but everything you're seeing here is done on runway right now. So everything we're going to go into right now, but, basically to, to sum up what runway has done in the last, in this gen three models, They have made a huge improvement from the last model, like a significant jump, not just, a random update. This is pretty big. they have a text to video model, so you can text and generate a video. They have an image to video model where you can put a reference image in and generate that. The natural motion is really impressive. As you can see in some of these things here, like the roller skater up in the top, right? Like he's moving. That doesn't really seem like AI, very smooth motion. The body movement is correct. the prompt coherence is strong. So when you're text prompting in there, it's going to listen to you, meaning you have to be better at prompting it, or else it's just going to go off and do its own thing. the output videos, you have an option here, which I love, which is five. five seconds or 10 seconds. 10 second ones are more expensive, of course, from a processing standpoint, but they're really good, especially if you're doing something in slow motion, just a little tip there, cause the slow motion ones, they end up with, basically you get more frames per second. So it's just going to look more smooth and natural. the gen time is surprisingly fast. Their new turbo model. this can be done in less than 30 seconds after a generation. So it's pretty impressive that this can render. That quickly. but the,

Isar:

as their CEO said, it renders faster than you can write your prompt. It's not, that's not a, it's not a joke. Amazing. It's really incredible.

Rory:

Yeah. And the output resolution is seven 20, seven 20 P. So it's not the best yet, but that being said, the other reason why we don't use things in large format is because the, to me, the upscalers in the AI video space are not great images seem to have that figured out. Where you can upscale an image, get really high definition quality. The video is way harder. Hoping that one of these tools really does, makes a concerted effort to put a video upscaler in there. Cause I would love that feature. But right now you have tools like Topaz Labs. They can do some upscaling. Their new model is pretty good. It's just, it runs locally on your own, processing. So it can take a really long time. A 10 second, 30 second video for me. I don't have exceptional hardware. To be quite candid, maybe a 30 second video, if I was trying to upscale, it could take two or three hours. So that's something I'm going to wait for, but regardless, that's why, we're using this tool now for prompting, right? there's always some frameworks that you want to have in going into this. Like I try to keep my prompts very short and punchy first. If they don't work, then I expand them because, really everything is built on tokens the same way chat GPT or any of the large language models are so the more tokens you add in there. Typically, the more confused it can get, especially if your language isn't punchy or pretty compact. anytime you add a lot of fluff words, it's just pulling randomly from its memory and it's going to affect the output. So I try to keep things very short, concise, visceral. oftentimes then I, if something is not working, I'm overly communicative. Meaning if you want someone, like you want a shot of feet walking away from you, like a, like maybe where someone's walking on a sidewalk, but you just want the feet walking, right? Sometimes if you pro just prompt that. It'll have the feet facing forward and walking backwards towards you. So it's a very weird sort of visualization. It's because you didn't tell it walking away from us, right? Like instead of just walking, they're walking away from us. So you're giving it the direction of here we go. We're going to walk. The character is walking forward and we're from shooting it from behind. It's just, it takes one extra step, like trying to explain it to someone who cannot see, right? I think that's the best way to think about prompting is you're explaining something to, it's like you're writing a novel. You want to tell people exactly what's happening so they can build a picture in their mind. now, again, with this, text prompting, I'm going to focus on this first. It's really important in the output with Runway. Runway, you can just put an image in and then press generate and it'll do whatever it wants, but really you control the experience in the text prompting portion of it. now, It's really important because it's controllable. And that's what I mean is that you have significant control over every little aspect of the actual image or video output. But. there's certain things that you need to look at it whenever I take something into account, whether this be doing something on mid journey or doing something on runway, really, I'm breaking things down into its DNA. So an image, right? If you have an image like what is the shot type, the subject and the character, the environment, the colors, the textures, the composition, that's all important. Same thing with video. We break this down. There's just more variables, right? So we try to break this down. I like to use an example of The difference between pizza and a calzone, right? The essentially the same thing, but when you take pizza and you break it down into individual components, it's dough, it's sauce, it's cheese, you can then rebuild those three ingredients into anything else you want. So it's a calzone can come from the same three ingredients. That's how I'm thinking about. Breaking these sort of prompts down right now, the variables here that we look at, there's the shot type, which is important. That's gonna be in both image and video, but the camera movement is something specifically because that's something you can control. Now, you might not think that's a big deal, but if a static, the difference between a static camera and something that's like orbiting in a 360 degree around an object or a person looks totally different and it can be way cooler. So camera motion is a big one to me. Again, we always look up with the subject and action. So who or what is in the image and what are they doing? The environment where it's set, the lighting plays a big role in storytelling. So whether that be, something that's happening in the morning or at night, or you have a silhouette. lighting versus something that's really bright. It tells a different story. Each type of shot, color grading can also be utilized within here. So if you want to keep a consistent look and aesthetic, across a movie or an ad or anything like that, having that color grading in your prompt can be really helpful. cuts and transitions. This actually works in a runway. It wouldn't say it works all the time, but if you want to have specific cuts, like a quick cut to, or a wipe cut to meaning like, Oh, we want to show somebody in one area and then quick cut to them in another area, That can be done. And then also the style, which I find to be the easiest way to guide. runway here. So think about, there's a lot of different styles in there from a 35 millimeter might be your standard site, photorealistic look cinematic is going to be more big and exaggerated. animation that'll give you more of a, an animated illustrative style or something along 3d render or Pixar, things like that. That'll give you more of that, that look that you're looking for. It's basically like a general holding sort of idea for runway to then go. And Oh, that's what I need to do for the whole thing. So those can be really helpful now. So I want

Isar:

to pause it just for one second, because I'm sure some people, especially people listening and not watching the screen lost us with all of these parameters and people like, how do I, what do you want me to do? write all of this down, every single time I'm like, It's just like best practices for any of these AI models, you need a prompt library that has the structure with all the different components in it or the components that are relevant for that shot or have all of them and you can delete the ones that are not relevant for that shot. But then that makes it very easy. You have your camera motion, you have your subject, you have the action, you have the lighting, you have it in a prompt structure and I'm sure you have examples, so we're going to dive into them in a minute. But you have the prompt structure and then just fill in the parameters. and what parameters you have, this can also be a library that you have a cheat sheet on the side in the beginning. And then once you've used them 50, 000 times, like Rory, you know what they are and you know the look that you're looking for. But if you. Have the vision in your head of the outcome you're trying to create. You just need to be able to translate it into professional terms that, again, you don't need to invent. They exist like these camera motions exist. The action exists. The color tones exist. So in the beginning, literally have a cheat sheet on the side of your desk or in a document that you can open from notion or Google docs, it doesn't matter and use the cheat sheet in the beginning until you get a feel of what works, how, and so on. There is no magic, there is literally a prompt structure, and then what can go into every second, every segment of the prompt structure, and then just repeat this 50 times, and you'll know how to do this pretty consistently.

Rory:

Great point, that's exactly it, we will go through some prompt structures here so that it's easier to start with a formula. And then just plug in some stuff, right? Because it's what you said. And it doesn't even have to be off the top of your brain. You can go and utilize a prompt formula in something like ChatGPT. It's can you fill this in for me? Thank you. it can be very useful there. But, I think the biggest hurdle for myself personally, when I started doing the image generators, is You're working in a different dimension, if that makes sense, like images are very flat, right? Like you can create some depth in an image, but you can't control motion within an image. Now, that's where this comes into a little bit bigger of a meta theory here. It's where we're working in four dimensional space. if I'm conceptualizing that correctly, because you can move forward, backward, up, down, diagonal towards the camera away from the camera, right? So there's a lot of different ways you can move and it can really alter your prompting. We have to take that into consideration when we're prompting because you there's a lot of depth within the video. Now it's not as flat as an image. So really, what's important to me when I'm looking at this. is the camera motion and the subject motion direction, right? this is very granular, but those two to me are the ones that have the most effect on the output of a video. And it really just gives you more control if you know what to do with them. So I'll show you a couple of examples here, right? this is a very simple prompt structure. This is someone you, everyone can go and take it use right now. I guarantee you, you will at least trim your, your re prompting down by 80%. So typically I'm using this simple structure. It's shot type or camera movement, then the subject and the action, and then a style, right? So if I'm using an example of that's going to be a 360 degree, orbiting shot, meaning there's the camera, think about it like a drone shot, someone standing in the center of the frame and the camera just circling around them, right? that's an orbiting shot now of a man on a mountaintop, right? And then cinematic is the style. So it's going to look and feel like it's cinematic and the camera is going to orbit around the guy, right? So when we do that, you see the prompt, very simple. This is a couple different ways to change the shot type. So if it's not 360 degree orbiting, all I'm changing here is this camera motion. It's the same prompt every single time. So really, I have, if we're looking here, we have a sweeping camera motion. We have a descending camera motion. We have a zoom out where the camera's zooming way out. We have a panning shot where the camera's moving from left to We have a zoom in where it's zooming in. So really, that camera motion is the one variable here that can change the entire look and feel, right? Because they all tell a different story. Those of you who

Isar:

are listening and not watching, we have six different examples of video clips on the screen that are showing really spectacular images of somebody at the top of the mountain, but each and every one of them very different. And you use them either in different sections or you sequence them to like the zoom in and then the closeup with the face. And then you can actually use these different, camera shots to Tell the continuation of the story as the camera comes in from the sky, turns on the person, zooms in on his face and all of that while he's standing on the top of the mountain with, the scenery around him and the clouds and the sunrise and so on. you can use them either as one offs or as I mentioned, combine them together.

Rory:

Yeah, and that's why it's, that's why I look when you have a structure, right? When, it's just what you're trying to fill in the blank for is. Camera motion and you have, subject is, and the action is man standing on top of mountain, right? Then it's just okay, this is now how we control this little piece. So very simple, but you can see it, the varying effects by one, essentially one or two words in a prompt. So that's why it's important to work off of structure and then honestly to have what you said before is just the end result of what you're trying to look for in the, little bit of vision to try and get there, not to say that you have to have one, but sometimes it's just easier to fill in that blank. now if we go a little bit further, right? Like I want to control more of the variables here, right? so going back to our little variables that we broke down before our DNA of a video, we have our shot type or camera movement. Then we'll have our subject in action again, who's in the scene, what are they doing, we'll have our lighting description, which I think is can be very important in terms of telling a story, the difference between really bright sunlight and a really moody dark light. It just gives you a different feeling. So lighting really shapes emotion. the prompt here, the example is we have close up shot of a man. Wearing sunglasses, walking in the Tokyo streets, strobe lighting, and then rapid flashes creating a staccato effect. So the reason why we put this little lighting description here is because sometimes, strobe lighting. Might be too general and you need to explain it. This is what I'm talking about being direct. rapid flashes create a staccato effect. That's going to tell it, Oh, do this quick flashing motion. So again, I, I put a little compilation together showing you like how these different lighting effects look. So again, if we're looking at this, like the staccato effect, or sorry, the, like we're just looking at for different lighting. effects, they all affect this differently. So have even the difference between warm light and cool light warm is going to be more RNG cool is going to be more blue volumetric is going to be more hazy. There's a lot of different ways to just change these one, this one little factor, and give it way more life. So again, you can look at with lighting, you can go into different categories of lighting, it could be time of day, each one tells a different story, whether that's twilight. Midday. Midnight. They all give you a different sort of look and feel. Again, there's just a way to, again, isolate these little variables. and then show where the output is going to go with them. But you can go really far into the descriptive side of things. Now this is when you want real control. if you have a real vision, just to show you that something like this is possible, right? Like I'm going to use a really grand prompt structure. If you want to call that, I'm asking a lot of it. So basically the prompt here is a cinematic behind the back shot of a character. And then we're going to have their movement. Away from us to an environment plus or environment and details where they are. And we'll do the lighting and the color scheme right now. This is. a larger prompt structure. There's a lot of things going on here. So I'm asking a lot of this. But as you can see, when we rip through this, you have a cinematic behind the back shot. So we're getting it from behind the back. It's following him. So it's as he's going somewhere, right? following him, meaning he's walking away from the camera, right? And it's of the Pope. You have our subject. Heading towards the balcony. So we have him heading towards the balcony. He's walking away from us again, directing exactly what he's doing, through a doorway. So we have him going through a doorway. You'll see here, overlooking a massive crowd. So he's overlooking this giant epic crowd. If you can't see this, and there's this high contrast lighting. there's a big difference between light elements and dark elements, as you can see, as he's walking through the doorway. And then we have this contrasting red and white color. So I asked it to do all these things. And it did this, right? That's not a simple prompt by any means, but you can see as you ask it more, if you're directing it correctly, it can do it. So it's not oftentimes I think a lot of people blame the tools. The tools are capable. It's just how you're instructing them. Same way with chat GPT, same thing with any of the LLMs, any of the image generators. So again, it's a little bit different, but the other thing now that we can do. What this is use image references. So that was just straight text prompting. That's me putting a text into the image generator, pressing enter, and then the video generates. Now I'm going to guide it with an image. So we have a starting point. the image references to me are great for consistency. There might like a, you can't, the one thing with things like Runway, Luma, Kling is if I wrote that text prompt of the Pope that we just saw. And then I did it again, he's going to look different in every single video. He's not going to look exactly the same with an image reference. Now I can use consistent characters, consistent settings, consistent styles, and it'll be very much so similar. So you're thinking branding, you're thinking that look and feel. so they have, the images play a huge influence in guiding the tools. And the one thing I will say about this is it is less controllable with majority of them. Because you're giving it 50, you're giving the runway or luma, like 50 percent of the work, right? Like it generated, already generated the image. Now it has to take that image and then go animate it. So the anim, the base image, as Isara said before, plays a huge role. in all of this. So if it's a really crappy image, to be honest, like it's just going to be a crappy output. What I mean by that is if it's bad resolution, if it's like grainy or fuzzy, like you don't expect to have really great output from runway then. So the base image plays a big role in all of this. And one tip, I don't know if I put this in here or We'll see. if you want to get better motion from runway. If you're generating these images ahead of time, like in mid journey or, Dolly or Leonardo, whatever, sometimes you bake motion into the images themselves beforehand, you're gonna see a lot more good video generations because you're giving it basically a head start. You're giving runway a head start into generating. So this car, for example, when we come back, when it comes back around, you'll see that there's already some motion blur attached into the image. So it looks like the car is already driving. Now that's going to tell it basically what direction to drive. I don't have to prompt it as much. So you can see here, the motion blur, the wheels are spinning, which means it gives it also a good starting point of, okay, the wheels are spinning on the car, meaning it's driving. Sometimes when you prompt something like a car, this is just a really random example here. This is a big problem that I see for a lot of people. They have an image of a car and they want to animate it driving. If the images of the image of the rims are static, meaning you can see all the spokes, you can see like the tires, super clear. It's just going to push the car across the frame. Like it doesn't look like it's driving. It looks like it's hovering with no tire motion, sometimes prompting spinning wheels onto your car in the image generation process can then help the video generation process. Cause it's already doing what you want it to do. So again, it's like the problem solving in multiple ways, really long, sidetracked there, but just wanted to make sure that was a hit on that for a little bit. Now, when you're doing, when you're prompting with images, right? Again, I would say keep the prompts even shorter than they were with the, when you're just text prompting. because there's already the characters, the setting, that's already in there. So really, you want to focus on the motion. I think that to me is the most important is to focus on the motion of what you wanted to do, how you want it to move. cause then that'll help the outcome. Now with this example, I'm going to show you this long prompt here. Again, we have tracking shot. Tracking aerial shot of a white pickup truck. As you can see, the image is already that, but I want it to look like it's tracking the car. And I want to show the car driving, but then I also want to show something that doesn't exist, which is the environment, right? So we have this tracking aerial shot of a white pickup truck. It's driving away from us. So it knows to push the car forward on a muddy road. So we'll get that little like detail in. So we get some mud splashes. Cause it would look pretty weird if this car was driving through. this muddy, wet road without any mud splashes or anything happening there. Dynamic motion is just something I add every once in a while to maybe trigger. It's like a little hack, It works with a lot of prompts. Muted color palette just to keep the consistency because I know I'm going to essentially show a part of the environment that is not available in the picture. So I'm trying to tell Runway what to do there. And then basically we have this second point, which is revealing a vast Wyoming landscape. So when you look at this, we have our tracking aerial shot, right? We have the white pickup truck. It's driving away from us. The mud splashes over here. We have our dynamic motion, the muted color palette. But make sure that when we pan up right here, this stays muted. muted and not turns into this green sort of lush environment, which it's not obviously and then we have revealing a vast Wyoming landscape. So again, we're directing every piece of this from this one image might be an extreme example. I'm just trying to show everyone what is possible when you direct it correctly. Right now, the other thing that we can do here is what's very interesting. All of the images, all the videos you just saw were first frame generation, meaning. You put the image in, and then it'll animate everything that happens after that frame. What Runway can do also is you can basically have it start with the last, or have the last frame as the generation point. Meaning, you put that image in, everything before that image will generate. So you can see what's going on here in these videos. If you're not watching, I'm sorry. It's, basically we have these images of food and the food is super close up in the camera. And when you run this, again, for those who's

Isar:

not watching, it's like a very detailed sandwich or burger that now makes me hungry because it's lunchtime and I didn't eat yet. but very, Amazing like view of a burger and somebody's holding it between his hands, like showing it to the camera on an extreme close up. That's the original image. And then the video outcome is the person like bringing it closer to the camera or the camera zooming in into the person that's holding it and so on.

Rory:

So it's for this, right? The reason I like to use last frame for something that's maybe a close up. Because it'll keep all of that detail from the image. If you have a really good detailed, crisp, sharp image, using it as the last frame, sometimes makes that way more prevalent than it doesn't give runway the opportunity to really go and more fit or take it in a different direction. so it's a great process. Now it becomes a little bit more of like a. like a mind game when you're using the last frame because you have to think about, okay, I want it to end here. How can I make that shot look a little bit more what am I, what do I want to happen before it? So I have to, I can prompt to the end. So it's a little bit crazy to think about it in that terms. it's reverse of what you normally would think. So it's just, I like to use this for certain things like brands. If I'm using a product that has text or. logos or things like that are specific color schemes. I like to use the last frame because I know the details are going to be held in place because they have to be because that's the last frame. It's essentially where it's going to end. So that to me is the one thing for branding, or products or marketing purposes like that. which I would say is super useful. So you look at things like this, you can utilize PNGs, you can utilize static images, you can, like I said, it works best with the last frame. the top image right here is definitely just a PNG, right? I didn't create these. I just pulled these as examples. So I wanted something with a PNG. So if you're not familiar with a PNG, what a PNG file is, Basically, if you have any sort of product image with a transparent background, there's nothing else. It's just like the soda can, or it's just like the lipstick or whatnot. You can plug that in and generate the environment around it. So you can see the last frame here, super helpful because it keeps the details of, rhyme and you see that. Or of the logo that says prime for anyone's listening.

Isar:

First of all, I think this is absolutely brilliant and it's amazing. And I think that's where it becomes really cool when it comes to specific products is the fact that you can use the actual product. So it's impossible right now to recreate. An actual exact product in mid journey or in flux or in any of these tools, you cannot create the exact image of something. But in this particular case, you can use the actual image of the product that you have in the video. So that's the way to do this is to use it as a, as the last frame and then build whatever you want around it and build whatever graphics and motion and so on. And it's perfect for, again, what Rory said in the beginning, social media stuff, shorts, Punchy to the point. Very cool. You can create whatever environment you want around it. And a short video that will be again, perfect for social media. The other thing that I want to add is how do you go the next step? And the next step is. Old school video editing, right? You can create four or five, six, 20 shots with these tools and then use any editing tool of your choice. But if you're a beginner, you can use Canva and just stitch them together into a longer video. So if you have. The story in your head, and if you don't use Chachi PT or Claude or Gemini to help you craft the story of the idea that you have, I wanna promote this to this audience. I want to end up with a closeup of that. This is the field that I wanna get. This is the audience that I'm trying to target. And then have it, and then create the images in Mid Journey or any of the other tools. Bring them here, create the short videos, pick the ones you like that connect to a logical story, and then do the cuts between them in any. Basic traditional editing tool. So you can go beyond three, five, 10 seconds videos to a 20, 30 second video that makes sense because there's a story and you go from one frame to the other over there, going back to what Rory said, starting with images as references that use. consistent reference of the background, the image, the colors in the original images makes a very big difference. Otherwise it will look weird because the transition is not going to be, normal between the images, but that's like the next evolution that you can go after all the incredible, amazing stuff that Rory showed us. Rory, this was Magic, like people who haven't seen this before, or maybe have seen it, but never knew how to do this. you really shared with us, like the whole process from your thought process to the tools, to the prompts, to like everything people need to get started. If people want to learn more from you, follow you, work with you, et cetera, what are the best ways to do that? LinkedIn's

Rory:

a great spot. I'm normally there just sharing what I'm doing. I think it's the best way for everyone to pick up on this stuff and go. So yeah, there's that if you have any, any questions, please feel free to hit me in the LinkedIn DMs. I'm a little bit slow on there just candidly because of there's just a lot of messages, but I will definitely try to get back as much as I possibly can, but it was, hope glad this was helpful. like I said, first time going through all of it. In sequence. So I know there's a little, a couple of rough spots I need to clean up, but it was good to, to lay this all out.

Isar:

No, this was fantastic. And people in the chat saying this was amazing. And I think it was amazing as well. yeah, follow Rory on LinkedIn. Like he really shares. Everything like what he's done today. Every time he figures something out, he puts out these amazing guides together and he just shares them with the world. So I really appreciate you. I appreciate everything you're doing for AI and the community and just helping people out. and like I said, follow Rory. You can learn a lot. If this is what you're trying to learn, how to use imagery and videos from AI, and I really thank everybody. that joined us. We had a pretty big crowd on both LinkedIn and, definitely on the zoom call today. So I appreciate all of you taking an hour out of your day to come and join us and geek out with us on AI and how to do things. And again, thank you Rory so much.

Rory:

Thank you for having me.

Isar:

Have a great day, everyone.

People on this episode