How AI Image Generation Can Be Used In Visual Marketing
ChatGPT blew everyone’s minds at the end of last year, and now visual AI is blowing everyone’s eyes. It has walked naked and muscular into the bar demanding your clothes, your boots, and your motorcycle — think wisely here. There’s a mountain of hype around all “AI” at the moment, but how can AI image generation be used in visual marketing?
In more ways than we might think!
How does AI image generation work?
There are already tons of resources on how AI image generation works but we’ll stick to the short of it: it’s sort of a reversal of image recognition software, gaffa-taped with random image generation networks. This video goes into the nitty gritty of how the networks are trained – for the curious!
By being trained on billions of images in various ways to end up with a tendency towards certain outputs, you can start with something like “photo of chicken and chips” and end up with something like:
Which, quite frankly, is amazing, and a little scary, and would probably be appetising if I ate meat.
Can image generation be used in marketing?
At the moment, it’s early days. Midjourney is, so far, the most impressive image generator that I’ve seen, but it still has a bunch of problems with accuracy, with bias, and with unclear copyright legality. Many of the artists, photographers, and designers whose images were used to train the algorithms (for most image AIs) don’t presently have any recourse to benefit from that, or, in any meaningful way, to opt out.
The question, then, is what would we want to use AI for in a marketing context?
And the easy answer here is usually something along the lines of “well, we can speed up creative work and remove the need to take pictures and stuff I guess.”
So, let’s play this out.
Let’s imagine that we’ve just landed a big new client, a burger chain, just about to go for a massive expansion. Let’s call them B-Chops Burgs, because why not? They’re ready to spend stacks of cash, and we’re ready to pocket that. So, let’s see what we can get for “a person eating burger”:
A bit wacky, but alright – they might be up for that. Let’s see if we can massage the output a little with “photo advert of a man eating a burger”:
Ok, so there are a few problems here: the dude on the top left has three arms, and the rest seem to have the same archetypal “guy” haircut… but they look pretty cool. B-Chops might be on board with a ‘bold’ style like this.
Let’s see if we can get a more naturalistic image to offset these with “natural photo of a man eating a burger at a burger shop table. Interior lighting, the table is a little messy. He looks happy.”:
And these are also kinda cool, but they all seem to trend towards the same lighting, towards the same framing, towards the same colour palette. So there are only so many iterations we’ll get here before we’re in some sort of fever dream. Perhaps we need to be even more explicit:
“natural photo of a man eating a burger at a burger shop table during the day. He’s wearing a bus driver uniform. You can see a London street through the window. Fluorescent interior lighting. The table is a little messy. He looks happy.”
The lighting is still trending towards the same palette as before, high contrast warm tones with lots of darkness at the frames, but these are very, very impressive. I’ve actually met the guy in the hi-viz before in a Lewisham pub – nice guy!
So, what’s the catch here?
Midjourney is a little over a year old at this point, and it’s good enough to spit out images like these. I think that the technical problems are going to be sorted in no time. We’ve gone from an irritating paper clip popping up at the least convenient times to this in a few decades and, even in the last couple of years, the technology’s bounded from hilariously bad pixel vomit to something amazingly convincing.
The real problem here is that, for all this young man’s rugged good looks, this isn’t a real person and this isn’t a real world. I don’t think the value here is going to be in generating the worlds from the get-go, I think the real value of AI image generators is going to be in the boring parts: helping creatives fix things.
The real value of AI image generators
The ability to generate an image of a handsome man in a burger shop is technically impressive, but not very business-useful – for two reasons:
- Everyone gets bored of seeing the same fundamental composition, lighting, expression, and palette
- It’s actually quite easy to hire a model and book a location out for a few hours of shooting – you have full creative control and you’re capturing real people in real places that can actually be visited.
I don’t want to go to B-Chops Burgs in some in-between-nowhere part of my city in order to enter some moodily-lit half-dream where everyone looks great but none of the menus say anything.
Marketing that works because it displays real, lived experience (even if aspirationally) and ties that to something I can buy or vote for.
So I don’t think getting rid of creative workers is the end-result here, as much as some may think/want that to be the case – the amount of times I’ve seen some viral tweet along the lines of “Graphic Designers Your Job Is Over Look At This Piece Of Crap I Just Generated” is astounding…but I won’t rant here because I’m on the clock.
The real use case is that the ability to generate something from prompts and inputs is the same as the ability in Photoshop to “smart paint”, to magic select objects, and to “heal” images. That is, to do the same thing manually took ages – I remember having to spend literal hours on path-selecting complex objects like people with hair – feathering, fixing, crying. The ability to fix things quickly is what’s helpful.
Let’s say we’ve hired the models, we’ve booked out the burger bar, and we’ve taken the image of the bus driver. We’re happy with the composition but actually we need a lot more room around him for graphic elements for the rest of the design. Historically, we were stuffed. You’d either have to add a process colour around it (like a thick, solid-colour border), which might not suit what you’re going for, or try to do some mirroring & blur, which is very difficult to make look natural.
Now, we can plug that image into Midjourney and yell “computer, enhance” like Harrison Ford in Blade Runner:
This is kind of a big deal. And this is the revolutionary thing – not the ability to generate kinda-average compositions. We’ve already seen this tied into professional tools. Photoshop’s Content Aware Fill, for example, is actually a serious time-saver because so much of a designer’s time is spent having to tweak, massage, and fix images to suit designs.
Of course, it’s not ideal to invent new surroundings to the images, but it’s infinitely better than having to use a completely different image. When removing elements from an image, you’re also not necessarily bothered about full fidelity backgrounds, they just need to look the part enough to get the job done.
This opens up a lot that wasn’t even possible before: “Ah, gosh, we actually didn’t want the guy centred.” – Just pan to the left.
Computer, enhance, enter hyperspeed, find the flux etc etc… I suppose this is pretty Sci-Fi stuff.