Fea­ture  ·  Tech­nol­o­gy & Visu­al Cul­ture

The Image and the Machine


How AI image gen­er­a­tion grew from a research curios­i­ty into some­thing nobody quite knows how to cat­e­gorise
 

 

For most of its his­to­ry, mak­ing a con­vinc­ing image required either tech­ni­cal skill, artis­tic train­ing, or both. That changed with remark­able speed. With­in the space of rough­ly five years, AI-pow­ered image gen­er­a­tion moved from pro­duc­ing blur­ry, vague­ly face-shaped smears to ren­der­ing pho­to­re­al­is­tic por­traits, archi­tec­tur­al con­cepts, and fan­ta­sy land­scapes of a qual­i­ty that would have tak­en a skilled illus­tra­tor days. The tech­nol­o­gy did not creep for­ward. It arrived.

Where it came from

The foun­da­tions were laid in 2014 with the intro­duc­tion of Gen­er­a­tive Adver­sar­i­al Net­works — GANs — in which two neur­al net­works are set against each oth­er: one gen­er­at­ing images, one cri­tiquing them, each improv­ing in response to the oth­er. Ear­ly results were mod­est by cur­rent stan­dards, but the prin­ci­ple was estab­lished. By the late 2010s, GAN-based sys­tems were pro­duc­ing syn­thet­ic faces indis­tin­guish­able from pho­tographs, which attract­ed both gen­uine admi­ra­tion and a rea­son­able amount of unease.

The more sig­nif­i­cant shift came with dif­fu­sion mod­els — a dif­fer­ent approach in which an image is grad­u­al­ly recon­struct­ed from noise, guid­ed by a text prompt. DALL‑E, released by Ope­nAI in 2021, brought this with­in reach of ordi­nary users for the first time. Mid­jour­ney, Sta­ble Dif­fu­sion, and Adobe Fire­fly fol­lowed in quick suc­ces­sion, each with a dif­fer­ent empha­sis: Mid­jour­ney for aes­thet­ic rich­ness, Sta­ble Dif­fu­sion for open-source flex­i­bil­i­ty, Fire­fly for pro­fes­sion­al inte­gra­tion with exist­ing cre­ative tools. Plat­forms like Night­Café assem­bled mul­ti­ple mod­els under one roof, adding com­mu­ni­ty fea­tures that gave the whole enter­prise a social dimen­sion it had pre­vi­ous­ly lacked.

What it is used for

The appli­ca­tions divide fair­ly clean­ly into pro­fes­sion­al and per­son­al, though the bound­ary between them is increas­ing­ly porous. On the pro­fes­sion­al side, adver­tis­ing agen­cies use AI gen­er­a­tion for rapid con­cept visu­al­i­sa­tion — pro­duc­ing a dozen mood-board options in the time it would pre­vi­ous­ly have tak­en to brief a sin­gle illus­tra­tor. Game stu­dios gen­er­ate tex­ture vari­a­tions and back­ground assets. Pub­lish­ers com­mis­sion cov­er con­cepts. Archi­tects pro­duce atmos­pher­ic ren­ders of unbuilt spaces. Film and tele­vi­sion use it exten­sive­ly in pre-pro­duc­tion, where the speed of iter­a­tion mat­ters more than the pol­ish of the final image.

Mar­ket­ing depart­ments have adopt­ed it with par­tic­u­lar enthu­si­asm, for the straight­for­ward rea­son that it pro­duces usable visu­al con­tent at a frac­tion of the pre­vi­ous cost. This is, depend­ing on where you stand, either a wel­come democ­ra­ti­sa­tion of visu­al pro­duc­tion or a sig­nif­i­cant struc­tur­al prob­lem for the illus­tra­tion and stock pho­tog­ra­phy indus­tries. Both things are true simul­ta­ne­ous­ly, which is an uncom­fort­able posi­tion that the indus­try has not yet ful­ly resolved.

The tech­nol­o­gy did not grad­u­al­ly replace human image-mak­ing. It appeared beside it, doing some of the same things faster and cheap­er, and left every­one to work out the impli­ca­tions.

Who does it

The user base is broad­er than the tech­nol­o­gy press tends to acknowl­edge. Pro­fes­sion­al design­ers and art direc­tors rep­re­sent one seg­ment — using gen­er­a­tion tools as part of an exist­ing work­flow rather than as a replace­ment for it. A larg­er seg­ment con­sists of hob­by­ists: pho­tog­ra­phers, graph­ic design enthu­si­asts, peo­ple with a visu­al imag­i­na­tion and no par­tic­u­lar train­ing in tra­di­tion­al media, for whom these tools rep­re­sent the first gen­uine­ly acces­si­ble route to mak­ing the images in their heads. Plat­forms with com­mu­ni­ty fea­tures and dai­ly free cred­its have encour­aged a cul­ture of reg­u­lar, habit­u­al cre­ation — peo­ple who gen­er­ate an image every morn­ing the way oth­ers do a cross­word.

There is also a grow­ing cat­e­go­ry of cre­ators who have built com­mer­cial oper­a­tions around AI-gen­er­at­ed work: print-on-demand mer­chan­dise, stock image libraries, self-pub­lished illus­trat­ed books. The results vary con­sid­er­ably in qual­i­ty and orig­i­nal­i­ty, and the mar­ket is becom­ing crowd­ed, but the com­mer­cial via­bil­i­ty is real for those who approach it with some rigour.

Hobby or something more

The hon­est answer is both, in rough­ly equal mea­sure, and the dis­tinc­tion may mat­ter less than it seems. The hob­by­ist argu­ment — that AI gen­er­a­tion is essen­tial­ly a pas­time for peo­ple who wish they could draw — under­es­ti­mates what skilled prompt­ing actu­al­ly involves. Direct­ing a mod­el toward a gen­uine­ly orig­i­nal result, rather than the path of least resis­tance toward the aes­thet­i­cal­ly gener­ic, requires a devel­oped visu­al sen­si­bil­i­ty, patience, and a will­ing­ness to work against the grain of what the mod­el finds eas­i­est to pro­duce. It is not illus­tra­tion, but it is not noth­ing.

The stronger objec­tion is one of orig­i­nal­i­ty. Dif­fu­sion mod­els are trained on exist­ing images, and they show it — they are, at their most unguid­ed, very good at pro­duc­ing work that resem­bles the mid­point of every­thing they have seen. The out­put can be tech­ni­cal­ly flaw­less and aes­thet­i­cal­ly inert. Push­ing beyond that requires the human in the loop to have some­thing spe­cif­ic to say, which returns the ques­tion of mer­it to where it usu­al­ly ends up: not with the tool, but with who­ev­er is using it.

AI image gen­er­a­tion is, by now, nei­ther new nor going away. What it is — art form, pro­duc­tion tool, cre­ative hob­by, or slow-motion dis­rup­tion of an indus­try — depends almost entire­ly on who is doing it and why. Which is, come to think of it, true of most things.

Posts about AI Image Generation

Fiftirs retro sci-fi street scene - commuters, flying busses, drones

Absurdity Day

Nation­al Absur­di­ty Day is cel­e­brat­ed on Novem­ber 20th. It is an unof­fi­cial “fun” hol­i­day, encour­ag­ing peo­ple to be aware—and celebrate—the illog­i­cal and non­sen­si­cal aspects of every­day life.

Read More »

The Toad

The Toad was­n’t a real toad. It didn’t need to be. It was the kind of crea­ture that lurks in the damp cor­ners of child­hood night­mares, in the half-remem­bered warn­ings of old sto­ries.

Read More »

GPT Image Creation

AI – of course AI – says, GPT is rev­o­lu­tion­iz­ing image gen­er­a­tion, par­tic­u­lar­ly with the release of state-of-the-art mod­els like GPT Image 2.0: It promis­es near-per­­fect prompt adher­ence, advanced visu­al rea­son­ing, and the abil­i­ty to gen­er­ate and accu­rate­ly ren­der com­plex text direct­ly inside images.

Read More »