I feel very weird about DALL-E 2, OpenAI’s new text-to-image artificial-intelligence engine. If you are unfamiliar, it’s a system that takes written prompts and then can produce at-times stunning high-resolution images in an array of artistic styles.
One reason I feel weird is because I don’t really understand the technology as well as I’d like. I think that I get the broad strokes. I understand that DALL-E 2 was supposedly trained on roughly 650 million image-text pairs that were scraped from the internet. And that it uses that data set to make connections between images and the words that described them. I also know that DALL-E 2 uses a process called diffusion to generate images from text. In a great article for IEEE Spectrum, Eliza Strickland described diffusion as a process “which begins with a random pattern of dots and slowly alters the pattern to create an image.”
All of that makes sense to me in the simplest sense. But go a step further, and I’m lost. Which is probably why the images that DALL-E 2 generates feel to me like magic. Intellectually, I understand that the pictures the engine returns when you type “full body photo of a horse in a space suit” are just the result of a number of assumptions of a machine that has processed a lot of images and can make connections, as well as replace one or more elements of an image with another. I know that DALL-E 2 isn’t sentient and what it does isn’t magic. But big leaps in technology often feel a bit like magic. And because DALL-E 2 spits out colorful, detailed, high-resolution images, it’s an especially evocative use case of artificial intelligence, and its effects feel very profound. Other AI learning engines like GPT-3, which generates human-sounding text, are also powerful pieces of technology, but there’s something about DALL-E 2’s visual nature that elicits a particularly potent response. Typing anything you can imagine into a little box and having it render in front of your eyes in seconds feels like something cribbed from science fiction—like some kind of digital witchcraft.
For the past few months, my Twitter feed has been full of people with early access to DALL-E 2 posting their creations after playing with the technology (as well as with other, less powerful AI text-to-image generators like Midjourney and the unaffiliated DALL-E-mini, now known as Craiyon). Usually the results appear in my timeline as a series of four images alongside the prompt that the person typed in. Some of them are prosaic—like a photo of an animal dressed like a person or a Renaissance painting of a dog. Others, though, are wildly imaginative. There’s a certain type of creative person who, when given access to the tool, tries to push the limits of the technology. Their goal is to try to confuse DALL-E 2 with ridiculous, fantastical prompts or to find the places where the machine cannot match the associations a human brain is capable of. And in the process of trying to stretch the limits of the AI’s associative abilities, they often create something wonderful.
Two of my favorite DALL-E 2 prompt creators are Jason Scott and Andy Baio. For the last few weeks, they’ve been posting delightful, interesting images. Here are some that I love:
The two have entered into an informal competition of sorts. Baio seems to build out some of the more interesting photorealistic renderings, while Scott is constantly trying to break DALL-E 2:
What I love about these prompts is that reading them feels a bit like being granted access into the pairs’ brains. Scott is an artist, as well as a self-described “history boy” with a deep reservoir of random knowledge. His background means he’s able to force DALL-E 2 to spit out extra-weird stuff. So instead of asking the engine to generate a “lion using a computer,” his brain adds in an extra complication, and we all get “exquisite royal tapestry depicting a lion using a computer.”
I don’t have access to DALL-E 2 yet, so I reached out to both of them to get their thoughts about the technology, the conflicted nature of AI-generated art, and what makes for a great prompt.
“I’m trying to satisfy my own curiosities,” Baio told me. “Often that means throwing out interesting or funny juxtapositions—it seems that I’m usually drawn to high-low culture mashups.” Like this:
Baio told me he doesn’t like to be overly specific with his prompts. “I want to give it enough room to generate creative results,” he said. In the process, he told me, he’s learning DALL-E 2’s limitations. The engine doesn’t generate text well at all—it can generally only create short words. It’s bad at creating faces, and most trademarked material and images of famous people are blocked so that people don’t abuse the software. But Baio has also found more interesting limitations. “Anything that’s an opposite, like a horse riding a man or a hand with six fingers, is a real struggle for it,” he said. “We want to think that DALL-E has this wild imagination and that it is capable of generating wildly unusual images, but because it’s been trained on tens of thousands of images of people with 10 fingers, it really struggles.”