Tech's New Frontier Raises a “Buffet of Unwanted Questions”

If tools like DALL-E 2 really are the next great leap, it’s worth thinking about who owns that future, and what we want it to look like.

An AI generated picture of "robots painting pictures on easels detailed dystopian future sci fi landscape."
An AI generated picture of "robots painting pictures on easels detailed dystopian future sci fi landscape." (Midjourney)

I feel very weird about DALL-E 2, OpenAI’s new text-to-image artificial-intelligence engine. If you are unfamiliar, it’s a system that takes written prompts and then can produce at-times stunning high-resolution images in an array of artistic styles.

One reason I feel weird is because I don’t really understand the technology as well as I’d like. I think that I get the broad strokes. I understand that DALL-E 2 was supposedly trained on roughly 650 million image-text pairs that were scraped from the internet. And that it uses that data set to make connections between images and the words that described them. I also know that DALL-E 2 uses a process called diffusion to generate images from text. In a great article for IEEE Spectrum, Eliza Strickland described diffusion as a process “which begins with a random pattern of dots and slowly alters the pattern to create an image.”

All of that makes sense to me in the simplest sense. But go a step further, and I’m lost. Which is probably why the images that DALL-E 2 generates feel to me like magic. Intellectually, I understand that the pictures the engine returns when you type “full body photo of a horse in a space suit” are just the result of a number of assumptions of a machine that has processed a lot of images and can make connections, as well as replace one or more elements of an image with another. I know that DALL-E 2 isn’t sentient and what it does isn’t magic. But big leaps in technology often feel a bit like magic. And because DALL-E 2 spits out colorful, detailed, high-resolution images, it’s an especially evocative use case of artificial intelligence, and its effects feel very profound. Other AI learning engines like GPT-3, which generates human-sounding text, are also powerful pieces of technology, but there’s something about DALL-E 2’s visual nature that elicits a particularly potent response. Typing anything you can imagine into a little box and having it render in front of your eyes in seconds feels like something cribbed from science fiction—like some kind of digital witchcraft.

For the past few months, my Twitter feed has been full of people with early access to DALL-E 2 posting their creations after playing with the technology (as well as with other, less powerful AI text-to-image generators like Midjourney and the unaffiliated DALL-E-mini, now known as Craiyon). Usually the results appear in my timeline as a series of four images alongside the prompt that the person typed in. Some of them are prosaic—like a photo of an animal dressed like a person or a Renaissance painting of a dog. Others, though, are wildly imaginative. There’s a certain type of creative person who, when given access to the tool, tries to push the limits of the technology. Their goal is to try to confuse DALL-E 2 with ridiculous, fantastical prompts or to find the places where the machine cannot match the associations a human brain is capable of. And in the process of trying to stretch the limits of the AI’s associative abilities, they often create something wonderful.

Two of my favorite DALL-E 2 prompt creators are Jason Scott and Andy Baio. For the last few weeks, they’ve been posting delightful, interesting images. Here are some that I love:

The two have entered into an informal competition of sorts. Baio seems to build out some of the more interesting photorealistic renderings, while Scott is constantly trying to break DALL-E 2:

What I love about these prompts is that reading them feels a bit like being granted access into the pairs’ brains. Scott is an artist, as well as a self-described “history boy” with a deep reservoir of random knowledge. His background means he’s able to force DALL-E 2 to spit out extra-weird stuff. So instead of asking the engine to generate a “lion using a computer,” his brain adds in an extra complication, and we all get “exquisite royal tapestry depicting a lion using a computer.”

I don’t have access to DALL-E 2 yet, so I reached out to both of them to get their thoughts about the technology, the conflicted nature of AI-generated art, and what makes for a great prompt.

“I’m trying to satisfy my own curiosities,” Baio told me. “Often that means throwing out interesting or funny juxtapositions—it seems that I’m usually drawn to high-low culture mashups.” Like this:

Baio told me he doesn’t like to be overly specific with his prompts. “I want to give it enough room to generate creative results,” he said. In the process, he told me, he’s learning DALL-E 2’s limitations. The engine doesn’t generate text well at all—it can generally only create short words. It’s bad at creating faces, and most trademarked material and images of famous people are blocked so that people don’t abuse the software. But Baio has also found more interesting limitations. “Anything that’s an opposite, like a horse riding a man or a hand with six fingers, is a real struggle for it,” he said. “We want to think that DALL-E has this wild imagination and that it is capable of generating wildly unusual images, but because it’s been trained on tens of thousands of images of people with 10 fingers, it really struggles.”

Scott takes it even further. “It’s so easy to ask this thing to draw a cat in the style of Rembrandt. And that’s fun at first, but now I’m up to abstract stuff. I’m like, ‘Please draw an embarrassed sky.’” Like Baio, he’s fascinated by the unexpected side effects of these mashups. “It’s like, wow, I didn’t know the machine would make these associations to do this!”

For Baio, the most staggering of these associations was when he asked DALL-E 2 to conjure “two slugs in wedding attire getting married, stunning editorial photo for bridal magazine shot at golden hour.”

“It blew my entire mind,” Baio told me. The first time he did the prompt he was too specific, and included words like tuxedos. The results were fine, he said, but boring. It wasn’t until he went a little less obvious that the output changed in a delightful way. “Wedding attire was vague enough that it coughed up all these odd variations. The results were so unique,” he said. “One slug had a flower on its head; another had this cottontail honeycomb headdress. It was all stuff I would never have thought of, and yet it all made total sense.” For Baio, it was one of the moments where DALL-E 2 behaved as if it were truly intelligent, or at least imaginative in an almost human way. When he posted it to Twitter, one person responded, “This is fucking remarkable and I wonder if we have a Mechanical Turk situation here. It’s too good.”

Listening to Scott and Baio detail their prompt creations underscored for me that DALL-E 2 is a tool that one can learn to master, not unlike, say, Photoshop. Over time, one can develop an understanding of how to manipulate the engine.

Among DALL-E 2 aficionados, prompt crafting has become its own very specific art. Guy Parsons, a researcher testing the AI engine, wrote an entire ebook about prompt crafting complete with suggestions, hacks, and shortcuts to get DALL-E 2 to understand what a person is really after. Reading through the prompt book is kind of like reading a guide to writing descriptive fiction; it is full of pages of hyper-specific adjectives as well as technical image genres to help people get closer to the specific vibes they are trying to evoke. For example, DALL-E 2 is especially adept at generating an image in the style of a specific popular artist, so adding “Annie Liebovitz” to a photo descriptor or “Picasso” to a painting descriptor will yield solid results, because the AI has a great deal of images to draw from.

But DALL-E 2’s ability to reproduce in the style of specific artists is also part of the reason that many observers—myself included—have real reservations about the technology.

“DALL-E is trained on the creative work of countless artists, and so there’s a legitimate argument to be made that it is essentially laundering human creativity in some way for commercial product,” Baio told me. “I am convinced that this will all end up going to court some day.” It’s easy to imagine the scenario playing out: A litigious artist frustrated by a viral piece of work based off their style could very well argue that OpenAI took their art without compensation to train its engine and turn a profit. And because OpenAI does limit what images it allows inside the engine and what prompts it blocks, a good lawyer could argue that OpenAI has total control over what goes into its system and what comes out, and only selectively removes art from entities they’re worried about upsetting.

When I talked to Scott about the implications of this technology, he described DALL-E 2 as “an arm holding an open can of worms that can generate more arms opening up more cans of worms.” What he’s saying is that, as with any technological leap, there’s going to be a series of deeply disruptive externalities—in this case, it’s that anyone willing to shell out a few bucks might have access to on-demand art direction.

“There’s a realm of commercial art that this destroys,” Scott said. “It’s the kind of art where the art director commissions you to take stock photos of a McDonald’s for the slush pile so that they can dip into [it] whenever there’s a story that features McDonald’s. It’s a better version of the clip-art department.” As a one-time working artist, Scott sees the potential issues, but he told me he’s also not sure that DALL-E 2 represents some great paradigm shift. “I think what’s happening here is that we’re seeing a new set of people—this time it’s creative types—who didn’t ever believe that their job could be slightly, crappily automated. And they’re going to be understandably mad.”

But in this messy, possibly litigious future, there’s also something akin to the democratization of art. “There are going to be a lot of tough dialogues as a result of this,” Scott said. “Will people really say, ‘You shouldn’t be able to create this type of art if you can’t illustrate or if you can’t compose a photograph’? It goes to the question of who is allowed to be an artist. It’s basically going to be a buffet of unwanted questions,” he said.

There is, of course, the possibility that tools like DALL-E 2 or Midjourney or even Craiyon could unleash a new era of creativity. Both Baio and Scott told me they could see artists and creatives using DALL-E 2 as a starting point for illustration projects or as a mood board for brainstorming. There’s also the possibility of taking an AI-generated image and then importing it into other editing software to personalize it with a human touch. One could imagine game-design companies using DALL-E 2 to generate weird in-game imagery at scale and plugging it into virtual worlds. It’s easy to see DALL-E 2’s work less as an endpoint for artists and more as another building block.

But what unsettles me most about DALL-E 2 is its commercial aspirations. Its parent company, OpenAI, is ostensibly a research laboratory, and its experiments with artificial intelligence are naturally interesting from a scientific perspective. But the company’s decision to open DALL-E 2 up to new users for a fee suggests that research is far from the company’s only goal. Since OpenAI is a for-profit company, it’s reasonable to ask just what the endgame of all this research and development really is. Last week Sam Altman, one of the company’s founders—alongside Elon Musk—tweeted that “AI creative tools are going to be the biggest impact on creative work flows since the computer itself. We are all going to get amazing visual art, music, games, etc.” Personally, I find the phrasing a little ominous. We are all going to get. That doesn’t sound very empowering. Altman isn’t suggesting we’ll be the ones making the art or even having much of a say in it—we will simply get what we are given.

This is especially concerning when you consider the money and raw computing power behind DALL-E 2. As Baio and Scott noted, the technology will open up a series of messy conversations and ethical dilemmas. That’s to say nothing of the potential environmental concerns about the processing power and electricity it takes to make the engine work at scale. If DALL-E 2 continues to make huge technological strides and produce more and more compelling art and media, the stakes are only going to get higher. And simply put, I don’t know that I’d entrust any of those questions to a company that has Elon Musk and Peter Thiel among its ranks.

For the past decade or so, I’ve tried to approach new technologies by gaming out the following question: What happens if the founders and creators succeed in their lofty ambitions? I’m generally less worried about a technology failing completely than I am about it succeeding (and the founders failing to plan for the eventual negative externalities of its success). As with many of OpenAI’s initiatives, DALL-E 2 seems like an immediately successful and useful technology. In my conversation with Baio, we settled on the notion that it’s almost the exact opposite of hyped Web3 or blockchain technologies. While so many of those technologies sound complex, they’re actually, at their core, not that difficult to understand. And despite the hype, it’s hard to have a wow moment using a blockchain-based technology. So much of it is marketing paired with an intellectually interesting cryptographic ledger. A technology like DALL-E 2, on the other hand, is full of those wow moments the second that you see or use it. Its potential utilities are immediately understandable. And while the technology feels relatively simple, under the hood it is quite complex.

I’m not, at present, worried that sentient AI is going to take over the human race. I think that, if anything, DALL-E 2’s constraints show that artificial intelligence is indeed very artificial, and lacks the improvisatory imagination of a human brain. But I am concerned about its power as a cultural and commercial tool. I’m concerned about the innumerable ethical implications that you can feel bubbling up upon using this technology. Most importantly, I’m concerned about who we’re entrusting with this technology and how those groups will use it. I get so much satisfaction watching people make joyful AI-generated art, and seeing it fill my timeline feels like peeking into the future. But if tools like DALL-E 2 really are the next great leap, it’s worth thinking about who owns that future and what exactly we want it to look like.

Charlie Warzel is a staff writer at The Atlantic and the author of its newsletter Galaxy Brain, about technology, media, and big ideas. He can be reached via email.