A powerful new form of artificial intelligence has burst onto the scene and captured the public’s imagination in recent months: text-to-image AI.
Text-to-image AI models generate original images based solely on simple written inputs. Users can input any text prompt they like—say, “a cute corgi lives in a house made out of sushi”—and, as if by magic, the AI will produce a corresponding image. …
These models produce images that have never existed in the world nor in anyone’s imagination. They are not simple manipulations of existing images on the Internet; they are novel creations, breathtaking in their originality and sophistication.
The most well-known text-to-image model is OpenAI’s DALL-E. OpenAI debuted the original DALL-E model in January 2021. DALL-E 2, its successor, was announced in April 2022. DALL-E 2 has attracted widespread public attention, catapulting text-to-image technology into the mainstream.
In the wake of the excitement around DALL-E 2, it hasn’t taken long for competitors to emerge. Within weeks, a lightweight open-source version dubbed “DALL-E Mini” went viral. Unaffiliated with OpenAI or DALL-E, DALL-E Mini has since been rebranded as Craiyon following pressure from OpenAI.
In May, Google published its own text-to-image model, named Imagen. …
Soon thereafter, a startup named Midjourney emerged with a powerful text-to-image model that it has made available for public use. Midjourney has seen astonishing user growth: launched only two months ago, the service has over 1.8 million users in its Discord group as of this writing. Midjourney has recently been featured on the cover of The Economist and on John Oliver’s late-night TV show.
Another key entrant in this category is Stability.ai, the startup behind the Stable Diffusion model. Unlike any other competitor, Stability.ai has publicly released all the details of its AI model, publishing the model’s weights online for anyone to access and use. This means that, unlike DALL-E or Midjourney, there are no filters or limitations on what Stable Diffusion can be used to generate—including violent, pornographic, racist, or otherwise harmful content.
Stability.ai’s completely unrestricted release strategy has been controversial. On the other hand, the company’s unapologetically open ethos is helping it build a strong community of developers and users around its platform, which may prove to be a valuable competitive advantage.
There is much to be said about the groundbreaking technology that underlies today’s generative AI, but one key innovation in particular is worth briefly highlighting: diffusion models. Originally inspired by concepts from thermodynamics, diffusion models have seen a surge of popularity over the past year, rapidly displacing generative artificial networks (GANs) as the go-to method for AI-based image generation. DALL-E 2, Imagen, Midjourney and Stable Diffusion all use diffusion models.
In a nutshell, diffusion models learn by corrupting their training data with incrementally added noise and then figuring out how to reverse this noising process to recover the original image. Once trained, diffusion models can then apply these denoising methods to synthesize novel “clean” data from random input.
Stepping back, what are we to make of all the recent activity and buzz in this space? Where will things go from here? Below are four hot takes that aim to cut through the noise and give you novel perspectives on the wild new world of generative AI. …
3. Text-to-image AI will unleash a hornet’s nest of copyright, legal, and ethical issues. Don’t expect these to slow the technology down.
Any new technology that offers to profoundly shake up the status quo will generate frictions and challenges with existing societal norms and policy frameworks. Generative AI is no exception.
There are a number of big-picture issues that this technology raises: the ever-present topic of AI-driven job displacement, the looming threat of deepfakes that these models intensify, the philosophical question of what constitutes true art and whether AI can ever create it. There are no easy answers to these questions, and the public discourse about them will continue for years.
There is one near-term issue that is worth briefly touching on here: the question of who owns and has the right to commercialize the images that these models produce.
Can the person who came up with a text prompt and fed it into an AI model take the resulting image and do whatever he or she likes with it (including in a commercial setting)? Or does the organization that built the AI model retain rights to all media that the model produces? What if the AI model is open source?
Complicating things further, consider the fact that the way companies like Google and OpenAI create these models in the first place is by training them on vast troves of publicly available images that those companies do not own, including the work of countless other artists, designers and organizations.
These questions are not just theoretical; they will have very real and immediate business consequences. Whether and how these issues are resolved will have a significant impact on the strategies and opportunities available to companies working with this technology. Entrepreneurs and investors need to pay attention. …
OpenAI’s currently stated policy is that DALL-E’s individual users get full rights to commercialize the images that they create with the model—including the right to reprint, sell, or merchandise the images—but that OpenAI retains ultimate ownership over the original images. Midjourney’s terms of service say something similar.
But when high-stakes disputes involving these images inevitably get litigated, will courts see it this way? This is uncharted territory; no direct legal precedent exists.
Jim Flynn, senior partner at law firm Epstein Becker & Green, provided a concrete example that illustrates the dynamics at play: “If I were representing one of the advertising agencies, or the clients of the advertising agencies, I wouldn’t advise them to use this software to create a campaign, because I do think the AI provider would [currently] have some claims to the intellectual property. I’d be looking to negotiate something more definitive.”
Ultimately, these issues should be seen not as showstoppers for the technology but rather as unresolved points that will be in play as this nascent industry barrels ahead at full speed. Make no mistake: legal ambiguity will not deter entrepreneurs and technologists from pushing forward the state of the art in this field and from building businesses that bring this technology to the masses.