<https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt>


In January 2021, the artificial intelligence research laboratory OpenAI gave a 
limited release to a piece of software called Dall-E. The software allowed 
users to enter a simple description of an image they had in their mind and, 
after a brief pause, the software would produce an almost uncannily good 
interpretation of their suggestion, worthy of a jobbing illustrator or 
Adobe-proficient designer – but much faster, and for free. Typing in, for 
example, “a pig with wings flying over the moon, illustrated by Antoine de 
Saint-Exupéry” resulted, after a minute or two of processing, in something 
reminiscent of the patchy but recognisable watercolour brushes of the creator 
of The Little Prince.

A year or so later, when the software got a wider release, the internet went 
wild. Social media was flooded with all sorts of bizarre and wondrous 
creations, an exuberant hodgepodge of fantasies and artistic styles. And a few 
months later it happened again, this time with language, and a product called 
ChatGPT, also produced by OpenAI. Ask ChatGPT to produce a summary of the Book 
of Job in the style of the poet Allen Ginsberg and it would come up with a 
reasonable attempt in a few seconds. Ask it to render Ginsberg’s poem Howl in 
the form of a management consultant’s slide deck presentation and it would do 
that too. The abilities of these programs to conjure up strange new worlds in 
words and pictures alike entranced the public, and the desire to have a go 
oneself produced a growing literature on the ins and outs of making the best 
use of these tools, and particularly how to structure inputs to get the most 
interesting outcomes.

The latter skill has become known as “prompt engineering”: the technique of 
framing one’s instructions in terms most clearly understood by the system, so 
it returns the results that most closely match expectations – or perhaps exceed 
them. Tech commentators were quick to predict that prompt engineering would 
become a sought-after and well remunerated job description in a “no code” 
future, where the most powerful way of interacting with intelligent systems 
would be through the medium of human language. No longer would we need to know 
how to draw, or how to write computer code: we would simply whisper our desires 
to the machine and it would do the rest. The limits on AI’s creations would be 
the limits of our own imaginations.


Imitators of and advances on Dall-E followed quickly. Dall-E mini (later 
renamed Craiyon) gave those not invited to OpenAI’s private services a chance 
to play around with a similar, less powerful, but still highly impressive tool. 
Meanwhile, the independent commercial effort Midjourney and the open-source 
Stable Diffusion used a different approach to classifying and generating 
images, to much the same ends. Within a few months, the field had rapidly 
advanced to the generation of short videos and 3D models, with new tools 
appearing daily from academic departments and hobbyist programmers, as well as 
the established giants of social media and now AI: Facebook (aka Meta), Google, 
Microsoft and others. A new field of research, software and contestation had 
opened up.

The name Dall-E combines the robot protagonist of Disney’s Wall-E with the 
Spanish surrealist artist Salvador Dalí. On the one hand, you have the figure 
of a plucky, autonomous and adorable little machine sweeping up the debris of a 
collapsed human civilisation, and on the other a man whose most repeated bon 
mots include, “Those who do not want to imitate anything, produce nothing,” and 
“What is important is to spread confusion, not eliminate it.” Both make 
admirable namesakes for the broad swathe of tools that have come to be known as 
AI image generators.

In the past year, this new wave of consumer AI, which includes both image 
generation and tools such as ChatGPT, has captured the popular imagination. It 
has also provided a boost to the fortunes of major technology companies who 
have, despite much effort, failed to convince most of us that either blockchain 
or virtual reality (“the metaverse”) are the future that any of us want. At 
least this one feels fun, for five minutes or so; and “AI” still has that 
sparkly, science-fiction quality, redolent of giant robots and superhuman 
brains, which provides that little contact high with the genuinely novel. 
What’s going on under the hood, of course, is far from new.

There have been no major breakthroughs in the academic discipline of artificial 
intelligence for a couple of decades. The underlying technology of neural 
networks – a method of machine learning based on the way physical brains 
function – was theorised and even put into practice back in the 1990s. You 
could use them to generate images then, too, but they were mostly formless 
abstractions, blobs of colour with little emotional or aesthetic resonance. The 
first convincing AI chatbots date back even further. In 1964, Joseph 
Weizenbaum, a computer scientist at the Massachusetts Institute of Technology, 
developed a chatbot called Eliza. Eliza was modelled on a “person-centred” 
psychotherapist: whatever you said, it would mirror back to you. If you said “I 
feel sad”, Eliza would respond with “Why do you feel sad?”, and so on. 
(Weizenbaum actually wanted his project to demonstrate the superficiality of 
human communication, not to be a blueprint for future products.)

Early AIs didn’t know much about the world, and academic departments lacked the 
computing power to exploit them at scale. The difference today is not 
intelligence, but data and power. The big tech companies have spent 20 years 
harvesting vast amounts of data from culture and everyday life, and building 
vast, energy-hungry data centres filled with ever more powerful computers to 
churn through it. What were once creaky old neural networks have become 
super-powered, and the gush of AI we’re seeing is the result.

AI image generation relies on the assembly and analysis of millions upon 
millions of tagged images; that is, images that come with some kind of 
description of their content already attached. These images and descriptions 
are then processed through neural networks that learn to associate particular 
and deeply nuanced qualities of the image – shapes, colours, compositions – 
with certain words and phrases. These qualities are then layered on top of one 
another to produce new arrangements of shape, colour and composition, based on 
the billions of differently weighted associations produced by a simple prompt. 
But where did all those original images come from?

The datasets released by LAION, a German non-profit, are a good example of the 
kind of image-text collections used to train large AI models (they provided the 
basis for both Stable Diffusion and Google’s Imagen, among others). For more 
than a decade, another non-profit web organisation, Common Crawl, has been 
indexing and storing as much of the public world wide web as it can access, 
filing away as many as 3bn pages every month. Researchers at LAION took a chunk 
of the Common Crawl data and pulled out every image with an “alt” tag, a line 
or so of text meant to be used to describe images on web pages. After some 
trimming, links to the original images and the text describing them are 
released in vast collections: LAION-5B, released in March 2022, contains more 
than five billion text-image pairs. These images are “public” images in the 
broadest sense: any image ever published on the internet may be gathered up 
into them, with exactly the kind of strange effects one may expect.

In September 2022, a San Francisco–based digital artist named Lapine was using 
a tool called Have I Been Trained, which allows artists to see if their work is 
being used to train AI image generation models. Have I Been Trained was created 
by the artists Mat Dryhurst and Holly Herndon, whose own work led them to 
explore the ways in which artists’ labour is coopted by AI. When Lapine used it 
to scan the LAION database, she found an image of her own face. She was able to 
trace this image back to photographs taken by a doctor when she was undergoing 
treatment for a rare genetic condition. The photographs were taken as part of 
her clinical documentation, and she signed documents that restricted their use 
to her medical file alone. The doctor involved died in 2018. Somehow, these 
private medical images ended up online, then in Common Crawl’s archive and 
LAION’s dataset, and were finally ingested into the neural networks as they 
learned about the meaning of images, and how to make new ones. For all we know, 
the mottled pink texture of our Saint-Exupéry-style piggy could have been 
blended, however subtly, from the raw flesh of a cancer patient.

“It’s the digital equivalent of receiving stolen property. Someone stole the 
image from my deceased doctor’s files and it ended up somewhere online, and 
then it was scraped into this dataset,” Lapine told the website Ars Technica. 
“It’s bad enough to have a photo leaked, but now it’s part of a product. And 
this goes for anyone’s photos, medical record or not. And the future abuse 
potential is really high.” (According to her Twitter account, Lapine continues 
to use tools like Dall-E to make her own art.)

The entirety of this kind of publicly available AI, whether it works with 
images or words, as well as the many data-driven applications like it, is based 
on this wholesale appropriation of existing culture, the scope of which we can 
barely comprehend. Public or private, legal or otherwise, most of the text and 
images scraped up by these systems exist in the nebulous domain of “fair use” 
(permitted in the US, but questionable if not outright illegal in the EU). Like 
most of what goes on inside advanced neural networks, it’s really impossible to 
understand how they work from the outside, rare encounters such as Lapine’s 
aside. But we can be certain of this: far from being the magical, novel 
creations of brilliant machines, the outputs of this kind of AI is entirely 
dependent on the uncredited and unremunerated work of generations of human 
artists.

AI image and text generation is pure primitive accumulation: expropriation of 
labour from the many for the enrichment and advancement of a few Silicon Valley 
technology companies and their billionaire owners. These companies made their 
money by inserting themselves into every aspect of everyday life, including the 
most personal and creative areas of our lives: our secret passions, our private 
conversations, our likenesses and our dreams. They enclosed our imaginations in 
much the same manner as landlords and robber barons enclosed once-common lands. 
They promised that in doing so they would open up new realms of human 
experience, give us access to all human knowledge, and create new kinds of 
human connection. Instead, they are selling us back our dreams repackaged as 
the products of machines, with the only promise being that they’ll make even 
more money advertising on the back of them.

The weirdness of AI image generation exists in the output as well as the input. 
One user tried typing in nonsense phrases and was confused and somewhat 
discomforted to discover that Dall-E mini seemed to have a very good idea of 
what a “Crungus” was: an otherwise unknown phrase that consistently produced 
images of a snarling, naked, ogre-like figure. Crungus was sufficiently clear 
within the program’s imagination that he could be manipulated easily: other 
users quickly offered up images of ancient Crungus tapestries, Roman-style 
Crungus mosaics, oil paintings of Crungus, photos of Crungus hugging various 
celebrities, and, this being the internet, “sexy” Crungus.

So, who or what is Crungus? Twitter users were quick to describe him as “the 
first AI cryptid”, a creature like Bigfoot who exists, in this case, within the 
underexplored terrain of the AI’s imagination. And this is about as clear an 
answer as we can get at this point, due to our limited understanding of how the 
system works. We can’t peer inside its decision-making processes because the 
way these neural networks “think” is inherently inhuman. It is the product of 
an incredibly complex, mathematical ordering of the world, as opposed to the 
historical, emotional way in which humans order their thinking. The Crungus is 
a dream emerging from the AI’s model of the world, composited from billions of 
references that have escaped their origins and coalesced into a mythological 
figure untethered from human experience. Which is fine, even amazing – but it 
does make one ask, whose dreams are being drawn upon here? What composite of 
human culture, what perspective on it, produced this nightmare?


A similar experience occurred to another digital artist experimenting with 
negative prompts, a technique for generating what the system considers to be 
the polar opposite of what is described. When the artist entered “Brando::-1”, 
the system returned something that looked a bit like a logo for a video game 
company called DIGITA PNTICS. That this may, across the multiple dimensions of 
the system’s vision of the world, be the opposite of Marlon Brando seems 
reasonable enough. But when they checked to see if it went the other way, by 
typing in “DIGITA PNTICS skyline logo::-1”, something much stranger happened: 
all of the images depicted a creepy-looking woman with sunken eyes and reddened 
cheeks, who the artist christened Loab. Once discovered, Loab seemed unusually 
and disturbingly persistent. Feeding the image back into the program, combined 
with ever more divergent text prompts, kept bringing Loab back, in increasingly 
nightmarish forms, in which blood, gore and violence predominated.

Here’s one explanation for Loab, and possibly Crungus: although it’s very, very 
hard to imagine the way the machine’s imagination works, it is possible to 
imagine it having a shape. This shape is never going to be smooth or neatly 
rounded: rather, it is going to have troughs and peaks, mountains and valleys, 
areas full of information and areas lacking many features at all. Those areas 
of high information correspond to networks of associations that the system 
“knows” a lot about. One can imagine the regions related to human faces, cars 
and cats, for example, being pretty dense, given the distribution of images one 
finds on a survey of the whole internet.

It is these regions that an AI image generator will draw on most heavily when 
creating its pictures. But there are other places, less visited, that come into 
play when negative prompting – or indeed, nonsense phrases – are deployed. In 
order to satisfy such queries, the machine must draw on more esoteric, less 
certain connections, and perhaps even infer from the totality of what it does 
know what its opposite may be. Here, in the hinterlands, Loab and Crungus are 
to be found.

That’s a satisfying theory, but it does raise certain uncomfortable questions 
about why Crungus and Loab look like they do; why they tip towards horror and 
violence, why they hint at nightmares. AI image generators, in their attempt to 
understand and replicate the entirety of human visual culture, seem to have 
recreated our darkest fears as well. Perhaps this is just a sign that these 
systems are very good indeed at aping human consciousness, all the way down to 
the horror that lurks in the depths of existence: our fears of filth, death and 
corruption. And if so, we need to acknowledge that these will be persistent 
components of the machines we build in our own image. There is no escaping such 
obsessions and dangers, no moderating or engineering away the reality of the 
human condition. The dirt and disgust of living and dying will stay with us and 
need addressing, just as the hope, love, joy and discovery will.

This matters, because AI image generators will do what all previous 
technologies have done, but they will also go further. They will reproduce the 
biases and prejudices of those who create them, like the webcams that only 
recognise white faces, or the predictive policing systems that lay siege to 
low-income neighbourhoods. And they will also up the game: the benchmark of AI 
performance is shifting from the narrow domain of puzzles and challenges – 
playing chess or Go, or obeying traffic laws – to the much broader territory of 
imagination and creativity.

While claims about AI’s “creativity” might be overblown – there is no true 
originality in image generation, only very skilled imitation and pastiche – 
that doesn’t mean it isn’t capable of taking over many common “artistic” tasks 
long considered the preserve of skilled workers, from illustrators and graphic 
designers to musicians, videographers and, indeed, writers. This is a huge 
shift. AI is now engaging with the underlying experience of feeling, emotion 
and mood, and this will allow it to shape and influence the world at ever 
deeper and persuasive levels.

ChatGPT was introduced in November 2022 by OpenAI, and further shifted our 
understanding of how AI and human creativity might interact. Structured as a 
chatbot – a program that mimics human conversation – ChatGPT is capable of a 
lot more than conversation. When properly entreated, it is capable of writing 
working computer code, solving mathematical problems and mimicking common 
writing tasks, from book reviews to academic papers, wedding speeches and legal 
contracts.

It was immediately obvious how the program could be a boon to those who find, 
say, writing emails or essays difficult, but also how, as with image 
generators, it could be used to replace those who make a living from those 
tasks. Many schools and universities have already implemented policies that ban 
the use of ChatGPT amid fears that students will use it to write their essays, 
while the academic journal Nature has had to publish policies explaining why 
the program cannot be listed as an author of research papers (it can’t give 
consent, and it can’t be held accountable). But institutions themselves are not 
immune from inappropriate uses of the tool: in February, Peabody College, a 
private university in Tennessee, shocked students when it sent out a letter of 
condolence and advice following a school shooting in Michigan. While the letter 
spoke of the value of community, mutual respect and togetherness, a note at the 
bottom stated that it was written by ChatGPT – which felt both morally wrong 
and somehow false or uncanny to many. It seems there are many areas of life 
where the intercession of machines requires some deeper thought.

If it would be inappropriate to replace our communications wholesale with 
ChatGPT, then one clear trend is for it to become a kind of wise assistant, 
guiding us through the morass of available knowledge towards the information we 
seek. Microsoft has been an early mover in this direction, reconfiguring its 
often disparaged search engine Bing as a ChatGPT-powered chatbot, and massively 
boosting its popularity by doing so. But despite the online (and journalistic) 
rush to consult ChatGPT on almost every conceivable problem, its relationship 
to knowledge itself is somewhat shaky.

Q&A

AI explained: why do chatbots make errors?

Show

One recent personal interaction with ChatGPT went like this. I asked it to 
suggest some books to read based on a new area of interest: multi-species 
democracy, the idea of including non-human creatures in political 
decision-making processes. It’s pretty much the most useful application of the 
tool: “Hey, here’s a thing I’m thinking about, can you tell me some more?” And 
ChatGPT obliged. It gave me a list of several books that explored this novel 
area of interest in depth, and described in persuasive human language why I 
should read them. This was brilliant! Except, it turned out that only one of 
the four books listed actually existed, and several of the concepts ChatGPT 
thought I should explore further were lifted wholesale from rightwing 
propaganda: it explained, for example, that the “wise use” movement promoted 
animal rights, when in fact it is a libertarian, anti-environment concept 
promoting the expansion of property rights.

Now, this didn’t happen because ChatGPT is inherently rightwing. It’s because 
it’s inherently stupid. It has read most of the internet, and it knows what 
human language is supposed to sound like, but it has no relation to reality 
whatsoever. It is dreaming sentences that sound about right, and listening to 
it talk is frankly about as interesting as listening to someone’s dreams. It is 
very good at producing what sounds like sense, and best of all at producing 
cliche and banality, which has composed the majority of its diet, but it 
remains incapable of relating meaningfully to the world as it actually is. 
Distrust anyone who pretends that this is an echo, even an approximation, of 
consciousness. (As this piece was going to publication, OpenAI released a new 
version of the system that powers ChatGPT, and said it was “less likely to make 
up facts”.)

The belief in this kind of AI as actually knowledgeable or meaningful is 
actively dangerous. It risks poisoning the well of collective thought, and of 
our ability to think at all. If, as is being proposed by technology companies, 
the results of ChatGPT queries will be provided as answers to those seeking 
knowledge online, and if, as has been proposed by some commentators, ChatGPT is 
used in the classroom as a teaching aide, then its hallucinations will enter 
the permanent record, effectively coming between us and more legitimate, 
testable sources of information, until the line between the two is so blurred 
as to be invisible. Moreover, there has never been a time when our ability as 
individuals to research and critically evaluate knowledge on our own behalf has 
been more necessary, not least because of the damage that technology companies 
have already done to the ways in which information is disseminated. To place 
all of our trust in the dreams of badly programmed machines would be to abandon 
such critical thinking altogether.

AI technologies are bad for the planet too. Training a single AI model – 
according to research published in 2019 – might emit the equivalent of more 
than 284 tonnes of carbon dioxide, which is nearly five times as much as the 
entire lifetime of the average American car, including its manufacture. These 
emissions are expected to grow by nearly 50% over the next five years, all 
while the planet continues to heat up, acidifying the oceans, igniting 
wildfires, throwing up superstorms and driving species to extinction. It’s hard 
to think of anything more utterly stupid than artificial intelligence, as it is 
practised in the current era.

So, let’s take a step back. If these current incarnations of “artificial” 
“intelligence” are so dreary, what are the alternatives? Can we imagine 
powerful information sorting and communicating technologies that don’t exploit, 
misuse, mislead and supplant us? Yes, we can – once we step outside the 
corporate power networks that have come to define the current wave of AI.

In fact, there are already examples of AI being used to benefit specific 
communities by bypassing the entrenched power of corporations. Indigenous 
languages are under threat around the world. The UN estimates that one 
disappears every two weeks, and with that disappearance goes generations of 
knowledge and experience. This problem, the result of colonialism and racist 
assimilation policies over centuries, is compounded by the rising dominance of 
machine-learning language models, which ensure that popular languages increase 
their power, while lesser-known ones are drained of exposure and expertise.

In Aotearoa New Zealand, a small non-profit radio station called Te Hiku Media, 
which broadcasts in the Māori language, decided to address this disparity 
between the representation of different languages in technology. Its massive 
archive of more than 20 years of broadcasts, representing a vast range of 
idioms, colloquialisms and unique phrases, many of them no longer spoken by 
anyone living, was being digitised, but needed to be transcribed to be of use 
to language researchers and the Māori community. In response, the radio station 
decided to train its own speech recognition model, so that it would be able to 
“listen” to its archive and produce transcriptions.

Over the next few years, Te Hiku Media, using open-source technologies as well 
as systems it developed in house, achieved the almost impossible: a highly 
accurate speech recognition system for the Māori language, which was built and 
owned by its own language community. This was more than a software effort. The 
station contacted every Māori community group it could and asked them to record 
themselves speaking pre-written statements in order to provide a corpus of 
annotated speech, a prerequisite for training its model.

There was a cash prize for whoever submitted the most sentences – one activist, 
Te Mihinga Komene, recorded 4,000 phrases alone – but the organisers found that 
the greatest motivation for contributors was the shared vision of revitalising 
the language while keeping it in the community’s ownership. Within a few weeks, 
it created a model that recognised recorded speech with 86% accuracy – more 
than enough to get it started transcribing its full archive.


Te Hiku Media’s achievement cleared a path for other indigenous groups to 
follow, with similar projects now being undertaken by Mohawk peoples in 
south-eastern Canada and Native Hawaiians. It also established the principle of 
data sovereignty around indigenous languages, and by extension, other forms of 
indigenous knowledge. When international for-profit companies started 
approaching Māori speakers to help build their own models, Te Hiku Media 
campaigned against these efforts, arguing, “They suppressed our languages and 
physically beat it out of our grandparents, and now they want to sell our 
language back to us as a service.”

“Data is the last frontier of colonisation,” wrote Keoni Mahelona, a Native 
Hawaiian and one of the co-founders of Te Hiku Media. All of Te Hiku’s work is 
released under what it named the Kaitiakitanga License, a legal guarantee of 
guardianship and custodianship that ensures that all the data that went into 
the language model and other projects remains the property of the community 
that created it – in this case, the Māori speakers who offered their help – and 
is theirs to license, or not, as they deem appropriate according to their 
tikanga (Māori customs and protocols). In this way, the Māori language is 
revitalised, while resisting and altering the systems of digital colonialism 
that continue to repeat centuries of oppression.

The lesson of the current wave of “artificial” “intelligence”, I feel, is that 
intelligence is a poor thing when it is imagined by corporations. If your view 
of the world is one in which profit maximisation is the king of virtues, and 
all things shall be held to the standard of shareholder value, then of course 
your artistic, imaginative, aesthetic and emotional expressions will be 
woefully impoverished. We deserve better from the tools we use, the media we 
consume and the communities we live within, and we will only get what we 
deserve when we are capable of participating in them fully. And don’t be 
intimidated by them either – they’re really not that complicated. As the 
science-fiction legend Ursula K Le Guin wrote: “Technology is what we can learn 
to do.”

This article was adapted from the new edition of New Dark Age: Technology and 
the End of the Future, published by Verso

_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

Reply via email to