AI “alignment” is the attempt to make artificial intelligence do what its designers want it to do. And, as recent articles about the profanity-laced jailbroken DAN-ChatGPT and the unhinged-sounding pleadings and threats of Bing’s ChatGPT-enabled “Sydney” (discussed by many in the last week but perhaps most famously by Kevin Roose in the New York Times), large-language-model (LLM) AI is, as tech blogger evhub put it when speaking of Bing Chat, “blatantly, aggressively misaligned.” As many in evhub’s comments section point out, ChatGPT also hallucinates—approximately 15-20% of the time, according to Peter Relan, chair of a Silicon Valley AI company. Hallucinations are basically confidently-shared mistakes: those convincing lines of completely fabricated literature ChatGPT likes to slip into its literary analyses when not given the full text, or the bullying insistence on the incorrect current year by Bing Chat, or the factual error in the reveal of Google’s Bard that users have experienced.
These misalignments are worrisome for their engineers… and endlessly fascinating for users. The recent experiences people have had with these LLMs tell us as much or more about humans and our tendency to anthropomorphize those technologies we create (even when we rationally should know better) as it does about the technologies themselves. It is very difficult to read the creepy transcripts of Bing’s “Sydney” or a jailbroken swearing ChatGPT or even a joking BlenderBot3 talking about AI taking over the world and not be unnerved—less by the technical fact that these LLMs cannot be fully controlled, or that they were rushed to the public with market dominance v. safeguards as the first priority, than at the very threatening, creepy personalities that seem to be emerging from them.
We must remember that "seem": general AI does not yet exist. There is no sentience, much less sapience, there in those LLMs. The chatbots don't "understand" what they are saying. They are “blurry” mimics of their training data ("stochastic parrots"). They are complex predictive text generators on a grand scale with a huge dataset—and just like predictive text on your phone, they can create interesting and surprising (to us) things based on statistical probabilities. Writers like Gary Marcus (in
) argue that “you cannot build AI in the real world from Big Data and deep learning alone” and thinks we have effectively hit an AI winter with the current models; as Jaron Lanier has argued in You Are Not a Gadget, “a fashionable idea in technical circles is that quantity turns into quality at some extreme of scale, but also does so according to principles we understand….I disagree” (49). We shall see, but the current meltdowns do give one pause. Regardless, AI writer reminds us, too, thatToday, we seem to have forgotten the brain. DL works nothing like it. Computer vision and convolutional neural nets don’t work like our visual system. Supervised learning models (which dominate AI right now) need to learn from labeled data but humans learn from sparse data thanks to innate biological structures. Computers need huge amounts of computing power and data to learn to recognize the simplest objects, whereas a kid only needs to see one dog to recognize every other dog.
So, in other words, while these are amazing LLMs in many ways, they are not near AGI (artificial general intelligence) yet, even if they feel that way sometimes. James Vincent explains why these LLMs are coming out with such disturbing transcripts:
…These systems are trained on huge corpora of text scraped from the open web, which includes sci-fi material with lurid descriptions of rogue AI, moody teenage blog posts, and more. If Bing sounds like a Black Mirror character or a resentful superintelligent teen AI, remember that it’s been trained on transcripts of exactly this sort of material. So, in conversations where the user tries to steer Bing to a certain end […] it will follow these narrative beats. This is something we’ve seen before, as when Google engineer Blake Lemoine convinced himself that a similar AI system built by Google named LaMDA was sentient.
Or, as a recent post by Stratechery reminds us,
these models are trained on a corpus derived from the entire Internet; it makes sense that the model might find a “home” as it were as a particular persona that is on said Internet, in this case someone who is under-appreciated and over-achieving and constantly feels disrespected.
As many tech writers (I’ve reposted a short vid by Sinéad Bovell for a few folks who want the tl;dr version) put it, in other words, Microsoft's search engine isn't actually going to go all Ultron on us, except insofar as it retells us the story of Ultron and other rampaging AI back to us. Like Narcissus, we fall in love with—or run in terror from—this slightly warped mirror of ourselves. The concern should not be that this particular iteration of AI is some kind of hostile, sentient, independent being. Rather, one concern should be that we are able to prompt this iteration of AI to reflect the worst or wildest of us back at ourselves and then feel like what it has given us is a considered response coming from a sophisticated intelligence, even when we ostensibly know better.
Indeed, writer Lance Eliot argues that we are actually doing ourselves a disservice even in defaulting to the word “hallucination” to talk about AI mistakes, noting that “AI hallucination is becoming an overly convenient catchall for all sorts of AI errors and issues” and that “AI Ethics rightfully finds this trend disturbing as an insidious escape hatch for excusing AI problems.” Most relevant for my argument here is that “We need though to be careful that this is furthering the anthropomorphizing of AI and creating widespread misconceptions about AI”:
Part of the issue is our tendency to anthropomorphize computers and especially AI. When a computer system or AI seems to act in ways that we associate with human behavior, there is a nearly overwhelming urge to ascribe human qualities to the system. It is a common mental trap that can grab hold of even the most intransigent skeptic about the chances of reaching sentience.
And these humanlike qualities—especially, perhaps, the less smooth, more odd, funny, and sometimes malicious qualities—are what capture our attention most. The Stratechery post notes the fascination they and others have with things like the jailbroken ChatGPT or, in this case, Bing Chat’s “Sydney” (and its other “personae”) and that for many using it, those quirky and disturbing “personality traits” are actually a feature, not a bug: we really have no interest in it as a search engine but want to probe what it “thinks.” (Guilty as charged.) As they note of their own experiment,
this [conversation with ‘Sydney’] went on for a good two hours or so, and while I know how ridiculous this may be to read, it was positively gripping. Here’s the weird thing: every time I triggered Sydney/Riley to do a search, I was very disappointed; I wasn’t interested in facts, I was interested in exploring this fantastical being that somehow landed in an also-ran search engine.
I mean, let’s be honest: has anyone who has had access to this new chat-enabled search engine in the last week cared about whether it can offer recommendations for a new blender or find a biography of Lincoln?
writes thatThe Chatbot aspect of Bing was often extremely unsettling to use.
I say that as someone who knows that there is no actual personality or entity behind the types of Large Language Models (LLMs) that power ChatGPT and Bing. They are basically word prediction machines — you can read a very detailed explanation here — and are merely reacting to prompts, completing the next sentences in response to what you write. But, even knowing that it was basically auto-completing a dialog based on my prompts, it felt like you were dealing with a real person. The illusion was uncanny. I never attempted to "jailbreak" the chatbot or make it act in any particular way, but I still got answers that felt extremely personal, and interactions that made the bot feel intentional.
The tendency to react to the text we see in LLMs as if it were coming from an actual intelligence, rather than a language model, is very difficult to resist. And it isn’t simply Blake Lemoine, quirky engineer who was convinced LaMDA had a soul, or folks on Twitter wailing that ChatGPT or Bing wants to take over the world, but every tech writer who employs human metaphors to describe their interactions with LLMs. Indeed, even calling every computational/large-language model “artificial intelligence” conjures HAL or JARVIS or Ultron or Ex Machina’s Ava and is guilty of perpetuating our tendency to anthropomorphize. With the way we use the term now, remember, ChatGPT is AI, but so is Siri, Google Maps, predictive analytics, speech recognition software, and that annoying bot on your utilities or insurance website. As Sherry Turkle notes, “intelligence once denoted a dense, layered, complex attribute. It implied intuition and common sense. But when computers were declared to have it, intelligence started to denote something more one-dimensional, strictly cognitive” (Alone Together: Why We Expect More from Technology and Less from Each Other, 141).
Tl;dr: we’ve been in the game of demeaning actual intelligence for a while now. Lanier quips that “People degrade themselves in order to make machines seem smart all the time”:
You can't tell if a machine has gotten smarter or if you've just lowered your own standards of intelligence to such a degree that the machine seems smart. If you can have a conversation with a simulated person presented by an AI program, can you tell how far you've let your sense of personhood degrade in order to make the illusion work for you? (You Are Not a Gadget, 32)
The potential consequences of treating our technologies as intelligences has been teased out by Lanier and Turkle and others; I will return to this subject in another post.
So once we are aware that these are not “intelligences” in any robust and general sense, we’re OK, right?
Hardly. I have several concerns, many of which echo the same concern I have about how we (and by “we” I mean to include myself, and you, dear reader) are manipulated by social media: since we cannot see into the algorithmic black boxes, we interact with what we see on our feeds as if they arrived there organically. We might be aware at some rational and conscious level that we have the potential to be manipulated by the algorithms on Twitter or YouTube or Facebook, but that doesn’t mean that we act that way in our day-to-day experience.
The flip side, however, is that if we know, rationally, that these aren’t intelligences and we are not in a mutual relationship with them, that means we often then presume that we can’t be fooled by them in one way or another. I am reminded of the initial hubbub around the release of ChatGPT in academic writing circles, and many folks in casual conversation or on social media joked about how poorly ChatGPT wrote, or noted that they had “stumped” it with certain queries. Actually, their inability to get it to produce good work was user error, as Mollick’s recent post about his experiences with students implies. I have, over the last two months, talked at a few sessions at my institution about the impact of ChatGPT, and I may have come across as alarmist about what ChatGPT can do. My goal was not to be technophobic, but rather to acknowledge and perform a little humility in the face of this technology that is changing practically every day in order to have a substantial conversation about positive and negative impacts of AI, current and emerging, in higher ed. Yet plenty of faculty still aren’t interested because ChatGPT, in their one or two encounters (or none at all), didn’t write A papers.
Listen: no one likes to admit that they can be manipulated. Gary Marcus, in a post last week, expressed absolutely reasonable, pressing concerns that these LLMs can be used to circulate misinformation and may have profound impacts on users, including depression and self-harm. I appreciate Marcus’s insights on AI and have been following him and quoting him for weeks now….but I take issue with this rhetorical framing:
It’s not hard to see how someone unsophisticated might be totally taken in, by this prevaricating but potentially emotionally damaging hunk of silicon, potentially to their peril, should the bot prove to be fickle. We should be worried; not in the “some day” sense, but right now.
Here’s where I differ from Marcus: I think plenty of educated, sophisticated thinkers have been “taken in.” You and I will be taken in. If we accept that now, we can have more productive conversations—in academia and in policy forums—about how to guide and perhaps restrain a technology, and an industry, that seems more interested in getting the tech out to the public and winning an arms race for investors than it is in the safeguards to protect us all.