Mental (State) Breakdown: Consciousness in AI explained part 1: Global Workspace Theory
if you like acronyms, you're gonna love this one
This is my first post as an Asterisk AI Fellow!
I work for Eleos AI Research, an organization that researches AI consciousness. My elevator pitch for Eleos’ schtick is “you can think about consciousness in AI in an evidence-based, reasonable way.” The reaction I get to this statement is usually some variant of “oh shit, really?” I think people are surprised because they have overindexed on one of the central questions in this area being called “the hard problem of consciousness.” Another common mistake is to assume that, to be willing and/or able to theorize about consciousness, you have to be some flavor of Berkeley psychonaut with a thick dossier of trip reports, talking about consciousness from your own personal experiences of its more exotic manifestations.
Neither of these assumptions is true. We don’t have to solve the hard problem, or cook up a grand theory of consciousness, or become Erowid all-stars in order to make headway on the topic of AI consciousness. This is good news, because we need to understand AI consciousness better in order to ensure we don’t do a moral atrocity (dismissing consciousness when we shouldn’t) or waste astronomical resources (attributing consciousness when we shouldn’t). And there’s a LOT we have yet to understand when it comes to AI consciousness. There are so many research questions in this space that haven’t been explored yet, or have only been explored cursorily1.
To help stoke the fires of your inquisitive, research-oriented mind, I want to lay out some of the things that might be important to look for when thinking about AI consciousness. These are from a 2023 report called, appropriately enough, “Consciousness in AI.” It’s a great read, but it’s really, really, long2, and requires a good chunk of background knowledge in both neuroscience and AI. It basically breaks down some of the top scientific theories of consciousness and lays out some indicator properties — some of the ways in which they might translate to consciousness in AI systems. So in this post, let’s look at one of these theories. We’re going to start with everyone’s favorite theory of consciousness – no, literally, everyone’s favorite – Global Workspace Theory.
To be clear, this is a very high-level blog post; this is not meant to be a replacement for any formal books or papers, more an amuse-bouche to test your appetite for further reading (and possibly research). I am also writing this in my own voice, as if I were sitting across from you over a pint at the bar. I’m going to use some fuzzy analogies and be a little silly tonally. As a friend said to me, “You really just talk like this.” You might be begging for the sweet embrace of neutral academic language after this.
What is a scientific theory of consciousness?
Basically, an empirically testable model that ties specific cognitive or neural processes to consciousness. If you believe that computational functionalism is legit (which many experts in this area do), then these theories would transfer to AI systems. Leaning on these theories is preferable because looking solely at model behavior can be really unreliable in AI systems.
What is an indicator property?
This paper says that these properties are “indicator properties.” What does that mean? Where did this table come from?
When I say “indicator properties”, I don’t mean “smoke alarm” or “check engine light.” It’s more like “risk factors for heart disease.” When you go to a new doctor for the first time, you might get grilled on things like family history of disease, lifestyle choices, diet, blood panel results, etc. Let’s say Person A eats large amounts of garbage, sits on the couch all day, smokes three packs a day, and has a family history of heart disease. Person B eats like Bryan Johnson, exercises regularly, and doesn’t smoke, and also has a family history of heart disease. A doctor might suggest different treatments or health interventions to these two hypothetical people – the combination of factors means that the risk of heart disease is greater in Person A than in Person B. Person A could theoretically live to be 80 and die from something totally unrelated, whereas Person B could keel over from a heart attack at 403, but generally speaking one would expect Person B to live longer.
Similarly, even though we haven’t solved the “hard problem of consciousness,” from what we do know and given that we have to make some kinds of decisions, it seems prudent to consider how many different indicator properties an AI system has, based on considering the various different scientific consciousness theories available. The more indicator properties an AI system can check, the more attentive we should be about its potential consciousness.
Not all these properties are equally-weighted for importance (that would have been a huge undertaking, probably way out of scope even for a megapaper. Rethink Priorities, however, is taking on this challenge - read more here). In fact, they’re not weighted at all. All the authors claim about each indicator is that, if a system has it, then all things equal, it’s more likely that it’s conscious. As I’ve said before, we’re still in the very early days of figuring out how to approach model welfare – there’s a lot of uncertainty and unknowns – but we’re not at square zero. Maybe square two?
Okay, but why think that “Input modules using predictive processing” or “Generative, top-down or noisy perception modules” make it more likely that an AI system has a heart attack is conscious? How did this table get filled with mouthfuls like that, and what do they have to do with consciousness?
The authors looked at a bunch of scientific theories of consciousness. None of them are obviously true, but they’re all taken somewhat seriously by people who spend all of their time thinking about this kind of stuff, and they each have some evidence behind them. Those theories say things roughly to the effect of “here are some computations that, when the human brain does them, are associated with people being conscious.” They usually don’t say exactly what those computations are, or what it would mean for an AI system to perform them instead of a human. But the premise of this paper is that we can try to do that, and use various computational / architectural properties as “indicators” of consciousness. Looking under the hood, so to speak, is much more reliable than relying solely on model self-reports, which can be quite misleading.

Global workspace theory is one of these scientific theories of consciousness from which we can pull indicator properties. “But I want to hear about all of these theories and their associated indicator properties,” you say, as you choke back tears and grate your teeth. “Why won’t you discuss my favorite scientific theory of consciousness?” As high as reader demand is for computational indicators of consciousness explained in a quippy terminally-online voice, those will have to wait for future posts.
Global workspace theory
All the world’s a burrito bowl, and all the beans and salsas merely players (generated with Nano Banana)
The basic idea for Global Workspace Theory (GWT from here on out) is that the brain is like a bunch of silos and a stage. Human and animal brains use a bunch of specialized systems (modules) that perform certain kinds of cognitive tasks independently and in parallel - one module handles vision, one handles language, one handles emotions, etc. Like the individual pans of ingredients at Chipotle, which are individually delicious but rarely enjoyed in total isolation from one another.
But we don’t just experience ourselves as senses of smell; we’re multimodal girls living in a multimodal world(s)4. We typically experience all these different sensory components together in one coherent mind, a shared area called the global workspace.
Once a module puts something in the global workspace, it can be seen and used by the other modules. Vision informs memory, memory informs language, language informs decision-making. The classic example is to say that the global workspace is like a stage production, but if you’ve ever worked in theater, you know that backstage is often quite chaotic and anything but siloed, so perhaps the global workspace is more like the burrito bowl in my initial analogy. (The Chipotle analogy falls apart like a poorly-rolled burrito when you consider the “vision informs memory” stuff, but I think there’s perhaps something to it. Feel free to argue and drive engagement to my Substack in the comments!) Similarly, you can have a lot of things in the global workspace, but the capacity is limited; there’s only so much room in the burrito bowl. The winners for your attention are the strongest signals (hot sauce) or a thing you pay attention to (your protein).
All this to say, GWT’s core schtick is that an entity is conscious if and only if it makes it into the global workspace and globally broadcasts.5
In meat brains, consciousness has a big network of “workspace neurons” distributed across the board, especially in the frontal and parietal areas of the brain. When you sense something, sensory regions generate a representation, which then gets passed into the global workspace if it’s important or strong enough, and then activity suddenly surges in these workspace neurons, when they process this information. This is called “ignition.” Ignition is a binary switch: it happens, and the thing gets broadcast or it doesn’t. There’s no semi-ignition.
So why is GWT legit? We’re going off a lot of studies that use contrastive analysis (brain activity is measured when conscious and unconscious and then compared). This activity is measured in the brain using fMRI, MEG, EEG, et cetera. When conscious, there’s usually a lot of activity in pretty wide networks in the brain, including the prefrontal cortex, and unconscious states tend to be pretty confined to specific areas.
So why does this matter for AI systems? Well, you can implement something like GWT in AI agents, so it’s not really hypothetical anymore, more a matter of scale and implementation. I could give a much more long-winded explanation, but “it’s here (in some form/to some extent)” and “it’s the leading theory” are pretty good reasons to keep close tabs on this.
“But Larissa!”, you cry, “what are the specific things to look for? I need checklists! I want specifics!”
Cool. Let’s think through this step by step. Let’s address the GWT indicator properties:
GWT-1: Multiple specialised systems capable of operating in parallel (modules)
The AI system has to have specialized systems that can work in parallel. The thing that matters is more the process rather than the exact characteristics. More modules might make it more likely that a system is conscious, but its nto super important.
GWT-2: Limited capacity workspace, entailing a bottleneck in information flow and a selective attention mechanism
We mentioned that a system has to have several modules that can work in parallel (vision, memory, motor systems in humans, for example) and a workspace that is smaller in capacity than the collective capacity of all the modules that can come into play in this workspace. The modules have to be efficient about information-sharing (notably not the case with a lot of transformers), maybe by making thriftier representations of things. This workspace limitation means that there has to be some kind of attention mechanism that chooses what makes it into the workspace.
GWT-3: Global broadcast: availability of information in the workspace to all modules
The workspace makes the information available to all the other modules, and all modules must be able to take input from the global workspace. This means that there has to be some flowing-back of information into the modules from stuff that goes down in the global workspace (this is “recurrent processing,” which lends credence to another theory/indicator property set from the table we introduced earlier).
GWT-4: State-dependent attention, giving rise to the capacity to use the workspace to query modules in succession to perform complex tasks
The attention mechanism needs to be aware of new flashy external information (bottom-up attention) as well as the current state of the system and longer-term goals (top-down attention).
_____________________________________________________________________
It’s hard to be 100% certain how much research around human brains translates to AI ones. While the science of consciousness in AI has begun in earnest, it’s not settled yet. That’s the virtue of the indicators: we can keep track of several variables, see how they’re evolving, and at any given time, come to the most informed conclusion we can.
In future posts, I’ll cover other theories of consciousness and their corresponding indicators, and say a bit about what we should actually do given this approach.
Thank you to Robert Long, Miles Brundage, and the Asterisk AI fellows for feedback and encouragement.
If you have AI welfare research questions of your own, you should apply to come to our conference.
One might even say Robert Long
“consumes a daily glass of The Thought-Experiment Juice That Gives You A Heart Attack” wasn’t on this doctor’s checklist, but in this thought experiment, that does indeed negatively contribute to heart health!
One could also say that many other “Barbie Girl” lyrics are helpful in understanding GWT – “imagination, life is your creation” is also relevant here
Consciousness and the Brain by Stanislas Dehaene is a big source for this section of my post, and is also a pretty easy read as far as consciousness books go. As always, Stanford Encyclopedia of Philosophy also is a great place to start.








I might take burritos a bit too seriously, but I think the metaphor actually holds up, the ingredients go together for a reason. That’s kind of the whole “vision informs memory” thing: each module adds something that changes how the others taste.
One thing I've been wondering about is whether normal LLMs could satisfy the GWT indicator properties by default. Some half-formed thoughts from thinking aloud:
GWT-1: There are in fact two places where individual modules are used in LLMs, experts in MoEs and attention heads. You could think of one or both of these as modules, and the residual stream as the workspace. However I think the analogy is not great. In attention heads, there's no competing for global representation, the outputs are just concatenated and then added to the residual stream. For experts, the expert selection is done based on expected relevance, but this is before the modules have a chance to do their thinking instead of after.
GTT-2: As I speculated above, I think the residual stream is one candidate for being a workspace. But its case seems weak. Layer-by-layer updates to the residual stream are incremental, and so the focus doesn't change dramatically by layer based on the module selected. It changes a lot from token to token, but that's after being discarded after the end of each one.
GWT-3: Both experts and attention heads satisfy this one, they both contribute to the residual stream, and take its current value as input when they're used.
GWT-4: TBH not sure how to evaluate this one. There's not really much flashy new information while a token is being processed.
Overall this doesn't make it seem like the conditions are satisfied, although there is information that is locally computed and globally used.