The video explores the current advancements in AI technologies that can decode human thoughts without implanting electrodes in the brain. These technologies, developed at institutions like the University of Texas and the University of Technology Sydney, use fMRI and EEG scans to interpret brain activity and translate it into understandable language. Such innovations have the potential to revolutionize how we communicate, particularly for individuals who are paralyzed or suffer from conditions that restrict verbal communication.
This subject is crucial because it intertwines the ethical implications of AI technologies with their benefits in medical and social contexts. While these technologies promise vast improvements in aiding those with speech impairments or physical limitations, they also challenge our perception of mental privacy. Researchers and ethicists express concerns about the potential misuse of these advancements, emphasizing the need for ethical guidelines and individuals' consent before decoding thoughts.
Main takeaways from the video:
Please remember to turn on the CC button to view the subtitles.
Key Vocabularies and Common Phrases:
1. sanctuary [ˈsæŋktʃuˌɛri] - (noun) - A safe or holy place where one is protected from danger or harm. - Synonyms: (refuge, haven, safe haven)
Charlotte Bronte wrote those words well over a century ago, at a time when the mind was the ultimate refuge of any human being, the sacred sanctuary into which no one could enter
2. decoding [diːˈkoʊdɪŋ] - (verb) - The process of converting coded or encrypted data back into its original form. - Synonyms: (interpret, decipher, decrypt)
A team at the University of Technology, Sydney, has gone a step further, decoding brainwaves into complex speech with an EEG monitor, which, unlike an fMRI scanner that's portable
3. ephemeral [ɪˈfɛmərəl] - (adjective) - Lasting for a very short time, transient or fleeting. - Synonyms: (short-lived, fleeting, temporary)
By way of brief background, we have long sought to understand the ephemeral qualities of mind through observations and measurements of the physical structures that support the mysterious phenomena of thought.
4. phrenology [fəˈrɛnələdʒi] - (noun) - A debunked field that involved the study of the skull's shape as indicative of mental faculties and character. - Synonyms: (pseudo-science, craniology, obsolete science)
phrenology. That was among the early attempts, a deeply flawed effort led by the German physiologist Francis Joseph Gall in the 19th century, seeking to link measurements of the skull to intelligence and personality traits.
5. rigorous [ˈrɪɡərəs] - (adjective) - Characterized by thoroughly considering every detail; meticulous and strict. - Synonyms: (meticulous, strict, precise)
Despite its initial popularity, phrenology was discredited for lack of scientific rigor and its use in justifying racial and social hierarchies.
6. neurons [ˈnjʊrɒnz] - (noun) - The cells in the brain and nervous system responsible for transmitting information through electrical and chemical signals. - Synonyms: (nerve cells, brain cells, neural cells)
Some of these tools, they are invasive probes inserted directly into the brain, capable of measuring the firing of individual neurons.
7. bespeak [bɪˈspiːk] - (verb) - (of an appearance or action) suggesting or indicating something. - Synonyms: (suggest, indicate, signify)
So, again, with the reasonable assumption that a given thought requires bespoke brain activity, and fMRI reading, much like an EEG signal, can be likened, again, to a fingerprint, a fingerprint of thought.
8. proxy [ˈprɑːksi] - (noun) - A figure or representative entity used to stand in for another. - Synonyms: (agent, substitute, stand-in)
Recall that fMRI, functional magnetic resonance imaging that measures blood flow, which is. It's a proxy for brain activity.
9. invasive [ɪnˈveɪsɪv] - (adjective) - Involving the introduction of instruments into the body, often to examine or modify it. - Synonyms: (penetrative, intruding, entering)
Some of these tools, they are invasive probes inserted directly into the brain, capable of measuring the firing of individual neurons.
10. implicated [ˈɪmplɪˌkeɪtɪd] - (verb) - To show someone as involved in a crime or situation. - Synonyms: (involve, connect, associate)
We always consider the privacy and societal implications of what we do.
Can AI Read Your Mind?
The human heart has hidden treasures in secret, kept in silence, sealed the thoughts, the hopes, the dreams, the pleasures whose charms were broken if revealed. Charlotte Bronte wrote those words well over a century ago, at a time when the mind was the ultimate refuge of any human being, the sacred sanctuary into which no one could enter. Unbidden, anything else could be taken from us but our thoughts. They were ours to control. But now, a century and a half later, in a world in which our privacy is under increasing assault from advances in technology and applications in surveillance and social media, Bronte's vision of hidden mental treasures is being revised.
In recent years, researchers in at least two different laboratories around the world have made significant breakthroughs in a quest to decode complex human thoughts with custom built AI powered by large language models. And they've accomplished this without implanting electrodes in a subject's brain. At the University of Texas, this has been accomplished via fMRI scans. In Australia. A team at the University of Technology, Sydney, has gone a step further, decoding brainwaves into complex speech with an EEG monitor, which, unlike an fMRI scanner that's portable. These innovations may 1 day give voice to the voiceless. They may 1 day help doctors to assess a patient's level of consciousness after an accident, help families communicate with a loved one who has had a stroke, vastly enrich the communication of someone who's paralyzed.
But these advances, they may also one day spell the end of a freedom our species has always taken for granted, freedom of inner thought. By way of brief background, we have long sought to understand the ephemeral qualities of mind through observations and measurements of the physical structures that support the mysterious phenomena of thought. phrenology. That was among the early attempts, a deeply flawed effort led by the German physiologist Francis Joseph Gall in the 19th century, seeking to link measurements of the skull to intelligence and personality traits. Despite its initial popularity, phrenology was discredited for lack of scientific rigor and its use in justifying racial and social hierarchies. Nevertheless, phrenology spawned a series of subsequent attempts to use cranial capacity measurements, as well as psychometrics and IQ testing to quantify mental faculties. But a true science seeking to measure the mind only emerged in the late 20th century with the advent of tools capable of probing brain processes themselves.
Some of these tools, they are invasive probes inserted directly into the brain, capable of measuring the firing of individual neurons. But other tools are non invasive external probes that can nevertheless measure brain function. And in this category, the workhorses in use today are eegs and fmris. Now, eegs, as I'm sure most, if not all of you know, that stands for electroencephalography and refers to a device that records the electrical activity taking place in a brain through sensitive electrodes placed on the scalp, capable of measuring the intensity of electric fields produced by brain functions. The role of eegs in mind reading technology relies on the fact that thoughts in the brain are carried by electrical signals. And these electrical signals, they generate electric and magnetic fields.
Now, these are tiny fields, but they are measurable fields. So the basic idea is that these electric and magnetic fields generated by the brain, the thinking brain, they yield a kind of fingerprint of thought. That is, the idea is that each and every distinct thought has a distinct electromagnetic signature, a distinct fingerprint. So if you learn the dictionary between thoughts and fingerprints, between thoughts and electromagnetic fields, then by measuring the fields, you should be able to determine what a subject is thinking. That's the essence of the approach. Now, the approach with fMRI, it follows the same essential path.
Recall that fMRI, functional magnetic resonance imaging that measures blood flow, which is. It's a proxy for brain activity. Scientists have long known that the more oxygen carrying blood is drawn to regions where the more neurons are firing. So, again, with the reasonable assumption that a given thought requires bespoke brain activity, and fMRI reading, much like an EEG signal, can be likened, again, to a fingerprint, a fingerprint of thought. So if researchers can learn how to read those fingerprints provided by an fMRI scanner, they will have learned how to read thoughts. Now, of course, this is all far easier said than done.
Eegs and fmris, they are still relatively coarse measurements of brain activity. So the question is, how far can scientists go in carrying out this mind reading program? Well, cutting edge research, as we will shortly see, is already showing intriguing hints that our unspoken thoughts are not confined to the insides of our heads, but can be detected and decoded by external devices. Now, the potential applications, they are heartening, right? Giving a new means of communication to those silenced by medical maladies. But like so many tools of modern science, these advancements are also cause I for concern. Right? In a world where so few vestiges of privacy remain, is the privacy of our inner thoughts in peril?
In this series of conversations, we're going to talk to some of the scientists on the frontiers of this area of research. And we will also speak with some of those thinkers who are deeply concerned, deeply troubled by the all too real possibility that these advances may 1 day breach the sanctity of our minds. All right, let's bring to the stage professor Michael Blumenstein of the University of Technology, Sydney, where he serves as deputy dean for research and innovation in the faculty of engineering and Information Technology. Welcome, Michael. And joining us as well from Austin, Texas, is Jerry Tang. He's from the Department of Computer Science at the University of Texas. He works on language neuroscience and brain decoding as the lead author of a groundbreaking paper published in Nature neuroscience, describing the successful decoding of natural language non invasively using fmri. Welcome, Jerry.
And I will say, jerry, you look remarkably awake. What time is it there in Austin, Texas? It's almost five. Almost 05:00 a.m. almost 05:00 a.m. well, thank you for your dedication to science and bringing your work to the greater public. So, Michael, let me just quickly begin with you. We're going to get into the detail of the work that both you and Jerry and your teams have undertaken and the accomplishments that you've reached. But before we get to the actual science, does the ethical side of this give you pause? You don't have to give me a deep answer, but is this something that we should worry about from an ethical point of view, or is it something that we're just gonna acclimate to?
I think that's a good question, Brian. My view is that, as you pointed out, there's always lots of positives to the artificial intelligence, to technology more broadly. My view is that in this case, because there is a connection, medical elements and the health field, the health and wellbeing of people, there probably needs to be one extra layer of ethical considerations. The good news is that when it's done in a university, there are ethical standards and approaches that you can use to look at those things. Because we are such ethical professors. All of us.
Exactly, yes. Jerry, how about you? Does it give you. I mean, do you lie awake at night ever worrying about where the work that you and your team are pioneering that we'll discuss in a moment where that might one day lead? Or is that not something that really occupies your thoughts? We definitely think that mental privacy is incredibly important. We think that nobody's brain should be decoded without their full cooperation. And in doing the research that we're doing, we always consider the privacy and societal implications of what we do. So that's a motivating factor for a lot of our work.
Okay. All right, so we'll get to perhaps a little bit of a further discussion on that side of things. But before we do, I just want to get to an understanding of the advances that both of your teams have accomplished. So, Jerry, just give us a rough overview of what it is that you're trying to do, and then we can get into some examples of how far you've gotten. Yeah, so our goal is to take brain recordings from a user and predict the words that the user was hearing or imagining.
So doing this is a two step process. First, we have to train a language decoder on brain responses from the user, and then we can apply this language decoder to new brain responses from that user. So I can start by going through the training process. Basically, we want to draw a connection between words and the user's brain activity. So we want to fit a model that can take in any sequence of words and predict how the user's brain would respond when thinking about those words. And in order to do this, we recorded 16 hours of brain activity from the user while they listened to narrative stories. So just stories from podcasts. And this shows us how the user's brain responds to a very wide range of concepts.
So, using this data set, we can build machine learning models that can take in a sequence of words and predict how that user's brain would respond. So, basically, that's how we train our decoder. We have this data set of brain responses to stories, and we learn to relate words to activity patterns in the person's brain. But do you find that there is a uniformity across the subjects in how their brain responds to the same collections of words, or does it wildly vary? And so it really is something that's iconic, one person to another?
It's a bit of both. We find that the general organization of language is pretty consistent across individuals, but there's also very fine grained differences. So each, when thinking about a concept or hearing a word, this activates a really precise pattern of brain activity in a person's brain. And we find that it's actually very hard to. It's not possible right now to take a decoder, train on one person and apply it to a different person. It seems like the level of precision we need needs to use data from the person being decoded.
And is it independent of the data set that you use? So you said that you use podcasts or news stories. Did you curate those in a special way, or did you just say, there's some popular podcasts and we're just going to grab their content and use it? Was it independent of the details of the data set? Yeah. So in order to create our training set, we just grabbed a bunch of popular podcasts. So podcasts that are like narrative stories, that are very easy to understand, that are entertaining, and that cover a lot of different topics. This gives us a very rich dataset that shows us how the user's brain responds to many different types of concepts.
And do you have some examples? I believe that we do have some initial examples of your system at work. Can we bring up an example? So, I don't know if you're seeing exactly what we're seeing here, but on the left hand side, it says we start to trade stories about our lives. We're both from up north. I gather that on the left hand side, that might be that the data that the person was subsequently supplied with, and they were thinking about that. And I gather on the right hand side is what your decoder gave rise to, which I can read for the audience here. We started talking about our experiences in the area he was born in. I was from the north. So they're not identical word for word between them, but at least as I read them, they feel very similar, as if you sort of caught the essence of the idea as opposed to the word for word details of what the person was responding to. Is that a reasonable summary of where I. The work gets you?
Yeah, we think that's exactly right. We think that our decoder generally recovers the meaning of what a person is hearing or thinking about, but it doesn't always do so in exact words. So often, it will paraphrase what the person is actually hearing. So, for instance, in this example, it takes trading stories and turns it into talking about our experiences. And this is using fMRI, is that correct? That's right.
And so can you just remind the audience of how that old technology works? And in particular, what's it particularly good at for the application that you're using it for? Yeah, so fMRI is a pretty old technology at this point. It basically uses a large magnet to measure brain activity from outside the skull. So a person will lay in the scanner, and the fMRI machine will tell us what the brain activity is like in every three by three by three millimeter cube of the brain. So, one of the big advantages that fMRI offers for our research is that it has really high spatial resolution, so it can tell us what's going on in every small piece of the brain. And as I mentioned earlier, the representations of the meaning of language are fine grained and precisely distributed in a person's brain. So fMRI allows us to measure that.
So if you're training someone on 16 hours of podcasts, I mean, I find that when I listen to a podcast, even for, you know, a half an hour, my mind is wandering for maybe 50% of the time, you know, in 16 hours of podcast, I presume the person's not doing it at one shot. I'm sure you break it up into reasonable chunks, but won't your data set be so noisy because the person's, you know, hearing about, you know, what's happening in the Middle east or in Russia, and they're worrying about what to cook for dinner at the same time? So how do you correct for that? Yeah, that's a great question. That's something that we can't really control from, like, outside the scanner. It's part of the reason why we chose these narrative podcasts. We find that generally, our participants find the podcasts entertaining, engaging, and this helps them pay attention more, and it's also why we collect such large amounts of data. So we have 16 hours of data from each user, and this is far greater than the typical amount of data collected in an fMRI experiment. And so this helps us kind of find the signal in the noise.
Now, there's one other example I think we have, and don't bring it up yet. I wonder if you can just give us a sense of it. In the example that we looked at of your work, the person was long since trained. They went through the 16 hours of podcasts. You got all the data of how their brain responds when they encounter particular words and particular ideas. You then had them read a new sentence, look at a new sentence, presumably, and you were able to decode the gist of it from the system that had been trained in the 16 hours. If a person is not being presented with a stimulus, like reading a sentence or hearing something that you provide for them, but they're just imagining. If they're just allowing their mind to imagine things, can you also apply the system to that situation?
Yeah. So, our decoder is trained on responses while people listen to stories. Um, but one of our main goals is to decode responses when people are just thinking about words. We think that this is one of the main use cases of our approach. We want to help people who, like, struggle to translate their thoughts into words. So we'd like to be able to apply this while people are just thinking in language. So we had a new experiment where we had our same users, and we just told them to imagine telling these stories in their heads without hearing or seeing anything for reference. And we found that our same decoder, which is trained on responses to perceived speech, can also work on responses to imagine speech. And so I think we're sitting on the left hand side here in the gray word cloud, the imagine speech. So Marco leaned over to me and whispered, you are the bravest girl I know. And then you decoded it to. He runs up to me and hugs me tight and whispers, you saved me.
So, again, not a word for word, direct decoding of the thoughts, but certainly it feels like it's in the same vicinity of ideas. And so how do you measure success here? Because it feels good to me. It feels right in the sense that they're close. But do you have an actual mathematical measure of how good the system is decoding the thoughts?
Yeah, that's a great question. That's something that we spent a lot of time thinking about because it's kind of a new problem for us. There are ways to measure how similar two sentences are, but these traditional approaches usually just count the number of words that are shared. So, since our decoder seems to recover like meaning, we needed a metric that captures whether two sentences share a meaning. And so, recently in machine learning, there have been approaches that use neural networks to take in two sequences and assign a similarity score. And it turns out that these types of approaches are highly correlated with human judgments of similarity. So we use these types of approaches to evaluate our decoder. And with that measure, then how well are you doing? I mean, can you typically, the kind of ways that we imagine the result to a general public being expressed is, you know, it gets it right 30% of the time or 80% of the time or something like that. It's a very blunt way of describing things. But is there a version of that assessment that you're comfortable articulating?
Yeah. So, using our metric, it's a bit less clear than that. We can say that our decoder is doing significantly better than expected by chance, but it's also hard to tell exactly how good it could be. So we did another test, which is similar to what you're describing, where we had other participants write multiple choice questions based on the story that the user was hearing, and we looked at whether people could answer these multiple choice questions based on the decoder prediction. So using just the words that the decoder is producing, can people understand the general gist of the story? And I think we found that of the 16 multiple choice questions that we asked, we found that people did very well on nine of the 16. So sort of half of the questions, they did really well, but half were questionable based on that. And so do you see that this is the first step in an ongoing refinement of this that will get to a larger percentage.
In fact, have you seen an improvement over time as you've carried on this research? Yeah, we definitely think that this is just an initial step. We think that for this to be actually really useful, we would want the decoder predictions to be much more precise and also much more consistent. So we've been exploring different ways to improve our model. We think that, like collecting more data, using more users, we also think that, like, the machine learning models that we're using in our decoder are rapidly improving. And by using better models, we might also expect to see better performance.
Excellent. So, Michael, let me now turn to you. You're doing your own version with your group of a similar goal, trying to decode the words that people are thinking. Can you give us a sense of the approach that you and your team are taking?
Yeah, the approach that the team that's working on it at uts is very much. There are some similarities to what was just mentioned there. One thing that I think needs to be said from the outset is the way in which the thoughts are measured. You know, a cap is used, a non invasive cap that can be worn by a user. So not going into an fMRI machine, you're just having a cap? Correct. A cap with sensors that can actually analyze the EEG or brain signals. An EEG, again, I'm sure everyone's familiar, just to give a quick sense, it's an electrical signal generated by the brain, which is in the form of a wave. And basically that is, you know, measured.
And then similarly, we're using words here like encoder and decoder. We actually have a pipeline of activity. It's quite. It actually looks quite complicated, but the, you know, discrete steps are actually pretty interesting and intuitive. Basically, it takes the wave and makes it a little bit more, what we call use feature extraction to make it a little bit more palatable for the computer to sort of process. And the, you know, cutting down of the size of the signal then actually allows manipulation and correlation of the signal so that it can start being cut up into discrete components. So the big thing here is actually trying to separate the waves to actually correlate them to words. So individual words, each wave component to individual words. And we actually use what's called self supervised learning, which is another form of machine learning, to actually not require a data set that is generated by a human to put in there, but rather itself correlates and can actually assign disparate components of the waves to associate with the words.
And so just to compare the measurement process. So Jerry and his team approach are using fMRI. You're using more the electric and the magnetic fields that are created inside the head that you measure externally. Jerry made a point that fMRI is really good at spatial resolution, getting us into order of millimeter cubes inside the brain and knowing what's happening. Each of those little millimeters in your EEG or MEG, if you're going the magnetic version, you don't have that kind of spatial resolution, right?
No, there are two elements which I think are different, but actually quite interesting. One is that the quality of the data, you could say, is a bit coarser, which the point made there by Jerry was that it's very detailed and you can get good high resolution. We've found in the work that's been done that you don't need that for the purpose of actually doing the segmentation into the components of the waves to correlate with the words. The other interesting part is that with a lot of large language models or deep learning, we traditionally consider as AI, it requires a lot of information, but at that stage of encoding, it doesn't require as much information. And you can use a smaller number of samples, smaller number of individuals. And what Gerry was saying about the variation between individuals, the process of encoding that we use, it's called d wave, actually enables it to not take away that variance. So you're sort of normalizing. So there is a recognition, there are differences between people, but it actually, through the process of what we use the machine learning for, normalizes that process to try and make it a bit more averaged out, so to speak.
And so what's your specific protocol? So Jerry had the people who are in these studies listen to hours and hours of podcasts inside of an fMRI machine. What do you guys do?
What was used there for that protocol was reading text, you know, not requiring listening, but reading it and then monitoring the waves. And that was actually. That's a very, you know, it's probably reading it out loud or. Yeah, reading it out loud, and actually, you know, being able to then, you know, record the eeg signals coming. And how much text was it? Hours and hours. Yeah, I think the total. And this is what I was saying, that the volume that is used to there seems to be quite different. I think we're talking about 68 hours over all participants. And how many participants?
There was about, I think, 29. Yeah. And also the research group university also works on collecting its own data as well, just to correlate and to also grow the data. But this is the thing that we find is the biggest challenge. Getting reams of data and being able to do it accurately for the reasons that were mentioned about people wandering off and not doing what they're supposed to, that is a challenge. So we're trying to restrict the amount of data required and still be accurate.
And so how well do you do in this approach? And do you have any examples that we might take a look at?
Yeah, I think we've got a couple of examples to show of someone actually thinking about a sentence, and then it. You know, so just thinking about a sentence, and they got the hat on, they have the hack on, and then you decode what they're thinking. Let's have a look. Yes, I'd like a bowl of chicken soup, please. Yes, a bowl of beef soup. So chicken went to beef. Is that. Yeah, a good meat eater. For me, it would have been tofu. The fairy tale was filled with magical adventures in a charming enchanted forest. The fairy tale had lots of magic, eh, journey forest magic. It's solid and affecting and exactly as thought provoking as it should be. It's believable, is what provoking as the sounds.
Be curious. Right. I mean, so, again, some of the words are spot on, some are not. And so, again, how do you measure how well you're doing? Do you just go, do you like Jerry, go for gist? Or you're actually going for word for word and seeing how you do? I think it's, in a sense, similar to what Jerry was describing. We do a qualitative analysis, but there are quantitative methods as well. So, qualitatively, does this look the same as the sentence? But we've got to look at two very important things. Firstly, there is a process to actually cut the words up. The fact that that's possible from coarse EEG signals from a cap is in itself quite a huge feat. Right. So the fact that we've got words in that in, well, I'd say reasonable order is something. The thing that I didn't mention was afterwards, once we've done the encoding to actually get the waves according to individual words, then we use a large language model to actually then take the tokens from that whole array of signals and then try and interpret them. That's what you see there.
So it's a pre trained model. That means it's not trained on any sequences that this individual has thought about. It's actually taking a very small subset of things that have been trained on the self supervised learning and the feature extraction that we've done, and it's just playing out like you would to chat GPT. But instead of typing into a keyboard, you're actually taking the thought tokens into the chat GPT. Now, that's particularly interesting, because, Jerry, you were noting that when you train on a given individual, if you then try to use that training on somebody else who wasn't trained, it doesn't work right because it's very specific, the signal to the person that you're training on. But, Michael, when you're now talking about a certain consistency across people and using, I mean, large language models, that's the most average kind of data set you could imagine. It's averaging over the entire Internet, right over the entire corpus of human output. I mean, that's how you build these large language models. So you guys seem in some sense, at far ends of a spectrum. And so is that a matter of where you both are currently in the work, or do we imagine that one day you won't need to train specifically on individuals and you'll just have the decoder, the encoder, and it will simply, simply work? Jerry, I mean, what's your thoughts on that?
Yeah, I think right now, at least for fMRI, we're not observing that. But I think one of my big takeaways from working on language decoding is that we never know, like, what kinds of information can be decoded, and, like, the situation five years from now can be very different from what we have right now. So I think that's when thinking about, like, the capabilities and also like, the privacy implications, I think that's important to keep in mind.
If I could just add to that, I think that's very, very true. I think we've got to look at what the technology, the algorithms, and how we capture the data will evolve over time. At the moment, we're talking about, you know, in Jerry's case, it's fMRI. In our case, it's eegs, coarser EEG signals. But, you know, what we all hope is that when we're trying to support someone or help someone that would love to have thoughts to speech because they have no other medium, is to give them the most comfortable experience possible. And the technology course, as it may be now, will become more potentially more sophisticated, but still become probably more affordable and more accurate.
So it's the data capture piece. It's how you process the data. How do you actually, you know, get away from this whole business of large language models consuming infinite information? How does the data get structured? I don't think in the short term, we're going to get away from training, although there are pushes into, like I said earlier, self supervised models that don't require training sets, but they are only on very small problems, you know, so. But things will evolve, and we'll need to get to a stage where, you know, at the end of the day, the goal is to help people.
Yeah, well, I remember, you know, when I was a summer student working at IBM back in the 1980s, and they were working on language translation and automatic language recognition. You as an individual, had to train on a very specific microphone and say a whole collection of words before the system had any chance of being able to decode what you were saying. And now, you know, we all know, we take out our phones and we talk to them all the time, and it decodes it immediately. Right. So there is a commonality among people that, at least in that domain, allows that technology to be user agnostic. And one can imagine that the same ultimately will happen here.
But a couple other questions, if you have patience for it. So Jerry's focus is with fMRI, which is really good at spatial resolution, not so good at the temporal resolution, and that seems to capture the gist of ideas. Your EEG or MEG approach has really good temporal resolution, not so much in the spatial resolution, and it seems perhaps better to capture the individual words. What does this tell us about how words infect the brain? I mean, when you encounter a word, what happens inside of our heads? Is the idea spread widely through space? Is that why Jerry's approach is just capturing the gist? Look, I think my personal view is that if we have an understanding of how language models work in the brain to some extent, you know, I think we understand that.
My personal view is that if we could understand the exact. You know, I come from a school with where if. If we understand the biologics better, you know, we will probably make better AI. You know, that that's my perspective. It's not everyone's perspective, but that's. So I think we. There should be some time devoted to actually studying that in a bit more detail, because what we're seeing today is AI. That is literally a model of something that someone thinks is like the brain. It's the most simplest possible model. The way we large language models understand language, it's not understanding language, it's matching patterns.
So I think what would ultimately be the best approach would be for us to be able to, in a simpler way, adopt the complexities of how our brain understands language, how the communication occurs, how thoughts occur, and then being able to properly take the approach of encoding it in AI. But I think we're still quite far away from that. And so where do you see your own work going? I mean, if you were willing to throw at a percent accuracy, I mean, we saw obviously some right, some wrong. Where are you right now?
The state of the art in being able to decode words from thoughts. 30%, 50%, 40%. Where would you say you are?
Well, I think this work has attracted a lot of attention internationally. You know, it's gone to the top conference, the NIPS conference, which is the top neural processing conference in the world. But as you saw, the outcomes seemed quite modest from a qualitative point of view. If I was to put a number on it, which is more difficult to do, I'd say it's above 60%, you know, but that's because, you know, even when you saw the language that was spoken after the thoughts were processed, you know, the audience laughed, you know, because it's coming out as potentially something that is a little bit not correct. You know, it looks a bit foreign to us that's not entirely English in the right sequence. But imagine where we were just a few years ago. There was literally no one being able to read brain thoughts. You know, it's amazing, state of the art, but, you know, we're still got a long way to go, so that's why I'm putting a 60% number on that.
Yeah, but do you see hurdles ahead that right now you don't know how to resolve? Or is it simply a matter of we have faster computers, you know, better eegs or me, then we'll have the better data, and that's all that we need. There's nothing fundamental standing in our way to get to 90% or 95%.
So my view is that the reason AI came to where it is today is because of the convergence of data storage and sophisticated algorithms, you know, and the sophistication in each is grown up. This can be, you know, a micro version of that. You know, we're talking about the way the data is captured. That's going to get better for sure. Even with simple cap type, you know, things. I think what I mentioned at the start that the pipeline for doing this work to actually encode, was actually quite complex. If anyone cares to read the article, it's quite complicated, but actually the elements of it, as I mentioned, were actually interesting and quite intuitive. So it was very clever in the way the pipeline was formed.
If each of those elements were augmented, you would see an augmentation if the data that's captured is a little bit better in quality, it'll augment. At the end of the day, we're using a simple LLM that's publicly available, and if you're not using that and using something a bit more optimized and better, it's going to shoot up. So that's in the short term, but in the long term, I think there needs to be a whole fundamental shift in where AI is to actually make it into the nineties, in my personal opinion. But you think that's conceivable? Absolutely, it's conceivable.
Jerry, how about you? I mean, what do you foresee on the horizon of the work that you're doing? Is there some big hurdle that you've got to surmount, or is it just a matter of plugging away, as Mike was saying, just getting better, capture better data, faster computers, larger data sets? Is that all that you need to get to a significant degree of accuracy in this approach? Yeah, I think there's a bunch of different factors that go into improving our model and taking us to where we want to go. One thing is accuracy, which, like Michael was saying, I think this can come from many different places. This can come from better quality data, more powerful models, new experimental approaches.
I think another important factor is like practicality. So we like, right now, our model is limited by the fact that we use functional MRI. So we are really excited to see decoding done in more portable technologies like EEG. And then finally, I think it's really important to, like, work with the patients, the potential users, in order to design interfaces that best meet their needs. So I think in my mind, those are the three major directions going forward.
Well, yeah, go ahead. I was just gonna mention, it's something that Jerry just said just got me thinking. Apart from working with the people, I think there's also the opportunity to work with clinicians and health practitioners. And I think that's another thing where AI is really taking off at the moment, is AI in health, and it's broad, but this essentially is a crossover of that, because the sort of things you're capturing normally are captured under health type conditions.
So I think that's another dimension to add to the people piece is also the clinical and the health angle. Well, look, it's enormously exciting, and I think the point that you made is the vital one. You know, a handful of years ago, the idea that you could put on a cap and start to read some of the words that someone is thinking, be kind of nutty and the fact that you guys, in various ways, are beginning to do that is really spectacular. Thank you both for joining us for this conversation.
Technology, Science, Innovation, Mind Reading Technology, Neurology, Brain Decoding, World Science Festival