ENSPIRING.ai: Machines of Loving Grace, Entropix, AI and elections, GSM8K
The discussion on the show revolves around the potential of AI in shaping the future, particularly concerning its role in combating infectious diseases and its overall impact on scientific advancement. Maya Merad, Kautar El Magrawie, and Ruben Bonen share their insights, addressing a mix of optimism and skepticism based on the current progress and limitations of AI technology. They weigh in on the practicality and realism of achieving world peace, solving climate change, or drastically accelerating GDP growth purely through AI advancements.
The participants dive into the technical specifics of AI advancements, such as developments in AI samplers, reasoning capabilities of large language models, and the implications of open-source platforms competing with proprietary models. The conversation extends to touch upon the reliability and robustness of AI models, exploring the potential and risks of using AI in real-world applications and industries. There is an exploration of the balance between excitement around AI potentialities and caution due to the existing technological and societal challenges.
Main takeaways from the video:
Please remember to turn on the CC button to view the subtitles.
Key Vocabularies and Common Phrases:
1. adversary [ˈædvərˌsɛri] - (noun) - One's opponent in a conflict or competition. - Synonyms: (antagonist, opponent, rival)
Ruben Bonen, CNE capability lead for adversary services.
2. optimist [ˈɑːptəmɪst] - (noun) - A person who tends to be hopeful and confident about the future. - Synonyms: (idealistic, hopeful, sanguine)
The optimist in me would love to say yes.
3. agenda [əˈdʒɛndə] - (noun) - A plan or list of items to be considered or acted upon. - Synonyms: (program, schedule, plan)
There's definitely an agenda and a social context.
4. collaboration [kəˌlæbəˈreɪʃən] - (noun) - The action of working with someone to produce something. - Synonyms: (partnership, cooperation, teamwork)
It has to be kind of a collaboration between the intelligence and also society and humans.
5. pathogens [ˈpæθədʒənz] - (noun) - Microorganisms that cause disease. - Synonyms: (germs, microbes, viruses)
Viruses evolve faster than algorithms, and the battle between pathogens and progress is far from over.
6. anthropomorphizing [ˈænθrəpəˌmɔrfʌɪzɪŋ] - (verb) - Attributing human characteristics to a god, animal, or object. - Synonyms: (humanizing, personifying, embodying)
Whenever I talk about reasoning, I like to say reasoning between quotation marks, because it's us anthropomorphizing.
7. ecosystem [ˈiːkoʊˌsɪstəm] - (noun) - A biological community of interacting organisms and their physical environment. - Synonyms: (habitat, environment, community)
I think it's encouraging because open source is given all the ingredients for free with approaches that are more accessible to everyone in the ecosystem.
8. inference [ˈɪnfərəns] - (noun) - The process of drawing a conclusion based on evidence and reasoning. - Synonyms: (conclusion, deduction, reasoning)
So yeah, I spent quite some time focusing on LLM inference
9. hype [hʌɪp] - (noun) - Excessive publicity or exaggerated claims. - Synonyms: (publicity, buzz, promotion)
Of course, there is also a hype
10. resampling [riːˈsæmplɪŋ] - (verb) - The method of repeatedly sampling values from observed data, with a general goal of assessing random variability in a statistic. - Synonyms: (remodeling, recalibrating, reanalyzing)
Helping the model decide if it should branch or resample based on future token possibilities.
Machines of Loving Grace, Entropix, AI and elections, GSM8K
We're jumping ahead. It's October 17, 2034. Has AI helped us solve nearly all natural infectious diseases? Maya Merad is a product manager for AI incubation. Maya, welcome to the show. What do you think? Thank you for having me. So, of course, the optimist in me would love to say yes, but I don't know if history has always proven us right. And I think it really depends on how we choose to use this technology. Kautar El Magrawie is a principal research scientist, AI engineering, at the AI hardware center. Kautar, welcome back to the show. Tell us what you think. Thank you, Tim. It's great to be back. Well, AI is making strides in tackling infectious diseases, but it's not a magic bullet. Viruses evolve faster than algorithms, and the battle between pathogens and progress is far from over. So there is a lot more work to be done.
All right, so some skeptics on the call. And finally, last but not least, in joining us for the first time on the show is Ruben Bonen, CNE capability lead for adversary services. Ruben, welcome and let us know what you think. Thanks. Glad to be here. I think we can get there, provided, you know, scaling continues, but I think it's mostly going to be an issue of competing human interests if we do. All right, great. Well, all that and more on today's mixture of experts.
I'm Tim Hwang, and it's Friday, which means that it's time again to take a whirlwind tour of the biggest stories moving artificial intelligence. We'll talk about a hot new sampler that's getting a lot of attention and apple raining on AI's parade. But first, I want to talk about machines of loving Grace, an essay by Dario Amadeh, who's the CEO of anthropic. And he makes some very wild predictions. He says that AI might solve all infectious diseases. It could ten x the rate of scientific discovery. He promises that one, you know, wild but not implausible outcome is 20% gdp growth in the developing world and potentially even world peace. And so I think I just wanted to kind of bring this topic up because the essay has been getting a lot of play and a lot of people have been talking a little bit about it. And I guess, Maya, I'll start with you. I mean, how believable do you think are these visions? And what is more or less believable in what Dario is predicting here?
So Dario definitely paints a picture that we would all love to believe in. But of course, people are going to be skeptical because a technology, which is a tool, can be used in different ways. And currently, the way that we're seeing AI being used is not materializing necessarily in all this optimism that he said, it's a mixed bag of how it's being used. So definitely there were advances in drug discovery, but at the same time, we're seeing articles about the rise of misinformation. So I think the article overemphasizes the positive, and I don't think it also sets in motion what are the prerequisites to get to this positive picture. I think it's going to have to come hand in hand with a lot of social change and not just a technological change. Yeah. So you think the end result of AI is likely to be neutral, if anything else, is that right? I don't think technology is neutral. I think how you put it in motion, there's definitely an agenda and a social context and an economic context behind it, and then that just unleashes it in different directions.
Yeah, for sure. Kaltar, I want to bring you into this discussion, because I know when you responded to that first question, you seemed a little bit more skeptical. I don't know. Do you sort of agree with Maya that this is sort of achievable, ultimately, just kind of thinking this is, I don't know, marketing or over optimism about the technology? I think certainly there are lots of things that we can achieve with AI. Of course, there is also a hype. So in Dario's essay, he explored the potential and also some limitations of AI and how it might shape society as it advances. So one thing I particularly found interesting is how he emphasized the need to rethink AI as a powerful, as powerful AI, and also tap into the potential. But also, there are lots of challenges that I see, and that requires continuous work, continuous progress, continuous algorithms. So, for example, if you look in biology and health, which he wrote a lot about that. So, I mean, we've seen what AI can do a lot of strides, and how it can significantly enhance research in biology and medicine.
But progress is often constrained by the speed of experiments, availability of quality data, regulatory frameworks, like, for example, clinical trials. And despite all of these revolutionary tools, like, for example, alphafold, there is need also to have things like virtual scientists driving not just data analysis, but also the entire scientific process. And I think there's a lot of work to be done there. If you look, for example, if we want to look at it from pragmatic versus long term impacts, in the short term, AI might be limited by existing infrastructure, and societal barriers. However, over time, I hope that these things can be resolved and that the intelligence can create new pathways, for example, reforming the way experiments are conducted and reducing bureaucratic inefficiencies through better system design. So it has to be kind of a collaboration between the intelligence and also society and humans.
And things need to be regulated, because also, like Maya mentioned, there's all these fake news and articles and data. So there is also that danger that we have to be careful about and all those threats. And I think we're going to talk about that later also. So how do we balance all of this so we can push it towards a direction that is productive, that is helping us, and not. That's not in a direction that can impede our progress or create issues? Yeah, for sure. Kaltar, I think one question I want to ask you before I move on to Rubin, because I think he'll have some real interesting angles on this, because he works on the many ways these systems can break or be used in not so great for not so great purposes.
You work a lot on hardware, and I think part of Daario's dream is the idea that eventually these systems will be able to control physical robotics out here in the real world, and that will be just this huge boost to the effect this technology has. Do you buy that? Are we close to that kind of world where it's really easy to kind of instrument these models to kind of control real world systems? Are we still pretty far away from that, you think? I think we're making progress towards that. There's a lot of work right now on making the hardware, the infrastructure, more efficient, and sustainability is a big part of that, because right now, we're hitting the physical limits, the limits of physics, and there's a lot of work needed to create these chips that are capable of acting in resource constrained devices, especially with this huge compute needs that AI keeps driving forwards with things like large language models. And so the computational needs are just growing.
And now that if you want to also do things like reasoning, and so it's going to be kind of an arms race, more is required algorithmically. But at the compute side, there is a lot of innovations that needs to be done at the semiconductor level, at the physical, the material science, all of that, to create these chips that are capable of handling this huge demand while still doing it in a sustainable way and a cost effective way to character support. This subject was not addressed in this article at all. I think it was overly optimistic that, yeah, AI will solve climate change, but in developing AI we're actually missing a lot of sustainability targets that companies have set, and that was not at all addressed. If I want to use it to solve climate change, I don't want to have data centers that are also emitting tons of carbon and consuming tons of energy to solve that problem.
Ruben, maybe I'll bring you in, because I think as a security guy, I mean, my friends who are security people look at this kind of essay and they're like, this is ridiculous, right? Like this technology is largely going to be used for like, you know, bad purposes or, you know, these systems will be so vulnerable that they'll never actually achieve kind of the full potential. How do you size up these claims? I guess as a security expert, like, you know, do you, do you sort of buy into the optimistic vision here or are you more skeptical? I am an optimist personally. Yeah. But like I mentioned in our introduction as well, I think, you know, the technological achievements are one thing, but then how do people with competing interests manage the outcomes of those achievements? I think is something else.
For example, like in the article, they talk about, or he talks about sort of authoritarian regimes and how AI systems clearly have applications to restrict what people can do, how they can think and manage all of that. And I think we can already see some of those dynamics at play currently in the west and the east. We've sort of diverged on AI development paths. And I think those things are going to continue as we get closer to those more powerful systems. Also, I think, for example, with medical advancements, I don't want to make any proclamations if what he says is possible or not. I don't think I'm a subject matter expert in that area, but it will depend then as well, if companies are willing to make those advancements available to people who may not be able to afford them currently and how that distribution is made among the population.
And then finally, what I want to mention also is that we talked a little bit about disinformation already and we'll talk about that later, I think. But one thing he didn't mention in the article is education, which is something I'm personally very hopeful for, that more free access to information and high quality AI assisted education is going to be a big uplift for a lot of people and I think will also help this sort of making our society sort of more democratic and more accepting of these technologies. Because I think a lot of times when there is some conflict, it's also because people don't have sort of the same basis to understand, like the facts, for example, with vaccine, anti, anti vaccination campaigns and things like that. So I think it's a complex picture. So I'm going to move us on to our next topic.
One of the things I've been watching most carefully in watching the kind of x Twitter chatter on AI is a bunch of hype around this repo called Entropics. Effectively, the story behind it is that it's an AI researcher that has introduced a sampler that effectively attempts to replicate some of the cool chain of thought features, in effect, that we saw for the OpenAI zero one release just a few, a few weeks back. And I guess, Maya, I'll turn to you because you're going to have to help me out here a little bit. Is what is a sampler anyways? And why should we care? I love this question.
So yeah, I spent quite some time focusing on LLM inference. So when we talk about AI, we mostly mean large language models. What a large language model does is given the start of a sentence, so a few words, it would predict what is the next word. So if I say on the table, there is a automatically in your head, there's a few probable words that pop up, there is a book, there might be a glass of water, et cetera. So the model does something similar. There's a statistical representation of all possible words that could come next, and then there's a probability attributed to each word, to the book, to the glass, etcetera.
And all of these probabilities are based on the data it has seen in the past. So these models are injected a lot of data and then based on what it's seen in the past, it kind of says, most logically, this is the next word that's going to come next. So what a sampler does is it determines, given x amount of words that the model has seen. What should the model output next? And the sampling technique that's most widely used today is called greedy. And by greedy we mean just outputting the token or the word that has the highest statistical probability. So I hope I answered your question on what is sampling.
I think this paper is really interesting and takes advantage of additional information that we can get out of large language models and out of the additional metadata that we have. So I think it's an interesting paper and, yeah, happy to understand more about other people's thoughts on it. Yeah, for sure. And I guess maybe Katar, I'll throw it to you, is I think one of the most interesting bits about it is it introduces a new sampler and I think the promise of it. I think one of the reason why people are so excited about it is, oh, it really seems to boost the performance of these models against all these different types of tasks.
And I think the other interesting thing is that it seems to kind of replicate in part, as I mentioned a little bit earlier, what OpenAI kind of touted as its special sauce for its new great model. And I guess I'm sort of sitting here thinking, well, OpenAI seems like the Goliath in the space because they can do all these crazy cool new algorithmic changes or improvements on their model. But do you think that the existence of something like entropics means that open source will almost be getting as good as fast as these proprietary models and what these proprietary companies can do? It almost seems like maybe there actually is no special sauce because some random researcher can just launch this repo that seems to do maybe something close to what these big companies can do.
Yeah, I totally agree with that. And actually I love what entropy is doing. I think they are having an innovative approach here that reflects also this fast moving evolution of open source AI community, where new methods like these adaptive sampling are explored without requiring massive computational resources, which is key here, but also demonstrating also the collaborative and experimental nature of the field. We can explore more, you know, in open source and kind of mimic or even, even exceed, you know, what the secret sauce of these big companies are doing. So, of course, entropics aims to replicate some of the unique features associated with OpenAI models, particularly in the reasoning capabilities. And they have this interesting ways of experimenting with entropy based and VaR, they call us also VaR entropy sampling techniques, which kind of tries to reflect the uncertainty in the model's next step, or examines also the surrounding token landscape and helping the model decide if it should branch or resample based on future token possibilities.
Really interesting approach. And I think at the end of the day, open source is going to kind of catch up with what's happening, a lot of innovation happening there. And we see that not just in these algorithmic things, but even with efforts like Triton, for example, on the, on the GPU hardware or the accelerator side, there is a lot of work also happening in open source to kind of go to CuDA free or, you know, and you will see a lot of these things, for example, in the vlms or where what's happening in open source is kind of on par with some of the secret sauce that proprietary companies are doing in the space of AI across all the stacks. What I think is also interesting is open source is giving kind of all the ingredients for free and with approaches that are more accessible to everyone in the field.
So to explain my point, what OpenAI did with OAN is take a big frontier model, do a lots of reinforcement learning in order to train it on how to do chain of thought reasoning at scale. What this open source repo did is take an open source model, Lama 3.1, and bypassed all this reinforcement learning that OpenAI did and take advantage of an innovation or this additional information that you get at inference level. So like Kautar said, the model has ways of telling us that it's uncertain of the next token to predict. So for certain situations, you could see with high probability it's gonna be this word. But there might be forks in the road wherever. Lots of different options are equally probable. So taking advantage of this sort of information, you could do a lot about it.
In this repo, they propose to do chain of thought or start from scratch. But I'm actually quite interested in uncertainty quantification as a means of giving information and tools for people to use these models in different ways. So if the model could tell you the answer is uncertain, you could use that to build different systems to output something different. So I think the choice could be different than what this repo does. But I do think it's an interesting research direction. Yeah, and I think that's such an interesting subtlety here, is that it's not just replicating the end result, but this engineer seems to have basically found a way to do it a lot cheaper.
Basically, it's just like we just edit the sampler rather than having to do this completely complex reinforcement learning process. This is also encouraging deeper reasoning through token control at inference time. So it's kind of paving the way also opening difference. Like Maya also mentioned this, figuring out ways, how do we do these, sampling these selections. This adds a much deeper and incorporating other information about the uncertainty of the model, about also the future predictions that you can do about the model to do the right next steps.
So I think this emerged as a joke in the last episode, but I'm thinking about turning it into a bit for mixture of experts, which is we got to talk about agents on every single episode. It's just part of what we need to do, I guess. Maya, in particular, you offer a question when we were talking about this episode before we were doing the recording about kind of the relationship between these types of uncertainty systems and kind of like getting more agentic behaviors out of these models. Do you want to talk a little bit more about that, because I think that relationship is really interesting and it's not maybe entirely clear. I think for some folks who are not as deep on it as you are, first of all, any model can be any model of a certain size, and that responds well to chain of thought, kind of step by step thinking.
With thinking between quotation marks can be turned agentic. Now, how well and how good that will perform is up to the inherent model and its, its performance on various benchmarks. And then we're going to be talking about benchmarks in an upcoming session. What is interesting about this new innovation? So taking advantage about information about uncertainty, I think this could be really interesting in the context of agentic systems, because you can basically stop an agent in its tracks if it's uncertain of the next step. And I think in the agent world, we're facing a lot of problems with reliability.
And actually, users are over trusting the agent's performance because it looks like it's performing in a way that is human relatable. So it's thought step by step. There is a plan. The plan at the high level seems reasonable. Actually, catching hallucinations in an agentic approach is harder than just text in and text out.
So I think this is a uncertainty quantification is a tool that I think would be really important to bring agentic systems to the next level. And I see it being used in multiple ways, stopping an agent in its track, maybe based on the repo that we've seen, maybe just starting again or starting a new chain of thought workflow. So we're at the very beginning of this, but this is something that on my team, we've been discussing as well, a really interesting research direction to integrate into our work. I think it kind of goes line in line with what the agentic approaches is doing, because what entropy is doing, it's introducing this entropy based sampling. And with the entropy technique, they're assessing future token distributions.
And this is what also agentic system. The behavior here requires foresight and planning and mimicking human like flexibility and dynamic and adaptive decision making. So I think they're kind of go hand in hand here. And there's a lot here that can be learned from, you know, each way from the agentic systems. You know, they could incorporate those techniques to have this human like flexibility and foresight, or vice versa. I think it's exciting, as the other two panelists mentioned, that there is this real push in open source, which I don't know how well we can quantify if it's catching up to frontier models or the efforts that those companies are doing.
But I think that's great that this is happening in the public. Yeah, for sure. And I think basically to what Maya said earlier, I think we will see more of the pattern that we see here, which is it's possible that open source may be very clever about solving the problem in a much more resource constrained way, which actually may keep it ahead of the proprietary models, and they're kind of much more expensive approach to some of these problems. So definitely another dynamic that we'll be returning to in future episodes.
So I'm going to turn us to our next topic. Apple released a paper that was of some controversy recently, and I was joking a little bit earlier in the intro that they kind of are raining on the AI parade. Effectively, what they did is they took a benchmark called gSmak, which contains a variety of mathematical questions. And what they did is they said, okay, we're going to do this. We're going to make some quick variations to this benchmark and create a new benchmark, which we call GSM symbolic.
And these changes are very, very small and subtle and don't really change the substantive nature of the mathematical problem. So you can imagine kind of like a grade school question about, you know, John having ten apples and you need to subtract three apples and add four apples. And kind of what they're doing is they're saying, okay, well, rather than John, we'll talk about Sally, and rather than apples, we'll talk about pears, and maybe rather than ten apples, the person will have twelve apples. And what they find is that these really kind of small changes can create some pretty significant drops in performance of the models against these benchmarks. So on one level we know this, which is that there's a bunch of overfitting on benchmarks, and people are always kind of like gaming the benchmarks and models look better against these benchmarks.
But this is also kind of worrisome. And maybe, Ruben, I'll toss it back to you. Right. Because it sort of suggests that maybe these models reasoning is actually nowhere near as strong as we think they are, that they appear to be. I don't know if you buy that conclusion. Yeah, I mean, I think it makes.
Okay, first of all, like, it makes sense that people want to benchmark models that get released. And so I think there is an incentive for companies to also do well on those benchmarks because otherwise people are going to say, oh, okay, this model isn't appreciably better than it was before. And obviously public data will end up in training data for these models. So I think that makes sense. But when I looked at the figures in the paper, I thought, or I saw, like, they have different sort of tests that they ran the models through. One is, like you mentioned, they changed the names and I maybe the figures or the objects, and there was a drop, I think, between 0.3 and 9% or something like that. But looking at the more frontier models, I think the drop was not really that large, in my opinion.
I think for GPT 40, it was only 0.3% or something like that. And then they had some other harder benchmarks where they added and removed conditions to the statements, or even added multiple conditions where there were much larger drops, like, I think, up to 40%. For Romini, I think I would have to look at the paper to get the exact figure. It was up to, like 65.7% in one of the worst. Yeah. And so I think even for sort of what we consider the frontier models, you have a lot of drop there.
But then, you know, when we have been talking about reasoning and chain of thought, I think you saw that the zero one benchmarks dropped by substantially less. It was still a lot. It was like 17% or something. So I'm not really sure how to feel about the results of this paper or what they mean or if this is a problem that will get resolved over time as reasoning gets better in these types of models or not. Yeah, for sure. And Kaltar, you just chimed in there. I don't know if you've got views on this paper and whether or not it's, you know, I guess.
Ruben, you made it sound. I don't want to put words in your mouth as kind of like, meh, big whoop. Right? Like, we kind of know that these models have lower performance when you change the benchmarks, and even then, the effect doesn't seem that big. And so maybe not too much to worry about. I don't know if Katari feel the same way. I think some of the results were surprising to me.
And this work from these Apple researchers, it kind of provided the very critical evaluation of the reasoning capabilities of Laura's language models. From what I saw, they're kind of exposing that LLMs reliance, especially on pattern matching, is big right now rather than really true reasoning. So because I don't think that the LLMs are really engaged in formal reasoning, but instead the use sophisticated pattern recognition. And this approach, of course, is very brittle and prone to failure with these minor changes that they have exposed.
So, for example, if you look at the GSM symbolic test performance. So they created, you know, the variations like Ruben mentioned, but with the, you know, and what they're seeing, you know, these drops sometimes can be very big if they just include irrelevant things to the problem. The reasoning should stay the same. But if you just say, oh, you know, these apples, for example, some of them are smaller than others, which is not doing anything, you know, to the reason in itself, it's just additional irrelevant information. But, you know, the llama was taking that and actually was taking the smaller apples and used that in the calculation.
So. And another thing that they expose is the variations, the inconsistent results across the runs. So they showed very high variance between different runs with the same models, which highlights also the inconsistency. Even slight changes in the problem structure resulted in accuracy drops up to 65% in certain cases. So I think what the key highlight here is the LLMs. They try to mimic reasoning, but mostly relying on data patterns. But their capability to perform consistent logical reasoning is still limited. And the findings also suggest that current benchmarks may overestimate the reasoning capabilities of LLMs. And I think we need improved evaluation methods to really go to the capabilities of LLMs, especially with respect to reasoning.
I love this new benchmark that Apple put up. I know we've been on previous podcast sessions where we talked about all the issues with benchmarks, so I think this is a great step in the right direction in order to force more generalizable insights based on benchmarks. I also think for me, it was really predictable that this was going to happen. Whenever I talk about reasoning, I like to say reasoning between quotation marks, because it's us anthropomorphizing what we're seeing coming out of LLMs. And like Kautar set, they're doing pattern matching. So it's pattern matching at scale.
They showed the model patterns it hasn't seen before. So you could update the models, lodge the model's training with some new patterns and can infer, can maybe unlock new use cases based on that. That's great. So it's a technology, it's an imperfect technology, but it can do useful things. I don't think we're in a world where this current technology can do logical reasoning. It's just pattern matching at scale.
And I think we have to accept it for what it is. And when we're thinking about making these systems useful, I think we're always going to be in a scenario where there's going to be a human in the loop or on the loop. We need to have ways of surfacing whether there's high confidence or low confidence in the LLM's trajectory. So I think we have to use these tools and use this knowledge that it is an imperfect technology to make it more robust. And there's a lot of papers that say taking this sort of technology with humans can increase the overall robustness of the system if we factor in a human as part of the system. And I think we should accept that as opposed to thinking.
I think with the current technology we have, we're on a pathway to what is called AGI. Yeah, for sure. And I'd love to make that with like, very concrete, with maybe a last question to Ruben. You know, right now, we've talked about this in previous episodes. There's a lot of excitement about, say, using AIH to harden computer networks as a complement to cybersecurity as a form of cybersecurity defense. I guess on the framework that Maya just laid out, it is an interesting question. Is cybersecurity a pattern matching question or is it a reasoning question?
Because I guess it would suggest here that if a lot of what we're doing in cybersecurity defense is just pattern matching, okay, maybe the technology really has some very strong legs here, but if something more is needed, there's actually some really interesting questions about whether or not it's kind of fit for purpose. Just a final thought. I'm curious about whether or not you agree with that framing. Yeah, I mean, security is vast and complex domain, and then in some cases there are like, reasoning is very important, but in other cases, it's all about data collection, correlating those data sources and summarizing. And I think for many years already, there has been use of sort of traditional machine learning in endpoint detection and response solutions to great effect, by the way.
Just want to say that. And then now with generative AI, there's a lot of push to integrate that also sort of into the backend where those events are correlated and maybe synthesized in a way that people had to do manually previously and sort of speed up those processes. But humans are definitely involved there. They have to be to evaluate those events. But, yeah, I think it's going to be big for our industry. So we're going to end today's episode on more of a maybe stress inducing segment.
As you know, there's a big election coming up in the US and a big election coming up around the world. And OpenAI did a disclosure recently where they observed that they're seeing state actors increasingly try to leverage AI for election interference. And this involves using models for generating fake articles and fake social media content and other sort of persuasive tactics, which I think is a really interesting development that finally the technology is becoming mature enough that your sort of enterprising election interfere really wants to leverage this technology into the field.
And I guess, Ruben, I'll start with you because you think a lot about security and vulnerability in these types of systems. What do you think about this? Is it kind of an issue that we're going to just be able to solve at some point? Is it going to get worse or better over time? I guess one of the really interesting things I'm trying to think about is what's the trajectory of these types of trends? Do we just live in this world now or is this a temporary thing? No, I think we just live in this world. Okay. This is my hot take.
Yeah. I think obviously AI has a lot of implications. I think what I would categorize this as is social engineering. And there are many varieties of that. There might be persuasive messaging, it might be persuasive generated images or videos. That's one category which I think where I think the risks are more immediately evident. There is another category where malicious actors are using AI to speed up their malicious attacks where I think that is much less mature at this point.
But when I was going through OpenAI's report on this, and I think it's great that they're being proactive and working with industry partners, I guess, to sort of combat these threats as they appear. It must be very new to them as well. My sort of conclusion was that they found that there was limited effect from what they saw. And I think the most effective post was sort of a hoax post about an account on X wherever I it replied a message that said, oh, you haven't paid your OpenAI bill.
But they said in the report that this wasn't actually generated by the API. So I think the impacts might still be limited, but we may also be biased in that assessment because we're obviously looking only at threats between quotation mark that we detected and stopped. So it wouldn't surprise me that there are actually much more successful influence campaigns in social media where we don't detect that because they are not behaving in a way that's sort of out of the ordinary, or they're using self hosted open source models to generate that so we don't have as much telemetry on what they're doing and things like that. That's a little paranoia inducing. Thank you.
I think that's where we are. Maya, any thoughts on this? I mean, I guess the obvious question is, is there anything we can do to fix this? Or is this pretty much just like, you know, we're doomed to live in a world of, you know, fake AI influence operations all the time now? Yeah, I think it's just the state of the world, unfortunately. So there are bad actors. When social media came about, everyone was really exciting because it brought us all closer together. It felt that we were all part of one big global community.
But for bad actors, this means better scale, bigger reach, and I think that's the same thing with AI. I think the world is moving very fast, and I do wonder about the ability of our societies and the people who are putting their brainpower towards solving these issues, about their ability to catch up with what's going on. I think already in the school system, I think we're already in a state that the school and educational system hasn't caught up to, a post AI world. And I wonder if, in the field of keeping information factful and how our society is organized, whether we'll be able to get there.
I do think it should be a concurgeant effort, and I think more global focus and public spending should be focused on these issues because we need more resources to catch up to where the technology is taking us. I want to quickly jump in as well and say, I think, again, I'm coming back to competing incentives here. I think a lot of times it's not clear to me that social media platforms have the correct incentives to say, okay, actually we could deploy our own AI systems to do like sentiment analysis and see which posts are promoting misinformation or giving harmful information to people, or are clearly part of some network that is generating similar messages. Because if those messages generate a lot of interaction, that might be good for those platforms.
There is a problem with misaligned incentives sometimes, I think, which is getting in our way as well. Yeah, I think that is actually a really important question. It's not just what the technology can detect, but is it actually being implemented and used, and what reasons do people have to actually do that? Koutar, do you want to round us out on this one with a final comment? I'm curious about how you think about these issues. And yeah, if you think we're doomed.
Yeah, this is actually, for me, it's a scary state. Of course, as the technology gets better, more sophisticated, especially Gen AI, these threats are also going to get more sophisticated and more clever in how they can reach massive, massive masses and then, you know, try to do harm. So of course, you know, to mitigate the misuse of AI models like those reported by OpenAI, there is a lot that needs to be done. Things like rope as AI detection tools. How do we develop and deploy tools that detect AI generated content?
And also that we ensure, you know, fake materials, regulation and oversight. Governments and tech companies need to work together, need to collaborate, to set clear guidelines and policies for AI use and transparency. And also I think user education is very important. Public awareness about AI generated misinformation to help people critically evaluate online content, not just everything you see in a website or the Internet or is something that you have to believe. So you have to critically see the content and maybe figure out other sources. Is this really true or not?
And also, I think partnership across, across industry, cooperation to share insights and prevent misuse. I think increasing awareness about this is really important. I mean, OpenAI did some things, clever ways to at least identify some 20 operations that they said for AI, for content creation, that they kind of halted and stopped, that are focused on election related misinformation.
So I think we need more of those. But again, like, you know, Maya and Ruben said, this is the world we live in, so, and it's going to be an arms race. As the technology gets better, the threat's going to get more sophisticated. And again, I want to say, when I read OpenAI's reports, I find that the cases they highlight I would label as low sophistication across the different use cases or some properties of those campaigns that they detected. I wonder, with really good engineering efforts, if there could be campaigns that it's not easy or possible to detect that they are happening. I think this problem is just going to get. Yeah.
Especially if they use proprietary models that outside the scope of OpenAI and other frontier models. Yeah. What is the like. Yeah, that's a really intriguing outcome that I haven't really considered is what's the evil OpenAI, right. Like, is there an evil Sam Altmande that's running a criminal foundation model? Presumably, yes. I think that's definitely something that exists.
So you always know what you're going to get when you tune into mixture of experts. We've gone from solving all infectious diseases and 20% GDP growth to sinister, invisible influence operations controlling you as we speak. So from the very good to the very bad of AI, you'll always get it on mixture of experts. Kouter, thanks for joining us. Maya, thanks for coming back on the show and Ruben, we'll hope to have you on again sometime.
Artificial Intelligence, Technology, Innovation, Ai Regulation, Open Source Ai, Ai Misinformation, Ibm Technology
Comments ()