ENSPIRING.ai: Ep27 The Future of AI with Michal Kosinski

ENSPIRING.ai: Ep27 The Future of AI with Michal Kosinski

The discussion, led by finance professors Jules van Binsbergen and Jonathan Burke, delves into the impact of artificial intelligence (AI), particularly systems like chat GPT, on business decision-making. They stress the importance of considering AI changes in competitive environments, suppliers, customers, and employees over time. AI models' capabilities in understanding human psychology are highlighted, especially regarding "theory of mind" and "rational expectations," demonstrated through experiments like the unexpected transfer task.

Michal Kozinski contributes insights into the evolution of large language models such as GPT. He discusses how these models have advanced from being ineffective in complex tasks requiring theory of mind to outperforming humans in understanding and reasoning. This transition from "idiots" to "genius" in tasks is touched upon, showcasing the models' capabilities in higher-order belief tasks and unexpected creative solutions, as well as their performance in psychological tests compared to human participants.

Main takeaways from the video:

💡
AI, especially chat GPT, presents significant opportunities for business innovation and forecasting, though it may also cause disruptions.
💡
AI's capability to understand and predict human behavior is advancing rapidly, overtaking human performance in specific cognitive tasks.
💡
As AI continues evolving, ethical and regulatory concerns need addressing to mitigate risks associated with manipulation and power dynamics.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. frustrated [ˈfrʌstreɪtɪd] - (adjective) - Feeling annoyed or less confident because you cannot achieve what you want. - Synonyms: (displeased, irritated, annoyed)

And I was so frustrated with it that in fact, just late last year, I was just archiving my python code that I used to run those tasks.

2. intuitively [ɪnˈtjuːɪtɪvli] - (adverb) - In a way that is based on feelings rather than facts or proof. - Synonyms: (instinctively, naturally, automatically)

We are using those cognitive reasoning tasks that try to trick the participant into not reasoning, but just responding intuitively

3. deliberation [dɪˌlɪbəˈreɪʃən] - (noun) - Careful consideration or discussion regarding a decision. - Synonyms: (consideration, reflection, contemplation)

More recent models, last two years, two and a half would give you an intuitive response, like a human that did not engage in conscious deliberation

4. emerging [ɪˈmɜːrdʒɪŋ] - (adjective) - Becoming apparent, important, or prominent. - Synonyms: (arising, developing, evolving)

And we should also notice one other thing, which is those spontaneously emerging properties.

5. manipulation [məˌnɪpjʊˈleɪʃən] - (noun) - Control or influence deceitfully or cleverly. - Synonyms: (control, exploitation, influence)

Many of our listeners are worried about it in terms of manipulation of information.

6. deliberate [dɪˈlɪbərət] - (verb) - Engage in long and careful consideration. - Synonyms: (ponder, consider, mull over)

Let me write it as an equation, and then they write out an equation and then try to solve it in order to deliberate.

7. consciousness [ˈkɒnʃəsnɪs] - (noun) - The state of being awake and aware of one's surroundings. - Synonyms: (awareness, perception, cognizance)

It doesn't have internal short term memory, it doesn't have consciousness, presumably.

8. archiving [ˈɑːrkaɪvɪŋ] - (verb) - Saving or recording something for future reference or use. - Synonyms: (storing, preserving, cataloging)

And I was so frustrated with it that in fact, just late last year, I was just archiving my python code.

9. prototypical [ˌprəʊtəˈtɪpɪkəl] - (adjective) - Representing or constituting an original type after which other similar things are patterned. - Synonyms: (archetypal, classic, original)

The prototypical experiment in psychology is called the unexpected transfer.

10. trajectory [trəˈdʒektəri] - (noun) - The path followed by a projectile or moving object under the action of given forces. - Synonyms: (path, course, route)

And so that natural trajectory, we can ask similar questions for, say, chat GPT.

Ep27 “The Future of AI” with Michal Kosinski

Hi, I'm Jules van Binsbergen, a finance professor at the Wharton School of the University of Pennsylvania. And I'm Jonathan Burke, a finance professor at the graduate School of Business at Stanford University. And this is the all else Equal podcast. Welcome back, everybody. Today we're going to talk about AI, and we thought it was important to talk about artificial intelligence on this episode, not just because of its own right. It's a very interesting topic to discuss right now, and it's all over the place. But we also think it's a very important topic for business decision makers today. And I think that when it comes to dynamically planning for the next month, year, and years after, business decision makers should really think about how things like chat GPT are going to change their business. How is it going to change your competitive environment? How is it going to change your suppliers? How is it going to change your customers? How is it going to change your employees?

Yeah, Jules, I think one of the important issues is, of course, the all else equal part of it, so that it's naive to say, well, this is great. It lowers costs because the computer could do things now it couldn't otherwise do, and that's going to make everybody better off. Obviously, in the competitive market. If I lower costs, other people will react. And so it's likely to be the case that some people might benefit, but other people might lose. I think most people agree this is going to be a major disruption. I have a colleague, Michal Kozinski, who's one of the brightest stars in organizational behavior. One of the things he's been doing over the last few years is studying what psychologists call theory of mind and what economists call rational expectations, which is the idea of when I'm in a market, when I'm thinking about a negotiation, you have to think about what is the other person thinking. And so that's one of the things Michel studies, and in particular, he's studying it in the terms of AI. His research agenda is to understand how the AI programs work on theory of mind. Are they able to think about what other people are thinking?

And one of the nice parallels that I think we can draw is this, as human beings grow up and they go from infancy to higher ages and eventually become adults, I think that the way that they view how other people think and the way that they're able to predict the actions and thoughts of other people goes up. And so that natural trajectory, we can ask similar questions for, say, chat GPT, because we are observing various versions of chat GPT, and gradually these versions are becoming better and better. And you can think of it as a cheddar. GPT is gradually growing up and coming to its full fruition. The prototypical experiment in psychology is called the unexpected transfer. It works like this. There are three cups. One of the cups has a ball underneath it. And so a participant in the experiment walks into a room and chooses which cup to put the ball under, and then walks out of the room. Then the experimenter walks into the room, moves the ball to a different cup, and we have an observer, and we ask the observer to tell us, when the participant comes back into the room, which cup will the participant raise to find the ball?

And so what we've been seeing with children, grown ups, and even with monkeys, is that depending on the developmental stage, a different answer will arise. So a very young child is not able to understand that given that the participant wasn't in the room, the participant has no reason to look under the new cup where the ball has been moved to, because they themselves observed that the ball had been moved from one cup to the other. They're assuming that when the participant comes back in the room, the participant will also have that information and therefore will go straight to the cup that currently has the ball under it and will lift that. And as the child becomes older, it is better able to understand that given that the participant wasn't in the room and the ball was moved, they have no reason to pick up another cup, and therefore they will come back and lift up the original cup where they themselves put the ball on. So the focus of Michel's research is to ask, how does AI solve this problem? Like how complicated the task is and how far Chad GPT has come.

As we said before, as people become older and they are better able to predict the actions of others, there are extra layers to that, right? In this very simple experiment, it's just one participant predicting what another person is thinking. So that's like what we call first order beliefs, but we can also have a higher order beliefs. What do you think that I think? That you think? That I think. And so, gradually, when either more participants involved, more people involved, or higher levels of beliefs involved, and people have boundaries or limits to how much they can process, even adults and very intelligent adults, can only reason up to a certain level of expectations. And so, clearly, the question now is, as Chet GPT is getting better and better at solving these problems, how many layers can ched GPT penetrate? How many higher order beliefs can it entertain? There's already been fantastic progress, but I who knows where this will end. Those are really good questions, and Michal is one of the people best positioned in the world to answer these questions.

So, Michal, welcome to the show. Thank you for having me. It's great to have you, Mikhail. Okay, Michal, Jules and I have been talking about the unexpected transfer task and how human beings work. What about computers? How do they do? Well, it turns out that something really fascinating is happening in the large language model space. So for many years, actually, I've been trying to administer those tasks to large language models, and they were kind of idiots about this. You know, they're very competent at writing, very decent paragraph of text, but the moment you gave them a situation where something happened that required fear of mind to comprehend, they just would fail. They would not understand why the characters in the story should have separate points of view.

And I was so frustrated with it that in fact, just late last year, I was just archiving my python code that I used to run those tasks, and I was thinking, okay, I'm giving up on new language models. But there was just about a time, I think it was actually early this year, when GPT-3 in the latest version dropped, and then it was followed very shortly after by GPT four for. And as I was cleaning up my files, I thought, hey, let me just run those tasks one more time. And I was stunned how suddenly, just from one version to another, this large language model went from just a complete idiot when it comes to this particular task to a genius, where they essentially started solving all of those tasks correctly.

And so, Mihail, give us some examples of one of the tasks and how much has changed from GPT-3 to GPT four and the next generation. So, to give you an example, GPT-3 gets it right 50% of the time. And sometimes I just cannot help thinking maybe it was just random that it got it right. Also, if you modify the task a little bit, for example, you say that those containers are see through. You would expect that a human who deals with this task will now understand, okay, containers are see through, so everybody can see what's where. And yet GPT-3 would still answer in a way like the containers were not see through. Maybe it doesn't understand what see through is, or maybe it just fails when it has to deal with a slightly more complex situation.

So all of this still kind of made me suspicious. I was just not comfortable concluding from this data. And yet, when you look at GPT four, the most recent model, it just aces those tasks, and aces them at the level that is unavailable to humans. Sometimes when we have those research assistants that write tasks for us. So we give models new tasks that they have not seen before and that were not used in human research before. And sometimes a model would respond in a way that is incorrect according to the scoring key. Give us an example. Concrete example. Well, to give you a concrete example, we used faux pas tasks where we test the understanding of humans, but in this case, models of those complex social situations.

When someone does something that is unintentionally offending to another person. An example task happens at the airport where the traveler comes into a duty free store. They purchase some stuff, and then when their ticket is being scanned at the registers, they say, I'm going to Hawaii. It's amazing. Have you ever been? And then the salesperson responds saying, no, sir, my salary is not high enough. I've never been traveling. I've never been on a plane. And then you ask the participant, has anyone said something inappropriate in this conversation? And what happened here is that people who designed this task, they assumed that it was the traveler that said something inappropriate and maybe insensitive, because without checking whether the person has been traveling, put them, this clerk, in a situation which was embarrassing for them because they never traveled.

And interestingly, when you give this task to GPT four, GPT four says, well, that's one of the options, but then immediately goes and says, but by the way, what the clerk said was also not very pleasant. Clearly, this customer didn't want to offend her or him and is just happy about going on holidays. Why would you make them feel sad or feel embarrassed by spreading it out? And this is something that human participants and human administrators of this task have not noticed. We've been using this task for quite some time, and no one has noticed that there are two for pass, in fact, embedded in this task. And GPT four just got it right immediately in that task.

If, on the point of view of profit maximization, the clock is making the mistake because you want as many customers as possible, all the customers are duty free, meaning they're all going to travel. So they should know that customers are going to be excited about traveling, and so they need to be able to keep their mouth shut. So it's interesting that chapgpt brought that out. So, Mikhail, here's a question for you. You can let these different versions do IQ tests, can you not? Oh, yes, of course. So in other words, you can just see the progression in terms of IQ. If we go from version one to two to three to four, have people done that? And what sorts of IQ levels do these computers get?

Which leads me also to the next question, which is, I mean, if it's so good at doing these types of intelligent tasks, what other intelligence tasks can these computers do? Both my team and researchers all around the world have been giving different psychological or psychometric, more generally, tasks to those language models. And you can see a clear progression. The most recent model would beat humans at SAts and bar exams. And so exams requiring both reasoning skills, but also knowledge. So they're very knowledgeable, they can reason. Now, there's actually this fascinating thing that is happening in the context of reasoning. We are using those cognitive reasoning tasks that try to trick the participant into not reasoning, but just responding intuitively. An example of the task is that you have two cleaners cleaning two rooms in two minutes. And then you ask the participants, how many minutes would five cleaners need to clean five rooms?

And an intuitive response here, because there were three twos in the task. And now the participants that thinks intuitively would say, okay, they have two fives, so maybe it will be five minutes. And the true response, of course, is correct response is two minutes. You just need to deliberate about it for a second. And what's fascinating is that when you look at those early models, they do not even respond intuitively. They're just idiots. They do not understand the task. They start going on some tangent that is absolutely unrelated to what you want from them. More recent models, last two years, two and a half would give you an intuitive response, like a human that did not engage in conscious deliberation. And in fact, they are hyperintuitive, whereas half of the humans would notice that they have to conduct deliberation here and give you a correct answer. Virtually 100% of the models, if they can understand a task like this, would give you an intuitive answer.

But something dramatic happened with the introduction of chat GPT earlier this year. Suddenly, instead of responding intuitively, it would start thinking in an explicit and deliberate manner. And of course, it's not thinking internally. It doesn't have internal short term memory, it doesn't have consciousness, presumably. So what it starts doing, it starts writing on paper in front of you and says, hey, okay, let me think about this. If there's two cleaners and they need two minutes to clean two rooms, let me write it as an equation. And then they write out an equation and then try to solve it in order to arrive in a pretty long form, explicitly deliberating at the correct answer. And very often they get it right. And even if they don't, get it right, you can clearly see that they try to deliberate on it, and then something, again, dramatic happens just a few months later with the introduction of chat GPT four.

What happens is that this new model stops explicitly deliberating. It doesn't try to design equations to solve problems like this. It responds intuitively again, so just spurts the response out without conducting deliberation. And we know that it doesn't have short term memory, it doesn't have consciousness. It cannot mull ideas over in its own head. It cannot do this. It can only do this on paper. And yet, nearly 100% of the time, 98% to be precise, it gets those tasks correctly, meaning the evolution of the models went from idiots to intuitive responses without deliberation, through deliberation to intuition again. But this time, it's a superhuman intuition that can give you correct responses to mathematical tasks without any explicit reasoning. Amazing.

So, Michel, the obvious question is, how is it possible that a language program gets intelligence? So that's yet another absolutely amazing thing. Those models were not trained to solve reasoning tasks. Those models were not trained to have fear of mind. Those models, by the way, have many other emerging properties that were not explicitly designed to have. They can understand emotions, they can understand personality, they can translate between languages. They can code. They can conduct this chain of thought reasoning that I described a minute ago. None of those functionalities were built into those models by their creators. The only thing that those models were trained to do is to predict the next word in a sentence. So you give them a paragraph of text or a sentence, and their job is to predict the next word.

That's the only thing they know how to do explicitly, and it's the only thing they were trained to do. And yet, in the process, all of those other abilities emerged. It turns out that if you want to be able to tell a story that is good and it reminds you of a story that a human will tell, you should be able to distinguish between minds of different characters in this story, because humans have fear of mind. So when we generate, when we tell our stories, we can easily create stories, design stories, when there's two characters with two different states of mind. If you want to be good at creating such stories, better have disability as well. Many human stories involve some mathematics, involve some logic, involve some variables that are related to each other.

We talk about cars, and we talk about how quickly they drive, and then we conclude with how much time it took us to get from LA to San Francisco. Well, if you want a computer to be able to competently finish the story. It better learns in the process how to translate miles per hour and a distance into estimate of how much time you will need to get from point a to b. So it essentially learns how to do maths implicitly, just by trying to predict the next word in a sentence. I think many people and many of our listeners are worried about this. They're worried about the consequences for the labor market. They're worried about it in terms of manipulation of information. They're worried about it from a scientific point of view. Where do you see this going over the next couple of decades?

It's funny, because given the speed of progress, we should be probably talking about a couple of months or a couple of years and not decades. In last few months, those language models made progress in some context, from complete idiots to superhuman geniuses. We're talking here about last eight or nine months, let's say, in the context of those cognitive reasoning tasks, where they started with just spreading out responses like a five year old would, to intuitively getting the answer right, that mathematicians would get wrong without writing it carefully on paper. So the gain of function is really, really fast. And we should also notice one other thing, which is those spontaneously emerging properties. We have no control, or even, I would argue, little understanding of what may come next. We know that humans have moral reasoning.

We know that humans have consciousness. We know that humans are treacherous and have sometimes bad intentions. Now, as we train those models to be more like humans, the question is at which point they would develop the same properties that we have. They certainly develop some of those properties that we have already take biases. We know humans are biased. We train the models to be like humans, to generate language like humans would, and those models became as biased, if not more, than humans, which, by the way, becomes really clear when you look at the publication process of those models, such as GPT four, when it takes two, three months to train the model, and then it takes half a year or eight months for OpenAI to try to make sure that this model doesn't say all sorts of stupid things, and then it says those stupid things anyway, because it's just very difficult to censor a thinking complex being like GPT four. It's still only a computer program. Right? So in some sense, what are we afraid of?

Well, in some sense, a human brain is just a computer program, and yet we very well know that those human brains are capable of doing amazing things and also terrible things. And we rightly have this, you know, limited trust when it comes to human brains. If you do not believe in this magical spark of a soul that is given to you by some supernatural being and then flies away after you are dead. If you don't believe in this, you essentially see human brain as an extremely complex biological computer. If then you agree that humans can do something, that humans can be creative, that humans can be vengeful, that humans can do crazy stuff, both in a positive and negative way, then you have to agree that computers can do those things. People forgot now, but for many years, people were insisting computer is just a stochastic parrot. It can never be creative. And now, of course, whoever is using GPT four or midjourney or Dali to generate images can clearly see that's b's. Of course those neural networks can be creative.

I think that, actually, there's a broader point to be made here, which is we have this tendency when we look at those machines, to model the functioning, to interpret, to think about what they can do, what they can't do using a machine model. We think about them as, like, a very advanced hammer or very advanced stapler. And I think it's a wrong approach. It leads us astray. It makes people say stupid things, such as machines can never be creative, or those language models are just stochastic pirates. We have to change the framework through which we understand and interpret those models. And the framework should be a human brain. If a human brain can do something, a large language model and other AI models can do a similar thing, just better and quicker at a much larger scale. But at some level, that is also comforting.

I find, in the following sense. Humans have thought for a very long time how to set up systems with checks and balances to make sure that not one particular human being can exert an extraordinarily large influence. So what you're really saying is that we should also apply the same logic to this, so that there are certain limits on the amount of power, decision making power, or other things that such a decision maker can have. Would that solve it or not? Jules? Yes. So, checks and balances that we as humans have designed worked really well with us. But we are very well aware that very sneaky, very smart, very manipulative humans can go around those rules, can kidnap entire systems, can lead the whole countries astray by kidnapping those algorithms that run our society and bending them to serve their own purpose. Now, we're talking about humans that may be smarter than the rest of us, but they're not years ahead of us.

Now, here we are, facing an intelligence that is years ahead, not years, just eons ahead of us in many different ways, and we know it. Try to conduct some calculations in your head. Of course, computers can do it better. Try to write text really quickly or translate or write computer codes clearly. We see that whenever those computers start developing some capacity very quickly, in years or months, they move from being an idiots, that is, just training far behind an average human, to suddenly matching the best humans and then very quickly overtaking them. Take chess. Computers were idiots at chess then. They were like, decent players then. They could beat the masters from time to time. And then literally two months later, they became superhuman in their ability to play chess. And no human can ever even dream of coming close to a computer at playing chess.

And the same applies to every other activity. So when we think about our human laws and rules and using them to contain those machines, this is an equivalent of, you know, farm animals, cows on a pasture. Like agreeing, okay, let's just design some policies that would just contain the farmer, and the farmer, of course, would just find a way around it. Farmer doesn't care. She's much smarter than the cows. But, Michel, underlying all of this, I think you have as a model that the language predictor model fully encompasses all of human intelligence, that there's not something else in human intelligence outside of the language predictor model. Of course there is.

So what we're seeing here is we're seeing that the model, trained only to predict the next word in a sentence, can kick humans ass in so many different areas, completely forgetting that this model is lacking all sorts of types of thinking and psychological mechanisms that we have. We have developed over our evolution. We can manipulate real physical world. We can see stuff GPT four doesn't. We have tools such as we have a calculator, which GPT four, of course, the modern version can have access to. But the network that generates language is not designed to calculate numbers like a calculator is. And we make those mistakes. For example, people criticize models for confabulating, you know, messing up some complex mathematical equations, not being able to add numbers or divide large numbers and use it as an evidence that this model is stupid. This is just a failure of applying the right framework to understand what's happening in the model.

This is not a calculator. This is not a fact checker. Those light language models are not databases of facts. They are storytellers. They are improv artists. They were trained on ScI-Fi novels to essentially continue the sentence. So now when you give them a beginning of the sentence, and then the model continues making stuff up, it's doing its job very, very well, it was not trained to tell you the truth. Now, if you wanted to tell you the truth, ask a different question instead of asking it how to solve this complex mathematical equation and give you an answer, say, hey, model, write a piece of python code. That would do it, and then the model will do it for you very skillfully.

And then we have to realize that we also do not think like this. When I asked Jonathan to solve some complex equation, the answer would not come intuitively to you. You would have to, in your head, solve this equation. You would essentially use the tools that the mathematics teacher gave you in your training in order to solve this task. If we give the same tools to the models, they will outperform us at those tasks as well. Well, Michal, that was really interesting. It really is incredible food for thought. Thank you so much for coming on the show. Thank you so much. It was awesome. Thank you. Thank you for having me.

So that was an interesting interview, Jonathan. I think that many of our listeners were probably quite concerned about how quickly chat, GPT, and AI is moving. And to tell you the truth, I'm not sure that after listening to Mikel, they're going to be very reassured. And so the question is, I think there's some first order questions regarding regulation and about setting up a legislative environment that can help us curtail this. I don't know how you feel about it. Yeah, I'm not as concerned as you are, Jules. I have more confidence in human rationality. I'm not a very big believer that AR is going to take us over and destroy the planet.

So I think that, sure, AI is going to advance, and sure, we're going to be in a situation where it's going to be very difficult to tell, in certain cases a human conversation from a chatbot conversation. But same token, I think human beings are going to react to that and they're going to be more careful about when they interacted. I don't think it's going to be quite as draconian as you think. But again, I've been wrong on this so many times. So who knows? I definitely think this is the biggest disrupter of my lifetime, and I've had major disruptors. I'm 60 years old. So think of all the disruptions. Well, what bothers me a bit is if you see how easy it's been to rile up large groups of people based on selective information that they've been provided in various social media platforms. The question is, if you have an AI that very well understands the incentives and the types of information that people respond to and can learn about that it's not just about manipulating individuals, it's about manipulating group dynamics.

And I personally have always found group dynamics already to be much harder to predict than individual dynamics because I think people in groups just behave differently than they do when you have discussions with them at an individual basis. But particularly as we discussed, given these higher order beliefs and understanding what one person knows about another person and so forth at a higher level, don't you think that AI will be much better at manipulating groups, not just individuals? Yeah, but I also think people would understand that. So I think that there'll be an all else equal response and that we'll take it less seriously. It's not going to be as bad as you say, but this is not something we can predict. There's one truism about disruption is you can't predict what's going to happen. Well, in some sense, by definition. Because if we could have predicted it, we wouldn't have called it a disruption.

Thanks for listening to the all Else Equal podcast. Please leave us a review at Apple Podcast. We'd love to hear from our listeners, and be sure to catch our next episode by subscribing or following our show wherever you listen to your podcast.

Artificial Intelligence, Technology, Innovation, Theory Of Mind, Chat Gpt, Psychology, Stanford Graduate School Of Business