The video features an engaging discussion with Andre Karpathy, a pioneering figure in AI, particularly in deep learning and his contributions at OpenAI and Tesla. Karpathy shares insights about his journey from co-founding OpenAI to taking on roles that significantly impacted the field of AI. Highlighting his time with Tesla and his evolving perspectives on the future of AI, he delves into how agi (Artificial General Intelligence) is evolving, comparing it to an operating system's development. He also discusses the current tech ecosystem, emphasizing its operating systems analogy, especially with references to the nascent, burgeoning field of LLMs (Large Language Models), likening them to components of a tech infrastructure poised for future expansion.

Karpathy further shares his thoughts on how the market is ripe for new, independent companies despite the dominance of AI giants like OpenAI. He talks about the importance of maintaining a vibrant, diversified tech ecosystem. The speaker describes a future where different apps and innovations coexist with major platforms, much like the early days of the iPhone. He also touches upon the varying players in the LLM field and how the landscape may narrow down similarly to current operating system divisions between proprietary and open source models. Karpathy discusses AI model training, scaling, and efficiency, highlighting the impact of sheer scale and deep expertise in leveraging modern technological architectures for AI advancements.

Main takeaways from the video:

💡
OpenAI and others have created structures that allow emergent AI businesses to grow, potentially spawning niche markets and unique applications.
💡
The future of AI, particularly agi, involves robust operating systems akin to large, integrated software environments, highlighting the need for continued innovation and collaboration.
💡
Maintaining a vibrant ecosystem is crucial for the broader tech field's health, requiring openness, shared learning, and diverse contributions from both large corporations and startups.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. agi [ ext.pronunciation] - (n.) - Abbreviation for 'Artificial General Intelligence,' which refers to highly autonomous systems that outperform humans at most tasks. - Synonyms: (general AI, strong AI, full AI)

To kick things off, agi, even seven years ago, seemed like an incredibly impossible task to achieve, even in the span of our lifetimes.

2. intimidating [ɪnˈtɪmɪˌdeɪtɪŋ] - (adj.) - Describing something or someone that makes others feel fearful or anxious. - Synonyms: (daunting, formidable, frightening)

To his picture, like a very intimidating

3. niche [niːʃ] - (n.) - A specialized segment of the market for a particular kind of product or service. - Synonyms: (segment, market, specialty)

Now that you're a free independent agent, I want to address the elephant in the room, which is that OpenAI is dominating the ecosystem. And most of our audience here today are founders who are trying to carve out a little niche, praying that OpenAI doesn't take them out overnight.

4. poached [poʊtʃt] - (v.) - Illegally hunt or catch (usually animals); in business, to disturb or take someone from their employer. - Synonyms: (attract, entice, lure)

He co-founded OpenAI back in 2015 and 2017, he was poached by Elon.

5. inspire [ɪnˈspaɪər] - (v.) - To fill someone with the urge or ability to do or feel something, especially to do something creative. - Synonyms: (motivate, encourage, stimulate)

Yes, OpenAI was right there. And this was the first office after, I guess, Greg's apartment, which maybe doesn't count. And so, yeah, we spent maybe two years here, and the chocolate factory was just downstairs, so it always smelled really nice.

6. oligopoly [ˌɒlɪˈɡɒpəli] - (n.) - A state of limited competition, in which a market is shared by a small number of producers or sellers. - Synonyms: (limited competition, market control, monopoly)

How do you foresee the future of the ecosystem playing out? Yeah, so again, I think the open source now, sorry, the operating systems analogy is interesting because we have say, like we have basically an oligopoly of a few proprietary systems like say windows, macOS, et cetera

7. autonomous [ɔˈtɑːnəməs] - (adj.) - (Of a person or entity) capable of operating independently without external control. - Synonyms: (self-governing, independent, self-reliant)

They're quite autonomous, but not fully autonomous, so what does the oversight look like?

8. vibrant [ˈvaɪbrənt] - (adj.) - Full of energy and life. - Synonyms: (lively, dynamic, energetic)

He wants it to be a vibrant place.

9. amplifier [ˈæmplɪˌfaɪər] - (n.) - A device that increases the power of electric signals; something that amplifies thoughts, emotions, etc. - Synonyms: (intensifier, booster, magnifier)

With agi being such a magnifier of power.

10. co-design [kɒˈdɪzaɪn] - (v.) - Design a product or process collaboratively. - Synonyms: (collaborate, joint design, group develop)

Together with the co design of the hardware and how that might evolve.

Making AI accessible with Andrej Karpathy and Stephanie Zhan

I'm thrilled to introduce our next and final speaker, Andre Karpathy. I think Karpathy probably needs no introduction. Most of us have probably watched his YouTube videos at length, but he's renowned for his research in deep learning. He designed the first deep learning class at Stanford, was part of the founding team at OpenAI, led the computer vision team at Tesla, and is now a mystery man again, now that he has just left OpenAI. So we're very lucky to have you here. Andre. You've been such a dream speaker, and so we're excited to have you and Stephanie close out the day. Thank you. Andre's first reaction as we walked up here was, oh, my God. To his picture, like a very intimidating. I don't know what year it was taken, but he's impressed. Okay, amazing. Andre, thank you so much for joining us today, and welcome back. Yeah, thank you.

Fun fact that most people don't actually know how many folks here know where OpenAI's original office was. That's amazing. Nick, I'm going to guess right here. Right here. Right here on the opposite side of our San Francisco office, where actually, many of you guys were just in huddles. So this is fun for us because it brings us back to our roots, back when I first started Sequoia and when Andre first started co founding OpenAI. Andre, in addition to living out the Willy Wonka working atop a chocolate factory dream, what were some of your favorite moments working from here? Yes, OpenAI was right there. And this was the first office after, I guess, Greg's apartment, which maybe doesn't count. And so, yeah, we spent maybe two years here, and the chocolate factory was just downstairs, so it always smelled really nice. And, yeah, I guess the team was 1020 less. And, yeah, we had a few very fun episodes here. One of them was alluded to by Jensen at GTC. That happened just yesterday or two days ago. So Jensen was describing how he brought the first DGX and how he delivered it to OpenAI. So that happened right there. So that's where we all signed it. It's in the room over there.

So Andre needs no introduction, but I wanted to give a little bit of backstory on some of his journey to date. As Sonja had introduced, he was trained by Jeff Hinton and then Fei Fei. His first claim to fame was his deep learning course at Stanford. He co founded OpenAI back in 2015 and 2017, he was poached by Elon. I remember this very, very clearly. For folks who don't remember the context, then Elon had just transitioned through six different autopilot leaders, each of whom lasted six months each. And I remember when Andre took this job, I thought, congratulations and good luck. Not too long after that, he went back to OpenAI and has been there for the last year. Now, unlike all the rest of us, today, he is basking in the ultimate glory of freedom in all time and responsibility. And so we're really excited to see what you have to share today.

A few things that I appreciate the most from Andre are that he is an incredible, fascinating futurist thinker. He is a relentless optimist, and he's a very practical builder. And so I think he'll share some of his insights around that today. To kick things off, agi, even seven years ago, seemed like an incredibly impossible task to achieve, even in the span of our lifetimes. Now it seems within sight. What is your view of the future over the next n years? Yeah, so I think you're right. I think a few years ago, I sort of felt like agi was. It wasn't clear how it was going to happen. It was very sort of academic, and you would, like, think about different approaches, and now I think it's very clear. And there's like a lot of space and everyone is trying to fill it, and so there's a lot of optimization.

And I think, roughly speaking, the way things are happening is everyone is trying to build what I refer to as kind of like this llmos. And basically, I like to think of it as an operating system. You have to get a bunch of peripherals that you plug into this new cpu or something like that. The peripherals are, of course, text, images, audio, and all the modalities. And then you have a cpu, which is the LLM transformer itself. And then it's also connected to all the software 1.0 infrastructure that we've already built up for ourselves. I think everyone is trying to build something like that and then make it available as something that's customizable to all the different nooks and crannies of the economy.

And so I think that's kind of roughly what everyone's trying to build out and what we sort of also heard about earlier today. So I think that's roughly where it's headed, is we can bring up and down these relatively self contained agents that we can give high level tasks to and specialize in various ways. Yeah, I think it's going to be very interesting and exciting, and it's not just one agent, it's many agents. And what does that look like? And if that view of the future is true, how should we all be living our lives differently? I don't know. I guess we have to try to build it, influence it, make sure it's good and, yeah, just try to make sure it turns out well.

So now that you're a free independent agent, I want to address the elephant in the room, which is that OpenAI is dominating the ecosystem. And most of our audience here today are founders who are trying to carve out a little niche, praying that OpenAI doesn't take them out overnight. Where do you think opportunities exist for other players to build new independent companies? Versus what areas do you think OpenAI will continue to dominate even as its ambition grows? Yes. So my high level impression is basically OpenAI is trying to build out this llmos. I think, as we heard earlier today, it's trying to develop this platform on top of which you can position different companies in different verticals.

Now, I think the OS analogy is also really interesting, because when you look at something like windows or something like that, these are also operating systems. They come with a few default apps, like a browser comes with Windows. You can use the edge browser. I think in the same way OpenAI or any of the other companies might come up with a few default apps, quote unquote. But that doesn't mean that you can have different browsers that are running on it, just like you can have different chat agents sort of running on that infrastructure. And so there will be a few default apps, but there will also be potentially a vibrant ecosystem of all kinds of apps that are fine tuned to all the different nooks and carries of the economy. And I really like the analogy of the early iPhone apps and what they looked like, and they were all kind of like jokes.

And it took time for that to develop. And I think absolutely, I'd agree that we're going through the same thing. Right now. People are trying to figure out, what is this thing good at? What is it not good at? How do I work it? How do I program with it? How do I debug it? How do I just actually get it to perform real tasks? And what kind of oversight, because it's quite autonomous, but not fully autonomous. So what does the oversight look like? What does the evaluation look like? There's many things to think through and just to understand, sort of like the psychology of it. And I think that's what's going to take some time to figure out exactly how to work with this infrastructure. So I think we'll see that over the next few years. So the race is on right now with LLMs, OpenAI, anthropic Mistral, llama, Gemini, the whole ecosystem of open source models now a whole long tail of small models.

How do you foresee the future of the ecosystem playing out? Yeah, so again, I think the open source now, sorry, the operating systems analogy is interesting because we have say, like we have basically an oligopoly of a few proprietary systems like say windows, macOS, et cetera. And then we also have Linux, and Linux has an infinity of distributions. And so I think maybe it's going to look something like that. I also think we have to be careful with the naming because a lot of the ones that you listed, like Lama Mistral and so on, I wouldn't actually say they're open source. It's like tossing over a binary for an operating system. You can work with it, it's useful, but it's not fully useful.

There are a number of what I would say is fully open source LLMs. So there's Pythia models, LLM 360, allmo, et cetera. And they're fully releasing the entire infrastructure that's required to compile the operating system, to train the model from the data, to gather the data, et cetera. And so when you're just given a binary, it's much better of course because you can fine tune the model, which is useful, but also I think it's subtle, but you can't fully fine tune the model because the more you fine tune the model, the more it's going to start regressing on everything else. And so what you actually really want to do, for example, if you want to add capability and not regress the other capabilities, you may want to train on some kind of like a mixture of the previous data set distribution and the new data set distribution, because you don't want to regress the old distribution, you want to add knowledge. And if you're just given the weights, you can't do that actually. You need the training loop, you need the dataset, et cetera.

So you are actually constrained in how you can work with these models. And again, I think it's definitely helpful, but I think we need slightly better language for it almost. So there's open weights models, open source models and then proprietary models, I guess. And that might be the ecosystem. And yeah, probably it's gonna look very similar to the ones that we have today, and hopefully you'll continue to help build some of that out. So I'd love to address the other elephant in the room, which is scale simplistically, it seems like scale is all that matters. Scale of data, scale of compute. And therefore, the large research labs, large tech giants, have an immense advantage today.

What is your view of that? And is that all that matters? And if not, what else does? So I would say scale is definitely number one. I do think there are details there to get right, and I think a lot also goes into the data set preparation and so on, making it very good and clean, et cetera. That matters a lot. These are all sort of like compute efficiency gains that you can get. So there's the data, the algorithms, and then of course, the, um, the training of the model and making it really large. So I think scale will be the primary determining factor, is like the first principal component of things, for sure. Uh, but there are many, many of the other things, uh, that, um, that you need to get right. So it's almost like the scale sets some kind of a speed limit, almost. Uh, but you do need some of the other things. But it's like, if you don't have the scale, then you fundamentally just can't train some of these massive models. If you are going to be training models, uh, if you're just going to be doing fine tuning and so on, then I think, um, maybe less scale is necessary. But we haven't really seen that just yet fully play out.

And can you share more about some of the ingredients that you think also matter, maybe lower in priority behind scale? Yeah, so the first thing I think is like, you can't just train these models if you're just given the money and the scale. It's actually still really hard to build these models. And part of it is that the infrastructure is still so new and it's still being developed and not quite there. But training these models at scale is extremely difficult and is a very complicated distributed optimization problem. And there's actually, the talent for this is fairly scarce right now, and it just basically turns into this insane thing running on tens of thousands of GPU's. All of them are failing at random at different points in time.

And so instrumenting that and getting that to work is actually an extremely difficult challenge. GPU's were not intended for 10,000 GPU workloads until very recently. I think a lot of the infrastructure is creaking under that pressure, and we need to work through that. But right now, if you're just giving someone a ton of money or a ton of scale or GPU's, it's not obvious to me that they can just produce one of these models, which is why it's not just about scale. You actually need a ton of expertise, both on the infrastructure side, the algorithm side, and then the data side, and being careful with that. So I think those are the major components. The ecosystem is moving so quickly, even some of the challenges we thought existed a year ago are being solved more and more today.

Hallucinations, context windows, multimodal capabilities, inference, getting better, faster, cheaper. What are the LLM research challenges today that keep you up at night? What do you think are immediate enough problems, but also solvable problems that we can continue to go after? So I would say on the algorithm side, one thing I'm thinking about quite a bit is this distinct split between diffusion models and autoregressive models. They're both ways of presenting probability distributions, and it just turns out that different modalities are apparently a good fit for one of the two. I think that there's probably some space to unify them or to connect them in some way, and also get some best of both worlds, or figure out how we can get a hybrid architecture and so on.

So it's just odd to me that we have two separate points in the space of models, and they're both extremely good. And it just feels wrong to me that there's nothing in between. So I think we'll see that sort of carved out, and I think there are interesting problems there. And then the other thing that maybe I would point to is there's still like a massive gap in just the energetic efficiency of running all this stuff. So my brain is 20 watts, roughly.

Jensen was just talking at GTC about the massive supercomputers that they're going to build, building. Now these are, the numbers are in mega megawatts, right? And so maybe you don't need all that to run like a brain. I don't know how much you need exactly, but I think it's safe to say we're probably off by a factor of 1000 to like a million somewhere there, in terms of like the efficiency of running these, these models. And I think part of it is just because the computers we've designed, of course, are just like not a good fit for this workload. And I think part Nvidia GPU's are like a good step in that direction. In terms of like the.

You need extremely high parallelism. We don't actually care about sequential computation that is sort of like data dependent. And in some way we just have these, we just need to like blast the same algorithm across many different sort of array elements or something you can think about it that way. So I would say number one is just adapting the computer architecture to the new data workflows. Number two is like pushing on a few things that we're currently seeing improvements on. So number one, maybe is precision. We're seeing precision come down from what originally was like 64 bit for double.

We're now down to, I don't know what it is, four, five, six, or even 1.58, depending on which papers you read. And so I think precision is one big lever of getting a handle on this. And then second one, of course, is sparsity. So that's also like another big delta, I would say, like, your brain is not always fully activated. And so sparsity, I think, is another big lever. But then the last lever, I also feel like just the von Neumann architecture of computers and how they built, where you're shuttling data in and out and doing a ton of data, movement between memory and the cores are doing all the compute. This is all broken as well, and it's not how your brain works. And that's why it's so efficient.

And so I think it should be a very exciting time in computer architecture. I'm not a computer architect, but I think it seems like we're off by a factor of 1000 to a million, something like that. And there should be really exciting sort of innovations there that bring that down. I think there are at least a few builders in the audience working on this problem. Okay, switching gears a little bit, you've worked alongside many of the greats of our generation. Sam, Greg from OpenAI and the rest of the OpenAI team. Elon Musk, who here knows the joke about the rowing team?

The american team versus the japanese team. Okay, great. So this will be a good one. Elon shared this at a last base camp, and I think it reflects a lot of his philosophy around how he builds cultures and teams. So you have two teams. The japanese team has four rowers and one steerer, and the american team has four steerers and one rower. And can anyone guess when the american team loses? What do they do? Shout it out? Exactly. They fire the rower. Elon shared this example, I think, as a reflection of how he thinks about hiring the right people, building the right people, building the right teams at the right ratio. From working so closely with folks like these incredible leaders, what have you learned? Yeah, so I would say definitely.

Elon runs his companies in extremely unique style. I don't actually think that people appreciate how unique it is. You sort of, like, even read about it and so much you don't understand it. I think it's, like, even hard to describe. I don't even know where to start, but it's like a very unique, different thing. I like to say that he runs the biggest startups, and I think it's just, I don't even know basically how to describe it. It almost feels like it's a longer sort of thing that I have to think through. But, well, number one is like, so he likes very small, strong, highly technical teams.

So that's number one. So I would say at companies by default, they sort of, like, the teams grow and they get large. Elan was always, like a force against growth. I would have to work and expand efforts to hire people. I would have to, like, basically plead to hire people. And then the other thing is that big companies, usually you want, it's really hard to get rid of low performers. And I think Elon is very friendly to, by default, getting rid of low performance. So I actually had to fight for people to keep them on the team because he would, by default, want to remove people.

And so that's one thing. So keep a small, strong, highly technical team, no middle management. That is kind of like non technical, for sure. So that's number one. Number two is kind of like the vibes of how this is how everything runs and how it feels. When he sort of, like, walks into the office, he wants it to be a vibrant place. People are walking around, they're pacing around, they're working on exciting stuff. They're charting something. They're coding. He doesn't like stagnation.

He doesn't like to look for it to look that way. He doesn't like large meetings. He always encourages people to leave meetings if they're not being useful. So actually, you do see this, or I, you know, it's a large meeting, and if you're not contributing and you're not learning, just walk out. And this is, like, fully encouraged. And I think this is something that you don't normally see. So I think, like, vibes is like a second big lever that I think he really instills culturally. Maybe part of that also is like, I think a lot of big companies, they like pamper employees.

I think, like, there's much less of that. It's like the, the culture of it is you're there to do your best technical work, and there's the intensity and, and so on. And I think maybe the last one that is very unique and very interesting and very strange is just how connected he is to the team. So usually a CEO of a company is like a remote person, five layers up, who talks to their vps, who talk to their reports and directors, and eventually you talk to your manager. It's not how you ask companies, right? Like, he will come to the office, he will talk to the engineers. Many of the meetings that we had were like, okay, 50 people in the room with Elon, and he talks directly to the engineers. He doesn't want to talk just to the vps and the directors.

So normally people would spend like 99% of the time maybe talking to the vps. He spends maybe 50% of the time, and he just wants to talk to the engineers. So if the team is small and strong, then engineers and the code are the source of truth. And so they have the source of truth, not some manager. And he wants to talk to them to understand the actual state of things and what should be done to improve it. So I would say, like, the degree to which he's connected with the team and not something remote is also unique and also just like his large hammer and his willingness to exercise it within the organization. So maybe if he talks to the engineers and they bring up that, what's blocking you? Ok, I don't have enough GPU's to run my thing. And he's like, oh, ok. And if he hears that twice, he's going to be like, ok, this is a problem. So what is our timeline? And when you don't have satisfying answers, he's like, ok, I want to talk to the person in charge of the GPU cluster. And someone dials the phone and he's just like, ok, double the cluster right now. Let's have a meeting tomorrow. From now on, sending daily updates until the cluster is twice the size. Then they push back and they're like, ok, well, we have this procurement set up, we have this timeline, and Nvidia says that we don't have enough GPU, GPU's, and it will take six months or something.

And then you get a rise of an eyebrow, and then he's like, okay, I want to talk to Jensen. And then he just kind of like removes bottlenecks. So I think the extent to which he's extremely involved and removes bottlenecks and applies his hammer, I think is also, like, not appreciated. So I think there's like a lot of these kinds of aspects that are very unique, I would say, and very interesting. And honestly, like, going to a normal company outside of that is you definitely miss aspects of that. And so I think, yeah, that's, maybe that's a long rent, but that's just kind of like, I don't think I hit all the points, but it is very unique thing and it's very interesting and, yeah, I guess that's my rent. Hopefully tactics that most people here can employ.

Taking a step back. You've helped build some of the most generational companies. You've also been such a key enabler for many people, many of whom are in the audience today getting into the field of AI. Knowing you, what you care most about is democratizing access to AI education tools, helping create more equality in the whole ecosystem at large. There are many more winners. As you think about the next chapter in your life, what gives you the most meaning? Yeah, I think you've described it in the right way. Where my brain goes by default is like, you know, I've worked for a few companies, but I think, like, ultimately I care not about any one specific company. I care a lot more about the ecosystem.

I want the ecosystem to be healthy. I want it to be thriving. I want it to be like a coral reef of a lot of cool, exciting startups and all the nooks and crannies of the economy. And I want the whole thing to be like this boiling soup of cool stuff. Genuinely, Andre dreams about coral reefs, you know, I want it to be like a cool place. And I think, yeah, that's why I love startups and I love companies and I want there to be a vibrant ecosystem of them. And by default, I would say a bit more hesitant about kind of like, you know, like five mega corpse kind of like taking over, especially with agi being such a magnifier of power. I would be kind of worried about what that could look like and so on. So I have to think that through more. But, yeah, I love the ecosystem and I want it to be healthy and vibrant.

Amazing. We'd love to have some questions from the audience. Yes, Brian. Hi. Brian Halligan. Would you recommend founders follow Elon's management methods, or is it kind of unique to him and you shouldn't try to copy him? Yeah, I think that's a good question. I think it's up to the DNA of the founder.

You have to have that same kind of a DNA and that some kind of vibe. And I think when you're hiring the team, it's really important that you're making it clear upfront that this is the kind of company that you have. And when people sign up for it, they're very happy to go along with it, actually. But if you change it later. I think people are unhappy with that, and that's very messy. So as long as you do it from the start and you're consistent, I think you can run a company like that and you know, but you know, it has its own like pros and cons as well, and I think so, you know, up to people. But I think it's a consistent model of company building and running. Yes. Alex, hi.

I'm curious if there are any types of model composability that you're really excited about. Maybe other than mixture of experts. I'm not sure what you think about like merge model merges, Franken merges, or any other things to make model development more composable? Yeah, that's a good question. I see papers in this area, but I don't know that anything has really stuck. Maybe the composability, I don't know exactly what you mean, but there's a ton of work on primary efficient training and things like that. I don't know if you would put that in the category of composability in the way I understand it. It's certainly the case that traditional code is very composable and I would say neural nets are a lot more fully connected and less composable by default, but they do compose and confine tune as a part of a whole. So as an example, if you're doing a system that you want to have trash apt in just images or something like that, it's very common that you pre train components and then you plug them in and fine tune maybe through the whole thing as an example.

So there's composability in those aspects where you can pre train small pieces of the cortex outside and compose them later. So through initialization and fine tuning. So I think to some extent it's. So maybe those are my scattered thoughts on it, but I don't know if I have anything very coherent otherwise. Yes, Nick. So we've got these next word prediction things. Do you think there's a path towards building a physicist or a von Neumann type model that has a mental model of physics that's self consistent and can generate new ideas for how do you actually do fusion? How do you get faster than light travel, if it's even possible? Is there any path towards that? Or is it a fundamentally different vector in terms of these AI model developments? I think it's fundamentally different in one aspect. I guess what you're talking about maybe is just capability question, because the current models are just not good enough. I think there are big rocks to be turned here. I think people still haven't really seen what's possible in the space at all. And roughly speaking, I think we've done step one of Alphago.

This is what the team, we've done imitation learning part. There's step two of Alphago, which is the RL, and people haven't done that yet. I think it's going to fundamentally, this is the part that actually made it work and made something superhuman. And so I think there's big rocks in capability to still be turned over here, and the details of that are kind of tricky potentially, but I think we just haven't done step two of Alphago, long story short, and we've just done imitation. And I don't think that people appreciate, for example, number one, how terrible the data collection is for things like JavaScript. Say you have a problem, some prompt is some kind of a mathematical problem. A human comes in and gives the ideal solution to that problem. The problem is that the human psychology is different from the model psychology. What's easy or hard for the human are different to what's easy or hard for the model. And so human kind of fills out some kind of a trace that comes to the solution. But some parts of that are trivial to the model and some parts of that are a massive leap that the model doesn't understand.

And so you're kind of just losing it, and then everything else is polluted by that later. And so fundamentally, what you need is the model needs to practice itself how to solve these problems. It needs to figure out what works for it or does not work for it. Maybe it's not very good at four digit addition, so it's going to fall back and use a calculator, but it needs to learn that for itself based on its own capability and its own knowledge. So that's, number one is like, that's totally broken. I think it's a good initializer, though, for something agent like. And then the other thing is like we're doing reinforcement learning from human feedback, but that's like a super weak form of reinforcement learning. Doesn't even count as reinforcement learning. I think, like, what is the equivalent in Alphago for RLHF? It's like, what is the reward model? It's what I call it, vibe check. Imagine if you wanted to train an Alphago RLHF, it would be giving two people two boards and said, which one do you prefer?

And then you would take those labels and you would train model, and then you would rl against that. Or what are the issues with that, it's like, number one, that's just vibes of the board. That's what you're training against. Number two, if it's a reward model, that's a neural net, then it's very easy to overfit to that reward model for the model you're optimizing over, and it's going to find all these spurious ways of hacking. That massive model is the problem. So Alphago gets around these problems because they have a very clear, objective function. You can rl against it. So RLHF is like nowhere near, I would say, RL, this is like silly. And the other thing is imitation, learning. Super silly. RLHF is nice improvement, but it's still silly. And I think people need to look for better ways of training these models so that it's in the loop with itself and its own psychology. And I think where there will probably be unlocks in that direction.

So it's sort of like graduate school for AI models. It needs to sit in a room with a book and quietly question itself for a decade. Yeah, I think that would be part of it, yes. And I think, like when you are learning stuff and you're going through textbooks, like there is an exercise, you know, there's exercises in the textbook. Where are those? Those are prompts to you to exercise the material right. And when you're learning material, not just like reading left to right, like number one, you're exercising, but maybe you're taking notes, you're rephrasing, reframing, like you're doing a lot of manipulation of this knowledge in a way of you like learning that knowledge, and we haven't seen equivalents of that at all in LLMs.

So it's like super early days, I think. Yes. Uzihe yeah, it's cool to be optimal and practical at the same time. So I wouldn't be asking how would you be aligned, the priority of a either doing cost reduction and revenue generation, or b finding the better quality models with better reasoning capabilities. How would you be aligning that? So maybe I understand the question. I think what I see a lot of people do is they start out with the most capable model that doesn't matter what the cost is. So you use GPT four, you super prompt it, et cetera, you do reg, et cetera. So you're just trying to get your thing to work. So you're going after accuracy first, and then you make concessions later.

You check if you can fall back to 3.5 for certain types of queries. You check if you make it cheaper later, I would say go after performance first, and then you make it cheaper later. It's kind of like the paradigm that I've seen a few people that I talked to about this kind of say works for them. And maybe it's not even just a single prompt. Think about what are the ways in which you can even just make it work at all. Because if you just can make it work at all, like, say you make ten prompts or 20 prompts and you pick the best one and you have some debate or I don't know what kind of a crazy flow you can come up with.

Just get your thing to work really well, because if you have a thing that works really well, then one other thing you can do is you can distill that, right? So you can get a large distribution of possible problem types. You run your super expensive thing on it to get your labels, and then you get a smaller, cheaper thing that you fine tune on it. And so I would say I would always go after sort of get it to work as well as possible, no matter what, first, and then make it cheaper is the thing I would suggest. Hi, Sam. Hi. One question. So this past year we saw a lot of kind of impressive results from open source ecosystem.

I'm curious what your opinion is of how that will continue to keep pace or not keep pace with closed source development as the models continue to improve in scale. Yeah, I think that's a very good question. Yeah, I think that's a very good question. I don't really know. Fundamentally, these models are so capital intensive. One thing that is really interesting is, for example, you have Facebook and meta and so on, who can afford to train these models at scale. But then it's also not part of, it's not the thing that they do and it's not involved. Like their money printer is unrelated to that. And so they have actual incentive to potentially release some of these models so that they empower the ecosystem as a whole so they can actually borrow all the best ideas.

So that, to me makes sense. But so far, I would say they've only just done the open weights model. And so I think they should actually go further. And that's what I would hope to see. And I think it would be better for everyone. And I think potentially maybe there's squeamish about some of the, some of the aspects of it eventually with respect to data and so on. I don't know how to overcome that. Maybe they should try to just find data sources that they think are very easy to use or something like that, and try to constrain themselves to those. So I would say those are kind of our champions, potentially.

And I would like to see more transparency also coming from, and I think meta and Facebook are doing pretty well. They release papers, they published a logbook, sorry, logbook and so on. So they're doing, I think they're doing well, but they could do much better in terms of fostering the ecosystem. And I think maybe that's coming. We'll see. Peter? Yeah, maybe this is like an obvious answer, given the previous question, but what do you think would make the AI ecosystem cooler and more vibrant? Or what's holding it back? Is it openness, or do you think there's other stuff that is also like a big thing that you'd want to work on?

Yeah, I certainly think one big aspect is just the stuff that's available. I had a tweet recently about, number one, build the thing. Number two, build the ramp. I would say there's a lot of people building a thing. I would say there's a lot less happening of building ramps so that people can actually understand all this stuff. I think we're all new to all of this. We're all trying to understand how it works. We all need to ramp up and collaborate to some extent to even figure out how to use this effectively.

So I would love for people to be a lot more open with respect to what they've learned, how they've trained all this, how what works, what doesn't work for them, et cetera. And yes, just for us to learn a lot more from each other, that's number one. And then number two, I also think there is quite a bit of momentum in the open ecosystems as well. So I think that's already good to see, and maybe there's some opportunities for improvement I talked about already. So. Yeah, last question from the audience, Michael.

To get to the next big performance leap from models, do you think that it's sufficient to modify the transformer architecture with, say, thought tokens or activation beacons, or do we need to throw that out entirely and come up with a new fundamental building block to take us to the next big step forward, or agi? Yeah, I think that's a good question. Well, the first thing I would say is transformer is amazing. It's just so incredible. I don't think I would have seen that coming for sure. For a while before the transformer arrived, I thought there would be insane diversification of neural networks, and that was not the case. It's like complete opposite, actually.

It's all the same model, actually. So it's incredible to me that we have that. I don't know that it's like the final neural network. I think there will definitely be. I would say it's really hard to say that given the history of the field and I've been in it for a while. It's really hard to say that this is the end of it. Absolutely it's not. And I feel very optimistic that someone will be able to find a pretty big change to how we do things today. I would say on the front of the autoregressive or diffusion, which is the modeling and the loss setup, I would say there's definitely some fruit there, probably.

But also on the transformer and like I mentioned, these levers of precision and sparsity, and as we drive that and together with the co design of the hardware and how that might evolve and just making network architectures that are a lot more sort of well tuned to those constraints and how all that works to some extent. Also, I would say like Transformer is kind of designed for the GPU, by the way. Like that was the biggest leap, I would say, in the transformer paper, and that's where they were coming from, is we want an architecture that is fundamentally extremely parallelizable because the recurrent neural network has sequential dependencies. Terrible for GPU. Transformer basically broke that through the attention and this was like the major insight there. And it has some predecessors of insights like the neural GPU and other papers at Google that are thinking about this. But that is a way of targeting the algorithm to the hardware that you have available. So I would say that's kind of like in that same spirit, but long story short, like, I think it's very likely we'll see changes to it still, but it's been proven like remarkably resilient. I have to say.

It came out many years ago now, like, I don't know. Yeah, six, seven. Yeah. So, you know, like the original transformer and what we're using today are like nothing super different. Yeah. As a parting message to all the founders and builders in the audience, what advice would you give them as they dedicate the rest of their lives to helping shape the future of AI? So, yeah, I don't usually have crazy generic advice. I think maybe the thing that's top of my mind is I think founders, of course, care a lot about their startup. I also want like, how do we have a vibrant ecosystem of startups? How do startups continue to win, especially with respect to, like, big tech and how do we, how's the, how does the ecosystem become healthier and what can you do? Sounds like you should become an investor.

Amazing. Thank you so much for joining us, Andre, for this and also for the whole day today.

Artificial Intelligence, Entrepreneurship, Innovation, Openai, Deep Learning, Leadership, Sequoia Capital