ENSPIRING.ai: Founder Eric Steinberger on Magics Counterintuitive Approach to Pursuing AGI
The video features an insightful conversation with Eric Steinberger, founder and CEO of Magic, as he discusses his journey and ambitions in artificial intelligence (AI). Eric shares his early passion for AI, influenced by a desire to find something meaningful and predictable to dedicate his life to. He discusses his fast-paced learning experience, including his collaboration with renowned researcher Noam Brown while still in high school. This unique journey highlights the importance of passion and determination in the field of AI research.
Eric opens up about his ambitious projects and the challenges he faced. These include his experience starting a nonprofit focused on climate change and subsequently redirecting his attention back to AI as AGI (Artificial General Intelligence) appeared to be closer than he originally anticipated. He elaborates on the need for AGI and the potential it holds to solve problems that require intensive computation. Eric also explains how he approaches AI research by synthesizing existing ideas and pushing for leaps in innovation.
Main takeaways from the video:
Please remember to turn on the CC button to view the subtitles.
Key Vocabularies and Common Phrases:
1. inference [ˈɪnfərəns] - (noun) - The process of deriving logical conclusions from premises known or assumed to be true. - Synonyms: (deduction, conclusion, reasoning)
Thing that remains to be solved is general domain long horizon reliability, and I think you need inference time.
2. theorem [ˈθɪərəm] - (noun) - A proposition in mathematics that is proven to be true based on previously established statements. - Synonyms: (proposition, statement, assertion)
When you try to prove a new theorem in math, or when you're writing a large software program...
3. wunderkind [ˈvʊn-dər-kɪnt] - (noun) - A child prodigy; someone who achieves great success or shows great talent at a young age. - Synonyms: (prodigy, genius, whiz-kid)
You're a Vienna born wunderkind whose early passion for math turned into a...
4. obsession [əbˈsɛʃən] - (noun) - An idea or thought that continually preoccupies or intrudes on a person's mind. - Synonyms: (fixation, mania, passion)
It took him, like, three months to write this paper, and it took me a couple years, but, yeah, I mean, slightly. I'm sure he, like, would have come up with this the next day, but, like, the sort of the gap between the two things, but it was just. Yeah, it was. obsession is the right word.
5. meticulous [məˈtɪkjələs] - (adjective) - Showing great attention to detail; very careful and precise. - Synonyms: (diligent, thorough, careful)
It's just a very simple proposal, but I think that's what makes it really meticulous.
6. recursively [rɪˈkɜːrsɪvli] - (adverb) - Involving the repeated application of a process or procedure. - Synonyms: (iteratively, repeatedly, dynamically)
The idea is that you could use this to recursively improve alignment, as well as the models themselves, in a way that isn’t bottlenecked by human resources.
7. vertically integrated [ˈvɜːrtɪkli ˈɪntəˌgreɪtɪd] - (adjective) - Combining into one organization all phases of design, production, and distribution tasks. - Synonyms: (unified, consolidated, comprehensive)
Out of all the companies that are trying to build an AI software engineer, you are probably the only one that is really taking a vertically integrated approach.
8. compounding [kəmˈpaʊndɪŋ] - (noun) - The process of growing or increasing exponentially over time. - Synonyms: (accumulating, increasing, augmenting)
He is fantastic at picking the right problems and then spending a long time just grinding to make it better and better and better and better. So he's very good at the whole, like, compounding thing in research.
9. intrinsic [ɪnˈtrɪnzɪk] - (adjective) - Belonging naturally; essential. - Synonyms: (inherent, innate, fundamental)
I am not one of the intrinsic curiosity type of people.
10. proactive [ˌproʊˈæktɪv] - (adjective) - Creating or controlling a situation by causing something to happen rather than responding to it after it has happened. - Synonyms: (dynamic, enterprising, anticipatory)
So I suppose be proactive in seeking, like, the best people in the world to, in a time efficient manner, just distill their brain into yours.
Founder Eric Steinberger on Magics Counterintuitive Approach to Pursuing AGI
Thing that remains to be solved is general domain long horizon reliability, and I think you need inference time. Compute, test and compute for that. When you try to prove a new theorem in math, or when you're writing a large software program, or when you're writing an essay of reasonable complexity, you usually wouldn't write a token by token. You'd want to think quite hard about some of those tokens and finding ways to spend not one x or two x or ten x, but a million x. The resources on that token in a productive way, I think is really important. That is probably the last big problem.
Eric Steinberger, founder and CEO of Magic, has an epic backstory as a researcher, having caught the attention of Noam Brown and becoming one of his research collaborators while still a student in high school. Eric is known for his exquisite research taste, as well as his big ambition to build an AI software engineer. We're excited to ask Eric about what it takes to build a full stack company in Aihe, his ambitions for magic and what separates a good AI researcher from a legendary one.
Eric, welcome to the show. Thank you so much for joining us. Thank you for having me, Sonia. Okay, so let's start with who's Eric? You're a Vienna born wunderkind whose early passion for math turned into a, I think, what you described as a full fledged obsession with AI by age 14. Take us back to age 14, Eric. What were you up to? How did you become so obsessed with Aihdenhe? Thank you, Sonia. I think I just had my midlife crisis when I was 14, and I just was just looking for something meaningful to do. Spent about a year looking at physics, math, bio, medicine, just anything, really, that seemed valuable to the world, and at some point bumped into just simply the idea of AI. It hadn't sort of occurred to me until then. And if you could just build a, a system, a computer system that could do all this other stuff for me, like, great, I don't have to decide. So it felt like my decision paralysis was sort of resolved, and it was this weird moment where I could just see the next 30 years of my life unfold in front of me and I was like, okay, this is clearly what's going to happen. I have to do this. It was quite nice. I like predictability, so it was great to know what the world will look like.
And you started loving math. Why AI then? I think I'm naturally attracted to math. It's just what my brain sort of gravitates to AI just seems useful. The thing that's most important to me is just what is useful for humanity and the world. And math is nice, but not useful. At some point, 17 dimensional spheres are probably not going to be the best career choice if you want to be useful. It seemed like something that I could get good at, but also just the most important thing ever. And so it was a very clear choice. AI just is like it was clear ten years ago. It's just, it wasn't close and now it's close and clear.
Can you tell the story of how you got to fair? I think it is such an epic story. Sure. I mean, so when I started at 14, I didn't really know how to program. I didn't get into programming out of curiosity about computers. I just wanted to soul of AI, basically. So after a couple years of just warming up on my own, I reached out to one of David Silver's PhD students who stick AlphaGo, DeepMind co founder and this PhD student. I guess at that point he was a graduate and sort of worked that DeepMind. I asked him if he could spend a year with me, just every two weeks, bashing my work, trying to do so sort of like super speed up mini kind of PhD like experience where I could just learn how to do research. And I sent him this, like giant email. You could print it out. I don't know how many pages it would be, but it'd be a lot of pages. I was basically just saying, like, I want to build this algorithm you made in your PhD. Sorry? I want to beat this algorithm you made in your PhD. Here's a list of ten ideas. I don't know if they're going to work, and I think I need your help to figure that out.
And then over a year, we eventually got there and he was like, his name is Johannes. Johannes was kind enough to just bash me every two weeks, roughly. And, yeah, it was brutal, dude. Like. Cause I was like, he hold me to the standard, you know, and I was like, don't be nice just because I'm in school. You were in high school? Yeah, I was in high school. And then when we were done, I just graduated high school when I finished. So this is when I finished the project that I was trying to get to with, with this. And then Noam Brown, who is obviously one of the best RL researchers in the world, reached out because he had worked on something similar, it turns out, and we sort of had some ideas that were very similar and some ideas that were a little different. So we just published this, and he reached out, and then I got to work with Noam Brown for two years, which was great, and then that continued, and so I got bashed for another year.
You were a high schooler? He was Noam Brown. Well, I mean, he published a paper called deep counterfactual regret minimization, and I published a paper called Single Deep counterfactual regret minimization, and mine beat his pie a little bit. And so you won brown as a high schooler. I think I just graduated. And I also. It took him, like, three months to write this paper, and it took me a couple years, but, yeah, I mean, slightly. I'm sure he, like, would have come up with this the next day, but, like, the sort of the gap between the two things, but it was just. Yeah, it was. obsession is the right word. I just. I do things, like, 100%. Yeah. So that was a lot of fun. Kept working in IRL with Noam Brown for a while, and, yeah, so that's how I got to fair. Noam Brown worked at fair at the time, and he reached out. I was at university then and basically just worked part time as a researcher at fair while studying, and anyway, that was it. That's awesome. It was a lot of fun. Noam is great. Like, the brainstorm ping pong sessions with Noam brown. Dude, there's nothing like this, where you would just think there's this problem and it's sort of like, you know, maybe you would, like, start a six month research. No. Noam and I would get on a call and it would just, like, we just discuss it and it's done. That was so great. I love that. I love that.
What makes him so great as a researcher? I think it's a number of things as a researcher, generally more from a meta level. He is fantastic at picking the right problems and then spending a long time just grinding to make it better and better and better and better. So he's very good at the whole, like, compounding thing in research, also making bets that aren't obviously the right bets when he makes them, because he makes them earlier, I suppose, and making mistakes differently. So he's generally very good at picking problems and then attacking them consistently. He's also just very smart. I guess that helps. He works really hard. He used to do 100 hours weeks during his PhD. I don't know if he still does them, but he used to work really, really hard during his PhD. I imagine he still is.
Okay, so Noam arranged for you to become a researcher at fair while you were still a university student. That was in my first semester, I think, or something. So you were juggling that. You were juggling being a collaborator to Noam at fair. And then you became obsessed with yet another problem, climate change. And I actually started an NGo that is incredibly popular, climate science. So you just didn't have enough on your plate. No, it was actually too crazy. That's when I dropped out. I was like, this is crazy, this is too much. I can do two things, I cannot do three. It was like my conclusion after. After doing, like, I did three months of that. That was terrible. That was fucking awful. Doing like, all three things because it just, like, you just can't do well at three things. I mean, Elon can, but maybe I'll learn it in ten years, but I couldn't at the time, so I dropped out at the time.
But yeah, I started NGo. I generally, I just think, like, charity stuff is awesome and hugely under appreciated. Like, it's sort of like super high status to start a startup, but I think it should be equally cool to start a charity. You're like helping the world in other ways. And so, yeah, I mean, we started it as a. It's a nonprofit, but we started it like a startup. People were working insanely hard. We had clear objectives. It was a software product, effectively. It was much more similar to a startup, with the exception that there was no money in, no money out, which is very weird. But yeah, it's mostly volunteer driven, or I guess is I just no longer run it. But yeah, that was an interesting experience. You'd think I would learn transfer a lot from running a quote unquote company to running a quote unquote company now.
But they are so different that I could. There was like no transfer at all between climate science and magic. Like a thousand volunteers, 20 hardcore engineers, no money at all. I can't tell you how much we raised because it's not on us yet, but, you know, giant. It's like completely different in every imaginable way. But, yeah, it was a lot of fun.
So, Eric, climate science became an incredibly successful nonprofit. It wasn't just any non profit. What made you decide to kind of hand over the reins and hand over the torch on that and go start company in AI? I just thought AGI was further away when we started it at all. I would never have started anything else if I thought AGI was so close. And once I realized it is, there was just like no other. My initial thing was always AI. That's what I did as a kid, I care about various issues in the world, but none of them are my unique calling in any way. You know, I hopefully be in a position to donate a bunch of money and whatever, but the thing I care about fundamentally is AGI. And it was like, oh, damn it, this is not 20 years away. So I have been running around with this AGI to do list, which is somewhat of a meme internally, because it's sort of like just going through it, and we're trying to fix all these problems. You have seen it. Yes. We showed it to you. And I've been running around with a version of this. There's actually like a 2017 or so.
I was still in high school. I got. I don't know why, but some conference invited me to present my AGI to the list. It was wrong at the time. I was also sure it was wrong, but at some point. So there was one thing I just couldn't at all figure out. And I don't like blue sky research in the sense of, like, just staring at a wall and trying to, like, figure out what the right question is. I really like to have the question and then look for the right answer when starting an intense project, because you need to know which direction you run in to really plan for it. And many things seemed clear, but it seemed completely unclear how to make these models reason in the general domain. And that became more clear with language models, especially code models. And so, yeah, when I saw just some of these early, some of the early results in this basic, I was like, okay, I know all this stuff from the RL world. I have a bunch of other thoughts. This seems great. We should just take LMs and make them do the URL stuff. It's a very simple proposal, but I think that's where it makes a lot of sense. RL has been doing this for ten years. It works in restricted domains. If you can make something work in 20 restricted domains and you have something else that works in a general domain, if you can combine them, maybe you get both the X and the Y axis. You have your beautiful top right corner of the matrix, and, yeah, so it seemed pursuable. And when something as important as AGI becomes an actually executable to do is obviously there are still things to figure out, details of the algorithms, how do you make it efficient, et cetera, et cetera. It's not like we knew everything at all, many, many things to figure out. But the direction was clear, so it seemed like the right moment.
Okay, we're going to circle back to your AGI to do list later because I'm curious about it. Sure. Yeah. I want to brag about you for minutes because it might be wrong. Still, I don't know until we have AGI. It is a hypothetical AGI to do list, but we're trying. I think the research field is tracking pretty closely to your to do list. I want to brag about you for a minute. I think you've been incredibly humble about your background, but, you know, as a high school student, you did catch Noam Brown's eye. And as you know, as one of his colleagues at fair, you became one of his top collaborators. Not even just one of. One of many, because there's such talented people that work there. But you're one of his top collaborators. And, you know, when I speak to folks that know you, they just say extraordinary things about your capabilities as a researcher, your creativity, your work ethic. As far as I can tell, you work nonstop. I think you texted me at 02:00 a.m. in preparation for this podcast. So I think it's safe. Hope I didn't wake you up. No, no. Thank you, silent month.
Anyways, I think it's safe to say that you are one of the brightest minds of the current research generation already and will certainly be one of the legends that people talk about for the next decade. And so, with that in mind, I'd love to ask you some questions of advice for aspiring researchers. And so maybe, first off, you did it all from a very untraditional background. How did you do it? And do you think that, what advice would you give to others in your shoes? I can only really speak for the sort of profile of goals and person I am. I think I was lucky in the sense that I knew very, very early, with 14, as we said, exactly what I wanted to do with my life. I had no doubt at all. And uncertainty can be paralyzing to a lot of people. I also had a very clear sense that I did not at all have a plan B. There was no other path in life that I would have been even, like, remotely above the neutral line on. It had to be built AGI. Everything else is completely irrelevant. So I understand. For many people, a well paying job at Google is a great achievement. I mean, if it's on AGI, it's fine, but you get what I mean. I just knew that there was nothing else I could do and be fulfilled in life. And look back when I'm 90, I'd be happy. So, in a way, burning the boats very, very early, gives you the opportunity to. To just be like, do things that you'd otherwise do ten years later, which, again, even, like, I sucked at the beginning. It took me two months to understand the first paper. I tried to understand, like, I was terrible at programming for a long time, but when you're a teenager, you're like, a decent researcher. You don't have to be great. That gets you things. Like a great mentor who then bashes you for a year, which was very, very helpful.
And then you get better, and you're still young, so your brain shapes more easily, maybe. I don't know. So I feel like I benefited a lot from being early, but within that, I'd say just go for the end goal immediately. Doing anything sort of like, oh, I'm gonna do a PhD, because I need a PhD to get a job. That's all bullshit. You don't. It's just completely bullshit. The other thing is, writing five page emails to people actually works. Writing, like, I get a lot of these, like, two paragraph, meh things now, and grateful I get emails, but I understand now why people think this stuff doesn't work. It certainly does when you're like, here is how I'm going to beat your algorithm. Please help me. Five pages, at least. In my experience, every single time anyone I want to help from in this way was very helpful. So I suppose be proactive in seeking, like, the best people in the world to, in a time efficient manner, just distill their brain into yours and show them that you can make use of that. If you tell someone who's very good, effectively, hey, I'm going to make good use of this. If you want to coach someone, I would love to be that person. They'll usually do it. They won't do it for ten people, but if they do it for one or two, that's enough. You just have to win that seat, I guess so.
That's been really helpful in my experience, also just not shying away from learning new things. Again. I didn't get into programming because I'm curious about computers. I'm not very curious about computers. I just like AI and that computers are the thing that are necessary. So it's fun. I enjoy programming now. It's great. But I wouldn't have gotten into it, I think, if it wasn't for AI. But still, you get into it, so don't be shy. We interview a lot of people who don't know how you'd implement an LLM, and it's kind of crazy to me. If you're a researcher and you couldn't implement sharding or whatever, it's just insane. So really understanding the whole stack going down to, but sort of not bottom up, really top down. Like, here's the thing I care about. This is the problem I want to solve. Okay, what do I need? What do I need? What do I need? And then all the way down. There are much more competent people at kernel programming and hardware design or whatever than I could ever dream up to be. But I understand enough of it to do better work at the top of the stack than I could if I didn't. So I think fundamentally you need to understand the domain you work in. It's also really good to just read everything I used to read. I don't know, I don't have a precise number, but just every paper I could, every paper I would see, basically. And eventually you get so fast at it that's feasible and you build a database in your head of, this is similar to this thing, but this was sort of my eye opening momentous.
Or Bill Gates has this interview like, oh, yeah, if you learn enough things, they're all similar to each other. So it's not linear, it gets easier. And at that point I was like, I should read every paper. And so thanks for the advice, Bill. Obviously this was like through a video. I never met him, but so I just started reading every paper. And that's really, really helpful because a lot of the best ideas that we had that work really well now at magic were enabled by random things that are like, oh, it would never work without this random thing that I would have to have come up with in tandem. But because I have this database in my head, I can go like, oh, yeah, like this. So often one good idea is enabled by three other ideas that others have come up with. And so it's always just like this composition of stuff. So having a large database is really helpful. Yeah. And then just never stop, never, never stop. It takes like ages to do good stuff, to do good work. And at any point there was actually one moment with Johannes, the deep mind research scientist who mentored me for a year in high school, where we had a version of the algorithm that wasn't very good. It was all right. And we're thinking like, should we publish this? We were both not really happy about it and he was close to giving up on me. It was like, well, maybe this is just not going to work. I wouldn't want to publish this. I was like, dude, fuck you. I'm just going to get this done. And then we got it done a month or two later. I remember going on a walk after this and just being like, can I do this? I don't know if I can do this, but there is no other option. So I just better get it done. And then I went back home and I started programming again. It was still like sad that day, but the next day was fine. Again, it just keep going. So I think you have to. I think that was a pretty formative experience because I actually wasn't sure if I could do it. And then we just did it like super soon after. So I really like haven't felt that insane level of doubt and pressure since then, which has sort of enabled. I think it's actually beneficial. You have to be realistic, but you don't want to if you stop you. Yeah, I mean, so anyway, so I think those would be the main things. Also be like really fucking honest about what you suck at to yourself because otherwise you're never going to get good at it. You need to search for the bad things and instead of trying. Actually, yeah. I think as a researcher, betting on your strengths is good only to the extent that you don't have necessary conditions that are completely missing. Like, you can't bet on your strengths if they're not enabled. Again, back to the engineering thing, for example. So, yeah, I don't know, I'm rambling, but stuff like that.
That's great. That is such a fascinating glimpse into the inner mind of what it takes to be a great researcher and behind all the glamour of training large models. And so thank you for providing that peak. And I'm really glad that you mentioned kind of reading every paper voraciously and having this database in your head, because one thing I've heard from your collaborators is that your superpower is understanding and absorbing new research. And so I'm curious, do you agree, like, do you think that is your superpower as a researcher or what kind of traits do you think have made you such an exceptional researcher? So I think initially in the RL work I did, it was synthesis, where I would read every paper and I would go like this thing plus this thing plus that thing with this modification. I think that's what they would mean. That yes, was definitely very helpful. I think it's a good way to do research. Generally there's enough work for synthesis to be a successful strategy. I guess to an extent still, that I tried very hard after this is actually in real interest if you bring this up. I realized this and I tried very hard to get better at leaps. Like coming up with totally alien crap that just. There's no reference for it at all. Because ultimately. So if you take the transformer, for example. Attention existed. The idea of stacking a bunch of LSTM blocks existed. And you just had to remove the idea of recurrence, really. And a bunch of couple other things that were necessary. Residual streams like the residual update and transformers existed from Resnet. So it's synthesis. But there is an amount of leap in there to make it all work. It's a little more complex than just taking components and putting them together. You need to come up with new things too.
Like, you know, the normalization and the head square. Square with the head, which is actually incorrect. But anyway, everyone now knows this. But roughly you should do summarization. So there are some new ideas in there that really help make it work. But it's still a large amount of synthesis. So I suppose most good ideas are synthesis. But there are always some. The best ideas are some leaps. And I'm trying to get better at those. But still, it's mostly, I guess, like take five things and throw it away. The stuff that doesn't work in them. Make the things work and configure. But yeah, I think some stuff needs to leave stuff. But yeah, I guess, like, no, that's a recipe. Like take LLMs, make them super efficient. Noncontext giant, throw rl on it, make it all work together. It's still mostly synthesis. I guess you're right. Who do you admire most in the research world? And like, what do you think those folks superpowers are? Shazir? No, Shazair. Me too. Yes. He. What is his superpower? I guess to an extent synthesis. He is. I mean, he's just the best at synthesis. He is also great at everything in the stack. He can. He has no weakness, really. Like, he can implement the whole thing on his own if he had to run it. He sees the future.
I think, in a way it's very unconstrained. And I think everyone's sort of crediting a number of the labs for scaling laws. This guy made a presentation where he was zipping through essays or completion or whatever, written by models of various scale. I was like, this is 100 million parameter model. This is a 300 million parameter model. This is a billion parameter model. This is a 5 billion parameter model. This is on YouTube somewhere. It's hilarious. And you're like, what. What if we make this bigger? He's sort of presenting it this hilarious way and then everyone else is like super scientific about it. I think gnome is generally just, if I had to put it, it's very, very intuitive. I think they're like a lot of labs and researchers are sort of, and I think this is not a bad thing. It's very good. Are very evals driven, very mechanical. Right? Like sort of very empirical in a way. Like Noam sort of just knows. He was like, ah, this, this would work and then it works. So I think that's a superpower that just extremely great synthesis.
He has the larger, he has a larger database because he's been around for so long. He just, he literally knows everything. I mean, he invented half of the stuff that everyone's doing now. There's no one who converse. I'd say there, there are a number of other people, I guess just that you, you shouldn't feel out of all the people who are sort of the Ogs of deep learning, I think you could have hinted there's by far the most credit. Just because he like went through all the bashing when, when it was like, oh, this will never work. And they're like training, like tiny, tiny, tiny, tiny, tiny things. They're like, this will never work. And he somehow stuck with it. I think that's that level of grit and, and belief in something that is now obviously working deserves a huge amount of credit, whether capsule its work or not, whatever. It's incredible to come to something like the conclusions that the world is at now. And if you look at some of the older papers, a lot of the ideas that are important now were in there already, so that's important. And I think he just deserves a ton of credit. Noah Brown had the army of Noams. Noam Brown. I should name my kid Noam, is what I mean. It's a very good strategy. Yeah, it's a great strategy, actually.
I think 100% of noams that are somewhat popular and well known in the research community are great. Yeah, no, he's also amazing. A number of labs were working on what he was working on during his PhD and he basically soloed the thing and was way better and way faster than labs that put ten people, including some really famous names on it. And if you just look at the paper track record, like here's the rest of the field, and then Noam 100 x is efficiency, and then here's the rest of the field that Noam life does it again. And it's just consistent, I think the consistency with which he has just bashed out these hundred x multipliers in RL data efficiency and computer efficiency is crazy. Yeah. So, yeah, the Noam army is pretty good.
I want to go back to this concept of leaps are still needed in research and that you still have this AGI to do list. What do you think are the most interesting unsolved problems in AI right now? Well, so a lot of it is solved now, I think. And the thing that remains to be solved is general domain long horizon reliability. And I think you need inference time, compute, test time, compute for that. So you'd want, when you try to prove a new theorem in math, or when you're writing a large software program, or when you're writing an essay of reasonable complexity, you usually wouldn't write a token by token. You'd want to think quite hard about some of those tokens and finding ways to spend not one x or two x or ten x, but a million x, the resources on that token in a productive way. I think it's really important. That is probably the last big problem. Fascinating. The last one.
Okay, I hope so. I think it's reasonable to think that is the last big unsolved problem. I mean, look, over the last few years, all of this other stuff got solved. Like, oh, can we do multimodal things? Can we do long context? Can we do it? All is gone. Reasonably smart models, you know, they're quite efficient now in terms of cost, that, I mean, you'd have to be a reality denier to not see what's coming. I mean, this is just a, this is like a realization to a lot of people in the online space, but like, RL has been doing this for like ages, so it's just like so clear that you need to do that. Or, I mean, maybe you don't need to. Maybe you can get away without doing it, which would be insane, but if you don't need to, it will still help you a lot. Like, like, it's just like, do I want to spend a billion dollars on my pre training run and then like a little bit more money on inference, or do I need to spend $10 billion on my pre training run? I tried. Like 10 billion would be great, but I'm going to be, I'm going to be, I'm going to prefer spending one.
And is bringing like the LLM and RL worlds together. Is that like a research problem? Like there's still like fundamental, like unsolved science problems or is that like a, you know, we have the recipe, we just need to do it. And how have the compute and the data? I think there is no public successful recipe. Right now, there are good ideas. Like, okay, even if you take best of n, make n large enough, it's sort of, you know, it's not terrible. Yeah. So there are ideas. I don't know that the final idea exists. I think there's just a lot of room up from what is currently known. But there are ideas. See, I think it's very unlikely that even if you stop progress in research, we would not at some point, hit something that everyone would agree. It's AGI. It's just that I think we can do better. And maybe it couldn't solve Riemann, maybe it couldn't do all these super hard things, but it'd be pretty good. And now I'm just curious. Okay, what's the actual. If we did all the things, how good will it get? I think there is research left to be done, and there are a lot of ideas floating in the world now. Everyone's sort of working on this, but I don't know that the current set of ideas is even final. It'll keep moving.
I think let's transition to talking about magic. Maybe just what is magic? You've been very mysterious to date, so maybe just share a little bit about what you're building. Yeah, I mean, we're trying to automate software engineering. It took us a while to figure out how to train supergiant models. That's a pretty interesting engineering challenge. I mean, fundamentally, we're trying to automate software engineering from the product side, and a subset of that is a model that can build AGI, because if it's a great software engineer, then it should be able to do everyone's job. At magic, we can do everyone else's job. That would be a subset. The idea is that you could use this to recursively improve alignment, as well as the models themselves, in a way that isn't bottlenecked by human resources. And there aren't that many Noam shaziers in the world. If I had a noam shazeer in my computer, I could spin up a million of them, and maybe alignment would just be solved. I'm simplifying a ton and very idealistic in the statement. I'm happy to turn this whole thing into a scalable oversight podcast if you'd like, but the. The core idea is, okay, if I could just clone what we are doing into a computer and then press yes on the money button to run a cluster to do the work we would be doing next week, that would be phenomenal.
I think we're pursuing these two things in tandem, where we want to ship something that's a good AI software engineer for people to use. It's, I think, going to be one of the first domains to see higher levels of automation. I don't like talking around. I don't think the whole assistant pitch is going to last very long. Once these models are good enough to automate, there's just no way the economy is not going to do that. I think everyone knows this and they just don't like talking about it. It's totally fine. We used to all be farmers. We're not farmers, we're fine. Everyone prefers this. I think we'll figure our way out in the economy. If it produces the same or more stuff with less inputs, we should be able to figure that out. That's not a hard problem from economic principles. You just have to figure out the distribution anyway. But that's what we're trying to do. We're trying to automate software engineering and as a part of that, automate ourselves in doing the work we want to do.
And so the reason they go after software engineering then is that is the kind of lever that allows you to automate everything else. It's like the MVP of AGI, right? Like the minimum viable AGI. Yeah, because then it creates everything else. Like, yeah, yeah. We wouldn't train something like Sora. Sora is great, you know, fantastic. Generic video is awesome. It's just not interesting from an AGI perspective if you believe that models can code themselves soon. Totally. And so, out of all the companies that are trying to build an AI software engineer, you are probably the only one that is really taking a vertically integrated approach and training your own models. And that is either insanely brave or insanely crazy, and probably a combination of both. I'm curious. I know you love training models, and so I know that's part of it, but why do you think you need to own the model to get this right?
And how do you motivate yourself in the David versus Goliath of knowing that OpenAI exists and has great people and cares about coding and is great at building models, obviously. How do you think about that entire dynamic? I think you need, well, to build the best model, you need to build the model. And we want to solve these fundamental problems. You can't rely on any. Like if the API guy solved it, then what the hell are you? We might as well start the company three years later. It goes to the point where we started, right? We started working on this stuff two years ago. So we have, it took us some time to learn how to train these large models. It was really. I think it took OpenAI two years to get from GPT-3 to GPT four as well. And I thought we could be like much faster and this is going to be great. It's a pain, so it's definitely an engineering challenge, but it's necessary. Like, it's not like, it's not like we're doing it just because it's fun or because I like training models. It's a massive financial investment that people trust us with. And it's not like it's one of those, like one to one ROI investment. It's like, if it's work, if it works, it's fantastic. And if it doesn't work, the GPU's ran and the money is gone. So like, you're getting a lot of people's trust doing that. It's certainly not something you should do just because it's fun and you enjoy it.
Fundamentally, I think the value will accrue at both at the AGI and at the hardware level, and never at the application level. There's no incentive at all to offer an API. If the API creates a hundred billion dollar company, you will just build that company. Internally it is. And if OpenAI doesn't, someone else will. It's just incredibly unimaginable to me that that would be how you would build these companies in the first place. So from a business perspective, I don't think that's necessarily the right way. Maybe there's some partnership potentials you could like, oh, we'll get like special access or whatever. And then why is that different from like cloud computing, right? Like there's, there's been many $10 billion. I mean, it's much, much, much harder to build Netflix and Airbnb and Uber than it is to build a chat interface. Like fundamentally magic is an application you press download on that we have a couple of guys working on and it's just there. Like, it's not, you know, you can build this with like YC pre seed money, the moats in.
I guess I can just make the API twice as expensive for the next model and then launch my own product and then undercut every. It's really fucked to not own the model in this domain and in any domain that's going to generate a ton of revenue for a single company. In the case where it's distributed, maybe it's fine, but I don't think this will be so it's necessary both for the market, which is good for us, because the market is incentivized to fund folks like us, which it isn't in other domains. Have fun writing an email assistant. You're not going to get that funded anymore. So that's helpful. But fundamentally, the reason we train our own models is because it's necessary for our mission. And I just wouldn't be interested in building like a nice little SaaS wrapper. It's just not like, that's not every. That's gonna happen anyway.
And I think, though, about competing against the 800 pound gorillas, like, you've raised a lot of money, but some people have raised boatloads. Yeah, they raised a lot more money. Some people have 100 million plus in revenue a year that they could spend. It goes beyond even the ones who could raise. Yeah, absolutely. And so how do you like, how do you motivate yourself to compete in that, in that reality? The question is how much does it cost to build AGI and not how much money can you raise? Because if you can build AGI for however much you can raise, and you're having more might help you, but it won't get you there substantially sooner if you have all the right ideas and you can build it with a certain amount of hardware, by definition, okay, if someone had like 100 times more hardware, would it be like computing that much faster or whatever? But it doesn't seem like a material advantage if your estimate for how much compute you need to build AGI is not as high as the revenue these companies could generate, or the funding they raise is in fact, much lower. And I think that is the case. It's not by any means accessible. It's very damn hard to get that much money. But it's not 100 billion. And if I'm wrong, I'm wrong and it'll be 100 billion and we will not have 100 billion and that's it. But if we can get to that point where we have AGI and a couple others have AGI, and then the benefit of additional computers there, and you show an ROI, it's like a reasonably even playing field in terms of additional revenue.
You're going to bring AGI to the market, you're going to raise more on it. So the starting conditions of have this hardware is you need sufficient hardware, but you don't need more than sufficient. So that's a bet. That's not a. You don't know, but I think it's a bet with a high enough probability of being right that it is reasonable to compete in the space. And I think it is actually, it is reasonable to think that, like, the ROI of having, quote unquote, sufficient funding might be better than the ROI of having, like, infinite funding early on. And is there like, an ideal for investors that is not for me. For investors, is there an ideal, like, team size for researchers? Is there a certain point at which you reach kind of like diminishing marginal returns of adding on the extra researcher? So one of my biggest weaknesses, especially early on at magic, was just scaling the team effectively, like we were very single, threaded on a very small number of people doing basically all the work. And I think we're getting better at that now. It's also, you just need a certain level of maturity of your code base and of your research ideas and everything to properly segment them.
So early on, I would have said five for that time. Now I would say closer to 20. And I'm not including folks working on other stuff. I'm including folks working on the models and everything. I'd say closer to 20. I could imagine that in a few months I'll say at a slightly larger number, especially when you get into large skill deployment, you really want to have very, very good processes around just having high reliability, availability of services that are detached from each other, et cetera, et cetera. So then you can segment even more, which obviously stuff we're working on now, but it sort of grows over time. I don't see it ever exceeding the tens of people, and right now it's in the low tens, very low tension. But I don't know, maybe it's a skill to be able to utilize. If you're able to utilize 200 people, you're just a better CEO than I am. No, seriously, if you can, it's a good skill. And I think part of why I say a smaller number for us is that there is a ton of stuff we just don't do. Like if we built a video model, that would just be a separate team. They built a video model and that's more scaling. So to an extent, we're more focused and that's why we're smaller. But also, if we could double the team and be twice as fast, I would do it any day.
Back in, was it late 2022 when I first met you? At the time, it was marketing assistance and email assistance were all the rage, and you were the first pitch that I heard that was AI that feels like a colleague, and I just remember that really sticking in my brain. So in some sense, you've been thinking about kind of like agents, to use a buzzword longer than anyone else maybe share your vision for that and what you think it takes to build a great agent. Fundamentally, there are two tiers here, I guess three. One is useless, the next is assistant that you have to micromanage. And then the next is the thing that manages you, basically, where it's more like a colleague. I think the layer where it's exactly even doesn't really exist because it's this little thin point. Once the model is more competent than you are, you are there to give it guidance on what you want to be accomplished and answer clarification questions. But you'll never have to tell it, like, here's a bug. I'm not saying that this is v one of everything. I'm not saying this is v one of our product. But fundamentally, that has to be the goal. The way I feel when I talk to my best engineer, that's how I want to feel when I talk to magic, where we have a discussion, he's almost always right, and then he just writes the code, and then someone else reviews it, and then it works. That experience where my job is exclusively saying, here's kind of what I want, and then they help clarify even.
Right? Like, I just want to hear specifically that it should feel like that and everything else doesn't matter to the user. Like, what tools the agent uses, how it works, does it run locally in the cloud? Does it need a vm? Does it have a browser? I don't care. Doesn't fucking matter. Our problem, not your problem. You care about your problems getting solved. So fundamentally, that's what I think matters to customers. And everything else is dependent on the exact product shape, exact domain, exact everything. And like, I'm stubborn as fucked. I just don't want to launch anything that isn't that we will probably have to, but I just really want to get that thing.
I want to talk to my computer, go and have lunch and come back. And it built AGI. That's the end goal. There'll be checkpoints, but I don't think anything else matters. How you accomplish that is up to each individual company. Yeah. How far away do you think we are from that? Or I guess maybe break it down into a little bit more. I mean, we met in 2022. You learned how to extrapolate Eric's timelines. So maybe. Yeah, one and a half or double everything I say. But I think very soon, like very small number of years. I don't want to give a number now, but very small number. Less than ten. Oh, definitely less than ten. I mean, way lesson wow. Okay.
Because I'm seeing some of the, like, the swee agent stuff that just came out. They're like 14% on Swebench, which feels like, I mean, 14%. I just don't care about 14%. Like, I mean, we, like, I don't know if 80 or 90 is good enough. Like, like, I think you need 99. Like even 96. I don't trust my computer. Like, I don't want to review the code if I have to. Like, the tier of product where I have to review the code is fundamentally different from the tier of product where I don't have to review and understand the code. And, like, you're not talking about 95 when you. When you don't want to review, you're talking about 99 point something. You're talking about whatever my developers accomplish. Plus some, same as with self driving cards. So the difference with self driving cars is like, you die if the thing crashes. And here you just have to review code so it's launchable before, but fundamentally you need way, way, way more. And, like, usually the last few. Right, like, the nines are hard to get, so. Yeah, but no, I think you can. I don't know. I don't. People have. I mean, models have surpassed all these benchmarks. I mean, just recently, the math benchmark, right? Like way faster than even, like, the prediction markets assumed then. Like, I don't see that stopping. There's just too much, like, if everyone was stuck and I got realistic some perception in the public that, oh, GPT four is like, only, like, it's not getting much better. No.
Okay, we're gonna close out with a lightning round. One more of the answers. One. What's your favorite AI app? Not magic. Probably all the invisible ones still, like my spam filter and all this stuff. The things that keep life working, I think are still at the moment more useful than the sort of AGI like apps, because if you took them away, life would just be awful. Like recommendation algorithms for whatever. I think that's really useful. Other than that, yeah, I think whichever you saying, other than, let's say other than the programming world, other than magic, I'd say whichever model is currently best. It's a very boring answer, but I actually picked the spam filters, etcetera. The recommendation services first.
What paper has been most influential to you? I don't think this paper is relevant at all in the world anymore, but it was the first paper I ever tried to deeply understand months on it and re implemented it and everything. And so it was most influential to me as a person, not so much to my current work. And the paper is called deep stack. It's one of those neural networks plus imperfect information game solving papers. It's reasonably complex for time. Yeah. So it's where folks are interested. It's like nowhere near Sota now, but it's sort of just irrelevant type of algorithm. But at the current time right now, back then, it was useful. That was very influential for me because it was just my first touch point with research, really. I had no idea how to do research at all. And then I sort of just was like, I'm going to dig into this the way people like, people like hyperlinks, spam on Wikipedia.
Where you rabbit hole. I did that with this paper. I love it. Okay, that's my weekend reading. Last question, what are you most excited about in AI in the next one, five and ten years? Just what it's gonna. How society is gonna integrate with it. I think that's, we're getting to the point now where it's really gonna impact over the next one to five years, it's really gonna impact how society does stuff and beyond. Just, you know, another tab in your browser that speeds you up by some percentage on some fast. I think it'll get much more significant in that timeframe. And ultimately, you should. Like the only, I am not one of the intrinsic curiosity type of people. I know most researchers are. I really am not. I just care about the outcome, and that is the outcome. So I most excited for the outcome.
Eric, thank you for joining us again. Last time we recorded the podcast, we weren't actually able to talk about the thing that got us so excited about magic, which was you had shared with us your long context eval, and our own kind of AI researchers had gotten really excited by what you'd accomplished on that, and that was actually what led to us investing in magic in the first place. So you just made some exciting new announcements around the eval. I was hoping you could share it with our audience. Yeah, for sure. Thank you so much. Yeah, I mean, we've been running around with this. Hashes evolve for a while, basically just being frustrated by needle and his sake evolve, and everyone keeps complaining about it. And now that we've decided to announce where we're currently at in terms of our context work instead of just blah, blah talking about all, we have so many tokens of context, it felt reasonable to share the eval as well.
We've used it in our fundraising, obviously, and thanks for backing us, and generally just used it to guide our architecture development and our research. So, yeah, felt right to open source it and let others compare their architectures and their results with ours. And then it's exciting to share. And thank you for having me back on to talk about it. Sure. Cheers, of course. Thanks, Eriche.
Artificial Intelligence, Technology, Innovation, Agi, Eric Steinberger, Software Engineering, Sequoia Capital
Comments ()