The video discusses the recent Consumer Electronics Show (CES) and highlights some of the exciting innovations and trends in artificial intelligence and computing technology showcased at the event. One of the focal points is Nvidia's new product, "Digits," which is a compact supercomputer capable of running large AI workloads locally on a desktop, potentially revolutionizing how computing power is accessed and utilized both in personal and business contexts.

The conversation delves into Nvidia's strategic shift towards desktop supercomputing and personal AI workloads. The speakers explore the potential enterprise uses, like industrial applications, that can benefit from having high computing power on location. They also discuss how this move positions Nvidia against traditional tech and computing companies like Apple, indicating a shift in their market approach to accommodate both consumer and enterprise segments.

Main takeaways from the video:

💡
Nvidia's "Digits" aims to democratize computing power by bringing supercomputing capabilities to the desktop, thus bridging the gap between personal and enterprise-level processing power.
💡
The shift could prompt widespread adoption in various industries, such as manufacturing and defense, where localized computing can reduce latency and costs.
💡
As AI tools evolve, trust and reliability remain points of discussion, indicating a need for continued development in AI safety, quality control, and integration into existing systems.
💡
The ongoing improvement and adoption of AI models pose both opportunities and challenges in fields like software development, where trustworthiness and the potential for self-correcting AI code are emphasized.
💡
The anticipation of rapid technological advancements and the potential integration of AI into everyday devices highlight the fast-paced nature of developments in this field, with leaders like Nvidia and Apple at the forefront.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. supercomputer [ˈsuːpərkəmˌpjutər] - (noun) - A high-performance computing machine designed to process and operate at maximum speed and large capacity for complex calculations. - Synonyms: (mainframe, megacomputer, high-performance computer)

The supercomputer right next to my laptop

2. parameter [pəˈræmɪtər] - (noun) - A measurable factor forming one of a set that defines a system or sets the conditions of its operation. - Synonyms: (factor, characteristic, criterion)

Imagine a 200 billion parameter model which is way bigger than what ChatGPT was when it came out two years back, right?

3. petaflop [ˈpetəˌflɒp] - (noun) - A unit of computing speed equal to one thousand million (10^15) floating point operations per second. - Synonyms: (computing speed, FLOPS, processing power)

So petaflops of computer at your desktop.

4. latency [ˈleɪtənsi] - (noun) - The delay before a transfer of data begins following an instruction for its transfer. - Synonyms: (delay, lag, pause)

In a lot of these use cases, there's a lot of latency between calling a server or a cloud API and getting responses back.

5. autonomous [ɔːˈtɒnəməs] - (adjective) - Having the ability to operate independently without human intervention. - Synonyms: (independent, self-governing, self-sufficient)

They are trying to ensure that the entire industry moves closer to this physical AI and agentic AI and autonomous driving era.

6. trustworthiness [ˈtrʌstwɜːðinəs] - (noun) - The quality of being reliable, truthful, or able to be relied on. - Synonyms: (reliability, credibility, dependability)

The comment was well we want things like trustworthiness in the AI and it should be reliable and all the things that you would want.

7. robustness [roʊˈbʌstnəs] - (noun) - The quality of being strong and in robust working condition. - Synonyms: (sturdiness, resilience, durability)

So this big congregation and the topics are of course around safety, robustness, trustworthiness

8. hallucinating [həˈluːsɪˌneɪtɪŋ] - (verb) - Experiencing a perception of something that does not exist in reality, often used in AI as when a model generates incorrect or nonsensical outputs. - Synonyms: (imagining, perceiving, envisioning)

And they found that in many cases Apple was actually summarizing incorrectly, it was hallucinating.

9. iteration [ˌɪtəˈreɪʃən] - (noun) - The process of repeating a set of operations until a specific condition is met. - Synonyms: (repetition, cycle, loop)

They just released an equivalent compute for 550 bucks, right? So just imagine Apple will never do this, they'll never take a sixteen hundred dollar thing and the next iteration being a third of the price, right? So you're getting to this point where Nvidia wants to make sure that the compute is as easily accessible and democratized as plugging into electricity

10. democratize [dɪˈmɒkrəˌtaɪz] - (verb) - To make a system or activity accessible to everyone. - Synonyms: (make accessible, generalize, universalize)

Nvidia's "Digits" aims to democratize computing power by bringing supercomputing capabilities to the desktop.

CES 2025, NVIDIA DIGITS, Apple Intelligence fails, and Sam Altman’s reflections

What are you most excited about coming out of ces? Shobhit Varshni is senior partner consulting on AI for US Canada and Latin America. Shobhit, welcome back to the show. What do you think? Nvidia's digits. The supercomputer right next to my laptop. Love it. Great. Skylar Speakman is a senior research scientist. Skyler, welcome back. What are you most interested coming out of CES as a longtime PC gamer? Absolutely. The new line of graphics cards coming out. And finally, last but not least, Volkmar Uhlig is vice president, AI infrastructure portfolio lead. Volkmar, what do you. You take out a CES this year? I'm a shove it. It's the digits.

All right. All that and more on today's Mixture of Experts. I'm Tim Hoang and welcome to Mixture of Experts. Each week, Moe is dedicated to bringing you the debate news and analysis you need to keep up with the top headlines in artificial intelligence. Today we're going to be talking about a new report on developer use of AI tools, some trouble with Apple intelligence, and Sam Altman's reflections on the second anniversary of ChatGPT.

But first, so let's talk a little about CES and Shobin. Maybe I'll kick it to you first. We're all very excited by digits. For those of us who are not obsessively watching all the headlines coming out of ces, what is digits and why are you so excited about it? So the intention here is shrinking their DGX all the way down to a small machine right next to your laptop. So Nvidia has figured out a way to squeeze in a lot more firepower than their graphics. GPU card with an insane memory, 120 GBs attached to it so you can start to run some really large AI workloads on your desktop. Imagine a 200 billion parameter model which is way bigger than what ChatGPT was when it came out two years back, right? You're able to run that locally right next to your machine. Now, it comes with a flavor of UNIX on it, but you can obviously instead of Linux, you could have your Mac and Windows use that as a server and you can do some really cool things, right?

But now you're talking about having personal supercomputers that you can literally keep on your desk or potentially even carry with you. It won't be out till May. It's about $3,000, which just looking at the hardware that's going in that itself is a ridiculously great price point to go deliver that. But this starts to move computing from the cloud, supercomputing all the way down to your desk. So petaflops of computer at your desktop. And that does us an insane value.

Yeah, absolutely. I know. Volkmar, you were saying that you were excited about this as well. I know, I think we've talked about it in the past. But if you want to give our listeners a little bit of an intuition of why is Nvidia moving into this market at all. Right. Like, arguably, doesn't this put them in competition with like Apple and all these other kind of, you know, kind of desktop personal computer creators? Whereas Nvidia's usual thing has of course been data centers doing. Do you have a sense of why they're moving into this market?

Yeah, I would not say that. Nvidia traditionally is a data center company. They are a gaming company and the data center kind of came along and hit them three, four years ago right in the face. Yeah, yeah. And you know, good for them. They captured it and was just visible in the market capitalization. I think what Nvidia is figuring out right now is that the development market or developer market was kind of limited to, you know, buy an RTX and stick it into an, you know, developer machine. And now they are effectively going all in of saying we need to cover this whole value chain creation. And I think it's very, very hard like today because you effect you need to buy like a Windows or Linux box. And then, you know, you, you stick in a bunch of Nvidia cars and you know, you rig this thing up.

And now they are effectively coming out and saying, okay, here's a ready to go system which is optimized for that specific workload. I think when you see what Apple did with the M1 to M4, now they are effectively trying to capture that desktop market. And that is not Cuda and that is not Nvidia. And I think Nvidia is doing a preventive strike here. And if you look from a pricing perspective, they are sitting right between the smaller Apple Apple Studio for 2000 and the bigger Apple Studio for 4000. And so they are at $3000 and they have specs which are bigger. And so I think it's. And now it's also, it's an attachment, but it's at the same time you can use it as your primary desktop. So I think they are effectively trying to cover their bases.

What will be interesting to see is, you know, what people are now doing. If you can just for $3,000, you can get that box, it's not a DGX, but you know, in many cases it may be sufficient for running small scale training jobs. And so I can imagine that people are just buying them by the truckload and putting them up in data centers and giving the developers not necessarily a desk, but you know, maybe it's tethered, but it's on, on premise. And so it's a really good way of actually getting that development loop going. And you could even use it for production use cases. Right. So if you don't need a, 19 inch rack solver, you could use something smaller.

They, I think at three different points in the press release from Nvidia, they talk about how easy it is to take the models that you've trained on your small digits and move it to Nvidia's cloud. So I also think they're really pushing for this hook here in order to drive more business to their data centers. And one of these is start small on your own personalized local system and make it extremely easy for you to then scale that up onto, of course, their data centers. So I think that also plays a lot into the strategy of why they're really pushing this.

Yeah. Shelby, maybe to turn back to you. What do you do with a petaflop? You know, it's like, it's kind of funny, like, because it is very exciting, you know, a supercomputer literally on your desktop. But like with that level of computing power, what do we use it for? I mean, is it just gaming? Do you anticipate people doing a lot more homebrew AI stuff? What does this unlock? Right. If digits really become super successful?

I think that the two different markets here, one is enterprise, one is consumer. Right. Think there will be some enthusiasts on the consumer side that'll obviously gravitate towards it, but I think there's a huge potential on the enterprise side. What that gives you is being able to run compute that's closer to where the action is happening. So think about industrial applications, where on the factory floor you want compute to be right next to where the manufacturing of the thing is happening. Or one of my clients, large auto industry, they have a lot of trucks and buses and things of that nature.

And you would want to have some mobile compute that you can actually run a model on. Right. In a lot of these use cases, there's a lot of latency between calling a server or a cloud API and getting responses back. Those are expensive. So imagine you're taking say pictures on the manufacturing conveyor belts. Right. You want to be able to process those near to where the images are being captured. There's less latency and there's a huge security concern here. Right. You want to make sure that the data, especially if it is related to something that's very sensitive, you don't want that leaving your premise either. So you want to be able to run those closer to it. Same thing goes for say, defense applications where you are doing something more tactical in the field.

You want to be able to compute all that images coming in from all the drones and stuff at the particular place because you may be in a territory where you really don't even have a cellular connection. Right. So all of those are heavy computing workloads that used to traditionally take cloud environments to go scale up and run that you're now being able to do closer to where the action is happening. That's a huge, huge unlock of value for enterprises today. We've been constrained by some cutesy little small models that will be running on mobile devices and things of that nature, but we're not quite there yet where you can run 200 billion parameter model right next to where the action is. Yeah, that's really exciting.

Well, a lot more to pay attention to. I'm definitely going to get one as it sounds like many of the folks on this call are. So we'll definitely have to compare notes once they start arriving on our respective desktops. Tim, apart from the digits, there were some insanely good things that Nvidia released during the keynote. There were like three different areas that Jensen wanted to ensure that people realized that this is what Nvidia really does. Right. So one was in physical AI figuring out way in which we can model the physical universe around us.

Give us a good set of starter AI, open source that can understand the physics and we can start to train things around it that leads to things, things like robotics and humanoids around us in our environments. Right. The second big area of unlock was automotives, figuring out how do we do autonomous driving? And you need the whole pipeline of millions of sensor data coming in. How do you process that and make decisions on the, on the vehicle itself. Right.

And then the third one was around digital workers, agents doing regular day to day work as you and I do inside of all the software that we work with. Jensen spent 90 minutes on stage wowing the audience. That's no easy feat. Right. If you analyze the entire 90 minute conversation, you start to realize how an incredible communicator he is breaking down complex concepts into such clarity. So in each of those different sections he proved that Nvidia is in fact a leader. They are making some bold moves to ensure that the ecosystem comes along with them. They just bought Run AI for me for maybe $700 million and they turned around and open sourced it. It's such a baller move, $700 million in the open source it.

So they're trying to ensure that the entire industry moves closer to this physical AI and agentic AI and autonomous driving era and they want to be born across each one of them. Last year they had in the gaming industry and Skyler's going to chuckle on this, right? The 40 series of their chips used to be 1600 bucks. They just released an equivalent compute for 550 bucks, right? So just imagine Apple will never do this, they'll never take a sixteen hundred dollar thing and the next iteration being a third of the price, right? So you're getting to this point where Nvidia wants to make sure that the compute is as easily accessible and democratized as plugging into electricity. But they want to be the electric superpower of the entire world. And if you look at those three different areas, my hot take, Nvidia is undervalued right now.

IBM is out with a new developer report taking a look at developers views on the use of AI tools in their workflow. A couple of very interesting data points. But I think the place I wanted to start is really, I think on this really interesting result where the developers were asked okay, so what do you want most out of an AI tool? The comment was well we want things like trustworthiness in the AI and it should be reliable and all the things that you would want. And then they were asked well what are the current problems with the existing AI toolset? And it was exactly those same things. And so I do want to really ask this kind of question of the group which is it does feel like despite all the hype around code assistance and agents in developing and all this kind of stuff that we've been talking a lot about, it seems like ultimately that there still is this big trust gap and it is actually preventing adoption of a lot of these tools.

And I guess maybe Skyler, I'll turn it to you first is do you see that as a big problem? Do you think that it's ultimately going to kind of put a ceiling on the use of these tools and what should we make of this? It was sort of an interesting result for me. I'm not sure about a ceiling is the right term but certainly delay. I spent a good time last end of last year in San Francisco at this International Network of AI Safety Institute. So this big congregation and the topics are of course around safety, robustness, trustworthiness. And those are the topics of the day in this. And here when I talk to would be clients, they aren't concerned about overall accuracy. That's not their concerns.

It's how are these machines reaching their conclusions and can we trust them? That's the back and forth we have now, not accuracy or even costs. So it's a concern at a global level and even at just kind of an individual client engagement level. So yes, it's been part of an IBM research strategy for many years now is what can we do with trust and governance in this space? Lots of, lots of work to be done there. Yeah, that's right. And I think there's kind of one point of view and Volkmain, I don't know if you agree, working with a lot of folks who are kind of in the nitty gritty of the technical aspects of this is I think the AI person's response also is, well, what do you care about trust or reliability? If it just works, then it just works, right?

You kind of think about the early days of Google where it's like, oh, the Google image search, there's this GPS thing that's going to tell you where to go. Yeah, sure, I don't trust that. And then over time it just turns out the fastest way to get from point A to point B is just to put it into GPS and kind of people get over like their fear about not really knowing how these systems make decisions. Do you think that'll kind of be the case here with kind of these, all these developer tools that say we're going to do CodeGen and you're like, I don't really need to understand because it just works. And I'm moving faster than developers that are not.

I don't think so. So the way right now the development works usually is, and I hope this is how it works for most companies is you use the co generation kind of as like, okay, I know what algorithm I want and I can proofread it. So I can proofread code about 10 times faster than I can write code. And so if I go and I need to build something, I'm just going to an agent and then the agent produces the code. I'm still checking that the code works and there is still an architecture behind it where you are saying you're kind of interacting with the system and you're, you almost have an engineer at your hand. Who is very fast and doesn't get tired. And you still need to do all the engineering practices we have. You still need to write unit tests, you still need to write integration tests. And so there is a rigor to it.

Now, if you have bad engineering practices and you don't write unit and integration tests, then you may actually litter your code base with bugs. But that's more of an organizational structure problem. So do you allow code which is untested in your code base? And a developer can make mistakes and the model can make mistakes and we are primarily now asking who has the higher likelihood to get it right. But in the end, confidence in your code base will always come from test coverage and reviewing that the tests are written well. And typically in engineering you're saying your test should be 10 times easier to understand than the code you are actually writing, so that you are actually. It's easier to check that the tests are correct than that the code itself is correct.

If you follow those practices, I think you will discover the bugs which get introduced. But if you don't. Yeah, good luck. Yeah, definitely. So do you think that this report is mostly just revealing the fact that effectively the sort of AI engineering is still more buggy than humans? Effectively, the lack of trust is kind of well warranted. I think we are not at the point that I can go blindly to a model and say, produce me 10,000 lines of code and they will be correct. I think the big challenge is that humans are lazy and so there is a tendency that we are overconfident that the model is doing.

And if you do that and we are not very skeptical about the output and we don't review it, we will actually get bugs into the code base. I would flip it around and say the more open ended question we have right now is where we are actually putting the model in the middle of the execution. So this one is the code generation. But I can review this. What if the model actually executes code? And we see this right now already in ChatGPT, you ask it a random question and it goes out and it produces actually Python code and then it runs the Python code and it gives you an answer, but then you look at the Python code, it may be buggy, right? And so sometimes the code doesn't even, you know, like when you do data aggregations, you know, you have like a table and has like, you know, five values in the first column and seven values in the other column and then says, oh panda, sorry, I got an exception.

So that, you know, and this happens in Real life. And so you get these answers which are just bogus simply because the code generation and then the code execution is wrong. And so that's, I think, where it becomes much more scary where we are doing this on the fly. Code generation. And I do not think that with the current accuracy we are there yet. And so for small things that may be okay, but for large things, I think you still need human eyes. Will that go away? Yes, probably over the next three, four years we will get to a point that, you know, the code will be better than what a human can produce.

Shobha, to bring you me back into this conversation, I gotta believe that this is like your life, right? Is like customers and clients saying, well, I don't know if I trust this stuff. And then you being like, no, the water's fine. I'm curious how this is kind of playing out in your world because it feels like this is like a conversation that you have day in, day out all the time. So from a IBM consulting perspective, right? We have very strict guidelines and warranties and things of that nature. For any code that IBM produces for an end client. We have to be bound by what our master Services agreement says and what will go into the code, Is it copyright free? Things of that nature as well, right?

So it's a pretty high bar for when our team members are producing code for our clients. And I think over time we have started to see that the quality of the engineer that is leveraging these co pilots matters a lot. Right? If you are a software architect, somebody who's senior who knows how to make interns work for you, right? So say we get you some brilliant software developers and they have these sparks of brilliance. They'll show you some code that's like, oh my God, I don't believe that this intern rewrote this. And you realize that they actually copied it off from Stack Exchange and they modified it a little bit or something of that sort. So it was brilliant, but it was because they had access to other things and stuff like that. But unless you know how to judge that piece of code, it's very difficult for you to even think about putting that into production.

So the bar of the manager for intern is pretty high. Similarly, when you get a copilot who is behaving like an intern and you're trying to ensure that the person who's using that copilot should understand how code is written, right? To the earlier points we've made, we need to know what good looks like. But if the code is being generated 100% by a copilot, then it's very difficult for you to understand what logic was used. Earlier you said that you can just proofread a code, but then you need to be really good enough and have done this over and over again before to understand what to even look for.

What's happening in reality today is 70% of the code gets generated and it works pretty well. The last 30%, the last mile is where we get stuck. Right. It's an iterative process. It takes one step forward, but then it ends up taking two steps backward and may introduce some other bugs. Right. So unless you really know how the code was written, how you would have written it yourself if you had the time, you're not really able to get to that 100% unlock of value. So this tandem between a human and copilot, we also need to figure out a little bit better on how to ask the right questions, how to create the right tester cases.

And I think having an agent that's going to go review and be the peer reviewer for the code that's been generated, we're moving towards that place a lot of our deployments with our clients. When we introduce other agents to review the code, review the errors, that multi agent is delivering higher quality code for our teams than what we got from an LLM that would just start spitting out the code end to end. It's really interesting to think about. This is like part of like a maturity of the overall AI tool chain that needs to happen. So the lack of trust is the fact that we have this AI codegen thing, but it's really not connected to any other AI tools around it sort of is what you're saying.

It's a new year, we can be optimistic. What is the insights from the same study? What was the lowest item on this list of 10, the one that people the developers don't think is a problem and I think it's really interesting. It was the quality of the LLM. So these developers I think are correct and convinced that the LLM quality is going to continue to increase. That's not one of their concerns and it's really interesting to see that sort of play out here as the lowest of the 10 options given here. This came up about half as often as the trustworthy issues did. So I think that's a pretty interesting takeaway from here.

LLMs will get better. How we integrate them into the decision making process, that's a different story. But I think there is kind of a global optimism that these LLMs are going to become stronger. For our next segment we're going to talk a little bit about Apple Intelligence. There was a really interesting news story that popped up in the last week about how this news summarization feature that was part of Apple Intelligence had been messing up. Right. So this would be a summary of your voicemails, your text messages and importantly your news stories, your news headlines that you were getting.

And they found that in many cases Apple was actually summarizing incorrectly, it was hallucinating. So Apple apologized and promised that they'd be doing better on the version two of this feature. I wanted to bring up this topic just because when we talked about this earlier last year before the feature came out, you know, the opinion that we had was AI is going to be perfect for Apple and they're going to get this so right and it's going to be so targeted. So I wanted to just go back and talk a little bit about were we right, were we wrong? And I guess, I don't know, maybe. Shobha, I'll throw it to you first on what your hot take is on that.

I think it underperforms in a lot of different scenarios. I think Apple is using a lot smaller models to do this on device to dialing up on the security side of things to make sure that they're small, can run, they're not using some insanely large models to do the summaries and stuff like that. So a little bit of the performance hit I believe is happening because of the size of the models that they're using and we see this in real world as well as you're building multi agent systems and stuff like that too. Right. So I think there's a little bit of the balance between hey, should I make sure that everything runs on device, and I'm going to constrain it only to a few things.

It has to be, it cannot start draining the battery and a few other things that they have to solve for versus Do I really get a really intelligent model to go do these summarizations and things of that nature? Skyler, maybe do you have a similar take or. I think in addition to Apple getting burned here, I think there's at least from what I've seen from the headlines, it's other news agencies that were using Apple technology and so for example the BBC, you see this BBC breaking news coming up and it's completely made up and so the BBC is actually feeling quite burnt in this. It's not just Apple with egg on their face, it's partners that they've gone with because now they're getting these headlines blasted to their customers with the BBC icon next to it. Gibberish. So I think it's going to be there.

Yeah. Apple really has to really think about how obviously the technical challenges of getting these hallucinations taken care of, but then how do you really pass that messaging on onto the consumer? Is going through another news agency the right way? Because I think they got hurt on this one. Yeah, that's right. Volkmar. One question I had in particular for you. I was having a conversation recently where a friend of mine was making the argument that Apple is ultimately, they have hardware brain. Right. They do hardware. And he was saying that he's a machine learning researcher who is like, machine learning is very different. Right. It's just like you throw a bunch of data at it and then the machine just sort of figures it out. And so its attitude is a lot more just like, just try it.

And then if it works, then it's basically a lot more shooting from the hip than kind of like the mentality of it is that is to do hardware. And so kind of from this argument, he was saying, culturally, Apple's just not well positioned to kind of play and win in this space because of kind of how careful Apple is in a lot of ways. Do you think that's right? Is there kind of a point of view here which is like, in some ways Apple was slow to launch the product and then it just can't bear organizationally the risk of these things. And so it's almost kind of always kneecap that really launching good features in the space?

I don't think so. Do you remember when Apple kicked out Google from their phone and did Apple Maps and it was a disaster, right? Yeah. And they took a lot of heat for it. And now it's one of the main routing applications. Right. I think what's happening is Apple was kind of in a bind because they were late to the game, they didn't build a really strong AI team. This was very visible. Like, you know, I was living in Silicon Valley and Apple was just not there. Like, they were not present. And now they were effectively in a bind of, okay, we need to bring something out, we need to make an announcement. So they made a big splash and they, you know, shipped the product. They tried to keep the functionality really limited.

But in fact, we make a strong statement, hey, we are going to put something on our devices. We are not like missing the whole thing. And I think they had to rush it out. I think fundamentally there is a backtesting problem. Those things could have been found if they would have done decent back testing on very large scale data. They have very large scale data on the devices. They didn't and so now they got burned. Do I believe that will get fixed? Yes.

I think what Apple is doing is it's defining on edge devices. How you do deep integration I think is still clunky. Like the whole rewrite my email, rewriting my text messages, it's not good. The models are not good yet. We have much better models out. I think figuring out how to squeeze something into that form factor with the resource constraints you have, with the power constraints you have is a really tough challenge. On the flip side, what we are seeing now is every generation of a new model we pretty much get the same capabilities for the next smaller model. So the 70 billion parameter model gets to a 20 billion and the 20 billion parameter model becomes a 13 billion and a 13 billion parameter model becomes 7 billion parameter model. And so just by waiting 6 to 12 months we will see capabilities which have only been traditionally been able to do in the cloud on two GPUs or so will be possible to run on a phone.

But if they would have waited the year they would have lost the market. So I think they were in this bind. It's like okay, technology is almost there, there's a lot of hype around it, we need to do something. So let's get something out. And now they burnt their fingers. Yeah, that's interesting. But it's cool to think of that. This is basically like Apple Maps again. As someone who just switched actually from Google Maps to Apple Maps I'm like wow, this is actually in fact way better. But it was such a funny thing because I remember the initial reputation of it was terrible.

So I did touch it 10 years, I don't know how long I didn't touch it for. And it's by the way, it's still like that for anywhere except for their Apple offices. So we went to Japan and Apple Maps sends you into the forest. That's amazing. Shobha, do you agree with that? It seems like Apple is the most disappointing, but it seems like what Volkmar is saying is give it time, they'll eventually win. Just because you can't beat, you can't be Apple. So if you look at the actually Apple did a lot in the open source community last year and it's fairly impressive what they did with their Ferret UI models.

They have these smaller adapter models can run on device and things of that nature. The power Envelope is pretty low. So they've done an incredibly good job. And open source, a lot of that, right? There are a few things where I think Apple has a lead over some of the other mobile manufacturers and stuff. Understanding of what's on the screen. As an example, they have some brilliant work that they've open source that lets you understand the different elements so you can then build on top of that and create apps that can take actions on the screen, things of that nature. Right. So they've done some really good fundamental work in 2024 and I'm expecting that 2025 they're going to start taking better use of the compute power going up as well as the fact that now they've learned so much the challenge with a really, really small model and as you said earlier this year, the small models will get better than where they were last year and so on, so forth.

So we are seeing that that'll get incrementally better. But the fact that you're picking a small model to cover news articles from every domain, that is a challenge, right? If you're asking a small model to do a bespoke piece of domain expertise, that works really well when we deploy this for our clients. But on an Apple phone you're expecting it to understand the nuances of negation and things of that nature. On a news article that could be around biology or it could be around some politics or sports and things of that nature. Right? It needs to have the understanding of every term that's used in golf that's different from the way you talk about it when you talk about soccer, right? Soccer with the football, things of that nature. So you do need a larger model to the summary.

But that's the balance there that they're trying to make. And I think they will catch up in 2025. But 2024, fundamental work that they did was, was I think I really agree with Shobat here, like the foundational work, how to think about UI integration, how to think about on device processing and also the offload and then also how the cross data domain integration, understanding maps, understanding your calendar, understanding your email, all that foundational work, I think it's incredible what they did. And so my expectation is that we will get aikit2 where also people can bring their own adapters.

Right now you can't. But that's just the next logical step because you cannot have like 20 big models live on a phone because you just don't have the memory capacity. And so the next logical step is like, okay, I can take the Apple model and I can fine tune it for my specific domain and I can load my adapter into it so that I can bring new AI capabilities on device but have shared base weights. And so this is where I think Apple did this foundational work by saying, hey, we are providing this as part of the operating system that, you know, people can build on. And this is, I think, their strength. So they will, they will do it through the ecosystem, play and give access to it.

But, you know, Apple always starts with the walled garden. You know, nobody can do anything until they figured it out by themselves, until they enabled all the applications. And then it will become kind of obvious how, how you build this and then we'll run it on our digits, you know. Right. Yeah, exactly. So for our last segment, let's do a little final round the horn. Sam Altman on his personal blog put out a reflections blog post looking back at the last two years of ChatGPT. There's a lot in it. It's a very long blog post.

I think the big thing that came out of it for me was really just the degree to which Sam still really believes in AGI as the mission of OpenAI. He hits on it multiple times and it's still the big thing he's rallying the company towards. But I kind of want to get the view of all of you on the panel on what you thought was surprising, what you thought was interesting. Shobha, I'm curious if you have any thoughts on the blog post and if there's anything that you thought was surprising or kind of worth it for people to pay attention to.

Yeah, so he talked a lot about AGI and I think we as a community have not agreed to what should be the levels of defining what AGI is. So I think we need to do a better job before we can even evaluate people's opinions on whether AGI is achievable or not. If you don't agree on a definition of artificial general intelligence between even humans, how do you even evaluate human intelligence? 10 kids in a classroom or in high school or in college, it's very difficult for us to have a good measure for that. So the community in 2025 needs to have better definitions, just like we did with autonomous driving, different levels. And here are the scenarios, here are the test cases.

We should do a little bit better job of defining that before we can even evaluate if Sam is really telling a truth about how far we are from AGI. Yeah, for sure. Skyler, Thoughts. Hot takes. Opinions. Yeah. Slightly humorous take. I had forgotten that he was fired, then hired back. So I kudos to the PR team for that. And it wasn't until reviewing the blog that I had that trigger again. You're like, oh yeah, he was briefly not CEO exactly.

I had forgotten about that. And so I guess that was. If you're talking for a hot take of reflection reading that, that's probably what jumped out at me is it just triggered that memory again. So yeah, that's my hottest take of that. That's right. Yeah. That's such a funny thing because that was such a big story and I had a very similar experience where I was like, oh yeah, that was last year. So last but not least, Volkmar. Curious if you've got any takes. Yeah, I think it's a mix of both. So I think having been in startups and venture capital for more than 10 years, you know, I can feel for the pain he is going through and you know, the ups and downs and you know, getting fired from your own company is really not fun.

But I think that it's really interesting to see the, the product evolution they are going through. And you know, he's pointing this out like, you know, we, we did ChatGPT and we released this thing into the wild and you know, it's the fastest growing consumer product ever. Right. So that's really amazing to see that how AI took off. I think in the end OpenAI created this new wave. They took the risk, they figured it out. Kudos to him. And now it's really the question they have this really big North Star of we want to get to AGI. And if you look at the like, you know, 2024 with 01 where they actually say we want to get to, you know, human level reasoning and they are still innovating.

And it's really impressive that, you know, if you look at OpenAI, they are clearly the leader I think in this industry right now. They are defining, you know, the next steps. And I think it's part of Sam's vision to say we want to get to AGI in a human scale time frame. And we, you know, every time they're releasing a new product like, wow, this is possible. I think they are still driving the industry and everybody else is a follower. So that's really impressive. Yeah, I love that.

Yeah, I think that was one big reflection on the blog post was just this guy who's running this company seems himself kind of surprised about how fast things are moving. He's like, oh yeah, wow, we're doing this thing. It's only been two years and it's very fun to see that even he is continually confounded by how things are happening. Well, great. Well, thanks for joining us. Showbit as always, Volkmar as always. And Skyler as always. It's a pleasure to all have you on the show. And thanks for joining us, all you listeners out there. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify and podcast platforms everywhere. And we will see you next week on Mixture of Experts.

ARTIFICIAL INTELLIGENCE, TECHNOLOGY, INNOVATION, NVIDIA DIGITS, AI TOOLS TRUST, APPLE INTELLIGENCE, IBM TECHNOLOGY