ENSPIRING.ai: Fireworks Founder Lin Qiao on the Power of Small Models to Democratize AI Use Cases

ENSPIRING.ai: Fireworks Founder Lin Qiao on the Power of Small Models to Democratize AI Use Cases

The discussion with Lin Tiao, founder and CEO of Fireworks, dives into the intricacies of building AI infrastructure on top of PyTorch, stemming from her experience at Meta. Lin lays out the challenges faced during the extensive process of supporting Meta’s AI workload, emphasizing the need to rebuild the AI stack from scratch on PyTorch to handle efficient data loading, distributed inference, and training. Her journey illustrated the underestimation of timeline and complexity when transitioning frameworks and the massive scale achieved by PyTorch implementations.

Exploring the motivations behind Lin's latest venture, Fireworks, she reveals their mission to significantly shorten the time to market for AI systems. By leveraging PyTorch's simplicity and robust community, Fireworks aims to capitalize on funnel effects where research models naturally transition into production applications. Lin passionately discusses the platform's focus on reducing complexity for their users, achieving lower latency, and providing tailored high-quality AI model customization through automation.

Main takeaways from the video:

💡
Transitioning AI systems onto PyTorch proved to be a long and complex task necessitating a complete rebuild from scratch.
💡
Fireworks aims to accelerate AI market readiness by reducing complexity and time to market, emphasizing customized, automated solutions.
💡
The simplicity of PyTorch is highlighted as a core reason for its dominance, allowing easy model creation and seamless transition to production.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. inference [ˈɪnfərəns] - (noun) - The process of making predictions or decisions based on a model or algorithm, often used in AI to mean drawing conclusions from data. - Synonyms: (reasoning, deduction, conclusion)

When we left, it was sustaining more than 5 trillion inference per day.

2. scalability [ˌskeɪləˈbɪlɪti] - (noun) - The capability of a system to handle a growing amount of work or its potential to be enlarged to accommodate that growth. - Synonyms: (expandability, flexibility, adaptability)

But we realized, we thought it's just a six month project. It turns out to be a five year project for us to support entire matter's AR workload building on top of Pytorch.

3. agility [əˈʤɪləti] - (noun) - The ability of an organization or system to rapidly adapt to changes. - Synonyms: (nimbleness, quickness, dexterity)

It's smaller and it's easier to tune, easier to improve quality, easier to focus on specific problem space.

4. customization [ˌkʌstəˈmɪzeɪʃən] - (noun) - The action of modifying something to suit a particular individual or task. - Synonyms: (personalization, adaptation, alteration)

We can get to very low latency for real time applications, very low cost for sustainable business growth and customization, automated customization for tailored high quality for enterprises

5. latency [ˈleɪtənsi] - (noun) - The delay before a transfer of data begins following an instruction for its transfer. - Synonyms: (delay, lag, wait time)

We can get to very low latency for real time applications, very low cost for sustainable business growth and customization, automated customization for tailored high quality for enterprises

6. hypothesis [haɪˈpɑːθɪsɪs] - (noun) - A proposed explanation for a phenomenon, used as a starting point for further investigation. - Synonyms: (theory, supposition, assumption)

They take those research models and test it out for production setting and try to validate that the hypothesis and then feed into production.

7. relentless [rɪˈlɛntləs] - (adjective) - Unyielding or inflexible in maintaining a course of action. - Synonyms: (persistent, unremitting, ceaseless)

It is a relentless journey to focus on simplicity.

8. viability [ˌvaɪəˈbɪlɪti] - (noun) - The ability to work successfully or to be practical and achievable. - Synonyms: (feasibility, practicability, sustainability)

Low latency is a critical part of product viability without that they don't, it's not a viable product.

9. asymptote [ˈæsɪmpˌtoʊt] - (noun) - A line that continually approaches a given curve but does not meet it at any finite distance. - Synonyms: (limit, boundary, terminal point)

Do you think that we're going to go into a phase where capabilities have started to mature or asymptotes, and the race is more about the optimization and tuning and application of those capabilities.

10. embrace [ɪmˈbreɪs] - (verb) - To accept or support something willingly and enthusiastically. - Synonyms: (adopt, welcome, accept)

And it feels like yours is a business where you have embraced an enormous amount of complexity to make life simple for your customers

Fireworks Founder Lin Qiao on the Power of Small Models to Democratize AI Use Cases

We thought replacing other frameworks as library with Pytorch is simple. Just swap the library. How hard can that be? But we realized, we thought it's just a six month project. It turns out to be a five year project for us to support entire matter's AR workload building on top of Pytorch, because we have to rebuild the whole entire stack from scratch, from ground up, because we have to think about how to load data efficiently, how to do distributed inference in PyTorch efficiently, how to scale training efficiently, and then we end up rebuild the whole entire inference and training stack on top of Pytorch. When we left, it was sustaining more than 5 trillion inference per day. So that's a kind of massive scale by two. Sapphire and fireworks mission is to significantly accelerate time to market for the whole entire industry, compressing it from five years to five weeks or even five days time to market. So that's our mission.

Joining us today is Lin Tiao, founder and CEO of Fireworks. Lynn is an AI infrastructure heavyweight who previously led Pytorch at Meta, which is the backbone of the entire global machine learning ecosystem. She's taken her experiences at Pytorch in order to build fireworks, an inference platform for generative AI. We're excited to ask Lynn about the market trends behind AI inference and how she plans to support and even accelerate the market shift to to compound AI systems at fireworks.

We're thrilled to have Lynn, CEO and founder of Fireworks, with us today. Thanks for joining us, Lynn. Thanks for having me. We're really excited to talk about a lot of things with you today, from Pytorch to the small model stack that you're building to what you're seeing in terms of enterprises building, production deployments. But before we get there, can you maybe say a sentence or two on what you're building at fireworks?

Yeah. So we started fireworks in 2022. Fireworks is a SaaS platform first and foremost for general AI inference and high quality tuning. Especially using our small model stack. We can get to very low latency for real time applications, very low cost for sustainable business growth and customization, automated customization for tailored high quality for enterprises. So that is fireworks.

Wonderful. I want to maybe start with the Pytorch story. Pytorch is kind of the foundation upon which the entire AI industry runs today. And you and Dima and some of your other co founders were integral and you know, and leaders of that project at Meta. So think about Pytorch as a programming language for digital brains, okay? And it's designed to, for researchers to very easily create those digital brains and experiment with it. But the challenge of Pytorch is it's very fast for people to create various different deep learning models, the digital brains, but the brains don't think fast enough. So that's a challenge I took on to address while I was at Optorch.

And you've mentioned before that most of the companies that are trying to build something similar to what you're building in fireworks have chosen to be framework agnostic, whereas you very much made a big bet on Pytorch. Can you say, why make the big bet on Pytorch and what benefits that brings to your customers?

That is really based on what I see when I operate PyTorch and matter also across the industry, and I clearly see a furnace effect that Pytorch is because it started as a tool for researchers, it starts to dominate the top of the, for model creation, and then the next stage of furniture is people doing applied production work. They take those research models and test it out for production setting and try to validate that the hypothesis and then feed into production. So that's clear funnel effects that's happening. And as Pytorch is the right researcher, it take over the top of the furno and it's really hard for people to rewrite into other frameworks for production. And naturally it just flow down towards the bottom of the funnel and that's how Pytorch become dominant. And I'm start to see more and more models, especially more nascent models are all built in Pytorch and run in Pytorch in production, including the general AI models. That's why we only bet on Pytorch and we don't want to distract also to support other things.

So researchers like it and it flows downstream from there. What do researchers like so much about Pytorch? Simplicity. Simplicity scales. And that's kind of lesson learned through the journey of Pytorch and Medoc and also building out the community. It is a relentless journey to focus on simplicity. And we have a constant seeking how to make the user experience simpler and simpler, and hiding more and more complexity in the backend.

For example, when I started this journey at Mata, there are three different frameworks. Caffe, two for mobile, onnx for server side production, Pytorch for researchers, too complicated. And the mission is to reduce three frameworks into one to simplify. But it's actually a mission impossible after I consolidate all three teams and there's no consensus to how to simplify and build this one stack. And we took a very idealistic approach and take the Pytorch frontend and take the cafe two backend. And we said, we're going to zip them together. It seems simple, but it's very hard to do because these two frameworks were never designed to work together and the integration complexity is even much higher than build a framework from scratch. So too complex.

And then we said, forget about it, we're going to all yum high torch, keep it beautiful, simple frontend and rebuild the backend. So we build torch script. That's Pytorch 1.0. So that's really like the key focus on simplicity wins over time. The other interesting thing is we thought replacing other frameworks as library with Pytorch is simple. Just swap the library. How hard can that be? But we realized we thought it's just a six month project. It turns out to be a five year project for us to support entire matters AI workload building on top of Pytorch. Because we have to rebuild the whole entire stack from scratch, from ground up, because we have to think about how to load data efficiently, how to do distributed inference in Pytorch efficiently, how to scale training efficiently, and then we end up rebuild the whole entire inference and training stack on top of Pytorch. When we left it was sustaining more than 5 trillion influence per day. So that's a kind of massive scale by two to five years.

And the fireworks mission is to significant accelerate time to market for the whole entire industry, compressing it from five years to five weeks or even five days time to market. So that's our mission.

Maybe when you look at open source standards, there's a lot of people that are trying to do it on using VLLM or tensor or TLLM. How do you think about how fireworks compares to what's in the open source? I really like both projects. And because my heart is in open source, based on Pytorch experience, I would say both projects are great projects for the community. I think our biggest differentiation is first of all, fireworks off the shelf is faster than both of the offerings. And second is we're building a system. We're not just a library. And our system can auto tune towards our developers or enterprise workload to be much much faster and to be much much higher quality. And that cannot be achieved by just library. And we are building all this complexity going back again to our journey of Pytorch. We are providing a very simple API, but hiding a lot of automation. The complexity automation complexity of auto tuning behind the scene.

For example when we deliver our inference with high performance, high performance here means low latency and low cost. We handwritten Cuda kernels. We implemented distributed inference across nodes and this aggregated inference across GPU's where we chop models into pieces and scale them differently. We also implemented semantic caching where given the content, we don't have to recompute and we capture application workload patterns specifically and they'll build into our inference stack. Yeah, we have many other, many other optimization. We have been specific design for different use cases that is not like general purpose or horizontal. So that is being encapsulated. We also have complexity optimization for quantization. You can think about, oh, quantization is just one technology, how hard can that be? But you can quantize so many different things. You can quantize KV cache, you can quantize waste, you can quantize communication across GPU's, across nodes and yield different performance gain and quality trade offs.

We also automate quality optimization. There are many things we are doing behind the scenes too. Deliver a very simple experience to the app developers, so they innovate, disconcentrate their cognitive bandwidth to innovate. On the application side, I liked your comment earlier about simplicity scales. And as you're talking through everything that you've built to make this such a simple and delightful experience for your customers, it reminds me of the idea of conservation of complexity. Like the amount of complexity required to deliver any given task can be neither created nor just, nor destroyed. It's just a question of who takes the complexity. That's right. And it feels like yours is a business where you have embraced an enormous amount of complexity to make life simple for your customers. And actually my question is about your customers. So where in the AI journey of your customers, where in their AI journey do they say, wait a minute, we need something better and then what brings them to you?

Yeah, so we've seen pretty consistent pattern that last year people all started for OpenAI because they are in the heavy experimentation exploration mode. Many startup, they have some creative ideas, application product ideas, and they want to explore product market fit. So they want to start with the most powerful model where open air provides and then when they feel confident they hit product market, but they want to scale their business. And then the problem comes in because as I mentioned, most of the JNL application there b two c consumer person or developer facing it requires very high responsiveness. Low latency is a critical part of product viability without that they don't, it's not a viable product. People are not patient enough to wait for half a minute for response. That's not going to work. So they are seeking, actively seeking low latency. And then another key factor is they want to build a sustainable, viable business. They cannot bankrupt quickly. And the weird thing is, not in this market, they can't.

The weird thing is if they hit a viable product, that means they can scale quickly. And if they're losing money at a small scale, they're going to bankrupt quickly. So bring down the total cost. Ownership is critical for them. So that's why they come to us.

So it sounds like, I remember you had this insight a year or so ago and we spoke about training tends to scale in proportion to number of researchers that you have, whereas inference tends to scale in proportion to the number of customers that you have. And in the long term probably going to be more customers of AI products than AI researchers out there. And therefore inference is the place to be. It sounds like you're kind of the customer journey sort of begins as people are going from training into inference. What sort of applications, what sort of companies are at that point where they're starting to really go into production. There's so many ways to answer this question.

It's a very interesting question. So first of all, my hypothesis when I start a company is we're gonna take our startups first because they are most tech advanced. There will be tons of startup built on top of JNAi. Then we will go to digital native enterprises because they are tech forward, and then we will go to traditional enterprise because they are tech conservative. They want to observe and adopt when the technology and the product ideas are mature.

Thats my hypothesis and it totally blew my mind what's happening right now because we have a lot of inbound of startups, we are working with digital native enterprises. We're also simultaneously working with traditional enterprises, including health insurance company, healthcare companies and banks. And especially for those traditional enterprise, usually I adjust my pitch to be very business oriented because hey, you know, that's kind of my, maybe my bias and kind of want to strike a meaningful conversation with them, but they quickly dive into very low level details, technical details with me, and it's very, very engaging.

What are the people doing like at a traditional enterprise? Who are the people that you're engaging with? So we like is it innovation person, AI person, or is it more business line leader, somebody who owns a production application? Yeah. So I think it's start to shift. We are more engaging start with ctos. I feel like this business is shifting towards innovation driven business transformation, and that's why we encounter more ctos than like the CEO's or other cisos. So that's kind of interesting shift, but yeah, across the board, I think there are multiple fundamental reasons why that's happening. That's my hypothesis.

One is all the leaders realize current genius wave is similar to the cloud first shift, or similar to the mobile first shift, is going to really map the landscape of industry. Startups are growing really fast and the incumbents feel threatened if they are not innovating fast enough. They will be obsolete, irrelevant, but also across the income, they are heavily competing with each other. They're competing how fast they kind of transition their business to be, to create more revenue, to be more efficient using generali. So that's one phenomenon.

The second phenomenon is general AI is different from traditional AI. I would say kind of. This is like very different. Traditional AI is given a lot of power to hyperscalers, right? Because traditional AI is you always have to train from scratch. There's no concept of foundation model. You build on top of, and that means you have to go off and curate all the data. And the data rich company, usually they are hyper scalers, and you need to have a lot of resource investment to train your own models and so on.

So that is before Jnai. It's less affordable, it's concentrated in hyperscalers postgres. Because of this concept of foundation models. People build on top of foundation models and you don't train from scratch. It's not meaningful. It's all the same data. It's all the Internet data you can crawl. It's a more or less similar model architecture. It's a waste of resource if you train from scratch. Instead, you fine tune. You tune based on your small data set, high quality, small data set. So it becomes a small data, small model problem. And it makes it so much affordable to everyone to assess this technology, and that's why everyone's jumping in to embrace it.

How many of your customers are using you for fine tuning versus just using base model? And what do you think goes into building a great fine tuning product? It really depends on the problem they're trying to solve. We actually see the open source model is becoming better and better. The quality difference between open source model and the closed source model are shrinking. And my prediction is going to converge at the same model buckets, same model size. If you go open and closed will converge.

The open and the closed will converge. Do you think there will be a time lag where closed is always six months ahead? Or do you think they'll just be neck and neck for same model size, especially like between seven to 70 billion or even within 100 billion model size, the quality will converge. That's my prediction. We'll see after a couple of years and we will come back to this podcast and see how it goes.

So the key here is customization. Given if this trend is true, then the key differentiation is how we customize those models towards individuals, use case towards individual's workload. And is it easier to customize an open source model than a closed source model? Or how does that kind of thing go? So I would say it's easier. It's just open source model tend to have a much richer community and there are a lot more people working on building on top of those models.

For example, a llama three model, it's a very, very good base model. It is very strong in instruction, following employee instruction very well. So it's very easy to align the model to solve a specific problem really well. And for example, we have been investing in function calling strategically as a direction. We can talk more about that, it's own topic by itself, but we find fine tune a function calling model on top of lambda two is so much easier compared with fine tuning based on mixture models or previous lambda two and I and other models. So that's just kind of the base model. Open source based model is becoming very, very strong in structural following, in logical reasoning, in many other base capabilities.

So it's very easy to warf it to become a high performance model for solving specific business tasks. That's the power of small models. If we think about just open source software, open source infrastructure software. 20 years ago, open source was thought of as a fast follower sort of thing, red hat being a canonical example. And then more recently, open source is not the fast follower, it's actually the innovator. And you think about Mongo or confluent or some of these other great open source businesses that have been created. Do you think there are areas in the world of models where open source is actually going to lead closed source and is actually going to be ahead of the proprietary models?

So I think the dynamics is very interesting right now because the proprietary or closed source model provider, they're batting on very few models. So OpenAI's LM models are like maybe three models, or you can think about as one model, because what is a model model is the model architecture and data training data that defines the model. So I'm pretty sure they use all the models. They have more or less similar training data. Model architecture is more or less similar. So it's kind of scaled all parameters and so on.

It's not just open eye, I think anthropic or mistral, and also all these kind of model builders, they have to concentrate the effort to focus on a specific model segment of that's kind of the best. All right, that's a business model. But open source push a different dynamics because it enables so many researchers to build on top of it. So that's kind of the small model phenomenon. It's smaller and it's easier to tune, easier to improve quality, easier to focus on specific problem space. So it enable thousands of flowers to blossom. Thousands of flowers blossom.

So that's the direction we believe in is to solve an enterprise problem. A thousand flower blossom is much better for enterprise because you just have so many problems and I bet at the given, at any problem space there is a solution for you. And we further customize towards your use case in your workload. And what you get is better quality, much lower latency for real time application, much lower cost for business sustainability and growth. So we believe in that direction maybe to that point. Have you seen your customers so far are able to match the quality that they got with OpenAI when they move over to the fireworks stack?

And how are you enabling what I call the small but mighty stack to compete? Yeah, so it really depends on domain. So for some domain, actually people don't even fine tune, they use an off shelf model as is, and it's already very very very good. For example, in the domain of coding, copilot cogeneration, transcription, translation, OCR I. It's just phenomenal. Those models are really, really good.

So that's kind of off shelf and ready to go. But for some areas it requires business logic because every company is defining what is good differently. And then of course off the shelf model will not work off the shelf because they don't understand the business logic. For example, classification, different company want to classify. Hey, you know, some marketplace want to classify whether it's a furniture or it's a dish or it's something else. That's completely depends on their domain or summarization.

You think summarization is a very general task, but for example, insurance company want to summarize into very special template, right? So, and there are specific business tasks on. Yeah, on many other things we just kind of work with across the board various different problems. And those requires fine tuning. And I want to call out fine tuning sounds simple, but it's actually not simple at all. So the end to end requires enterprise or developers to collect data. First thing to trace after traced into label, after the label, they need to pick and choose which fine tuning algorithm to use.

There's supervised fine tuning, there's DPO, there's slew of preference based fine tuning, as in they don't label absolute good result. They basically say, I prefer this over that. They need to pick whether they want to use parameter efficient fine tuning like Lora or full model fine tuning. And for some tasks they need to tune hyper parameters, not just the model weights itself. So among these many technology, they have to kind of figure out when to use what and so on. It's very deep. And usually those app developer, they haven't even touched AI yet and there's lot for them to pick up.

And then once they've, and they test it, it's still improving some dimension if it's still not good in some other cases, and then they need to capture those failure cases and analyze should I collect more data and go through this cycle again? Or it's actually a product design, right? It's very interesting, some failure cases, not really failure cases, just they haven't designed what the product should react. For example, people are building assistant to auto generate content when people type. And if you're in a table and your cursor is in a cell, and what does auto generate mean? Do you auto extend what you type in the cell? Do you generate more roles or you do nothing. So it's actual product design.

So that requires a PM to be in the loop to think about the failure cases. So with all this complexity, what we want to do is take away the rudimentary stuff, take away the complexity of figure out out which tuning approach to use, how to automatic label data, how to automatically clock data from production, want to take away all this and keep a simple API for people to use, but leave the design part to our end user. For example, how the product should respond should completely in their realm to figure out and solve so that we want to create that separation. And we started working in this direction and hopefully we'll announce our product there soon.

I love that you're liberating people to not have to think from the tech out and to actually think from the customer back and sort of use all the stuff that you've built to deal with the underlying technology and really focus on your point, the design patterns and the usability and making sure that they're actually solving an important problem. End to end in a compelling way. What is your vision for the fireworks platform? And to Pat's point earlier on conservation of complexity, we started this podcast talking about how you're conserving complexity for your customers.

On the inference stack you just now talked about how you're conserving complexity for your customers in terms of the fine tuning workflows. What are the other pieces that have to come together, and what is your ultimate vision for what fireworks the product is? If everything works, five years from now, what will you have built?

So what we like the North Star for fireworks is the simple API access to the totality of knowledge, right? So right now we're building towards that direction. We already have more than 100 models we provide across large language models, image generation models, audio generation models, video generation models, embedded models and multi modal models as image, as the input to extract information. So that's kind of one side of the foundation model coverage.

But put all the foundation model together, it still have limited knowledge because its training data is limited. Its training data has a starting time, ending time. All the information they can crawl on the Internet is still limited because there are a lot of knowledge that hidden behind APIs, hidden behind the public APIs that you dont have access to, or you just cannot get real time information. There are a ton of private APIs hosted within enterprises, no way anybody will have access outside of the companies.

So then how do we get access to? Totality of the knowledge for the enterprises is to have a layer to blend across many different models and public private APIs. So that's kind of the vision and the tool to the vehicle to get there is function calling, is the function calling model. Basically this model is capable of understanding. Here are the APIs you want to access and for what reason. It can automatically be the router to most precisely call out to those APIs, whether it's accessing models or accessing non model APIs in the most accurate way.

So think about strategically, that's extremely important to build this simplified user experience, because then our customer, they don't need to figure out, they don't need to scratch their head and figure out, oh, I need to fine tune to be able to access those APIs. And how to even do that myself is kind of a tall order for me to do. So. We want to basically, you can think about that because many people are familiar with a notion called mixture of expert.

So OpenAI is providing mixture of expert, and mixture of expert becomes a very popular model architecture. The concept has a router sit on top of few very big experts, and each is specialized in doing its own thing. And our vision is we want to build a mixture expert that access hundreds of those experts. And each of those experts are much smaller, much agile, but with high quality of solving specific problems real quick.

Do those experts live in fireworks, in AWS, in hugging face? Where do those experts come from that get put together with fireworks as the overarching framework? Yeah. Our ambition is those experts living fireworks. That's where we want to curate. Curate towards curated models. We serve towards that. And that's why today we already have more than 100 models. And it will take some time to build this layer in a very solid way. But we're going to release our next generation function calling model. It's really, really good. A little preview on that. It has multiple layers of breakthroughs and we're going to announce it together with demos and example and people can leverage and build on top of very cool.

Do you see any viable competition for Nvidia on the horizon? That's a very interesting question. First of all, I think Nvidia is operating a very lucrative market. Any lucrative market invites competition. This is just the economics here. And also from the whole entire industry point of view, in general, industry doesn't like monopoly, so that's kind of another trend like pressure coming from the industry.

So I think it just, it's not a question whether there will be competition to Nvidia, it's just a question of when. Do you think it's coming soon. I think it's coming soon. I think it's coming soon. I think, I mean, obviously we can look at Nvidia competition in multiple segments, on the general purpose competition segments, GPU, that MD is coming up. That's interesting. And I think also in a specific AI segment where the AI model space is stabilized, there's no more innovation. Like the problem is well defined and this is the model, then customization will have its own role. So I think kind of, I will look at the market that way, and I do think there will be competition coming soon.

Can I ask you about that, by the way? Because you guys are in this part of the market where you are model agnostic to some degree, and it's really about the optimization of those models when it comes to put them into production. Do you think that the returns to scale on the frontier, the models that are out on the bleeding edge, do you think the returns to scale are starting to slow down? Do you think that we're going to go into a phase where capabilities have started to mature or asymptotes, and the race is more about the optimization and tuning and application of those capabilities.

I think both will happen at the same time. One is it will start to stabilize in part two in the model applicability point of view and will heavily customize. Our strategy is heavily customized towards the use cases and workloads. So that's one direction. And the second is I want to caution that because at meta we also think for certain period of time that is a model for ranking recommendation and we should heavily index on that assumption. But then after a few years it's not the case. There's significant amount of model innovation in seemingly stabilized modeling space and that pushed its curve. Formata I think same phenomenon will happen in the JNI space, that a new model architecture will happen and we're kind of overdue.

We talk about competition from other, other vendors, other direct competitors. What about OpenAI? Does OpenAI keep you up at night? Like they drop prices on their APIs all the time? They're making their models, they're also trying to win the better, faster, cheaper race. How do you think about them and how do you think about ultimately what you're going to build that's different from where they are going?

Right. So again, I feel like for the, they are actually going smaller and cheaper. I think for the same model size, for the same model bucket, whether it's closed source or open source, the quality is going to converge. That's again, that's my prediction. And the real meaningful thing is to push the boundary here. Heavy customization or automated customization tailored towards individual use case and individual workload. I'm not sure if OpenAI has the appetite to do it because their mission is AGI. If they hold their mission, which is a great mission actually, but it's kind of solving a different problem than solving an enterprise problem, which basically means there are a lot of problems, a lot of specific problems that is really good for the small models to customize towards. And that's where we want to focus on energy and build on top of open source models, assuming the trend that they are going to converge in quality.

Love that our partner Roloff, last time you were here, made the point that in prior technology waves, Internet mobile, it was the people that did all the hard work driving down the marginal cost of running this stuff that actually enabled all the application development on top and all the end use cases that we get to enjoy every day. And I love that you are taking that exact approach with AI where it's still so cost prohibitive for most people to run in production. And by just dramatically bringing down that cost curve, you're actually in the whole industry blossom. So it's really wonderful. Should we close out with some rapid fire questions?

Yeah, let's do it. Okay. Anyway, first. No, go for it. Okay. Let's see. Favorite AI app. We do a lot of video conferencing, and the nose taker for video conferencing is a game changer for us, whatever it is. There's so many different varieties, but I just love them. Which one do you use? I think we use fathom. Yeah. Our sales team use that. It's really good for training and also summarization. Short time. Nice.

What will be the best performing models in 2024? My prediction is there will be many, given the rate that every week, every week, there's a new model coming up and, says arena, they keep competing with each other. So this is all good news for the whole entire industry. It's really hard to predict which one, but the one prediction I'm pretty confident is the model qualities will keep improving and keep increasing.

In the world of AI, who do you admire most? I would say meta. It's not one person, but that Mata's commitment to open source. I think matter is the most brilliant in the journey of J and I by continuous open source series of lama models, and continue to push a boundary, continue to kind of shrinking the quality differences. So. Okay, so what matter is doing is basically decentralized power for the hyperscalers. To everybody who has a dream to innovate on foundation models, JNL models, I think that's. That's really brilliant. Love that.

Okay. Will agents perform or disappoint this year? I'm very bullish on agents. It's. I think it's gonna. It's gonna blossom. That's all we got.

All right, thank you. It's really fun to have this conversation. Thanks for having me. Thank you for joining us.

Ai Infrastructure, Innovation, Technology, Pytorch, Generative Ai, Performance Optimization, Sequoia Capital