ENSPIRING.ai: Trust, reliability, and safety in AI ft. Daniela Amodei of Anthropic and Sonya Huang
This engaging discussion with Daniela, the co-founder, and president of Anthropic, presents an overview of how the company focuses on creating powerful generative ai tools centered around human values like trustworthiness and reliability. Daniela shared insight into Anthropic's differentiation in a saturated market of foundational models, highlighting their unique practices such as the use of constitutional ai for aligning models with human values and making their products approachable for large enterprises.
Anthropic's model family, Claude, offers a range of AI tools suitable for varied uses across industries. From cutting-edge applications in scientific research and complex code development to customer support and enterprise tasks, Anthropic adapts its model capabilities to fit the wide array of needs. Danielle discussed Anthropic's focus on enterprise clients but acknowledged the innovative uses emerging from startups, pointing out their continuous growth in the market by responding to industry challenges and customer needs.
Main takeaways from the conversation include:
Please remember to turn on the CC button to view the subtitles.
Key Vocabularies and Common Phrases:
1. generative ai [ˈdʒɛnəreɪtɪv ˌeɪ.aɪ] - (noun) - A type of artificial intelligence that can generate new content or data from preexisting models. - Synonyms: (creative AI, model-based AI, AI generation)
We are a generative ai company that is really working to build powerful, transformative, generative ai tools that really have humans at the center of them.
2. constitutional ai [ˌkɒn.stɪˈtuː.ʃən.əl eɪˌaɪ] - (noun) - A method used in AI design whereby AI models are trained or guided by sets of documented ethical guidelines or standards. - Synonyms: (AI ethics, AI alignment, AI governance)
We pioneered a technique called constitutional ai, which really enables the models to incorporate documents like the UN declaration of human rights.
3. enterprise [ˈɛn.tə.praɪz] - (noun) - A large company or business, especially one involved in commercial activities. - Synonyms: (corporation, business, company)
For enterprise businesses, so large businesses in particular, I think, have really resonated with our approach.
4. product market fit [ˈprɒd.ʌkt ˈmɑːkɪt fɪt] - (noun) - The degree to which a product satisfies strong market demand. - Synonyms: (market alignment, commercial success, market need fulfilment)
What are the use cases that you see that are already reaching real product market fit?
5. neuroscience [ˈnjʊərəʊˌsaɪəns] - (noun) - The scientific study of the nervous system, particularly the brain. - Synonyms: (brain science, neurology, cognitive science)
I think interpretability is to me, like the coolest and most exciting area of AI research today because it's fundamentally trying to figure out, like, what are these models actually doing? It's like the neuroscience of large models.
6. red teaming [rɛd ˈtiːmɪŋ] - (noun) - A practice where a group simulates an attack or tests for vulnerabilities to improve organizational security or performance. - Synonyms: (penetration testing, vulnerability assessment, security audit)
We do a lot of work in kind of the policy sphere and try and publish research results, papers, you know, red teaming, red teaming results as well.
7. mechanistic interpretability [ˌmɛkəˈnɪstɪk ˌɪntəˌprɛtəˈbɪlɪti] - (noun) - The field of AI research focused on understanding and explaining how complex AI models make decisions. - Synonyms: (AI interpretability, model transparency, explainable AI)
We have a team that focuses on something called mechanistic interpretability, which is essentially the art of trying to figure out what is happening inside the black box.
8. hallucination [həˌluːsɪˈneɪʃn] - (noun) - In AI, refers to instances where AI models generate incorrect or nonsensical content not rooted in the provided data. - Synonyms: (error output, AI misprediction, data anomaly)
The kind of prototypical one that's talked about is this hallucination problem.
9. accountability [əˌkaʊntəˈbɪləti] - (noun) - The obligation to accept responsibility for one's actions. - Synonyms: (responsibility, liability, answerability)
How do you balance innovation and accountability?
10. regulatory landscape [ˈrɛɡjʊlətəri ˈlænd.skeɪp] - (noun) - The structure and dynamics of regulations and policies affecting a particular field. - Synonyms: (regulatory environment, policy framework, governance structure)
Could you comment on how you see the regulatory landscape evolving?
Trust, reliability, and safety in AI ft. Daniela Amodei of Anthropic and Sonya Huang
We are thrilled to have our next speaker with us. Daniela is the president and co-founder of Anthropic, which recently just launched the really impressive Claude three model. Please welcome Daniela in conversation.
Thank you so much for being here, Daniella.
You're welcome. Do I need a mic? Yes, you do. Here, take this.
Oh, that's so nice of you. Thank you. I think everybody in the audience is familiar with anthropic as probably a customer of yours, but can you just do a quick refresher for everyone in the audience about anthropic, the company? What is your mission? What's the future you imagine, and how are you building towards that future?
Sure thing. So, first of all, thanks so much for having me. Great to be with all of you today. So, I'm Daniella. I am a co-founder and president at Anthropic. We are a generative ai company that is really working to build powerful, transformative, generative ai tools that really have humans at the center of them. So we have a huge focus on building this tech in a way that is trustworthy and reliable. And we've been around for just about three years, a little over three years, and in that time, have been able to advance the state of the art across generative ai on a number of dimensions.
Wonderful. And what are the things that. What are the unique approaches that you're taking now that the foundation model space is getting very crowded? What are the things that make you uniquely anthropic?
I love that question. So, first of all, I would say there's a few different ways that I kind of, like, think about or interpret that question. One is really, how do we kind of differentiate ourselves at the model level? Right? What do we do when we're training the models, or how do we want the models to sort of have people feel when they use them? And here, what I would say is we really, again, thinking about this kind of commitment to trustworthiness, reliability of our models. We implement a number of different sort of technical safety approaches to help make the models really more aligned with what humans want them to be doing. So we pioneered a technique called Constitutional aIh, which really enables the models to incorporate documents like the UN declaration of human rights, the apple terms of service, to really make it more aligned with values of the sort of human race.
From a sort of business perspective, we really have tried to make quad as approachable as possible, in particular, for enterprise businesses. So large businesses in particular, I think, have really resonated with our approach because they also value models that are helpful and honest and harmless. Right. In general, very large enterprise businesses tend to be concerned about models that will hallucinate or say something very, very offensive.
Wonderful. Let's talk about use cases. I think one of the major questions people in the audience have today is where companies are finding the most product market Fitzhen and I think you have a unique vantage point on that from anthropic. What are the use cases that you see that are already reaching real product market fit? And what are the use cases that you think are on the come that are about to reach product market fit?
So I think it varies a little bit. First of all, just kind of depending on industry. So there's kind of some industries that I think are kind of quite advanced in generative ai. Unsurprisingly, the technology industry has been an early adopter. That's often how it goes. But I think something that has sort of been interesting for us to see is we just released this new sort of suite of models, the Claude three model. We call it the model family. And so the kind of biggest model, Claud three opus, is the kind of state of the art. We sort of joke it's like the Rolls Royce of the models. It's incredibly capable and powerful.
And really, what we've seen is not everybody needs the kind of top tier, state of the art model for all of their use cases. But the times when you do need it is when you need a model that is just incredibly intelligent, capable, and powerful. So things like if you're doing scientific research or you're trying to have a model write very complex code for you at a fast pace or do complex macroeconomic policy analysis, cloud three opus is a great fit for that. Claude three haiku, which is the smallest model. This is like the Ducati. It's sort of, the racing motorcycle is amazing for things like customer support. So really what we've seen in the industry is that speed and cost are very important for anything that kind of requires real time response rates.
And then Claude three sonnet, which is sort of that middle model a lot of enterprise businesses are using for things like day to day retrieval, summary of information, if they have unstructured data that they need to pull together and analyze. And so I would say it varies by industry, but it also sort of varies by use case and just how much ability customers have to kind of choose between what's available for them.
Wonderful. Can you share one or two of your favorite use cases that people have built on? Anthropic?
Yeah, for sure. I would say I'm like a do gooder at heart. So one of my favorite use cases is the Dana Farber Cancer Institute uses Claude to help with genetic analysis. So looking for sort of cancer markers, I think there's also much more kind of a sort of boring application, but there's a lot of kind of financial services firms like Bridgewater and Jane street that are really using Claude to help them analyze financial information in real time. I think I like both of those because they really just sort of represent such a wide spectrum. Right. I think it illustrates how truly general purpose these models are. Right. It's a model that can help you to literally try and cure cancer faster, but also to do sort of the day to day bread and butter of illegal services or financial services firms work.
Wonderful. Are you seeing more success in your customers finding product market fit from startups or from enterprises right now?
So I would say for anthropic in particular, we have really focused on kind of the enterprise use case. And again, this is really because we have felt such a resonance in approach for businesses that are interested in building in ways that are trustworthy and reliable, all of the things we've sort of been talking about. That being said, I think there's a ton of innovation that is always happening in the startup space. And so something that I think is really interesting to watch is sometimes we'll have kind of a startup sort of prototype something, and we'll say, like, wow, that's, you know, that's a really fascinating use case. Like, we wouldn't have thought that, you know, you could use Claude that way, and then that will become something that, like enterprise businesses sort of like later learn about because they know someone who works at that startup or they've kind of seen it in production. So my sense is for us personally, we're much more sort of, you know, building for and pivoted towards the enterprise. But I think there's really a wide, wide ecosystem of development that that's happening in the business space.
Wonderful. On the spectrum from prototyping to experimentation, all the way to production. Where do you think most of your customers are today on that journey?
Yeah, I think for this, I'll talk about enterprise and then startups because they're a little bit different. I think for enterprises, it actually ranges pretty widely. There's some businesses that I would even say have multiple production use cases, right, where they might be using Claude internally to analyze health records or help doctors or nurses analyze notes and save themselves administrative time so they can be with patients more.
But if they're a big company, they might also be using it for a chat interface. Right. So depending on the business use case, sometimes they have multiple use cases in production, but it's a little spiky. Right. There might be times where one of those use cases is, like, quite far along. They've already been in production for, like, a year. They really, like, know the question, right. They come to us and they're like, we really, really want to optimize, like, this metric, or we really care about price, or we really care about latency.
And then there's businesses all the way on the other end of the spectrum who come to us and are like, I've been hearing about generative ai, like, from my board. Can you help us understand? Is there a solution here? Right. And so I think it, it does, it does vary a lot, but I will say industries. I have personally been surprised that some industries that are not necessarily historically known for being early adopters, like insurance companies or financial services or healthcare, I think, are actually great candidates for incorporating this technology, and many of them have.
Wonderful. Let's move on to cloud three and research. Maybe you just launched cloud three, maybe tell us a little bit about what went into it and how the reception has been so far.
So, yes, we just a couple of weeks ago launched Claude three. As I mentioned, it's this sort of model family, right? So there's different models kind of available for different use cases, again, for businesses. And really, I think what has been so interesting is we've gotten great positive feedback about quad. Of course, there's always things that we're improving and wanting to do better.
But something that I have found really just interesting is customers have sort of simultaneously commented on how kind of capable and powerful the models are. Right. They're the most intelligent, state of the art models available on the market today. But people have also commented, hey, it's way harder to jailbreak these, or the hallucination rates have kind of gone down a lot.
And so there has been this kind of dual language around both capability and safety. And then the last piece, which I always find really interesting, is many customers have told us part of the appeal of Claude is that Claude feels more human. And so when people kind of interact with or talk to Claude, we've sometimes heard folks say it really feels like talking to a trusted person versus talking to a robot that was kind of trained to sound like a human.
I love that. And I think everyone here has seen all the eval charts. I think Claude really, one of the areas where it really spikes is in coding, where I think the performance is. Is just off the charts right now, maybe. Can you tell us a little bit about how you made the model so good at coding in particular, and then how you see the role, how you see AI software engineering playing out and anthropic's role in it?
So I think something that is interesting that I've learned from my research colleagues, so I don't sort of pretend to be an expert on this, is as the models just become generally more performative, they kind of, like, get better at everything. And so I think much of the same training techniques that we used to improve the models accuracy and reading comprehension and general reasoning were also used to improve its ability to code.
And I think that's something that, again, is kind of a fundamental, interesting sort of research thing, which is like rising boat sort of lifts all tides. That being said, there's a lot of variety in these models. And something I've always found interesting is certain models, people are like, I always use this model for task x at the consumer level, and other times folks will say, this model you absolutely have to use for task.
Why? So I think there is a little bit of almost pull through personality that happens with these kind of, regardless of the improvements, it's kind of a useful caveat in terms of what are people doing in the software engineering space, and what is the role of these models. I'm not a programmer, so I feel like I probably can't opine on this as well as others. But much of what we have heard from our customers is that Claude is a great tool in helping people who write code. So Claude cannot replace a human engineer yet, but it can be a great kind of copilot in helping.
Love that. Maybe more of a philosophical research question. How do you think about the role of transparency in AI research, especially as it seems like the AI field has become more and more closed?
Anthropic has always felt very strongly about publishing a large portion of our research. So we don't publish everything, but we. We have published something like two dozen papers. The vast majority of them are actually technical safety or policy research papers. And the reason that we choose to publish those are as a public benefit corporation, we really view our job as helping to raise the watermark, really, across the industry in areas like safety.
So we have a team that focuses on something called mechanistic interpretability, which is essentially the art of trying to figure out what is happening inside the black box, that is these neural networks. And it's a very emerging field of research. There's two or three teams in the entire world that work on it. And we really feel like there's a lot of opportunity when sharing that more broadly with the scientific community to just increase understanding around topics like that, particularly in sort of the elements of safety. So we've shared all of these research papers, and then additionally we do a lot of work in kind of the policy sphere and try and publish research results. Papers are, you know, red teaming. red teaming results as well.
Thank you. One of the big themes of today's event is trying to think about what's next. So I was hoping to ask, from your, from your vantage point, what are the biggest challenges that you see your customers facing or your researchers thinking about when they're trying to build with LLMs? Like, where are they, you know, hitting a wall? And how is anthropic working to address some of those problems?
So I think there's a few kind of classes of ways that these models are still sort of, they're still not perfect. Right. I think one big one is there are just fundamental kind of challenges to how these models are developed and trained and used. So the kind of prototypical one that's talked about is this hallucination problem. Right. I'm sure everyone in the room knows this, but models are just trained to predict the next word, and so sometimes they don't know the right answer, and so they just make something up. And we have made a huge amount of progress as an industry in reducing hallucination rates from, like, the GPT-2 era, but they're still not perfect.
I'm not entirely sure, like, what the sort of, like, decrease curve will look like for hallucination rate. Right. We keep getting better at it. I'm not sure if we'll ever be able to get models to zero. That is a fundamental challenge for businesses. Right. If your model is going to even very occasionally hallucinate for some of the highest stakes decisions, you probably wouldn't choose to use a model alone. Right. You would say, hey, we need a human in the loop.
And I do think something that's kind of very interesting is there's a really small set of cases today where LLMs alone can do the majority of the task. Right. Like they're best. Again, I think in tandem with a human for the majority of kind of use cases. I also think there's just sort of this interesting.
It almost feels a little more philosophical, which is just what are humans actually comfortable with giving to models? Right. I think part of the sort of human in the loop story is also about helping businesses and industries and individuals feel more comfortable with an AI tool making fundamental decisions.
Thank you for sharing that. A few of the folks here spoke about planning and reasoning. Is that something you all are thinking about at anthropic? And could you share a few words on that?
Yeah, definitely. So that can obviously mean a few things. So I think on the kind of dimension of how do you get these models to sort of like, execute sort of multi step instructions? Right. I'm assuming that's kind of what planning means. You know, it's really interesting.
There's a lot of research and kind of work that has gone into this sort of concept of, like, agents. Right? Like, how do you give the models the ability to, like, take control of something and execute multiple actions in a row? And can they plan? Can they sort of think through a set of steps? I do think that Claude three represented for us a leap between the last generation of models in its ability to do that. But I actually think that level of agentic behavior is still really hard. I think the models cannot quite do that reliably.
Yet again, this feels like such a fundamental research question that I don't know how long it will be until that's not the case. But I don't think it's the sort of, you know, the dream of, like, can I just ask Claude to book my flight for me? Like, please go book my reservation, hotel, just plan my vacation? I don't actually think that that's like, immediately around the corner. I think there's still some, some research work and engineering work that needs to go into making that possible.
Yep. Yep.
Okay, so the future is coming, but maybe not as quickly as we think the future is coming quickly. It's also coming choppily. It's a little unclear exactly which parts of it are going to come where. Okay, very cool.
Can we talk about AI safety for a moment? Anthropic really made a name for itself on AI safety, and I think you were the first major research institution to publish your responsible scaling policies. How do you balance innovation and accountability? And how would you encourage other companies in the ecosystem to do that as well?
So something that we kind of get asked a lot is how do you all plan to compete if you're so committed to safety? Something that I think has been really interesting is many fundamental safety challenges are actually business challenges. And rather than sort of thinking of these two as something that, you know, two sides that are kind of opposed to each other, I actually think the path to kind of mainline success in generative ai development runs through many of the safety topics we've been talking about. Right. Most businesses don't want models that are going to, like, spout harmful garbage. Right. Like, that's just not a useful product. The same thing is true, like, if the model refuses to answer your questions, if it's dishonest. Right. If it makes things up. Those are sort of fundamental business challenges in addition to kind of technical safety challenges.
I also think something we have really aimed to do as a business is sort of take the responsibility of developing this very powerful technology quite seriously. Right. We sort of have the benefit of being able to look back on several decades of social media and say, like, wow, much of what social media did for the world was incredibly positive, and there were these externalities that nobody predicted that it created, which I think are sort of now widely believed to be quite negative for people. So I think anthropic has always aimed to say, what if we could try and sort of build this technology in a way that better anticipates what some of those risks are and helps to prevent them? And the responsible scaling policy is basically our first attempt to do that. Right. It might not be perfect. There could be things about it that are sort of laughably wrong later. But really, what we've said are, what are the dimensions on which something can go wrong here?
Our CEO, my brother Dario, testified to Congress about the potential risks for generative ai to develop things like chemical and biological weapons. And what we've said is we actually have to do proactive work to ensure that these models are not able to do that. And the responsible scaling pact is really just a way of sort of saying, hey, we're committing to doing that work.
Thank you for sharing that. Let's see. Any questions from the audience?
Yes. Thanks so much. One of the things that I think was really awesome about the Claude opus release was that it was really strong specific performance in a few domains of interest. And so I was wondering if you could talk more about kind of like, technically, how you view the importance of research versus compute versus data for specific domain outperformance and what the roadmap looks like for where Claude will continue to get better.
Yeah, that's a great question. I think my real answer is that I think you're probably giving the industry more credit than it deserves for having some perfectly sort of planned structure between, like, we'll sort of, you know, research area X and, like, increased compute will improve. Why? I think there's a way in which training these large models is more a process of discovery by our researchers than kind of intentional, deliberate decisions to improve particular areas, to kind of go back to that rising tide lifts all boats sort of analogy. Making the models just generally more performative tends to just make everything better sort of across the board.
That being said, there is sort of particular targeted work that we did do in some sub areas with constitutional ai and reinforcement learning from human feedback, where we just saw that performance wasn't quite as good. But it's actually a smaller fraction than you might think compared to just generally improving the models and making them better.
It's a great question. Yes, Sam, I've been loving playing with Claude three Claude opus. It's fantastic, and I totally agree. It feels way more human to talk to. One thing I've noticed that it. That almost feels like a specific human, like it has a personality. And I'm kind of curious, as you guys continue to work in this domain and make other models, how you see the boundary of kind of like, personality development if people are kind of trying to create specific characters, is there kind of a stance you guys are taking from the constitutional perspective of the boundaries of how Claude can actually play a character other than itself?
So something that is really, I think, unusual about kind of Claude is just how, like, seriously, Claude will take feedback about its tone. Right. If you're like, Claude, you are. This is. This is too wordy. Like, please just be very factual and talk to me like, I am a financial analyst. Like, try it out. Claude will absolutely sort of adjust its style to be more kind of in that. In that sort of milieu. Or, hey, I'm writing, you know, a creative writing story. Like, please use very flowery language. Or talk to me like you're angry at me, or talk to me like you're sort of, you know, friendly or whatever.
I think there's sort of an interesting other thing you're asking, though, which is, like, what is the default mode that we should be setting these models kind of personalities to be? And I don't think we've. I don't think we've sort of landed on kind of the perfect. The perfect spot. But really, what we were aiming for was, like, what is a slightly wiser, better version of us? Kind of, how would they react to questions? Right. Some humility.
Oh, I'm sorry. I missed that. Or, thanks so much for the feedback. I'll try to do that better. I think there's kind of an interesting fundamental question, though, which is, as the kind of marketplace evolves, do people want particular types of chat bots or chat interfaces to treat them differently? You might want to coax a particular form of customer service bot to be particularly obsequious or. I don't know, there are just other potential use cases. My guess is that's probably going to end being the province of startups that are built on top of tools like Claude, and I think our stance might vary a little bit there.
But in general we've tried to start from a friendly, humble base and then let people tweak them as they go, within boundaries, of course.
Hey, so the developer experience on Claude and the new generation of Claude three models is markedly different than other LLM providers, especially the use of XML as like a prompt templating format. How are you thinking about introducing switching costs here, and especially in the long term? Do you want it to be an open ecosystem where it's very easy to switch between anthropic and your various competitors? Or are you thinking about making more of a closed ecosystem where, you know, I'm working directly with anthropic for all of my model needs.
So I think, I think maybe the best way to answer this is what we've seen in the market today, which is that most big businesses are interested in at some point. Some of them just use one model, but they like to try them out. And my guess is that likely developers will have that same instinct. I think the more open, hey, whatever, it's easy to download your data, move it over. I think that's the goal that we're trying to eventually aim towards. The one difference I would say, is that often developers, particularly when they're just getting started, are like, the switching costs are just more laborious for them. They're like, hey, I'm building on this tool. It's annoying to switch, it's complicated to switch. You have to redo your prompts because all of the models react a little bit differently just depending on, and we have great prompt engineering resources. Please check them out. And also it just takes some time and effort to understand the new personality of the model that you're using. I think my kind of short answer is yes, we're aiming for that more open ecosystem, but also it's sort of tactically hard to do in kind of a perfect way with interpretability research.
I'm curious what you think is coming first to the product. What is looking most optimistic where I could say turn on a switch and have it only output Arabez or something like that. What, what do you think is like closest working?
So interpretability is a, is a team that is deeply close to my heart despite me, like not being able to contribute anything of value to them other than telling them how great they are. I think interpretability is to me, like the coolest and most exciting area of AI research today because it's fundamentally trying to figure out, like, what, what are these models actually doing? Right. It's. It's like the neuroscience of large models. I actually think we're not impossibly far, but not that close from being able to sort of productionize something in interpretability today. The kind of neuroscience analogy is a little bit strange, but I actually think it's relevant in one particular way, which is that we can have a neuroscientist look at your brain and be like, well, we know that these two things light up when you think about dogs, but it can't sort of change you thinking about dogs, right. It's like you can sort of diagnose and understand and see things, but you can't actually go in and change them yet. And I think that's about where we are at sort of the interpretability level.
Could we offer some insight, like, in the future? I think almost certainly, yes. Probably not even on a crazy long time scale. Right. We could say, hey, if you're playing with sort of, you know, this type of model and it's activating strangely, I think that's the type of thing we could show a sort of visualization to a customer of, I don't actually know how actionable it is, if that makes sense. Right. In sort of the same way you're like, well, these sort of two parts of the model are lighting up, or this set of neurons is activating. But I think it's an interesting area of very basic science or basic research that I think could have incredible potential applications.
A couple of years from now, I'll ask a question, maybe give the folks here a taste of what's going to come on the product roadmap. Let's assume that Claude gets smarter and smarter, but what are you all going to add on the developer pacing product? And then what should we expect in terms of first party products from you?
So, first of all, we are just sort of scrambling day in and day out to try and keep up with the incredible demand that we have. So we are incredibly grateful for everybody, patience. But I think really on the kind of developer side, we really want to just uplevel the tools that are available for developers to be able to make the most use of Claude broadly. I think something that's really interesting just sort of speaking to the kind of ecosystem point is there's so much opportunity for knowledge sharing and sort of learning between developers and between people that are kind of using these models and tools. So we're also very interested in just sort of figuring out how to host more information sharing about how to get the most out of these models as well.
Wonderful. Oh, you have the mic. Yes. Go for it.
Given your focus on safety, I was hoping you could comment on how you see the regulatory landscape evolving. Maybe not so much for you specifically, but for the companies that are using your models and others.
So something that I think is just always an unknown is like, what, what's going to happen in the regulatory landscape and how is it going to impact, like, how we build and do our work kind of in this space, I think. I mean, first of all, I don't have any amazing prescience to say, like, this set of regulations I expect will happen, but I imagine what we'll see is kind of, it will probably start from a place of the consumer, because that's really what kind of governments and regulators are sort of most well positioned to try and defend or protect the. And I think a lot of the kind of narrative around data privacy is one that I expect we'll sort of see emerge right around just, hey, what are you doing with my data? Right. People put personal things into these, into sort of these interfaces, and they want to know, like, are the companies being responsible with that information? Right. What are they doing to protect it? Are they de anonymizing it? We don't train on people's data, but if other companies do, like, what does that mean for that person's information? Completely speculative, but that sort of is my guess of where things will start.
I also think there is a lot of interest in activation in sort of the policy space right now around how to develop these models in a way that is safe from a sort of bigger picture, like capital s perspective. Right. Some of the sort of scary things I talked about. But again, regulation is a sort of, it's a long process, and I think something we have always aimed to do is work closely with policymakers to give them as much information as possible so that there is thoughtful regulation that will prevent some of the potentially bad outcomes without sort of stifling innovation.
Thank you, Daniela. Thank you.
Do we have time for one more question? Okay, one more.
I'm getting looks from Emma. Sorry.
Hey, Daniela.
Claude, three is awesome. Thank you.
When you think about the model family and the hierarchy of models, do you have any thoughts on whether it is effective to use prompts or if you've done any work internally on giving the smaller models insight that larger models are available, to kind of say, hey, this is beyond my knowledge, but this is a good time to use the larger model.
That is such a good idea. Are you looking for a job? That's a great idea. That has not been something we have currently trained the models to do. I actually think it's a great idea. Something we have thought about is just how to kind of make the process of switching between models within a business just much more seamless. Right. You can imagine that over time the model should know, like, hey, you're not actually trying to look at macroeconomic trends in the 18th century right now. You're just trying to answer a sort of frontline question. You don't need opus, you need haiku. And I think some of that is sort of a research challenge, and some of it is actually just a product and engineering challenge.
Right. I. How well can we kind of get the models to self identify the level of difficulty and really sort of price optimize. Right. For customers to say, you don't actually need opus to do this task. It's really, really simple. Pay, you know, a tiny fraction of the cost for haiku, and we'll just switch you to sonnet if it's sort of somewhere in the middle. We're not there yet, but I think that's definitely something we've been thinking about in a request we've been hearing from customers. But I love your idea of adding in the sort of the sort of like self knowledge of the models. It's a cool idea, the call a friend. Exactly.
Yeah. Wonderful. Thank you so much, Daniella, thank you for sharing with us today. Thanks for having me.
Artificial Intelligence, Technology, Innovation, Enterprise Ai, Ai Safety, Product Market Fit, Sequoia Capital
Comments ()