The video presents insights into the evolving use of AI agents, a growing trend in the application of artificial intelligence. Andrew Ng, a prominent AI expert, discusses the distinction between traditional non-agentic workflows and the more iterative, agentic workflows. The latter involves AI engaging in a more comprehensive process such as planning, reviewing, and revising tasks, which results in remarkably improved outcomes. Examples include applying these methods to solve coding challenges, where agentic workflows can surpass the performance of simple prompting, enhancing the efficiency of AI models like GPT-3.5 when compared to GPT-4. Ng emphasizes the adoption of agentic workflows and the benefits they bring to AI applications, providing practical examples and recounting personal experiences to highlight their potential impact.
The talk sheds light on several design patterns used in AI agent development: reflection, where AI reflects on its own outputs for refinement; tool use, utilizing AI's ability to integrate external functionalities for task completion; planning, allowing AI to autonomously reroute around failures; and multi-agent collaboration, where multiple AI agents work in tandem on complex problems. These patterns demonstrate robust improvements in AI processes, showcasing how multiple agents can constructively debate to achieve superior performance. The speaker notes the need for patience and adaptation in allowing AI agents the time required to optimize tasks beyond immediate search responses.
Please remember to turn on the CC button to view the subtitles.
Key Vocabularies and Common Phrases:
1. agentic [eɪˈdʒɛntɪk] - (adjective) - Referring to the characteristics or approach of an agent; in AI, involves taking proactive or self-driven actions. - Synonyms: (autonomous, proactive, self-driven)
In contrast with an agentic workflow, this is what it may look like.
2. iterative [ˈɪtərətɪv] - (adjective) - A process that repeatedly executes a set of instructions or operations until a specific condition is met. - Synonyms: (repetitive, recurring, cyclic)
And so this workflow is much more iterative, where you may have the om do some thinking.
3. benchmark [ˈbɛn(t)ʃˌmɑrk] - (noun) - A standard or point of reference against which things may be compared or assessed. - Synonyms: (standard, criterion, yardstick)
My team analyzed some data using a coding benchmark called the human eval benchmark.
4. Criteri(A/On) [kraɪˈtɪəriə/kraɪˈtɪəriɒn] - (noun) - A principle or standard by which something may be judged or decided. - Synonyms: (standard, gauge, measure)
Check the code carefully for correctness, sound efficiency, good construction criteria, just write a prompt like that.
5. reflection [rɪˈflɛkʃən] - (noun) - The process of introspection or reviewing one's actions, thoughts, or work for improvement. - Synonyms: (introspection, contemplation, self-examination)
Agents reflection is a tool that I think many of us should just use.
6. prompt [prɒm(p)t] - (noun / verb) - (n.) A cue or instruction that initiates a response. (v.) To cause an action or encourage someone to say or do something. - Synonyms: (cue, encourage, motivate)
The same LLM that you prompted to write the code may be able to spot problems.
7. synthesize [ˈsɪnθəˌsaɪz] - (verb) - Combine different elements to form a coherent whole. - Synonyms: (integrate, combine, amalgamate)
Then find a pose to image model to synthesize a picture of a girl.
8. autonomous [ɔːˈtɒnəməs] - (adjective) - Able to operate or function independently. - Synonyms: (independent, self-governing, self-sufficient)
I can't believe my AI system just did that autonomously.
9. debate [dɪˈbeɪt] - (noun / verb) - (n.) A structured discussion on a particular topic. (v.) Engage in argumentation or discussion on a subject. - Synonyms: (discussion, argument, dialogue)
Multi-agent debate where you have different agents, for example, chat, GPT and Gemini debate each other.
10. trajectory [trəˈdʒɛktəri] - (noun) - The path followed by an object or process as it evolves. - Synonyms: (course, path, route)
The path to AGI feels like a trajectory rather than a destination.
What's next for AI agentic workflows ft. Andrew Ng of AI Fund
All of you know, Andrew Ng, as a famous computer science professor at Stanford, was really early on in the development of neural networks with GPU's, of course, a creator of coursera, and popular courses like deeplearning AI. Also the founder and creator and early lead of Google Brain. But one thing I've always wanted to ask you before I hand it over, Andrew, while you're on stage, is a question I think would be relevant to the whole audience. Ten years ago on problem set number two of CS 229, you gave me a b, and I was wondering, I looked it over. I was wondering what you saw that I did incorrectly.
So anyway, Andrew, thank you, Hansin. Looking forward to sharing with all of you what I'm seeing with AI agents, which I think is an exciting trend that I think everyone building an AI should pay attention to. And then also excited about all the other what's next presentations. So, AI agents, you know, today, the way most of us use large language models is like this, with a non agentic workflow where you type a prompt and generate an answer. And that's a bit like if you ask a person to write an essay on a topic, and I say, please sit down on the keyboard and just type the essay from start to finish without ever using backspace. Um, and despite how hard this is, lms do it remarkably well.
In contrast with an agentic workflow, this is what it may look like. Have an AI, have an LM, say, write an essay outline. Do you need to do any web research? If so, let's do that. Then write the first draft, and then read your own first draft and think about what parts need revision, and then revise your draft. And you go on and on. And so this workflow is much more iterative, where you may have the om do some thinking. Um, and then revise this article, and then do some more thinking, and iterate this through a number of times. And what not many people appreciate is this delivers remarkably better results. Um, I've actually really surprised myself working these agent workflows. How well, how well they work.
I'm going to do one case study. My team analyzed some data using a coding benchmark called the human eval benchmark released by OpenAI a few years ago. Um, but this says coding problems like, given a non empty list of integers, return the sum of all the odd elements or uneven positions, and it turns out the answer is a code snippet like that. So today, a lot of us will use zero sharp prompting, meaning we tell the AI, write the code and have it run on the first bus. Like, who codes like that? No, human codes like that. We just type out the code and run it. Maybe you do. I can't do that.
Um, so it turns out that if you use GPT 3.50 shot prompting, it gets it 48% right. Uh, GPT four, way better. 67% right. But if you take an agentic workflow and wrap it around GPT 3.5, say it actually does better than even GPT four. Um, and if you were to wrap this type of workflow around GBD four, you know, it. It also, um, does very well. And you notice that GPT 3.5 with an agentic workflow actually outperforms GPT four. Um, and I think this has, and this means that this has significant consequences for, I think, how we all approach building applications.
So agents is deterrent has been tossed around a lot. There's a lot of consultant reports, talk about agents, the future of AI, blah blah, blah. I want to be a bit concrete and share with you, um, the broad design patterns I'm seeing in agents. It's a very messy, chaotic space. Tons of research, tons of open source, there's a lot going on. But I try to categorize a bit more concretely what's going on. Agents reflection is a tool that I think many of us should just use. It just works to use. I think it's more widely appreciated, but actually works pretty well. I think of these as pretty robust technologies. When I use them, I can almost always get them to work well.
Planning and multi agent collaboration. I think it's more emerging when I use them. Sometimes my mind is blown for how well they work, but at least at this moment in time, I don't feel like I can always get them to work lively. So let me walk through these four design patterns in a few slides, and if some of you go back and yourself will ask your engineers to use these, I think you get a productivity boost quite quickly. So, reflection, here's an example. Let's say I ask a system, please write code for me for a given task. Then we have a coded agent, just an LLM that you prompt to write code to say, yeah, def do task, write a function like that.
Um, an example of self reflection would be if you then prompt the LM with something like this. Here's code intended for a toss and just give it back the exact same code that they just generated, and then say, check the code carefully for correctness, sound efficiency, good construction criteria, just write a prompt like that. It turns out the same LLM that you prompted to write the code may be able to spot problems like this bug in line 5, may fix it by blah, blah, blah. And if you now take the zone feedback and give it to it and re prompt it, it may come up with a version two of the code that could well work better than the first version. Not guaranteed, but it works often enough for this to be worth trying for a lot of applications.
Um, to foreshadow two use, if you let it run unit tests. If it fails a unit test, then you, why do you fail the unit test? Have that conversation and maybe they figure out fail the unit test. So you should try changing something and come up with v three. By the way, for those of you that want to learn more about these technologies, I'm very excited about them. For each of the four sections, I have a little recommended reading I section in the bottom that hopefully gives more references.
And again, just to foreshadow, multi agent systems, I've described as a single coded agent that you prompt to have it, you know, have this conversation with itself. One natural evolution of this idea is instead of a single coded agent, you can have two agents where one is a coded agent and the second is a critic agent. And these could be the same base LM model, but they prompt in different ways. Where you say, one, you're expert coder, right? Code, the other one, explicit code reviewers to review this code. And this type of workflow is actually pretty easy to implement. I think it's actually a very general purpose technology for a lot of workflows. This will give you a significant boost in the performance of lms.
Um, the second design pattern is two use. Many of you will already have seen, you know, LM based systems, uh, uh, using tools. On the left is a screenshot from um, uh, copilot. Uh, on the right is something that I kind of extracted from, uh, GPT four. But, you know, oms today, if you ask it what's the best copy maker can do a web search for some problems will generate code and run codes. Um, and it turns out that there are a lot of different tools that many different people are using for analysis, for gathering information, for taking actions, for personal productivity.
Um, it turns out a lot. The early work in two use turned out to be in the computer vision community because before large language models, om's, you know, they couldn't do anything with images. So the only option was that the LM generate a function call that could manipulate an image, like generate an image or do object detection or whatever. So if you actually look at literature, it's been interesting how much of the work, um, in two years seems like it originated from vision because lms would blind to images before, you know, GPT, four v and lava and so on. Um, so that's true use and it expands what an LM can do.
Um, and then planning, you know, for those of you that have not yet played a lot with planning algorithms, I feel like a lot of people talk about the chat GPT moment where you're, wow, never seen anything like this. I think you've not used planning algorithms. Many people will have a kind of an AI agent. Wow, I couldn't imagine an AI agent doing good. So I've run live demos where something failed and the AI agent rerouted around the failures. I've actually had quite a few of those. Wow, I can't believe my AI system just did that autonomously.
But one example that I adapted from hugging GPT paper, you say, please generate an image where a girl is reading a book and the pose is same as a boy in the image example Jpeg. And please describe the new image for your voice. So give an example like this. Today with AI agents, you can kind of decide, first thing I need to do is determine the pose of the boy, then find the right model, maybe on hugging face to extract the pose. Then next need to find a pose to image model to synthesize a picture of a girl as following the instructions, then use image to text, and then finally use text to speech.
And today we actually have agents that I don't want to say they work reliably, they're kind of finicky, they don't always work, but when it works is actually pretty amazing. But with agentic loops, sometimes you can recover from earlier failures as well. So I find myself already using research agents in some of my work where I'll run a piece of research, but I don't feel like googling myself and spend a long time. I should send to the research agent, come back in a few minutes and see what it's come up with. And it sometimes works, sometimes doesn't, right? But that's already a part of my personal workflow.
The final design pattern, multi agent collaboration. This is one of those funny things, but it works much better than you might think. But on the left is a screenshot from a paper called Chat Dev, which is actually open source. Many of you saw the flashy social media announcements of demo of a dev. Chat dev is open source, it runs on my laptop. And what chat Dev does is example of a multi agent system where you prompt one LLM to sometimes act like the CEO of a software engine company, sometimes act a designer, sometimes a product manager, sometimes act like a tester. And this flock of agents that you build by prompting an Om to tell them you are now a CEO, you are now software engineer. They collaborate, have an extended conversation so that if you tell it, please develop a game. Develop a go Moki game.
They will actually spend a few minutes writing code, testing, iterating, and then generate surprisingly complex programs. Doesn't always work. I've used it, sometimes it doesn't work, sometimes it's amazing. But this technology is really getting better. And just one of the design pattern, it turns out that multi agent debate where you have different agents, for example, it could be have chat, GPT and Gemini debate each other. That actually results in better performance as well. So having multiple similar AI agents work together has been a powerful design pattern as well.
So just to summarize, I think these are the patterns I've seen, and I think that if we were to use these patterns in our work, a lot of us can get a productivity boost quite quickly. And I think that agentic reasoning design patterns are going to be important. This is my last slide. I expect that the set of tasks AI could do will expand dramatically this year because of agentic workflows. And one thing that is actually difficult people to get used to is when we prompt an om, we want to respond right away. In fact, a decade ago, when I was you're having discussions at Google on, we call the big box search type of long prompt.
One of the reasons I failed to push successfully for that was because when you do a web search, you want to respond back in half a second. That's just human nature. We like that instant grab, instant feedback. But for a lot of the agent workflows, I think we'll need to learn to delegate a task in AI agent and patiently wait minutes, maybe even hours, uh, to, for response. But just like I've seen a lot of novice managers delegate something to someone and they check in five minutes later. Right. And that's not productive. Um, I think we need to, it's really difficult. We need to do that with some of our AI agents as well.
I saw, I heard some loss. Um, and then one other important trend, fast token generation is important because with these agentic workflows, we're iterating over and over. So the element is generating tokens for the element to read a. So being able to generate tokens way faster than any human to read is fantastic. And I think that generating more tokens really quickly from even a slightly lower quality LM might give good results compared to slower tokens from a better LM. Maybe it's a little bit controversial because it may let you go around this loop a lot more times.
Kind of like the results I showed with GPTC and an agent architecture on the first slide. Um, and candidly, I'm really looking forward to cloud five and cloud four and GPT five and Gemini 2.0 and all these other ones. Four models in membrane building. And part of me feels like if you're looking forward to running your thing on GPT 50 shot, you know, you may be able to get closer to that level of performance on some applications than you might think with agent in reasoning. Um, but on an early model, I think. I think this is an important trend. And honestly, the path to AGI feels like a journey rather than a destination. But I think this type of agent workflows could help us take a small step forward on this very long journey. Thank you.
Artificial Intelligence, Innovation, Technology, Agentic Workflow, Machine Learning, Andrew Ng, Sequoia Capital