ENSPIRING.ai: RAG vs. Fine Tuning

ENSPIRING.ai: RAG vs. Fine Tuning

The video explores the two powerful techniques, retrieval augmented generation (RAG) and fine tuning, used to enhance the capabilities of large language models in artificial intelligence. It discusses how RAG retrieves external, up-to-date information to provide accurate responses without altering the base model, while fine tuning involves tailoring a model to specialize in specific domains through labeled data.

It outlines the strengths and practical applications of each method, urging viewers to consider their AI applications' priorities when choosing between them. RAG is highlighted for dynamic, frequently updated data, useful in applications demanding transparency and trust. Fine tuning excels in sectors with specific vocabulary and nuances, such as legal or financial industries.

Main takeaways from the video:

💡
RAG is effective for applications requiring continuously updated data and context.
💡
Fine tuning is essential for industries with specific terminology and styles.
💡
Combining RAG and fine tuning yields robust AI systems capable of precision and accuracy in specialized domains.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. retrieval augmented generation [rɪˈtriːvəl ɔːɡˈmɛntɪd dʒɛnəˈreɪʃən] - (noun phrase) - A technique that enhances a model by retrieving and utilizing external, up-to-date information when generating responses. - Synonyms: (RAG, information retrieval, augmented generation)

So let's begin with retrieval augmented generation, which is a way to increase the capabilities of a model through retrieving external and up to date information.

2. generative ai [ˈdʒɛnərətɪv ˌeɪˈaɪ] - (noun phrase) - Artificial intelligence that can generate text, images, or other media from a set of inputs. - Synonyms: (creative AI, AI generation, intelligent creation)

So one of the biggest issues with dealing with generative ai right now is one, enhancing the models, but also two, dealing with their limitations.

3. hallucination [həˌluːsəˈneɪʃən] - (noun) - In AI, it's when a model generates an incorrect or misleading output not grounded in the input data. - Synonyms: (misinterpretation, error, anomaly)

And at the same time, because we're working with this retriever system and passing in the information as context in the prompt, well, that really helps with hallucinations.

4. corpus [ˈkɔːrpəs] - (noun) - A collection of written or spoken material stored for language research purposes. - Synonyms: (collection, database, archive)

Because now, instead of having an incorrect or possibly hallucinated answer, we're able to work with what's known as a corpus of information.

5. proprietary [prəˈpraɪətɛri] - (adjective) - Owned by a private individual or corporation under a trademark or patent. - Synonyms: (exclusive, copyrighted, private)

Because we can start to see that we can get better responses back from a model with our proprietary and confidential information without needing to do any retraining on the model.

6. inference [ˈɪnfərəns] - (noun) - The process of drawing conclusions and applying reasoning, often in AI when model processes data. - Synonyms: (deduction, reasoning, conclusion)

And at the same time, because that is baked into the model's weights itself, well, that's really great for speed and inference, cost, and a variety of other factors that come to running models.

7. nuance [ˈnuːɑːns] - (noun) - A subtle difference in or shade of meaning, expression, or sound. - Synonyms: (subtlety, distinction, variation)

Now, fine tuning is really powerful for specific industries that have nuances in their writing styles, terminology, vocabulary.

8. transparency [trænˈspɛrənsi] - (noun) - The quality of being done in an open way without secrets. - Synonyms: (openness, clarity, honesty)

Providing the sources for this information is really important in systems where we need trust and transparency when we're using AI.

9. contextual [kənˈtɛkstʃuəl] - (adjective) - Related to or determined by the context of the information or environment. - Synonyms: (relevant, situational, related)

And then pass that knowledge, as well as the original prompt to a large language model. And with its intuition and pre trained data, it's able to give us a response back based on that contextualized information.

10. specialize [ˈspɛʃəˌlaɪz] - (verb) - To focus on a specific area of knowledge or expertise. - Synonyms: (concentrate, focus, narrow down)

So this could be data, this could be PDF's, documents, spreadsheets, things that are relevant to our specific organization or knowledge that we need to specialize in

RAG vs. Fine Tuning

Let's talk about rag versus fine tuning. Now, they're both powerful ways to enhance the capabilities of large language models, but today you're going to learn about their strengths, their use cases, and how you can choose between them. So one of the biggest issues with dealing with generative ai right now is one, enhancing the models, but also two, dealing with their limitations. For example, I just recently asked my favorite LLM, a simple question, who won the euro 2024 world championship? And while this might seem like a simple query for my model, well, there's a slight issue. Because the model wasn't trained on that specific information, it can't give me an accurate or up to date answer.

At the same time, these popular models are very generalistic, and so how do we think about specializing them for specific use cases and adapt them in enterprise applications? Because your data is one of the most important things that you can work with. And in the field of AI, using techniques such as rag or fine tuning will allow you to supercharge the capabilities that your application delivers. So in the next few minutes, we're going to learn about both of these techniques, the differences between them, and where you can start seeing and using them in. Let's get started.

So let's begin with retrieval augmented generation, which is a way to increase the capabilities of a model through retrieving external and up to date information, augmenting the original prompt that was given to the model, and then generating a response back using that context and information. And this is really powerful, because if we think back about that example of with the eurocup, well, the model didn't have the information in context to provide an answer. And this is one of the big limitations of LLMs. But this is mitigated in a way with rag, because now, instead of having an incorrect or possibly hallucinated answer, we're able to work with what's known as a corpus of information.

So this could be data, this could be PDF's, documents, spreadsheets, things that are relevant to our specific organization or knowledge that we need to specialize in. So when the query comes in this time we're working with what's known as a retriever that's able to pull the correct documents and relative context to what the question is, and then pass that knowledge, as well as the original prompt to a large language model. And with its intuition and pre trained data, it's able to give us a response back based on that contextualized information, which is really, really powerful, because we can start to see that we can get better responses back from a model with our proprietary and confidential information without needing to do any retraining on the model. This is a great and popular way to enhance the capabilities of a model without having to do any fine tuning.

So, as the name implies, what this involves is taking a large language foundational model. But this time we're going to be specializing it in a certain domain or area. We're working with labeled and targeted data that's going to be provided to the model. And when we do some processing, we'll have a specialized model for a specific use case, to talk in a certain style, to have a certain tone that could represent our organization or company. And so then when a model is queried from a user or any other type of way, we'll have a response that gives the correct tone and output or specialty, and a domain that we'd like to receive. And this is really important because what we're doing is essentially baking in this context and intuition into the model.

And it's really important because this is now a part of the model's weights versus being supplemented on top with a technique like rag. Okay, so we understand how both of these techniques can enhance a model's accuracy, output and performance, but let's take a look at their strengths and weaknesses and some common use cases, because the direction that you go in can greatly affect a model's performance, its accuracy, outputs, compute, cost, and much, much more.

So let's begin with retrieval, augmented generation. And something that I want to point out here is that because we're working with a corpus of information and data, this is perfect for dynamic data sources such as databases and other data repositories, where we want to continuously pull information and have that up to date for the model to use and understand. And at the same time, because we're working with this retriever system and passing in the information as context in the prompt, well, that really helps with hallucinations.

And providing the sources for this information is really important in systems where we need trust and transparency when we're using AI. So this is fantastic. But let's also think about this whole system, because having this efficient retrieval system is really important in how we select and pick the data that we want to provide in that limited context window. And so maintaining this is also something that you need to think about. And at the same time, what we're doing here in this system is effectively supplementing that information on top of the model. So we're not essentially enhancing the base model itself, we're just giving it the relative and contextual information it needs versus fine tuning is a little bit different because we're actually baking in that context and intuition into the model, while we have greater influence in essentially how the model behaves and reacts in different situations.

Is it an insurance adjuster? Can it summarize, documents? Whatever we want the model to do, we can essentially use fine tuning in order to help with that process. And at the same time, because that is baked into the model's weights itself, well, that's really great for speed and inference, cost, and a variety of other factors that come to running models. So for example, we could use smaller prompt context windows in order to get the responses that we want from the model. And as we begin to specialize these models, they can get smaller and smaller for our specific use case.

So it's really great for running these specific specialized models in a variety of use cases, but at the same time, we have the same issue of cutoff. So up until the point where the model is trained, well, after that we have no more additional information that we can give to the model. So the same issue that we had with the World cup example. So both of these have their strengths and weaknesses, but let's actually see this in some examples and use cases here.

So when you're thinking about choosing between rag and fine tuning, it's really important to consider your AI enabled applications priorities and requirements. So namely, this starts off with the data. Is the data that you're working with slow moving, or is it fast? For example, if we need to use up to date external information and have that ready contextually, every time we use a model, then this could be a great use case for rag, for example, a product documentation chatbot, where we can continually update the responses with up to date information.

Now, at the same time, let's think about the industry that you might be in. Now, fine tuning is really powerful for specific industries that have nuances in their writing styles, terminology, vocabulary. And so for example, if we have a legal document summarizer, well, this could be a perfect use case for fine tuning. Now, let's think about sources. This is really important right now in having transparency behind our models and with Rag, being able to provide the context and where the information came from is really, really great.

And so this could be a great use case again for that chatbot for retail insurance and a variety of other specialties where having that source and information in the context of the prompt is very important. But at the same time, we may have things such as past data in our organization that we can use to train a model. So let it be accustomed to the data that we're going to be working with. For example, again, that legal summarizer could have passed data on different legal cases and documents that we feed it so that it understands the situation that it's working in and we have better, more desirable outputs.

So this is cool, but I think the best situation is a combination of both of these methods. Let's say we have a financial news reporting service. Well, we could fine tune it to be native to the industry of finance and understand all the lingo there. We could also give it past data of financial records and let it understand how we work in that specific industry, but also be able to provide the most up to date sources for news and data, and be able to provide that with a level of confidence and transparency and trust to the end user who's making that decision and needs to know the source.

And this is really where a combination of fine tuning and rag is so awesome, because we can really build amazing applications, taking advantage of both RAG as a way to retrieve that information and have it up to date. But fine tuning to specialize our data, but also specialize our model in a certain domain. So they're both wonderful techniques and they have their strengths, but the choice to use one or a combination of both techniques is up to you and your specific use case and data.

Artificial Intelligence, Technology, Innovation, Data Customization, Fine Tuning Ai, Retrieval Augmented Generation, Ibm Technology