In the video, Erica demonstrates how to use LangChain in Python for a Retrieval augmented Generation (rag) workflow with large language models (LLMs). She highlights the limitation of LLMs, such as outdated datasets, through an example with the IBM granite model, and introduces rag as a method to provide LLMs with the most current information by creating a knowledge base, retrieving relevant content, and feeding it to the model.

Viewers learn how to establish a rag workflow step-by-step. Erica explains setting up a knowledge base, using a retriever to fetch updated content, configuring the LLM, and creating a prompt to instruct the LLM on how to use the provided information. She demonstrates the process using IBM Slate model for vectorizing content and showcases the successful retrieval and response process via practical queries related to recent IBM announcements.

Main takeaways from the video:

💡
Retrieval augmented Generation (rag) enhances LLMs by providing them with up-to-date knowledge, enabling them to respond accurately to current events or recent data.
💡
An effective rag workflow involves setting up a knowledge base, a retriever, a generative LLM, and detailed prompts.
💡
Experimenting with the rag framework affords flexibility in querying different topics and efficiently harnesses the capabilities of models like LangChain and IBM’s Slate for data management.
Please remember to turn on the CC button to view the subtitles.

Key Vocabularies and Common Phrases:

1. granite [ˈɡrænɪt] - (noun) - In this context, it refers to a model from IBM used for natural language processing tasks. - Synonyms: (n/a)

But when I asked the IBM granite model to tell me about the UFC announcement from November 14, 2024, it didn't know what I was talking about and mentioned it was trained on a limited data set up to only 2021.

2. rag [ræɡ] - (noun) - An abbreviation for Retrieval augmented Generation, a process that enhances machine learning models by integrating external information sources. - Synonyms: (information retrieval process, augmented generation, data supplementation)

The answer is rag Retrieval augmented Generation.

3. retriever [rɪˈtriːvər] - (noun) - In this context, it’s a component used in rag to fetch the relevant content from the knowledge base. - Synonyms: (fetcher, collector, obtainer)

Second, we'll set up a retriever to fetch the content from the knowledge base.

4. knowledge base ['nɑlɪdʒ beɪs] - (noun) - A structured set of data or content that the LLM can utilize for generating accurate responses. - Synonyms: (content repository, information bank, data archive)

First, we'll add a knowledge base to include the content we want the LLM to read.

5. vector store ['vɛktər stɔːr] - (noun) - A storage system that organizes information as vectors to facilitate quick retrieval and search processes. - Synonyms: (vector repository, indexed storage, data vectorization)

Our query is searched for in our knowledge base vector store.

6. chroma [ˈkroʊmə] - (noun) - In this context, it refers to software used for embedding and storing documents locally as vectors for retrieval. - Synonyms: (n/a)

And to finish off this step, let's load our content into a local instance of a vector database using chroma.

7. instantiate [ɪnˈstænʃɪeɪt] - (verb) - To create an instance or specific realization of an object based on its definition in programming. - Synonyms: (initialize, create, establish)

And finally, in this step, we instantiate the LLM using WatsonX.

8. embedding model [ɛm'bɛdɪŋ ˈmɒdəl] - (noun) - A type of machine learning model that converts information into vectors to enable efficient search and retrieval processes. - Synonyms: (vector model, data encoder, textual representation model)

Next, we need to instantiate an embedding model to vectorize our content.

9. recursive [rɪˈkɜːrsɪv] - (adjective) - Characterized by repetition or recurrence often used to denote processes that refer back to themselves. - Synonyms: (iterative, repetitious, cyclical)

LangChain's recursive character text splitter takes a large text and splits it based on a specified chunk size.

10. augmented [ɔːɡˈmɛntɪd] - (adjective) - Enhanced or increased in value or quality. - Synonyms: (enriched, improved, strengthened)

The generative model will process the augmented context along with the user's question to produce a response.

LangChain RAG - Optimizing AI Models for Accurate Responses

Hi, my name is Erica and I'm going to show you how to use LangChain for a simple rag example in Python large language models LLMs can be great for answering lots of questions, but sometimes the models don't have the most up to date information and can answer some questions about recent events. For example, I was reading this recent announcement about the UFC and IBM partnership on IBM.com and wanted to ask an LLM about it. But when I asked the IBM granite model to tell me about the UFC announcement from November 14, 2024, it didn't know what I was talking about and mentioned it was trained on a limited data set up to only 2021.

How do I give this LLM the most up to date information so it can answer my question? The answer is rag Retrieval augmented Generation. Let me show you how it works. Typically we have our user asking the question to the LLM which generates a response. But as you just saw, the LLM didn't have the right information or context to answer my question. So we need to add something in the middle between the question and the LLM. First, we'll add a knowledge base to include the content we want the LLM to read. In this case, it'll be the most up to date content from IBM.com pages about some IBM products and announcements. Second, we'll set up a retriever to fetch the content from the knowledge base. Third, we'll set up the LLM to be fed the content. Fourth, we'll establish a prompt with instructions to be able to ask the LLM questions. The top search results from Search and Retrieval will also be gathered here.

Once we've completed these four steps, we can start asking our questions about the content in our knowledge base. Our query is searched for in our knowledge base vector store. The top results are returned as context for the LLM and finally the LLM generates a response. I'll walk through all these steps again in the jupyter notebook linked in the description to this video. Before we can begin, we need to fetch an API key and project ID for our notebook. You can get these credentials by following the steps in the video linked in the description below.

We also have a few libraries to use for this tutorial. If you don't have these packages installed yet, you can solve this with a quick PIP install and here we can import the packages. Next, save your WatsonX project ID and WatsonX API key in a separate env file. Make sure it's in the same directory as this notebook. I have my credentials saved already, so I'll import those over from my env file and save them in a dictionary called credentials.

Okay, now we can get started with the rag workflow. First, we'll gather the information from some IBM.com URLs to create a knowledge base as a vector store. Let's establish URLs dictionary. It's a Python dictionary that helps US map the 25 URLs from which we will be getting the content. You can see at the top Here I have the article about the UFC and IBM partnership I asked about before. Let's also set up a name for our collection. Ask IBM 2024.

Next, let's load our documents using the LangChain web based loader. For the list of URLs we have loaders load in data from a source and return a list of documents. We'll print the page content of a sample document at the end to see how it's been loaded. It can take a little while for it to finish loading. And here's a sample document. Based on the sample document. It looks like there's a lot of white space and newline characters that we can get rid of. Let's clean that up with this code. Let's see how our sample document looks now, after we've cleaned it up.

Great. We've removed the white space successfully. Before we vectorize our content, we need to split it up into smaller, more manageable pieces known as chunks. LangChain's recursive character text splitter takes a large text and splits it based on a specified chunk size, meaning the number of characters. In our case, we're going to go with a chunk size of 512. Next, we need to instantiate an embedding model to vectorize our content. In our case, we'll use IBM's slate model.

And to finish off this step, let's load our content into a local instance of a vector database using chroma. We'll call it Vectorstore. The documents in the vector store will be made up of the docs we just chunked and they'll be embedded using the IBM Slate model. For step two, we'll set up our vector store as a retriever. The retrieved information from the vector store, the content from the URLs serves as additional context that the LLM will use to generate a response later in step four. Code wise, all we need to do is set up our vector store as retriever.

For step three, we'll set up our generative LLM. The generative model will use the retrieved information from step two to produce a relevant response to our questions. First, we'll establish which LLM we're going to use to generate the response. For this tutorial, we'll use an IBM granite model. Next, we'll set up the model parameters. The model parameters available and what they mean can be found in the description of this video. And finally, in this step, we instantiate the LLM using WatsonX.

In step four, we'll set up our prompt which will combine our instructions, the search results from step two, and our question to provide context to the LLM we just instantiated in step three. First, let's set up instructions for the LLM. We'll call it Template because we'll also set up our prompt using a prompt template and our instructions. Let's also set up a helper function to format our docs to differentiate between individual page content.

Finally, as part of this step, we can set up a rag chain with our search results from our retriever, our prompt, our helper function, and our LLM. Finally, in step five and six, we can ask the LLM questions about our knowledge base. The generative model will process the augmented context along with the user's question to produce a response. First, let's ask our initial question. Tell me about the UFC announcement from November 14, 2024. On November 14, 2024, IBM and UFC announced a groundbreaking partnership and it looks like the model was able to answer our question this time. Since it received the context from the UFC article, we fed it Next, let's ask about WatsonX Data.

What is WatsonX Data? WatsonX Data is a service offered by IBM that enables users to connect to various data sources and manage metadata for creating data products. Looks good. And finally, let's ask about WatsonX AI. What does WatsonX AI do? WatsonX AI is a comprehensive AI platform that enables users to build, deploy and manage AI applications. Was also able to respond to our WatsonX AI question. Feel free to experiment with even more questions about the IBM offerings and technologies discussed in the 25 articles you loaded into the knowledge base.

ARTIFICIAL INTELLIGENCE, TECHNOLOGY, INNOVATION, LANGCHAIN, IBM, RETRIEVAL AUGMENTED GENERATION, IBM TECHNOLOGY