RAG - Retrieval Augmented Generation

LLMs are good but sometime you may run into the situation where LLM start giving some irrelevant or outdated information as a response.

To address this issue, you must fine-tune your pre-trained / base model with relevant and valid information. This process is similar to training a model from the scratch and time-consumed one.

This is where RAG comes in..

RAG

RAG, or Retrieval Augmented Generation, helps you retrieve the needed information from external valid sources such as webpages, documents, and images, instead of relying solely on fine-tuned models. It helps you get more comprehensive and accurate outputs.

How it works?

Here is a breakdown of how it works,

  • Your external information broken into many chunks and will be converted into context-aware embeddings.

  • This embeddings will be stored in database, called Vector database. This database is organised in such a way that it allows for efficient retrieval of relevant information given a query.

  • When user enter a query, RAG will searches through vector database to find a passage that are contextually relevant to user's query.

  • Once this passage is identified, it will be provided to your LLM model to generate an answer from the context.