RAG - an old idea in a new coat
While LLMs in the form we have them today are novel, the patterns around them are not necessarily so.
The term “Retrieval Augmented Generation” (RAG) was established in the 2020 paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" with the idea of augmenting the generation of the LLM through a better context, which comes from retrieving data before letting the language model reason.
However, the idea behind the retrieval algorithms has been introduced previously.
If you look at how recommender systems work, you have 3 steps:
Candidate Generation
Through the corpus of all potentially relevant data, which can be billions of documents, this step is about quickly filtering down to a more manageable set
Scoring
As scoring functions are usually more expensive than just the candidate generation, this is running on a subset of all candidates.
Re-ranking
From here, we rerank and clean up the data. In YouTube’s case, we might mix a few different channels, and categories and not just show results from one channel for a search.
All these patterns are very domain-specific and can vary a lot depending on the use case.
What can we learn from recommender systems for building RAG systems? What worked well in the past, and what didn’t?