In this video we take a cursory look at the various techniques you can employ to get better results from Large Language Models.
Today, I want to talk about a plot that I found online. It was published back in May on Twitter by Andre Karpathy, one of the founding Engineers of OpenAI. The plot explains various techniques to improve the performance of a language model (LLM). I'll go over some of those techniques and see which ones are easily applicable today and which ones may need more work.
The plot, which you can find on Andre's Twitter account, shows the effort needed to use these techniques on the X-axis and the task accuracy on the Y-axis. The red line represents a small base model, while the blue line represents a big base model, typically above a billion parameters.
The easiest way to improve model performance is prompt engineering, which involves trying different variants and shapes of prompts to get the model to respond better. This can already yield good results for big base models.
Another technique is few-shot prompts, where you teach the model what constitutes a good or bad answer within the prompts. This can guide the model to respond more accurately.
Retrieval augmented generation (RAG) is a more complex technique that involves building a system around the model, including a query engine and a vector database. RAG allows the model to respond to questions with data it has not seen before, such as from a private knowledge base.
Finally, fine-tuning is the most effort-intensive and complex technique, requiring training pipelines and skills to fine-tune the model with your own data. It's not recommended as a first step to improve model performance.
In conclusion, prompt engineering and few-shot prompts can significantly improve model performance without building additional infrastructure or tooling. RAG and fine-tuning require more effort and resources. This plot provides valuable insights into improving language model performance.