Large Language Models (LLMs) are beginning to revolutionize how users can search for, interact with, and generate new content. Some recent stacks and toolkits around Retrieval-Augmented Generation (RAG) have emerged, enabling users to build applications such as chatbots using LLMs on their private data.
However, while setting up a naive RAG stack is straightforward, addressing a long tail of quality evaluation and scalability challenges is essential for software engineers to render their applications production-ready.
This hands-on training will guide you through using LlamaIndex and Ray to implement reliable evaluation methods for LLMs. You'll learn how to design experiments to optimize key application components and utilize scalable evaluation workflows to quantitatively compare them.
Concluding the training, you'll learn how to take the best-performing configuration and transition it into production, along with discussing the path toward continuous learning.