There are a lot of different moving pieces when it comes to developing and serving LLM applications. This talk will provide a comprehensive guide for developing retrieval augmented generation (RAG) based LLM applications — with a focus on scale (embed, index, serve, etc.), evaluation (component-wise and overall) and production workflows. We’ll also explore more advanced topics such as hybrid routing to close the gap between OSS and closed LLMs.
Takeaways:
• Evaluating RAG-based LLM applications are crucial for identifying and productionizing the best configuration.
• Developing your LLM application with scalable workloads involves minimal changes to existing code.
• Mixture of Experts (MoE) routing allows you to close the gap between OSS and closed LLMs.
Philipp Moritz is one of the creators of Ray, an open source system for scaling AI. He is also co-founder and CTO of Anyscale, the company behind Ray. He is passionate about machine learning, artificial intelligence and computing in general and strives to create the best open source tools for developers to build and scale their AI applications.
Goku works on education, engineering and product at Anyscale Inc. He was the founder of Made With ML, a platform that educates data scientists and MLEs on MLOps 1st principles and production-grade implementation. He has worked as a machine learning engineer at Apple and was a ML lead at Ciitizen (a16z health) prior to that.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.