The popularity of machine learning (ML) in the real world has exploded recently, with offline batch inference as one of the primary workloads. Yet existing production ML systems fall short for this workload in scale and simplicity. To address this, the Ray community has built Ray Data, an open-source library for building large-scale data processing for ML applications. In this talk we'll discuss:
• How to use Ray Data for efficient inference of Terabytes of data and a pretrained model
• Why traditional data processing tools are difficult, expensive, and inefficient, particularly for modern deep learning applications.
• How to easily leverage modern ML models, multiple times faster and cheaper than other common solutions (like Spark or SageMaker) using - Ray Data
• How and why offline inference can be useful with LLMs and when building LLM applications
• Demonstrate an e2e offline batch inference use case with Ray Data
Takeaways:
• Ray Data is the best solution for offline batch inference/processing, particularly when working with unstructured data and with deep learning models
• Discuss why having a performant batch inference solution is important for LLM workloads, and show how Ray Data can help
• Share user success stories of using Ray Data for batch inference
Amog Kamsetty is a software engineer at Anyscale where he works on building distributed ML libraries and integrations on top of Ray. He is one of the lead developers of Ray's distributed training and offline batch inference libraries.
Balaji Veeramani is a software engineer at Anyscale. He works on open source libraries built on Ray. Before joining Anyscale, he was a student at UC Berkeley.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.