Some of the most demanding ML use cases involve pipelines that span both CPU and GPU devices in distributed environments. Most frequently, this situation occurs in batch inference, which involves a CPU-intensive preprocessing stage (e.g., video decoding or image resizing) before utilizing a GPU-intensive model to make predictions. It also occurs in distributed training, where similar CPU-heavy transformations are required to prepare or augment the dataset prior to GPU training. In this talk, we examine how Ray data streaming works and how to use it for your own machine learning pipelines to address these common workloads utilizing all your compute resources–CPUs and GPUs–at scale.
Takeaways
• Ray Data streaming is the new execution strategy for Ray Data in Ray 2.6
• Ray Data streaming scales data preprocessing for training and batch inference to heterogeneous CPU/GPU clusters
Eric Liang is a software engineer at Anyscale and TL for the Ray open source project. He is interested in building reliable and performant distributed systems. Before joining Anyscale, Eric was a staff engineer at Databricks, and received his PhD from UC Berkeley.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.