Serving numerous models is essential today due to diverse business needs and various customized use-cases. However, this raises the challenge of how to efficiently deploy and manage these models while considering both ease of use and cost-effectiveness. This talk aims to provide a comprehensive insight into various patterns of serving many models using Ray Serve. We will delve into how 3 features in Ray Serve - model composition, multi-application, model multiplexing - enable seamless deployment of numerous models while optimizing resource utilization.
Takeaways:
• Discuss common industry patterns for serving many models.
• Learn how to simplify management and enhance performance of many-model serving through Ray Serve's model composition, multi-application, and model multiplexing features.
• Deep dive into case studies of Ray Serve users running many-model applications in production.
Sihan is a software engineer at Anyscale, a contributor to the Ray Serve. Before joining Anyscale, he was the software engineer in Pinterest, working on ML inference service.
Jon Park is a Principal ML Engineer at Clari, where he leads ML Platform. Previously, he worked as a Director of API Platform Engineering at TIBCO. He holds a BA in Computer Science and MBA from UC Berkeley, and Master of Computer Science in Data Science from UIUC.
Cindy Zhang is a software engineer focusing on Ray Serve and Ray infrastructure at Anyscale.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.