KServe is a Opensource production-ready model inference framework on Kubernetes utilizing many knative's features such as routing for canary traffic and payload logging. However, the one model per container paradigm limits the concurrency and throughput when sending multiple inference requests. With RayServe integration, a model can be deployed as individual Python workers allowing for parallel inference. This enables concurrent inference requests to be processed simultaneously, improving overall efficiency. In this talk, we will share how you can configure, run, and scale machine learning models in Kubernetes using KServe and Ray.
Ted Chang is software engineer in the IBM Cognitive Open Technologies Group focusing on software development in the MLOps and Data/AI space. Lately, he has been focusing on Kubeflow, KServe and Flink.
James Busche is a senior software engineer in the IBM Open Technologies Group, currently focused on the Open Source CodeFlare project. Previously, James has been a DevOps Cloud engineer for IBM Watson and the worldwide Watson Kubernetes deployments.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.