We demonstrate the integration of Ray with CodeFlare and Red Hat OpenShift Data Science Pipelines (RHODS Pipelines) for automatically scaling the execution of end-to-end workflows to train and validate foundation models on an OpenShift Container Platform (OCP). Workflow pipelines in foundation model development typically involves running various pre-processing steps to deduplicate data sources, filter out biased and low-quality data, and remove hate and profanity contents. The preprocessed and cleaned-up data are then tokenized and used to further train or fine-tune existing generative pre-trained models. Auto scaling is critical in the execution of this workflow because these steps are usually very compute intensive, with some of the steps iterated several times. RHODS Pipelines is a tool for specifying workflow pipelines as DAGs. It uses Tekton as the workflow engine to deploy pods to execute the workflow DAG in a Kubernetes cluster. However, RHODS Pipelines +Tekton lacks a way for the user to automatically scale up with parallel pods to run a task in the DAG. CodeFlare is a tool to create the necessary configurations for deploying a Ray cluster in an OCP and submitting a task programmed with Ray to execute in parallel. We explore the integration of Ray with CodeFlare and RHODS Pipelines, such that the entire end-to-end workflow DAG, or any subset of them, can be easily specified, independently managed, automatically scaled up by individual developers. We will show foundation model use cases that leverage and benefit from a simple interface to provide specific parameters and specify the DAG in RHODS Pipelines, allowing the tool to generate all the necessary configurations and artifacts for effectively running foundation model workflows in parallel with Ray on OpenShift.
Dr. Yuan-Chi Chang is a Distinguished Research Staff Member at IBM Thomas. J. Watson Research Center in Yorktown Heights, New York. He has a prolific career with patents, technical publications, and contribution to IBM software products, including the first-of-a-kind realtime glucose monitoring analytics solution. More recently, he contributed to Ray Workflow's new feature to receive and process asynchronous events. His area of interest includes distributed processing, data management and workflow pipelines. Dr. Chang received his Ph.D. from University of California, Berkeley.
As a software engineering manager at Red Hat, Alex leads development of data science experimentation and distributed computation in the Red Hat OpenShift Data Science product and the Open Data Hub, its upstream open source community. In addition, Alex co-leads development of the Codeflare project, a suite of tools for minimizing the effort to scale AI and ML on Kubernetes.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.