At JPMorgan Chase we need robust forecasts of market prices and other time-series to provide a high-quality financial service to our clients. However, time-series forecasts in financial markets are difficult due to the richness of the systems involved and low signal-to-noise ratio. Furthermore, often the underlying drivers (e.g. Inflation) are non-stationary. In this work, we present the concept of probabilistic forecasting at scale powered by Ray, and we apply this technique to improve time-series models for multiple use cases relevant to the finance industry.
When forecasting time-series, one must consider the question: "Is the future likely to represent this exact version of history?" Or can we take random subsets of the past and develop a probabilistic model to forecast the future based on multiple slices of past data (i.e. a multi-verse approach)? This kind of sampling (called back-testing) helps remove the influence of outlier data points, while being representative of history, but requires much, much more processing. This is why we scale our project with Ray. To be more specific, probabilistic forecasting and non-stationarity necessitate large scale compute and distributed ML model development:
Potential use cases for this work are forecasting stock, commodity & energy prices, interest rates, exchange rates and a large variety of tradable assets. The forecasts could help traders and investors make better informed decisions about buying and selling assets, modeling other assets that depend on these time-series forecasts and managing risk. These use-cases can be expanded to cover different forecast horizons. A short-term model might be appropriate for day trading, while a medium-term model might inform asset selection in a portfolio, and finally a long-term forecast might help with stock valuation or asset construction decisions (e.g. wind-farms needed by 2050).
Our research team has created a platform for large scale ML based time-series forecasting for the above use cases. The backbone of the platform is Ray and its capabilities to efficiently distribute large scale computations on cloud leveraging Kubernetes. This JPMorgan Chase platform is used internally to conduct probabilistic regression, feature engineering, feature selection, hyper-parameter optimization, and probabilistic analysis metrics, with scaling powered by Ray.
Distributed time-series forecasting has the potential to drive significant improvements in efficiency, profitability, and sustainability, thanks to the power of Ray.
Authors:
Peyman Tavallali and Savinay Narendra are Vice Presidents and Applied AI ML Leads at JPMorgan Chase's Machine Learning Center of Excellence.
Berowne D Hlavaty is an Executive Director in JPMorgan Chase's Big Data & AI Strategies Research team.
Peyman Tavallali joined JPMC in May-2022 focusing on probabilistic regression/forecasting, TS analysis, ML uncertainty quantification, explainable ML, and distributed computation since then. In general, Peyman uses applied ML and applied math to assimilate in and provide value to JPMC.
Before joining JPMC, Peyman was working at NASA's JPL leading and participating in many of JPL's and NASA's cutting-edge technology projects using tools in ML and applied math. He also led numerous uncertainty quantification ML efforts for different use cases at JPL.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.