Scaling Financial Machine Learning with MLOps

Written by William Arias | 7/29/24 4:18 PM

In the financial services landscape, where payments, transfers, trades, and numerous transactions take place every day, data sits at the financial system’s core and enables its operation. And where there is data, there is potential room to leverage models that learn from it and can help to optimize the operation. Such models within the machine learning realm have become critical tools for tackling tasks (to name a few) such as fraud detection, customer segmentation, sentiment analysis, and risk assessment. Creating these models has become easier in recent years. However, the same can’t be said about deploying and maintaining these models in production environments. This is where MLOps (Machine Learning Operations) comes into play, offering a set of practices and, hopefully, standards that combine machine learning, DevSecOps, and data engineering to develop, deploy, and maintain ML models more reliably and efficiently.

Responding to changes in Financial Machine learning systems

Consider a typical use case in finance: a credit card fraud detection model. While the initial development and deployment of such a model can be straightforward:

Simplified Workflow

reality hits after the model has been deployed. After all, the fun part of ML starts by facing real-world scenarios, and these ones, way more often than not, introduce challenges such as:

Finance domain knowledge experts might identify inaccuracies in probability estimates due to feature interactions. Therefore, more feature engineering is needed.
New credit card transactions data was annotated and made available, and the model may need retraining
The model is not performing as expected, and different teams may need to iterate on its testing and development or try different models quickly.

The scenarios above raise new requirements and questions:

How can ML systems respond to changes quickly?
Is there a way to speed up new model development, experimentation, and deployment?
How can compliance be ensured throughout the process?

To address these challenges, financial institutions need a robust MLOps strategy. Let's break down those challenges into use cases:

Enhancing experiment reproducibility

A baseline step to enhance reproducibility is having a common and standard experiment environment for all data scientists and machine learning engineers to run their experiments. A standard data science environment ensures all team members use the same software dependencies. A way to achieve this is by building a container image with all the respective dependencies under version control and re-pulling it every time a new version of the code or data is available. This process is illustrated in the figure below:

Introducing code or data changes to the model project allows GitLab CI to build a container image the team has agreed on, where those changes can be tested and evaluated automatically. Team members are more likely to reproduce experiments by pulling the same image where the test and evaluation were done.

Iterating faster in the machine learning experimentation

ML models require a decent amount of experimentation. Embracing developer workflows in the development of learning algorithms can help improve model traceability and collaboration. Consider the following scenario:

A machine learning engineer is working on an issue that requires them to split the training and testing dataset of credit card transactions to re-evaluate the model performance:

As illustrated in the figure above, iterating on the model experimentation using developer workflows means that changes to the modeling scripts are committed in a separate branch and tested automatically using CI principles with GitLab CI.

Therefore, any required changes to the model configuration are introduced by creating a merge request https://docs.gitlab.com/ee/user/project/merge_requests/. The changes can not only be automatically tested and evaluated using GitLab CI, but using these workflows allows for code and model performance review:

People who have worked long enough in DevSecOps know that a green pipeline doesn’t necessarily mean total success. That is even more true and relevant in machine learning models; the model code can comfortably pass unit testing and still not be good enough for the task at hand. To reduce the probability of deploying underperforming models, the testing job can automatically publish model metrics that can be reviewed by fellow data scientists or domain knowledge experts, as shown in the testing report figure, and decide if the model meets the expected metrics, all of this in the merge request page.

Evaluating models using ephemeral environments

Another advantage of using decades of lessons learned from DevSecOps is the automatic spin-up of testing environments.

This comes in handy when evaluating models, recalling that successful unit testing alone won’t ensure correct model functionality and that testing the model is not the same as evaluating it using CI/CD pipelines with GitLab CI. It is possible to deploy a work-in-progress version of the model to an ephemeral environment and interact with it, possibly like an end-user would do it (depending on the use case).

Being able to keep humans in the loop by evaluating models during the development and deployment process can help us uncover undesired predictions or unfair behaviors.

Automatically allocate GPU hardware for training stages

Implementing MLOps principles using GitLab offers the ability to leverage GPU hardware and, even better, to get this hardware automatically provisioned to run jobs declared in the .gitlab-ci.yml file. Using GPU hardware (GitLab Runners) requires adding this line to the CI/CD configuration file

tags:

- saas-linux-medium-amd64-gpu-standard

Teams working on modeling can take advantage of this capability to train ML models faster without spending time setting up or configuring graphics card drivers

Keeping track of the model experiments and iterations

Each model training execution triggered using GitLab CI is an experiment that needs tracking. Using Experiment tracking in GitLab helps to record metadata that comes in handy to compare model performance and collaborate with other folks by making result experiments available for everyone and providing a detailed history of the model development:

Automatically registering different model candidates and experiment metadata is another step forward towards reproducibility, collaboration and keeping a clear path of model traceability from pipeline run to candidate details:

The figure above shows us details about a model experiment and its provenance: in this case, the model produced comes from a merge request !9 "Experiments with splits in the training dataset"

Once all the experimentation is done, the best model candidate can be stored and versioned in the model registry:

The model registry then allows the machine learning teams to keep track of the models and metadata associated with it, such as logs, model metrics, dependencies and more.

Keeping the Machine learning box transparent

Having a repository dedicated to the development of a machine-learning model and leveraging GitLab dependency and security scanners in the CI/CD process previously described, results in the visibility of the whole software inventory that produced the machine-learning model:

As illustrated in the figure above, this view from the compliance side of the project provides information about the libraries used to develop, build, and deploy the model. Thus, it enhances transparency into both first-party developed code and third-party adopted software and, of course, facilitates the auditing process.

Putting it all together

By implementing these MLOps practices, financial institutions can:

Respond quickly to changing conditions, whether it's new data, code improvements, or changing regulatory requirements
Maintain high standards of code quality and compliance
Facilitate collaboration between data scientists and domain experts
Provide a framework for continuous improvement of models using decades of lessons learned in DevSecOps

To successfully extract value from data, financial institutions need to make sure they not only have the right financial domain knowledge, the right quantity, and the quality of financial data but also a reliable data infrastructure and ML-oriented pipelines orchestrated by MLOps using CI/CD principles. With a proper MLOps strategy, domain knowledge teams can collaborate closely, properly test and evaluate models, quickly respond to new data sources, and train and retrain ML models at scale while remaining compliant.

Author: William Arias, Senior Developer Advocate, GitLab

View full post