Introduction to MLOps for Beginner — Machine Learning Life cycle & tools for MLOps
In machine learning, we often train, test and deploy the model iteratively until and unless the performance of the model meets the business requirements. Frequently, we need to go back and make some changes in design and development and re-deploy model for production. It is very challenging to track all the processes from development to deployments without following the standard set of practices. Then, what are the possible ways to handle the above procedures in a standard way? Can we develop, deploy and maintain the machine learning model in the production? Difficult right? This is where MLOps comes into play.
MLOps refers to Machine Learning operations. It is the standard set of practices to design, develop, deploy and maintain the machine learning in the production continuously, reliably and efficiently. MLOps speeds up the machine learning life cycle with proper development and improved collaboration. The following figure illustrates the machine learning life cycle.
In the design phase, The context of the problem is defined. The business requirements such as end-user requirements, budget and other compliance are also defined in the design phase. The business personnel are involved in the role. The key metrics are also identified which helps to design an accurate machine learning system which is accurate, satisfy the customer requirements and generates revenue for business stakeholders.
Data are collected from multiple input sources and processed. In data processing, different processes involved are: extracting data from multiple input sources, transforming data from raw form to standard form and storing data in a database/data lake/data warehouse. The quality of data is measured by consistency, completeness, timeliness and accuracy metrics.
In the development phase, Feature engineering is done where raw data are converted to more robust features.
If multiple machine learning processes are using the same feature or use the same feature multiple times, then a features store is created.
Model training and testing are tracked using experiment tracking. In experiment tracking, ml models, hyper-parameters, data versions, environment configuration and execution scripts are configured for tracking. Experiment tracking will help to compare and evaluate experiments, report results to stakeholders and reproduce results.
In the deployment phase, Runtime environments are created for machine learning applications in production using containerization. Containers are abstractions at the app layer that package code and dependencies together. Containers run without depending on the underlying OS.
The Microservice architecture is used to integrate small loosely coupled fine-grained services which are complete and can be used in another system.
The deployment process is fully automated using the CI/CD pipeline. In this process, continuous monitoring and retraining are also performed.
The deployed machine learning application is monitored in the production environment. Monitoring focuses on not only the accuracy of the model in the production environment but also on technical metrics of the system such as CPU usage, number of inferences, latency of the system etc.
When new data are collected, which are not trained yet, then we need to develop a new version of the machine learning model which learns from both old data and new data. The retraining of the model depends upon business requirements and how often new data will generated. After retraining the model, generally, the old model is replaced by the new model.
There are different tools available to automate the machine learning life cycle. I have listed them below.
- Feature Store: Feature store is the central place to store frequently used features for machine learning pipelines. The tools that are used to create feature stores are FEAST, HOPSWORKS etc.
- Experiment Tracking: In experiment tracking, the components associated with each experiment such as parameters, metrics, models and other artefacts are organized to compare the performance of each machine learning model. The tools that are used for experiment tracking are mlflow, CLEARML, W&B etc.
- Containerization: Containerization is application-level virtualization over multiple network resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment. The tools that are used for containerization are docker, Kubernetes, Amazon EKS, Azure Kubernetes Service(AKS), Google Kubernetes Engine etc.
- CI/CD: CI/CD refers to continuous integration and continuous deployment. It is the process to automate the system development, integration and deployment processes. The tools which are commonly used for CI/CD are Jenkins, Gitlab etc.
- Monitoring: After the deployment of the machine learning system in production, It must be monitored continuously to evaluate whether the system is working as expected or not. The tools that are used to monitor machine learning systems are fiddler, great_expectations etc.
In final, to build a complete MLOps pipeline, we need to integrate one tool from each five steps mentioned. But, there are other solutions available for a complete MLOps platform from feature store to monitoring, which are: AWS Sagemaker, Azure Machine Learning, Google Cloud AI platform etc.
I hope that the above blog is helpful in understanding about MLOps basics and available tools to automate the machine learning life cycle. If you have any queries I am happy to answer them if possible. If it is helpful to you, then please don’t forget to clap and share it with your friends. See you in part 2…