MLOps/LLMOps
How to roll out machine learning models in three stages to ensure that the model works properly in production
What to do when you want the model in production as fast as possible. Overengineering is fun, but right now, you need results. Fast.
Root Causes of Non-Determinism and how to fix those issues
Don't write your own tools. Everything you need at the beginning has already been written by someone else. Invest effort in automation. It would be...
The role of MLOps is to support the whole flow of training, serving, rollout, and monitoring, not only deployment and testing. The entire workflow is...
Data Storage
An explanation which focusing on the technical building blocks of a feature store and the separation of responsibilities between data engineers and data scientists.
An explanation of why we need data versioning and what kinds of data versioning tools exist.
We transform the raw data into vector embeddings to train/use an ML model (for example, in language processing). Vector databases store such embeddings and offer...
The author goes far beyond the basic SQL tutorials that got stuck with the SQL-92 standard.
Data Management
Monitoring the data quality and ensuring that the feature store always contains valuable data. Hints of the kinds of data quality checks that we can...
Experiment Tracking
Using experiment tracking to compare experiments, analyze results, debug the model training code, and improve team collaboration by sharing experiment results.
What experiment is tracking, and why do you need it? Ideas on experiment tracking implementation better than using a shared Excel file
Observability
Monitoring data drift to get an early warning about incomming model performance problems
A practical guide to finding common problems with ML models and fixing them
An explanation of the most common problems related to ML model deployed in production
Preventing ML model degradation over time by monitoring the model's perfromance KPIs
An introduction to monitoring ML pipelines in production. The article covers Monitoring your infrastructure, the input data, and the ML training process.