AI in Data

Documenting the Data Engineering side of AI
Curated by Bartosz Mikulski

Observability

Q&A: Do I need to monitor data drift if I can measure the ML model quality?

Monitoring data drift to get an early warning about incomming model performance problems

Elena Samuylova
Emeli Dral

Observability

ML Model Monitoring – 9 Tips From the Trenches

A practical guide to finding common problems with ML models and fixing them

Felipe Almeida

Observability

Concept Drift and Model Decay in Machine Learning

An explanation of the most common problems related to ML model deployed in production

Ashok Chilakapati

Observability

Model Monitoring: What it is & why does it matter?

Preventing ML model degradation over time by monitoring the model's perfromance KPIs

Görkem Gençer

Observability

Monitoring ML pipelines

An introduction to monitoring ML pipelines in production. The article covers Monitoring your infrastructure, the input data, and the ML training process.

Kristina Georgieva

MLOps/LLMOps

Shadow deployment vs. canary release of machine learning models

How to roll out machine learning models in three stages to ensure that the model works properly in production

Bartosz Mikulski

Written by me

MLOps/LLMOps

Deploying your first ML model in production

What to do when you want the model in production as fast as possible. Overengineering is fun, but right now, you need results. Fast.

Bartosz Mikulski

Written by me

MLOps/LLMOps

Reproducibility in ML: why it matters and how to achieve it

Root Causes of Non-Determinism and how to fix those issues

Jennifer Villa
Yoav Zimmerman

Experiment Tracking

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Using experiment tracking to compare experiments, analyze results, debug the model training code, and improve team collaboration by sharing experiment results.

Jakub Czakon

Experiment Tracking

Experiment Tracking: What it is, Best Practices & Tools

What experiment is tracking, and why do you need it? Ideas on experiment tracking implementation better than using a shared Excel file

Cem Dilmegani

Data Storage

What is a Feature Store?

An explanation which focusing on the technical building blocks of a feature store and the separation of responsibilities between data engineers and data scientists.

Mike Del Balso

Data Management

Unit Testing Data: What Is It and How Do You Do It?

Monitoring the data quality and ensuring that the feature store always contains valuable data. Hints of the kinds of data quality checks that we can...

Data Storage

Data Versioning: What is it & why is it important?

An explanation of why we need data versioning and what kinds of data versioning tools exist.

Görkem Gençer

Data Storage

What is a Vector Database?

We transform the raw data into vector embeddings to train/use an ML model (for example, in language processing). Vector databases store such embeddings and offer...

Bryan Turriff

Data Storage

Modern SQL

The author goes far beyond the basic SQL tutorials that got stuck with the SQL-92 standard.

Markus Winand

MLOps/LLMOps

Scaling An ML Team (0–10 People)

Don't write your own tools. Everything you need at the beginning has already been written by someone else. Invest effort in automation. It would be...

Peter Gao

MLOps/LLMOps

Why is DevOps for Machine Learning so Different?

The role of MLOps is to support the whole flow of training, serving, rollout, and monitoring, not only deployment and testing. The entire workflow is...

Ryan Dawson