AI in Data

Documenting the Data Engineering side of AI
Curated by Bartosz Mikulski

Q&A: Do I need to monitor data drift if I can measure the ML model quality?

Monitoring data drift to get an early warning about incomming model performance problems

ML Model Monitoring – 9 Tips From the Trenches

A practical guide to finding common problems with ML models and fixing them

Concept Drift and Model Decay in Machine Learning

An explanation of the most common problems related to ML model deployed in production

Model Monitoring: What it is & why does it matter?

Preventing ML model degradation over time by monitoring the model's perfromance KPIs

Monitoring ML pipelines

An introduction to monitoring ML pipelines in production. The article covers Monitoring your infrastructure, the input data, and the ML training process.

Shadow deployment vs. canary release of machine learning models

How to roll out machine learning models in three stages to ensure that the model works properly in production

Deploying your first ML model in production

What to do when you want the model in production as fast as possible. Overengineering is fun, but right now, you need results. Fast.

Reproducibility in ML: why it matters and how to achieve it

Root Causes of Non-Determinism and how to fix those issues

ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

Using experiment tracking to compare experiments, analyze results, debug the model training code, and improve team collaboration by sharing experiment results.

Experiment Tracking: What it is, Best Practices & Tools

What experiment is tracking, and why do you need it? Ideas on experiment tracking implementation better than using a shared Excel file

What is a Feature Store?

An explanation which focusing on the technical building blocks of a feature store and the separation of responsibilities between data engineers and data scientists.

Unit Testing Data: What Is It and How Do You Do It?

Monitoring the data quality and ensuring that the feature store always contains valuable data. Hints of the kinds of data quality checks that we can...

Data Versioning: What is it & why is it important?

An explanation of why we need data versioning and what kinds of data versioning tools exist.

What is a Vector Database?

We transform the raw data into vector embeddings to train/use an ML model (for example, in language processing). Vector databases store such embeddings and offer...

Modern SQL

The author goes far beyond the basic SQL tutorials that got stuck with the SQL-92 standard.

Scaling An ML Team (0–10 People)

Don't write your own tools. Everything you need at the beginning has already been written by someone else. Invest effort in automation. It would be...

Why is DevOps for Machine Learning so Different?

The role of MLOps is to support the whole flow of training, serving, rollout, and monitoring, not only deployment and testing. The entire workflow is...