ML Production Pipeline

The problem

I learned the long-winded way that training a model is a milestone, not a finish line. After that comes serving predictions, logging what the model actually saw in production, and catching data drift before bad inputs quietly degrade results. That loop is what turns a proof of concept into a system a team can run and trust in production.

What I built

ML Production Pipeline is an end-to-end demo you run locally with Docker Compose. It trains a fraud classifier with scikit-learn and tracks each run in MLflow so you can compare settings, scores, and saved model files.

A FastAPI service serves predictions. Each request gets logged to Redis so monitoring can run on its own schedule without slowing the API. A separate Go checker reads those logs and watches for data drift (when live inputs stop looking like training data). When feature values drift enough from the training baseline, webhook alerts fire.

That is the full loop: train, serve, log what the model saw, monitor for drift, and know when it is time to retrain.

Why I’m building this

I’m an ML learner by choice, extending a platform and reliability background into model operations. The same curiosity applies: what breaks silently, what should you measure before users notice, and who gets notified when behavior changes?

What I learned

A model only knows the world it trained on. Data drift is when live inputs slowly stop matching that world. Accuracy can drop while the API still returns healthy responses, which is why you log what the model actually saw in production. You cannot catch drift from training metrics alone.

MLflow matters for the same reason: training is iterative. Log each run’s settings and scores so you know which model to deploy and what baseline the drift checker should compare against.

Python fits training and serving; Go fits drift detection, where you keep recalculating statistics on a stream of logged predictions. That work is CPU-heavy, and Python’s Global Interpreter Lock (GIL) keeps threads from running that math in parallel. Go handles concurrent number-crunching more cleanly. The drift checker uses Population Stability Index (PSI) for gradual shifts and a Kolmogorov–Smirnov (KS) test for sharper changes, but those scores only help if you pick a sensible baseline and window size. Otherwise you alert on noise.

The loop is train, serve, monitor, retrain when the data says the world changed.

Repo

Full source and design notes are on GitHub.