A production-grade CI/CD pipeline for ML models needs three things model code does not: dataset versioning, experiment tracking, and a rollback path.
- 01Code in GitHub, dataset versions in DVC
Reproducibility starts with both being version-controlled.
- 02Train + log to MLflow on every PR
Automatic comparison against the production baseline.
- 03Block merge on regression in eval metrics
Pipeline fails if accuracy drops >1% on the holdout set.
- 04Promote to SageMaker via blue/green endpoints
Shadow traffic on the new endpoint for 24h before cutover.
- 05Monitor input/output drift; auto-rollback on threshold
SageMaker Model Monitor + Lambda rollback hook.
GitHub Actions Workflow
name: mlops
on: [pull_request, push]
jobs:
train:
runs-on: ubuntu-latest-large
steps:
- uses: actions/checkout@v4
- uses: iterative/setup-dvc@v1
- run: pip install -r requirements.txt
- run: dvc pull
- run: python -m src.train # logs to MLflow
- name: Compare vs prod baseline
run: python -m src.eval --baseline prod
- name: Promote on main
if: github.ref == 'refs/heads/main'
run: python -m src.promote --to sagemaker --strategy blue-green
Ready to optimize your cloud or AI footprint?
Book a free 30-minute architecture review. We will deliver a written cost-and-architecture audit within 48 hours.
Need help with CI/CD for ML models?
Ohveda runs free 30-minute architecture reviews. We will identify your top opportunities in writing within 48 hours — at no cost.