The model had all the right ingredients: clean training data, carefully selected features, and performance metrics that looked solid on paper. It was designed to predict user retention, and in a controlled evaluation setting, it performed well. But in production, cracks started to appear. Power users were being misclassified as likely to churn. Casual users […]