Predicting Credit Card Defaults
Risk Assessment using Gradient Boosted Decision Trees and SHAP Interpretability

Overview
Financial institutions rely on accurate risk assessment to maintain stability. Using the UCI Default of Credit Card Clients Dataset, this project aims to predict the likelihood of a client defaulting on their payments. After comparing multiple machine learning architectures, our Gradient Boosted Decision Tree emerged as the best model with a test accuracy of 82%.
Feature Engineering & Selection
To better capture user behavior, I engineered a feature called Credit Utilization Ratio, that is calculated by amount of bill statement in a month divided by the amount of given credit (\(\frac{BILL\_AMT}{LIMIT\_BAL}\)). This metric represents the proportion of a client’s credit limit currently in use, an indicator of financial stress.
Experimented with Recursive feature elimination with cross-validation (RFECV) and Sequential Forward Selection (SFS) backward to select the best features.
To optimize the model’s performance and reduce noise, I implemented:
- Recursive feature elimination with cross-validation (RFECV)
- Sequential Forward Selection (SFS) backward
to identify the most predictive subset of features.
Model Implementation
We benchmarked four distinct classification algorithms to find the optimal balance between bias and variance:
Logistic Regression
from sklearn.linear_model import LogisticRegressionDecision Tree
from sklearn.tree import DecisionTreeClassifierRandom Forest
from sklearn.ensemble import RandomForestClassifierGradient Boosted Decision Tree
from xgboost import XGBClassifier
Hyperparameter Optimization
To maximize performance, I used RandomizedSearchCV for hyperparameter optimization, focusing on tree depth, learning rate, and estimator counts for the ensemble models.
Model Interpretation (SHAP)
Beyond just predicting “yes” or “no,” we focused on explainability. Using TreeExplainer from the SHAP library, we generated Waterfall and Beeswarm plots to visualize exactly how features like payment history and credit utilization influenced the model’s final decision.
Reflection
This was a very fun project to explore and compare the performance of different machine learning models.
Credit
Collaborator: Built with Nicole Link.