Predicting Patient Readmission with Machine Learning

1. Why Predict Patient Readmission?
Patient readmission refers to when a patient who has been discharged from a hospital is admitted again within a short period—typically 30 days. Frequent readmissions can indicate poor care quality, ineffective discharge planning, or underlying chronic conditions. Using ML to predict readmissions can save hospitals costs and improve patient outcomes.
2. Importing Essential Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
3. Loading and Understanding the Data
df = pd.read_csv('diabetic_data.csv')
print(df.shape)
print(df.columns)
The dataset contains over 100,000 hospital encounters with more than 50 features related to diagnosis, procedures, medications, etc.
4. Data Cleaning
Remove irrelevant or high-missing columns like 'weight', 'payer_code', 'medical_specialty'.
df = df.drop(['weight', 'payer_code', 'medical_specialty'], axis=1)
df.replace('?', np.nan, inplace=True)
df = df.dropna()
5. Target Variable
We will transform the 'readmitted' column to a binary label.
df['readmitted'] = df['readmitted'].apply(lambda x: 1 if x == '<30' else 0)
6. Feature Encoding
Convert categorical variables into numerical format.
df = pd.get_dummies(df, drop_first=True)
7. Train-Test Split
X = df.drop('readmitted', axis=1)
y = df['readmitted']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
8. Model Building
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
9. Evaluation
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
10. ROC & AUC
from sklearn.metrics import roc_auc_score, roc_curve
y_prob = model.predict_proba(X_test)[:,1]
roc_auc = roc_auc_score(y_test, y_prob)
print("AUC: ", roc_auc)
11. Model Explainability with SHAP
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values[1], X_test)
12. Deployment (Optional)
You can deploy this model using Flask or FastAPI on a cloud service like Heroku or Render.
13. Conclusion
This project demonstrated how to build an end-to-end ML pipeline to predict hospital readmissions. It covered data preprocessing, model training, evaluation, and interpretation. Accurate predictions can assist healthcare providers in planning better post-discharge strategies and reducing costs.
Comments
Post a Comment