🎓 Advanced Machine Learning Series – Part 3: Stacking and Voting Ensembles

🎓 Advanced Machine Learning Series – Part 3: Stacking and Voting Ensembles

Welcome to Part 3 of the Advanced Machine Learning series on Darchumstech. In this tutorial, we explore Stacking and Voting—two advanced ensemble techniques that combine multiple models to boost accuracy and generalization.

Stacking (or stacked generalization) trains multiple base models and a meta-model. The base models learn the task, and the meta-model learns to combine their outputs.

  • Base learners can be any models: SVM, decision trees, KNN, etc.
  • Meta-learner is often logistic regression or gradient boosting.
  • Cross-validation is commonly used to avoid data leakage.

Here's an example using StackingClassifier from scikit-learn:

from sklearn.datasets import load_iris
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

base_learners = [
    ('svc', SVC(probability=True)),
    ('tree', DecisionTreeClassifier())
]

meta_learner = LogisticRegression()
clf = StackingClassifier(estimators=base_learners, final_estimator=meta_learner)
clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Voting combines predictions from multiple models and selects the final result by majority (classification) or averaging (regression).

  • Hard Voting: Predicts the class that gets the most votes.
  • Soft Voting: Averages the predicted probabilities and selects the highest.
  • Simple, effective, and easy to implement.

Here's a basic voting ensemble example using VotingClassifier:

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

voting_clf = VotingClassifier(
    estimators=[
        ('lr', LogisticRegression()),
        ('svc', SVC(probability=True)),
        ('tree', DecisionTreeClassifier())
    ],
    voting='soft'  # or 'hard'
)

voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


  

Comments