Decision Tree

Algorithm #3 — Phase 1: Supervised Learning
Decision Tree — Algorithm #3 Infographic

🌳 What Is It?

A Decision Tree is like playing "20 Questions" with your data. It makes decisions using a flowchart of if-then rules, where each node asks a question about a feature.

Mental Model: Imagine you're trying to guess which animal someone is thinking of. You'd ask questions like "Does it have fur?" or "Can it fly?" — each answer narrows down the possibilities until you reach the answer!
All Data
🎯
Feature A > 5?
Yes ✅
Group 1
No ❌
Group 2
Class A
🎉
Feature B < 10?
Class B
🎊
Class A
🎉

🔢 The Math Behind It

1. Entropy (Disorder Measure)

H(S) = -Σ pi × log₂(pi)

What it means: How mixed up is this group?

2. Information Gain

IG = H(parent) - weighted_avg[H(children)]

What it means: How much uncertainty did this split remove?

The algorithm picks the split with the highest information gain!

3. Gini Impurity (Alternative)

Gini = 1 - Σ pi²

What it means: Probability of incorrect classification

🎯 Key Concepts

1. Tree Depth vs Overfitting

Shallow Trees 🌱

High bias, low variance

Risk: Underfit (too simple)

Deep Trees 🌲

Low bias, high variance

Risk: Overfit (memorizes noise)

Sweet spot: Use cross-validation to find optimal max_depth

2. Pruning Strategies

3. Feature Importance

Measures how much each feature reduces impurity across all splits

Great for feature selection — tells you which features actually matter!

📊 When to Use Decision Trees

✅ Great For:

❌ Not Ideal For:

🆚 Comparison with Other Algorithms

vs Linear Regression

✅ Handles non-linearity

✅ No feature scaling needed

vs k-NN

✅ Faster prediction (O(log n))

✅ More interpretable

🛠️ Hyperparameters to Tune

Pro tip: Start with max_depth=3, then gradually increase until validation performance plateaus

🎓 Checkpoint Questions

Question 1: What is entropy?

Think before you peek at the answer below...

💡 Answer

Entropy is a measure of disorder/uncertainty in a dataset. 0 = perfectly pure (all same class), higher values = mixed classes. The algorithm uses entropy to decide which splits reduce uncertainty the most.

Question 2: How does a decision tree choose which feature to split on?

💡 Answer

The tree evaluates ALL possible splits, calculates information gain (reduction in entropy/Gini impurity) for each, and picks the split with the highest gain. This maximizes how much we learn from each question.

Question 3: Why do deep trees overfit?

💡 Answer

Deep trees memorize training data noise by creating too-specific rules. They perform great on training data but fail on new data. Pruning (limiting depth) keeps rules general and improves generalization.

🚀 Next Steps

After mastering Decision Trees, you'll learn:

Practice suggestion: Try the Kaggle Titanic dataset — a classic decision tree problem!

✨ Phase 1: Supervised Learning — Algorithm 3 of 7 ✨

Hafs Ibrahim
𝕏 @hafs_darwish LinkedIn GitHub Blog
30 AI Algorithms Curriculum • Sensei System
© 2026 Hafs Ibrahim. All rights reserved.