In this article
Decision Tree & Random Forest: "20 Questions" Game જ ML Algorithm છે!
નાનપણ માં "20 Questions" Game રમ્યા?
"Animal છે?" — હા "4 પગ છે?" — હા "ઘરમાં રાખાય?" — હા "Bark કરે?" — હા "Dog!" 🐕
Congratulations — તમે Decision Tree Run કર્યો!
ML ની દુનિયામાં Decision Tree ઠીક આ Game ની Logic ઉ કામ કરે — Questions → Answers → Final Decision.
Decision Tree — "Question Tree"
Decision Tree = Data ઉ Based Questions પૂછો → Branch → Final Answer.
Structure:
[વ્યક્તિ Loan લેશે?]
|
________________________
| |
[Income > 50k?] [Income ≤ 50k?]
| |
_____|_____ _____|_____
| | | |
[Credit [Credit [Job Stable?] [Reject]
Score>700] Score≤700] |
| | _____|_____
[Approve] [Reject] | |
[Approve] [Reject]
3 Parts:
- Root Node — પ્રથમ Question (સૌથી Important Feature)
- Branch — Answer ની Direction (Yes/No, >/<)
- Leaf Node — Final Decision (Approve/Reject, Cat/Dog)
Real Example — Titanic Survival
[Male કે Female?]
|
___________
| |
[Male] [Female]
| |
[Age > 9?] [Survive ✅]
|
________
| |
[Yes] [No ≤ 9]
| |
[Siblings [Survive ✅]
> 2?]
|
______
| |
[No] [Yes]
| |
[✅] [❌]
ફક્ત 3 Questions — Titanic Survival Predict!
Decision Tree ક્યારે Split કરે?
Model "Best Question" ક્યારે નક્કી? — Information Gain / Gini Impurity.
Gini Impurity (સરળ):
- Pure Node = ફક્ત એક Class = Gini = 0 (Best)
- Mixed Node = 50-50 Mix = Gini = 0.5 (Worst)
Model એ Feature Select કરે — Split પછી સૌથી Pure Nodes.
💡 Analogy: Library માં Books Sort — Color? Author? Genre? — Genre Sort = સૌથી Useful Split — Reader ઝટ Book શોધે. Decision Tree Same Logic.
Decision Tree — Python Code
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Data Load
iris = load_iris()
X, y = iris.data, iris.target
# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Model Train
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)
# Predict & Evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
# Accuracy: 0.9667
Decision Tree ની Problems
Problem 1 — Overfitting:
Deep Tree = Training Data ના બધા Exception Memorize:
# ઘણો Deep = Overfit
model = DecisionTreeClassifier(max_depth=None) # ❌
# Better — Limit Depth
model = DecisionTreeClassifier(max_depth=5) # ✅
Problem 2 — Unstable:
Data માં થોડો Change → Completely Different Tree!
Original Data: Income > 50k → Root Node
+1 Data Point: Age > 30 → Root Node Changed!
Problem 3 — Biased:
ઘણા Values ધરાવતા Feature (Age: 1-100) ← Simple Feature (Gender: M/F) — Unfair Advantage.
Random Forest — "ઘણા Trees, એક Decision"
Decision Tree ની Problems નો Solution = Random Forest.
Random Forest = ઘણા Decision Trees ભેગા → Majority Vote → Final Answer.
🌳🌳🌳🌳🌳 → "Loan Approve?" → 4 Trees: Yes, 3 Trees: No → Majority: Yes ✅
Random Forest ક્ કઈ રીતે બને?
Step 1 — Bootstrap Sampling (Bagging): 1000 Rows Data → Random 800 Rows Select → Tree 1 1000 Rows Data → Random 800 Rows Select → Tree 2 ... 1000 Rows Data → Random 800 Rows Select → Tree 100
Step 2 — Random Feature Selection: 10 Features Total → દરેક Split ઉ Random 3 Features Consider → Different Trees, Different Perspective.
Step 3 — Voting:
Tree 1: Spam ✅
Tree 2: Not Spam ❌
Tree 3: Spam ✅
Tree 4: Spam ✅
Tree 5: Not Spam ❌
──────────────────
Final: Spam ✅ (3 vs 2 — Majority)
💡 Analogy — Doctor's Second Opinion: 1 Doctor ની Opinion = Decision Tree 5 Doctors ની Opinion → Majority = Random Forest ઘણા Expert = Better Decision!
Random Forest — Python Code
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
# Random Forest — 100 Trees
rf_model = RandomForestClassifier(
n_estimators=100, # Trees ની સંખ્યા
max_depth=5, # Tree Depth Limit
random_state=42
)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
print(classification_report(y_test, y_pred))
Output:
precision recall f1-score
setosa 1.00 1.00 1.00
versicolor 1.00 1.00 1.00
virginica 1.00 1.00 1.00
accuracy 1.00
Feature Importance — "ક્યો Feature Important?"
Random Forest ની Bonus Power — Feature Importance — ક્યો Column Model Decision ઉ સૌથી Influence?
import pandas as pd
import matplotlib.pyplot as plt
# Feature Importance
importances = rf_model.feature_importances_
feature_names = iris.feature_names
fi_df = pd.DataFrame({
'Feature': feature_names,
'Importance': importances
}).sort_values('Importance', ascending=False)
print(fi_df)
# Feature Importance
# petal length (cm) 0.4423
# petal width (cm) 0.4187
# sepal length (cm) 0.0954
# sepal width (cm) 0.0436
💡 Business Use: Customer Churn Predict — "Price, Service, Competitor" — ક્યો Factor સૌથી Important? — Random Forest Feature Importance ઝટ Answer!
Decision Tree vs Random Forest
| Decision Tree | Random Forest | |
|---|---|---|
| Trees | 1 | 100+ |
| Overfitting | ઘણો Risk | ઓછો (Averaging) |
| Accuracy | Moderate | High |
| Speed | ઝડપ | ધીમો (ઘણા Trees) |
| Interpretable | ✅ Visual | ❌ Black Box |
| Feature Importance | Basic | ✅ Reliable |
| Small Data | ✅ | Moderate |
| Large Data | ❌ Overfit | ✅ |
ક્યારે ક્યો Algorithm?
| Situation | Use |
|---|---|
| Explain Decision જોઈ (Bank, Medical) | Decision Tree |
| Maximum Accuracy | Random Forest |
| Fast Training | Decision Tree |
| Feature Importance | Random Forest |
| Visual / Presentation | Decision Tree |
| Production Model | Random Forest |
Real World Applications
🏦 Banking — Loan Approval: Features: Income, Credit Score, Age, Job Stability Random Forest → Approve/Reject + Why (Feature Importance)
🏥 Medical — Disease Prediction: Features: Symptoms, Age, Test Results Decision Tree → Doctor ને Explainable Logic
📧 Spam Detection: Features: Words, Links, Sender Random Forest → High Accuracy Spam Filter
🛒 E-commerce — Customer Churn: Features: Purchase History, Activity, Complaints Random Forest → "આ Customer જવાનો — Offer આપો!"
💳 Fraud Detection: Features: Amount, Location, Time, Merchant Random Forest → Real-time Transaction Fraud Alert
નિષ્કર્ષ
🌱 Decision Tree = 1 Tree, Simple, Explainable, Overfit Risk 🌲🌲🌲 Random Forest = ઘણા Trees, Accurate, Robust, Black Box
Simple Problem + Explain જોઈ → Decision Tree
Complex Problem + Accuracy → Random Forest
"20 Questions Game" → Decision Tree → Random Forest — ML ના સૌ પ્રથમ Powerful Algorithms — Zero Math, Pure Logic!" 🎯