Decision Tree અને Random Forest: "20 Questions" ગેમ જ એક ML આલ્ગોરિધમ છે!

Decision Tree & Random Forest: "20 Questions" Game જ ML Algorithm છે!

નાનપણ માં "20 Questions" Game રમ્યા?

"Animal છે?" — હા "4 પગ છે?" — હા "ઘરમાં રાખાય?" — હા "Bark કરે?" — હા "Dog!" 🐕

Congratulations — તમે Decision Tree Run કર્યો!

ML ની દુનિયામાં Decision Tree ઠીક આ Game ની Logic ઉ કામ કરે — Questions → Answers → Final Decision.

Decision Tree — "Question Tree"

Decision Tree = Data ઉ Based Questions પૂછો → Branch → Final Answer.

Structure:

                [વ્યક્તિ Loan લેશે?]
                       |
            ________________________
           |                        |
     [Income > 50k?]          [Income ≤ 50k?]
           |                        |
      _____|_____              _____|_____
     |           |            |           |
[Credit    [Credit        [Job Stable?] [Reject]
Score>700] Score≤700]         |
     |           |         _____|_____
  [Approve]  [Reject]    |           |
                       [Approve]  [Reject]

3 Parts:

Root Node — પ્રથમ Question (સૌથી Important Feature)
Branch — Answer ની Direction (Yes/No, >/<)
Leaf Node — Final Decision (Approve/Reject, Cat/Dog)

Real Example — Titanic Survival

[Male કે Female?]
      |
   ___________
  |           |
[Male]     [Female]
  |           |
[Age > 9?] [Survive ✅]
  |
________
|       |
[Yes] [No ≤ 9]
  |       |
[Siblings [Survive ✅]
 > 2?]
  |
______
|     |
[No] [Yes]
 |     |
[✅] [❌]

ફક્ત 3 Questions — Titanic Survival Predict!

Decision Tree ક્યારે Split કરે?

Model "Best Question" ક્યારે નક્કી? — Information Gain / Gini Impurity.

Gini Impurity (સરળ):

Pure Node = ફક્ત એક Class = Gini = 0 (Best)
Mixed Node = 50-50 Mix = Gini = 0.5 (Worst)

Model એ Feature Select કરે — Split પછી સૌથી Pure Nodes.

💡 Analogy: Library માં Books Sort — Color? Author? Genre? — Genre Sort = સૌથી Useful Split — Reader ઝટ Book શોધે. Decision Tree Same Logic.

Decision Tree — Python Code

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data Load
iris = load_iris()
X, y = iris.data, iris.target

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model Train
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)

# Predict & Evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
# Accuracy: 0.9667

Decision Tree ની Problems

Problem 1 — Overfitting:

Deep Tree = Training Data ના બધા Exception Memorize:

# ઘણો Deep = Overfit
model = DecisionTreeClassifier(max_depth=None)  # ❌

# Better — Limit Depth
model = DecisionTreeClassifier(max_depth=5)     # ✅

Problem 2 — Unstable:

Data માં થોડો Change → Completely Different Tree!

Original Data:  Income > 50k → Root Node
+1 Data Point:  Age > 30    → Root Node Changed!

Problem 3 — Biased:

ઘણા Values ધરાવતા Feature (Age: 1-100) ← Simple Feature (Gender: M/F) — Unfair Advantage.

Random Forest — "ઘણા Trees, એક Decision"

Decision Tree ની Problems નો Solution = Random Forest.

Random Forest = ઘણા Decision Trees ભેગા → Majority Vote → Final Answer.

🌳🌳🌳🌳🌳 → "Loan Approve?" → 4 Trees: Yes, 3 Trees: No → Majority: Yes ✅

Random Forest ક્ કઈ રીતે બને?

Step 1 — Bootstrap Sampling (Bagging): 1000 Rows Data → Random 800 Rows Select → Tree 1 1000 Rows Data → Random 800 Rows Select → Tree 2 ... 1000 Rows Data → Random 800 Rows Select → Tree 100

Step 2 — Random Feature Selection: 10 Features Total → દરેક Split ઉ Random 3 Features Consider → Different Trees, Different Perspective.

Step 3 — Voting:

Tree 1:  Spam ✅
Tree 2:  Not Spam ❌
Tree 3:  Spam ✅
Tree 4:  Spam ✅
Tree 5:  Not Spam ❌
──────────────────
Final:   Spam ✅ (3 vs 2 — Majority)

💡 Analogy — Doctor's Second Opinion: 1 Doctor ની Opinion = Decision Tree 5 Doctors ની Opinion → Majority = Random Forest ઘણા Expert = Better Decision!

Random Forest — Python Code

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Random Forest — 100 Trees
rf_model = RandomForestClassifier(
    n_estimators=100,    # Trees ની સંખ્યા
    max_depth=5,         # Tree Depth Limit
    random_state=42
)
rf_model.fit(X_train, y_train)

y_pred = rf_model.predict(X_test)
print(classification_report(y_test, y_pred))

Output:

              precision  recall  f1-score
    setosa         1.00    1.00      1.00
versicolor         1.00    1.00      1.00
 virginica         1.00    1.00      1.00
  accuracy                           1.00

Feature Importance — "ક્યો Feature Important?"

Random Forest ની Bonus Power — Feature Importance — ક્યો Column Model Decision ઉ સૌથી Influence?

import pandas as pd
import matplotlib.pyplot as plt

# Feature Importance
importances = rf_model.feature_importances_
feature_names = iris.feature_names

fi_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values('Importance', ascending=False)

print(fi_df)
#              Feature  Importance
# petal length (cm)      0.4423
# petal width  (cm)      0.4187
# sepal length (cm)      0.0954
# sepal width  (cm)      0.0436

💡 Business Use: Customer Churn Predict — "Price, Service, Competitor" — ક્યો Factor સૌથી Important? — Random Forest Feature Importance ઝટ Answer!

Decision Tree vs Random Forest

	Decision Tree	Random Forest
Trees	1	100+
Overfitting	ઘણો Risk	ઓછો (Averaging)
Accuracy	Moderate	High
Speed	ઝડપ	ધીમો (ઘણા Trees)
Interpretable	✅ Visual	❌ Black Box
Feature Importance	Basic	✅ Reliable
Small Data	✅	Moderate
Large Data	❌ Overfit	✅

ક્યારે ક્યો Algorithm?

Situation	Use
Explain Decision જોઈ (Bank, Medical)	Decision Tree
Maximum Accuracy	Random Forest
Fast Training	Decision Tree
Feature Importance	Random Forest
Visual / Presentation	Decision Tree
Production Model	Random Forest

Real World Applications

🏦 Banking — Loan Approval: Features: Income, Credit Score, Age, Job Stability Random Forest → Approve/Reject + Why (Feature Importance)

🏥 Medical — Disease Prediction: Features: Symptoms, Age, Test Results Decision Tree → Doctor ને Explainable Logic

📧 Spam Detection: Features: Words, Links, Sender Random Forest → High Accuracy Spam Filter

🛒 E-commerce — Customer Churn: Features: Purchase History, Activity, Complaints Random Forest → "આ Customer જવાનો — Offer આપો!"

💳 Fraud Detection: Features: Amount, Location, Time, Merchant Random Forest → Real-time Transaction Fraud Alert

નિષ્કર્ષ

🌱 Decision Tree = 1 Tree, Simple, Explainable, Overfit Risk 🌲🌲🌲 Random Forest = ઘણા Trees, Accurate, Robust, Black Box

Simple Problem + Explain જોઈ  →  Decision Tree
Complex Problem + Accuracy   →  Random Forest

"20 Questions Game" → Decision Tree → Random Forest — ML ના સૌ પ્રથમ Powerful Algorithms — Zero Math, Pure Logic!" 🎯

WhatsApp પર શેર કરો Twitter LinkedIn

AI ની દુનિયા સાથે જોડાયેલા રહો! 🚀

દર અઠવાડિયે AI ની નવી અપડેટ્સ, પ્રોમ્પ્ટ્સ અને ફ્રી માર્ગદર્શિકા સીધા તમારા ઈમેલ પર ગુજરાતીમાં મેળવો.

શું આ લેખ તમારા માટે ફાયદાકારક હતો?

તમારો પ્રતિભાવ અમને વધુ સારી માહિતી આપવા માટે મદદરૂપ થશે.