બધા વિષયો
BasicsMar 28, 2026· 9 min read

Model Evaluation: Accuracy, Precision, Recall, અને F1 Score — ML મોડલ સારો છે કે ખરાબ કેવી રીતે નક્કી કરવું?

લગભગ 95% એક્યુરેસી ધરાવતો ML મોડલ હંમેશા સારો હોય એવું જરૂરી નથી. ઉદાહરણ તરીકે, Cancer Detection માં 95% Accuracy એટલે 5 દર્દીઓની ભૂલ! Confusion Matrix, Precision, Recall, F1 Score, અને AUC-ROC શું છે અને ક્યારે કયો મેટ્રિક વાપરવો તેની Python કોડ સાથેની સરળ ગુજરાતી સમજૂતી.

Model Evaluation: Accuracy, Precision, Recall, F1 — Model સારો છે કે ખોટો?

ML Model Train થઈ ગઈ. હવે?

"Model સારી છે" — કઈ રીતે ખ્યાલ આવ્યો? "95% Accuracy" — Enough?

ઘણી વાર Accuracy Mislead કરે. એક ઉદાહરણ:

🏥 Cancer Detection Model — 100 Patients:

  • 95 Healthy, 5 Cancer
  • Model Predict: બધા Healthy
  • Accuracy: 95% — "Wow!"
  • પણ 5 Cancer Patients Miss — Model Useless!

Accuracy High = Model Good — આ Assumption ખોટી. સાચું Evaluation જોઈએ — Precision, Recall, F1.


Confusion Matrix — Evaluation ની શરૂઆત

Confusion Matrix = Model ની Prediction ની Report Card — 4 Boxes:

                    ACTUAL
                 Positive  Negative
PREDICTED  Positive |  TP  |  FP  |
           Negative |  FN  |  TN  |

4 Terms:

Term Full Form અર્થ ઉદાહરણ
TP True Positive સાચું Positive Cancer છે, Model કહ્યું Cancer ✅
TN True Negative સાચું Negative Healthy છે, Model કહ્યું Healthy ✅
FP False Positive ખોટું Positive Healthy છે, Model કહ્યું Cancer ❌
FN False Negative ખોટું Negative Cancer છે, Model કહ્યું Healthy ❌

💡 સરળ Rule: પહેલો શબ્દ = Prediction સાચી (True) કે ખોટી (False). બીજો શબ્દ = Model શું Predict કર્યું (Positive/Negative).


Accuracy — "Overall Score"

Accuracy = (TP + TN) / (TP + TN + FP + FN)

ઉદાહરણ — Email Spam Filter (100 Emails):

  • 90 Normal, 10 Spam
  • Model: 88 Normal સાચા (TN), 8 Spam સાચા (TP), 2 Normal ને Spam (FP), 2 Spam Miss (FN)
Accuracy = (88 + 8) / 100 = 96%

Accuracy ક્યારે Enough?

  • Classes Balanced — Spam 50%, Normal 50%
  • FP અને FN ની Cost સરખી

Accuracy ક્યારે Mislead?

  • Imbalanced Data — 95% Healthy, 5% Cancer
  • FP/FN ની Cost અલગ

Precision — "Positive Predict કર્યા, ક્યા સાચા?"

Precision = TP / (TP + FP)

અર્થ: Model "Positive" Predict કરે ત્યારે — ક્યા % ખરેખર Positive?

ઉદાહરણ — Spam Filter: Model 15 Emails ને Spam Predict:

  • 12 ખરેખર Spam (TP)
  • 3 Normal Emails ખોટા Spam (FP)
Precision = 12 / (12 + 3) = 12/15 = 80%

Model Spam Predict કરે ત્યારે — 80% વખત સાચો.

💡 Precision Important ક્યારે? — FP (False Alarm) Costly હોય.

  • Spam Filter — Normal Email Delete = Problem
  • Court System — Innocent ને Guilty = Problem

Recall — "Actual Positive, ક્યા Catch થ્યા?"

Recall = TP / (TP + FN)

અર્થ: ખરેખર Positive Cases માંથી — Model ક્યા % Catch કર્યા?

ઉદાહરણ — Cancer Detection: 100 Patients, 10 ને Cancer:

  • Model 8 Cancer Detect (TP)
  • 2 Cancer Miss (FN)
Recall = 8 / (8 + 2) = 8/10 = 80%

Model 80% Cancer Patients Detect — 20% Miss.

💡 Recall Important ક્યારે? — FN (Miss) Costly હોય.

  • Cancer Detection — Miss = Life Threatening
  • Fraud Detection — Miss = Money Loss
  • COVID Test — Miss = Spread

Precision vs Recall — Tradeoff

Precision ↑ = Recall ↓ — ઘણી વખત Trade-off!

ઉદાહરણ — Spam Filter:

High Precision (Conservative Model):

  • ફક્ત 100% Sure Spam Flag → FP ઓછા → Precision High
  • ઘણા Spam Miss → FN વધે → Recall ઓછો

High Recall (Aggressive Model):

  • ઘણા Emails Spam Flag → Spam Miss ઓછા → Recall High
  • Normal Emails પણ Flag → FP વધે → Precision ઓછો

🎯 Analogy — Net Casting:

  • Wide Net (High Recall): ઘણી માછલી પકડાઈ — ઘણો Garbage પણ
  • Small Net (High Precision): ઓછો Garbage — ઘણી માછલી Escape
  • Smart Net (F1 Balance): સારી માછલી ઝડપ, Garbage ઓછો

F1 Score — Precision + Recall Balance

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

F1 = Precision અને Recall ની Harmonic Mean — Balance Measure.

ઉદાહરણ:

Precision = 80%,  Recall = 80%
F1 = 2 × (0.80 × 0.80) / (0.80 + 0.80) = 0.80 = 80%

Precision = 90%,  Recall = 50%
F1 = 2 × (0.90 × 0.50) / (0.90 + 0.50) = 0.64 = 64%

F1 ક્યારે ઉપયોગ?

  • Imbalanced Classes
  • Precision + Recall બંને Important
  • Single Score Comparison

ચારે Metrics — ક્યારે ક્યો?

Metric ઉપયોગ ક્યારે? ઉદાહરણ
Accuracy Balanced Classes, FP=FN Cost General Classification
Precision FP Costly Spam Filter, Legal
Recall FN Costly Cancer, Fraud, COVID
F1 Score Imbalanced + Both Matter Medical, NLP, Most ML

Python Code — sklearn ઉ Evaluation

from sklearn.metrics import (accuracy_score, precision_score,
                              recall_score, f1_score,
                              confusion_matrix, classification_report)

# Actual vs Predicted
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

# Individual Metrics
print("Accuracy: ", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:   ", recall_score(y_true, y_pred))
print("F1 Score: ", f1_score(y_true, y_pred))

# Confusion Matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))

# Full Report (Best!)
print("\nClassification Report:")
print(classification_report(y_true, y_pred))

Output:

Accuracy:  0.80
Precision: 0.80
Recall:    0.80
F1 Score:  0.80

Confusion Matrix:
[[3 1]
 [1 4]]

Classification Report:
              precision  recall  f1-score  support
           0       0.75    0.75      0.75        4
           1       0.80    0.80      0.80        5
    accuracy                         0.80       10

💡 classification_report()One Line, Full Picture — Production Model Evaluate ઉ આ ఒ Use!


Real World Scenarios

Scenario 1 — COVID Test:

  • FN = COVID Positive, Test Negative = Dangerous — Spread!
  • Recall Maximum — Miss ન કરો

Scenario 2 — YouTube Recommendation:

  • FP = Wrong Video Recommend = User Annoyed
  • FN = Good Video Miss = Acceptable
  • Precision Important

Scenario 3 — Bank Loan Approval:

  • FP = Bad Customer ને Loan = NPA (Loss)
  • FN = Good Customer Reject = Business Miss
  • Balance — F1 Score

Scenario 4 — Fire Alarm:

  • FN = Fire છે, Alarm ન વાગ્યો = Disaster!
  • FP = Fire નથી, Alarm વાગ્યો = Inconvenience
  • Recall Maximum

AUC-ROC — Advanced Evaluation (Bonus)

ROC Curve = Different Threshold ઉ Precision-Recall Plot.

AUC (Area Under Curve):

  • AUC = 1.0 → Perfect Model
  • AUC = 0.5 → Random Guess (Useless)
  • AUC = 0.85+ → Good Model
from sklearn.metrics import roc_auc_score

auc = roc_auc_score(y_true, y_prob)
print(f"AUC-ROC: {auc:.2f}")

💡 AUC-ROC — Threshold ઉ Depend ન — Overall Model Power Measure.


Summary — 1 Minute Cheat Sheet

Accuracy  = Overall Correct / Total
Precision = TP / (TP + FP)     → FP ઓછા જોઈએ
Recall    = TP / (TP + FN)     → FN ઓછા જોઈએ
F1 Score  = Balance of P & R   → Both Important

Imbalanced Data?  → F1 / Recall
FP Costly?        → Precision
FN Costly?        → Recall
Balanced Data?    → Accuracy OK

નિષ્કર્ષ

Accuracy Alone = Incomplete Picture. Precision + Recall + F1 = Full Story.

Cancer Detection — Recall. Spam Filter — Precision. Most Cases — F1. Right Metric Choose = Right Decision.

ML Model ની "Report Card" = Confusion Matrix + Classification Report — sklearn ની 2 Lines, Full Evaluation! 🎯

આ પણ વાંચો (Related Articles)

પ્રતિભાવ આપો