Model Evaluation: Accuracy, Precision, Recall, અને F1 Score — ML મોડલ સારો છે કે ખરાબ કેવી રીતે નક્કી કરવું?

Model Evaluation: Accuracy, Precision, Recall, F1 — Model સારો છે કે ખોટો?

ML Model Train થઈ ગઈ. હવે?

"Model સારી છે" — કઈ રીતે ખ્યાલ આવ્યો? "95% Accuracy" — Enough?

ઘણી વાર Accuracy Mislead કરે. એક ઉદાહરણ:

🏥 Cancer Detection Model — 100 Patients:

95 Healthy, 5 Cancer

Model Predict: બધા Healthy

Accuracy: 95% — "Wow!"

પણ 5 Cancer Patients Miss — Model Useless!

Accuracy High = Model Good — આ Assumption ખોટી. સાચું Evaluation જોઈએ — Precision, Recall, F1.

Confusion Matrix — Evaluation ની શરૂઆત

Confusion Matrix = Model ની Prediction ની Report Card — 4 Boxes:

                    ACTUAL
                 Positive  Negative
PREDICTED  Positive |  TP  |  FP  |
           Negative |  FN  |  TN  |

4 Terms:

Term	Full Form	અર્થ	ઉદાહરણ
TP	True Positive	સાચું Positive	Cancer છે, Model કહ્યું Cancer ✅
TN	True Negative	સાચું Negative	Healthy છે, Model કહ્યું Healthy ✅
FP	False Positive	ખોટું Positive	Healthy છે, Model કહ્યું Cancer ❌
FN	False Negative	ખોટું Negative	Cancer છે, Model કહ્યું Healthy ❌

💡 સરળ Rule: પહેલો શબ્દ = Prediction સાચી (True) કે ખોટી (False). બીજો શબ્દ = Model શું Predict કર્યું (Positive/Negative).

Accuracy — "Overall Score"

Accuracy = (TP + TN) / (TP + TN + FP + FN)

ઉદાહરણ — Email Spam Filter (100 Emails):

90 Normal, 10 Spam
Model: 88 Normal સાચા (TN), 8 Spam સાચા (TP), 2 Normal ને Spam (FP), 2 Spam Miss (FN)

Accuracy = (88 + 8) / 100 = 96%

Accuracy ક્યારે Enough?

Classes Balanced — Spam 50%, Normal 50%
FP અને FN ની Cost સરખી

Accuracy ક્યારે Mislead?

Imbalanced Data — 95% Healthy, 5% Cancer
FP/FN ની Cost અલગ

Precision — "Positive Predict કર્યા, ક્યા સાચા?"

Precision = TP / (TP + FP)

અર્થ: Model "Positive" Predict કરે ત્યારે — ક્યા % ખરેખર Positive?

ઉદાહરણ — Spam Filter: Model 15 Emails ને Spam Predict:

12 ખરેખર Spam (TP)
3 Normal Emails ખોટા Spam (FP)

Precision = 12 / (12 + 3) = 12/15 = 80%

Model Spam Predict કરે ત્યારે — 80% વખત સાચો.

💡 Precision Important ક્યારે? — FP (False Alarm) Costly હોય.

Spam Filter — Normal Email Delete = Problem

Court System — Innocent ને Guilty = Problem

Recall — "Actual Positive, ક્યા Catch થ્યા?"

Recall = TP / (TP + FN)

અર્થ: ખરેખર Positive Cases માંથી — Model ક્યા % Catch કર્યા?

ઉદાહરણ — Cancer Detection: 100 Patients, 10 ને Cancer:

Model 8 Cancer Detect (TP)
2 Cancer Miss (FN)

Recall = 8 / (8 + 2) = 8/10 = 80%

Model 80% Cancer Patients Detect — 20% Miss.

💡 Recall Important ક્યારે? — FN (Miss) Costly હોય.

Cancer Detection — Miss = Life Threatening

Fraud Detection — Miss = Money Loss

COVID Test — Miss = Spread

Precision vs Recall — Tradeoff

Precision ↑ = Recall ↓ — ઘણી વખત Trade-off!

ઉદાહરણ — Spam Filter:

High Precision (Conservative Model):

ફક્ત 100% Sure Spam Flag → FP ઓછા → Precision High
ઘણા Spam Miss → FN વધે → Recall ઓછો

High Recall (Aggressive Model):

ઘણા Emails Spam Flag → Spam Miss ઓછા → Recall High
Normal Emails પણ Flag → FP વધે → Precision ઓછો

🎯 Analogy — Net Casting:

Wide Net (High Recall): ઘણી માછલી પકડાઈ — ઘણો Garbage પણ

Small Net (High Precision): ઓછો Garbage — ઘણી માછલી Escape

Smart Net (F1 Balance): સારી માછલી ઝડપ, Garbage ઓછો

F1 Score — Precision + Recall Balance

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

F1 = Precision અને Recall ની Harmonic Mean — Balance Measure.

ઉદાહરણ:

Precision = 80%,  Recall = 80%
F1 = 2 × (0.80 × 0.80) / (0.80 + 0.80) = 0.80 = 80%

Precision = 90%,  Recall = 50%
F1 = 2 × (0.90 × 0.50) / (0.90 + 0.50) = 0.64 = 64%

F1 ક્યારે ઉપયોગ?

Imbalanced Classes
Precision + Recall બંને Important
Single Score Comparison

ચારે Metrics — ક્યારે ક્યો?

Metric	ઉપયોગ ક્યારે?	ઉદાહરણ
Accuracy	Balanced Classes, FP=FN Cost	General Classification
Precision	FP Costly	Spam Filter, Legal
Recall	FN Costly	Cancer, Fraud, COVID
F1 Score	Imbalanced + Both Matter	Medical, NLP, Most ML

Python Code — sklearn ઉ Evaluation

from sklearn.metrics import (accuracy_score, precision_score,
                              recall_score, f1_score,
                              confusion_matrix, classification_report)

# Actual vs Predicted
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]

# Individual Metrics
print("Accuracy: ", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall:   ", recall_score(y_true, y_pred))
print("F1 Score: ", f1_score(y_true, y_pred))

# Confusion Matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))

# Full Report (Best!)
print("\nClassification Report:")
print(classification_report(y_true, y_pred))

Output:

Accuracy:  0.80
Precision: 0.80
Recall:    0.80
F1 Score:  0.80

Confusion Matrix:
[[3 1]
 [1 4]]

Classification Report:
              precision  recall  f1-score  support
           0       0.75    0.75      0.75        4
           1       0.80    0.80      0.80        5
    accuracy                         0.80       10

💡 classification_report() — One Line, Full Picture — Production Model Evaluate ઉ આ ఒ Use!

Real World Scenarios

Scenario 1 — COVID Test:

FN = COVID Positive, Test Negative = Dangerous — Spread!
Recall Maximum — Miss ન કરો

Scenario 2 — YouTube Recommendation:

FP = Wrong Video Recommend = User Annoyed
FN = Good Video Miss = Acceptable
Precision Important

Scenario 3 — Bank Loan Approval:

FP = Bad Customer ને Loan = NPA (Loss)
FN = Good Customer Reject = Business Miss
Balance — F1 Score

Scenario 4 — Fire Alarm:

FN = Fire છે, Alarm ન વાગ્યો = Disaster!
FP = Fire નથી, Alarm વાગ્યો = Inconvenience
Recall Maximum

AUC-ROC — Advanced Evaluation (Bonus)

ROC Curve = Different Threshold ઉ Precision-Recall Plot.

AUC (Area Under Curve):

AUC = 1.0 → Perfect Model
AUC = 0.5 → Random Guess (Useless)
AUC = 0.85+ → Good Model

from sklearn.metrics import roc_auc_score

auc = roc_auc_score(y_true, y_prob)
print(f"AUC-ROC: {auc:.2f}")

💡 AUC-ROC — Threshold ઉ Depend ન — Overall Model Power Measure.

Summary — 1 Minute Cheat Sheet

Accuracy  = Overall Correct / Total
Precision = TP / (TP + FP)     → FP ઓછા જોઈએ
Recall    = TP / (TP + FN)     → FN ઓછા જોઈએ
F1 Score  = Balance of P & R   → Both Important

Imbalanced Data?  → F1 / Recall
FP Costly?        → Precision
FN Costly?        → Recall
Balanced Data?    → Accuracy OK

નિષ્કર્ષ

Accuracy Alone = Incomplete Picture. Precision + Recall + F1 = Full Story.

Cancer Detection — Recall. Spam Filter — Precision. Most Cases — F1. Right Metric Choose = Right Decision.

ML Model ની "Report Card" = Confusion Matrix + Classification Report — sklearn ની 2 Lines, Full Evaluation! 🎯

WhatsApp પર શેર કરો Twitter LinkedIn

AI ની દુનિયા સાથે જોડાયેલા રહો! 🚀

દર અઠવાડિયે AI ની નવી અપડેટ્સ, પ્રોમ્પ્ટ્સ અને ફ્રી માર્ગદર્શિકા સીધા તમારા ઈમેલ પર ગુજરાતીમાં મેળવો.

શું આ લેખ તમારા માટે ફાયદાકારક હતો?

તમારો પ્રતિભાવ અમને વધુ સારી માહિતી આપવા માટે મદદરૂપ થશે.