f1score and auc of roc

March 24, 2024

© 2024 borui. All rights reserved. This content may be freely reproduced, displayed, modified, or distributed with proper attribution to borui and a link to the article: borui(2024-03-24 09:39:26 +0000). f1score and auc of roc. https://borui/blog/2024-03-24-en-f1score-auc-roc.
@misc{
  borui2024,
  author = {borui},
  title = {f1score and auc of roc},
  year = {2024},
  publisher = {borui's blog},
  journal = {borui's blog},
  url={https://borui/blog/2024-03-24-en-f1score-auc-roc}
}

Types of problems in machine learning

There are two broad problems in Machine Learning, Classification and Regression. The first deals with discrete values, and the second deals with continuous values.\

Classification can be subdivided into two smaller types:

  1. Binary Binary Classification has two target labels, most of the time, one class is the normal state while the other is an abnormal state. Think of a fraudulent transaction model that predicts whether a transaction is fraudulent or not. This abnormal state (=fraudulent transaction) is sometimes underrepresented in some data, so detection might be critical, which means that you might need more sophisticated metrics.

  2. Multiclass Classification In Multiclass Classification, classes are equal to or greater than three. Multiclass classification problems can be solved using a similar binary classifier with the application of some strategy, i.e., One-vs-Rest or One-vs-One.

Confusion Matrix

A confusion matrix is a table with the distribution of classifier performance on the data. It’s a N x N matrix used for evaluating the performance of a classification model. It shows us how well the model is performing, what needs to be improved, and what error it’s making.

Where: TP – true positive ( the correctly predicted positive class outcome of the model), TN – true negative (the correctly predicted negative class outcome of the model), FP – false positive (the incorrectly predicted positive class outcome of the model), FN – false negative (the incorrectly predicted negative class outcome of the model).

Now let’s move on to metrics, starting with accuracy.

Accuracy = (TP + TN) / (TP+FN+FP+TN).

Recall is aka Sensitivity given the context, Recall = Sum(TP) / Sum(TP+FN). The recall is also called sensitivity in binary classification.

Precision = TP/(TP + FP).

specificity= TP/(TN + FP).

f1score

F1-score F1-score keeps the balance between precision and recall. It’s often used when class distribution is uneven, but it can also be defined as a statistical measure of the accuracy of an individual test.

F1 = 2 * ([precision * recall] / [precision + recall])

ROC Curve and AUC

ROC curve An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

True Positive Rate False Positive Rate True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows: TPR = TP/(TP + FN).

False Positive Rate (FPR) is defined as follows: FPR = FP/(TN + FP).

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.

📓 Note: Classification Threshold is a critical cut-off point in statistical classification models such as Logistic Regression, Random Forest, and Neural Networks. It is the value that distinguishes between the different class labels in a binary, or multi-class, classification problem. The model must output a numeric value as representation of possibility.

Balanced Accuracy

Balanced Accuracy is used in both binary and multi-class classification. It’s the arithmetic mean of sensitivity and specificity, its use case is when dealing with imbalanced data, i.e. when one of the target classes appears a lot more than the other.

Balanced Accuracy formula Balanced Accuracy formula Sensitivity: This is also known as true positive rate or recall, it measures the proportion of real positives that are correctly predicted out of all positive predictions that could be made by the model.

Sensitivity= TP / (TP + FN)

Specificity: Also known as true negative rate, it measures the proportion of correctly identified negatives over the total negative predictions that could be made by the model.

Specificity =TN / (TN + FP)

In anomaly detection like working on a fraudulent transaction dataset, we know most transactions would be legal, i.e. the ratio of fraudulent to legal transactions would be small, balanced accuracy is a good performance metric for imbalanced data like this.

Assume we have a binary classifier with a confusion matrix like the below:

Balanced Accuracy Binary Classification Accuracy = (TP + TN) / (TP+FN+FP+TN) = 20+5000 / (20+70+30+5000) Accuracy = ~98.05%. This score looks impressive, but it isn’t handling the Positive column properly.

So, let’s consider balanced accuracy, which will account for the imbalance in the classes. Below is the balanced accuracy computation for our classifier:

Sensitivity = TP / (TP + FN) = 20 / (20+70) = 22.2% Specificity = TN / (TN + FP) = 5000 / (5000 +30) = ~99.4%. Balanced Accuracy = (Sensitivity + Specificity) / 2 = 22.2 + 99.4 / 2 = 60.80% Balanced Accuracy does a great job because we want to identify the positives present in our classifier. This makes the score lower than what accuracy predicts as it gives the same weight to both classes.

Balanced Accuracy Multiclass Classification As it goes for binary, Balanced Accuracy is also useful for multiclass classification. Here, BA is the average of Recall obtained in each class, i.e. the macro average of recall scores per class. So, for a balanced dataset, the scores tend to be the same as Accuracy.

Let’s use an example to illustrate how balanced accuracy is a better metric for performance in imbalanced data. Assume we have a binary classifier with a confusion matrix as shown below:

Confusion Matrix Binary Classification The TN, TP, FN, and FP, gotten from each class is shown below:

Balanced accuracy positives negatives Let’s compute the Accuracy:

Accuracy = TP + TN / (TP+FP+FN+TN)

TP = 10 + 545 + 11 + 3 = 569 FP = 175 + 104 + 39 + 50 = 368 TN = 695 + 248 + 626 + 874 = 2443 FN = 57 + 40 + 261 + 10 = 368

Accuracy = 569 + 2443 / (569 + 368 + 368 + 2443) Accuracy = 0.803 The score looks great, but there’s a problem. The sets P and S are highly imbalanced, and the model did a poor job predicting this.

Let’s consider Balanced Accuracy.

Balanced Accuracy = (RecallP + RecallQ + RecallR + RecallS) / 4.

The recall is calculated for each class present in the data (like in binary classification) while the arithmetic mean of the recalls is taken.

Balanced Accuracy vs F1 Score

So you might be wondering what’s the difference between Balanced Accuracy and the F1-Score since both are used for imbalanced classification. So, let’s consider it.

F1 keeps the balance between precision and recall F1 = 2 * ([precision * recall] / [precision + recall]) Balanced Accuracy = (specificity + recall) / 2 F1 score doesn’t care about how many true negatives are being classified. When working on an imbalanced dataset that demands attention to the negatives, Balanced Accuracy does better than F1. In cases where positives are as important as negatives, balanced accuracy is a better metric for this than F1. F1 is a great scoring metric for imbalanced data when more attention is needed on the positives.

reference

  1. Motunrayo Olugbenga. (10th August, 2023). Balanced Accuracy: When Should You Use It? [Blog post]. Retrieved from https://neptune.ai/blog/balanced-accuracy

  2. Classification: ROC Curve and AUC. (18 July, 2022). Google developer. Retrieved from https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

  3. 林雨洲. (22 Sept, 2022). 深挖一下F1 score (F-measure, F-score). 知乎专栏. [Blog post]. Retrieved from https://zhuanlan.zhihu.com/p/161703182