Motivation
Evaluating your machine learning algorithms is important part in your project. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. Metrics evaluation used to measure the performance of our algorithms. For Regression models, we usually use R-squared and MSE, but for Classification models we can use precision, recall and accuracy. Evaluating a classifier is often much more difficult than evaluating a regression algorithm. Usually, after we’ve got prediction from our models, we can use confusionMatrix()
from caret
packages to evaluate classification models. In this article we’ll discover how to evaluate Machine Learning Performance using yardstick
packages for Classification Algorithms.
Installation
library(tidyverse)
library(dplyr)
To install yardstick
package, you can run:
install.packages("yardstick")
# Development version:
devtools::install_github("tidymodels/yardstick")
How to use
After installation, we can called the library before used it:
library(yardstick)
We will demonstrate the data for evaluating used two_class_example
. Take a look the data:
head(two_class_example)
#> truth Class1 Class2 predicted
#> 1 Class2 0.003589243 0.9964107574 Class2
#> 2 Class1 0.678621054 0.3213789460 Class1
#> 3 Class2 0.110893522 0.8891064779 Class2
#> 4 Class1 0.735161703 0.2648382969 Class1
#> 5 Class2 0.016239960 0.9837600397 Class2
#> 6 Class1 0.999275071 0.0007249286 Class1
Check structure from the data:
str(two_class_example)
#> 'data.frame': 500 obs. of 4 variables:
#> $ truth : Factor w/ 2 levels "Class1","Class2": 2 1 2 1 2 1 1 1 2 2 ...
#> $ Class1 : num 0.00359 0.67862 0.11089 0.73516 0.01624 ...
#> $ Class2 : num 0.996 0.321 0.889 0.265 0.984 ...
#> $ predicted: Factor w/ 2 levels "Class1","Class2": 2 1 2 1 2 1 1 1 2 2 ...
These data are a test set form a model built from two classes (Class1
and Class2
). There are columns for the true (truth
), prediction (predicted
) and columns for each probability for each class.
To evaluate the prediction, we can use Accuracy, Recall and Precision (recall course materials Classification in Machine Learning 1 & 2). Remember when we’re doing this in caret
packages, we can use :
# data is prediction value, and reference is truh value
caret::confusionMatrix(data = two_class_example$predicted, reference = two_class_example$truth)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction Class1 Class2
#> Class1 227 50
#> Class2 31 192
#>
#> Accuracy : 0.838
#> 95% CI : (0.8027, 0.8692)
#> No Information Rate : 0.516
#> P-Value [Acc > NIR] : <2e-16
#>
#> Kappa : 0.6749
#>
#> Mcnemar's Test P-Value : 0.0455
#>
#> Sensitivity : 0.8798
#> Specificity : 0.7934
#> Pos Pred Value : 0.8195
#> Neg Pred Value : 0.8610
#> Prevalence : 0.5160
#> Detection Rate : 0.4540
#> Detection Prevalence : 0.5540
#> Balanced Accuracy : 0.8366
#>
#> 'Positive' Class : Class1
#>
In yardstick
packages, we can customize what we want to show in our confusion matrix. By default, we don’t want customize the metrics it’ll give use accuracy and kap metrics:
# 2 class only
metrics(data = two_class_example, truth = truth, estimate = predicted)
#> # A tibble: 2 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.838
#> 2 kap binary 0.675
If we want to customize the metrics, we can create set of metrics that we want to show and then applied that to our data:
# set metrics
multi_met <- metric_set(accuracy, precision, recall, spec)
two_class_example %>%
multi_met(truth = truth, estimate = predicted)
#> # A tibble: 4 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy binary 0.838
#> 2 precision binary 0.819
#> 3 recall binary 0.880
#> 4 spec binary 0.793
If we have data with multi-class it’ll we very helpful because we can have accuracy, precision, recall and the others metrics from our prediction. We’ll use the hpc_cv
data to demonstrate, the data column columns for the true class (obs), the class prediction (pred) and columns for each class probability (columns VF, F, M, and L). Additionally, a column for the resample indicator is included.
head(hpc_cv)
#> obs pred VF F M L Resample
#> 1 VF VF 0.9136340 0.07786694 0.008479147 1.991225e-05 Fold01
#> 2 VF VF 0.9380672 0.05710623 0.004816447 1.011557e-05 Fold01
#> 3 VF VF 0.9473710 0.04946767 0.003156287 4.999849e-06 Fold01
#> 4 VF VF 0.9289077 0.06528949 0.005787179 1.564496e-05 Fold01
#> 5 VF VF 0.9418764 0.05430830 0.003808013 7.294581e-06 Fold01
#> 6 VF VF 0.9510978 0.04618223 0.002716177 3.841455e-06 Fold01
Let’s check what’s class for obs
and pred
:
levels(hpc_cv$obs)
#> [1] "VF" "F" "M" "L"
levels(hpc_cv$pred)
#> [1] "VF" "F" "M" "L"
To get the accuracy, we can use:
# multi-class
metrics(data = hpc_cv, truth = obs, estimate = pred)
#> # A tibble: 2 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy multiclass 0.709
#> 2 kap multiclass 0.508
We can use the multi_met
that we’ve created above, and we define the truth column and prediction column.
hpc_cv %>%
multi_met(truth = obs, estimate = pred)
#> # A tibble: 4 x 3
#> .metric .estimator .estimate
#> <chr> <chr> <dbl>
#> 1 accuracy multiclass 0.709
#> 2 precision macro 0.631
#> 3 recall macro 0.560
#> 4 spec macro 0.879