Metrics Evaluation using `yardstick`

Motivation

Evaluating your machine learning algorithms is important part in your project. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. Metrics evaluation used to measure the performance of our algorithms. For Regression models, we usually use R-squared and MSE, but for Classification models we can use precision, recall and accuracy. Evaluating a classifier is often much more difficult than evaluating a regression algorithm. Usually, after we’ve got prediction from our models, we can use confusionMatrix() from caret packages to evaluate classification models. In this article we’ll discover how to evaluate Machine Learning Performance using yardstick packages for Classification Algorithms.

Installation

library(tidyverse)
library(dplyr)

To install yardstick package, you can run:

install.packages("yardstick")

# Development version:
devtools::install_github("tidymodels/yardstick")

How to use

After installation, we can called the library before used it:

library(yardstick)

We will demonstrate the data for evaluating used two_class_example. Take a look the data:

head(two_class_example)
#>    truth      Class1       Class2 predicted
#> 1 Class2 0.003589243 0.9964107574    Class2
#> 2 Class1 0.678621054 0.3213789460    Class1
#> 3 Class2 0.110893522 0.8891064779    Class2
#> 4 Class1 0.735161703 0.2648382969    Class1
#> 5 Class2 0.016239960 0.9837600397    Class2
#> 6 Class1 0.999275071 0.0007249286    Class1

Check structure from the data:

str(two_class_example)
#> 'data.frame':    500 obs. of  4 variables:
#>  $ truth    : Factor w/ 2 levels "Class1","Class2": 2 1 2 1 2 1 1 1 2 2 ...
#>  $ Class1   : num  0.00359 0.67862 0.11089 0.73516 0.01624 ...
#>  $ Class2   : num  0.996 0.321 0.889 0.265 0.984 ...
#>  $ predicted: Factor w/ 2 levels "Class1","Class2": 2 1 2 1 2 1 1 1 2 2 ...

These data are a test set form a model built from two classes (Class1 and Class2). There are columns for the true (truth), prediction (predicted) and columns for each probability for each class.

To evaluate the prediction, we can use Accuracy, Recall and Precision (recall course materials Classification in Machine Learning 1 & 2). Remember when we’re doing this in caret packages, we can use :

# data is prediction value, and reference is truh value
caret::confusionMatrix(data = two_class_example$predicted, reference = two_class_example$truth)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Class1 Class2
#>     Class1    227     50
#>     Class2     31    192
#>                                           
#>                Accuracy : 0.838           
#>                  95% CI : (0.8027, 0.8692)
#>     No Information Rate : 0.516           
#>     P-Value [Acc > NIR] : <2e-16          
#>                                           
#>                   Kappa : 0.6749          
#>                                           
#>  Mcnemar's Test P-Value : 0.0455          
#>                                           
#>             Sensitivity : 0.8798          
#>             Specificity : 0.7934          
#>          Pos Pred Value : 0.8195          
#>          Neg Pred Value : 0.8610          
#>              Prevalence : 0.5160          
#>          Detection Rate : 0.4540          
#>    Detection Prevalence : 0.5540          
#>       Balanced Accuracy : 0.8366          
#>                                           
#>        'Positive' Class : Class1          
#> 

In yardstick packages, we can customize what we want to show in our confusion matrix. By default, we don’t want customize the metrics it’ll give use accuracy and kap metrics:

# 2 class only
metrics(data = two_class_example, truth = truth, estimate = predicted)
#> # A tibble: 2 x 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy binary         0.838
#> 2 kap      binary         0.675

If we want to customize the metrics, we can create set of metrics that we want to show and then applied that to our data:

# set metrics 
multi_met <- metric_set(accuracy, precision, recall, spec)

two_class_example %>% 
  multi_met(truth = truth, estimate = predicted)
#> # A tibble: 4 x 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 accuracy  binary         0.838
#> 2 precision binary         0.819
#> 3 recall    binary         0.880
#> 4 spec      binary         0.793

If we have data with multi-class it’ll we very helpful because we can have accuracy, precision, recall and the others metrics from our prediction. We’ll use the hpc_cv data to demonstrate, the data column columns for the true class (obs), the class prediction (pred) and columns for each class probability (columns VF, F, M, and L). Additionally, a column for the resample indicator is included.

head(hpc_cv)
#>   obs pred        VF          F           M            L Resample
#> 1  VF   VF 0.9136340 0.07786694 0.008479147 1.991225e-05   Fold01
#> 2  VF   VF 0.9380672 0.05710623 0.004816447 1.011557e-05   Fold01
#> 3  VF   VF 0.9473710 0.04946767 0.003156287 4.999849e-06   Fold01
#> 4  VF   VF 0.9289077 0.06528949 0.005787179 1.564496e-05   Fold01
#> 5  VF   VF 0.9418764 0.05430830 0.003808013 7.294581e-06   Fold01
#> 6  VF   VF 0.9510978 0.04618223 0.002716177 3.841455e-06   Fold01

Let’s check what’s class for obs and pred:

levels(hpc_cv$obs)
#> [1] "VF" "F"  "M"  "L"
levels(hpc_cv$pred)
#> [1] "VF" "F"  "M"  "L"

To get the accuracy, we can use:

# multi-class
metrics(data = hpc_cv, truth = obs, estimate = pred)
#> # A tibble: 2 x 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <dbl>
#> 1 accuracy multiclass     0.709
#> 2 kap      multiclass     0.508

We can use the multi_met that we’ve created above, and we define the truth column and prediction column.

hpc_cv %>% 
  multi_met(truth = obs, estimate = pred)
#> # A tibble: 4 x 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 accuracy  multiclass     0.709
#> 2 precision macro          0.631
#> 3 recall    macro          0.560
#> 4 spec      macro          0.879

Annotation

  • More metrics you can access in here
  • How calculated the estimator here
  • How to custom your own metrics here
Scroll to Top