[R] caret's Kappa for categorical resampling

Harlan Harris Wed, 22 Jun 2011 12:38:41 -0700

Hello,

When evaluating different learning methods for a categorization problem with
the (really useful!) caret package, I'm getting confusing results from the
Kappa computation. The data is about 20,000 rows and a few dozen columns,
and the categories are quite asymmetrical, 4.1% in one category and 95.9% in
the other. When I train a ctree model as:


model <- train(dat.dts,
                 dat.dts.class,
                 method='ctree',
                 tuneLength=8,
                 trControl=trainControl(number = 5, workers=1),
                 metric='Kappa')

I get the following puzzling numbers:


  mincriterion  Accuracy  Kappa   Accuracy SD  Kappa SD
  0.01          0.961     0.0609  0.00151      0.0264
  0.15          0.962     0.049   0.00116      0.0248
  0.29          0.963     0.0405  0.00227      0.035
  0.43          0.964     0.0349  0.00257      0.0247
  0.57          0.964     0.0382  0.0022       0.0199
  0.71          0.964     0.0354  0.00255      0.0257
  0.85          0.964     0.036   0.00224      0.024
  0.99          0.965     0.0091  0.00173      0.0203


(mincriterion determines the likelihood of accepting a split into the tree.)
The Accuracy numbers look sorta reasonable, if not great; the model overfits
and barely beats the base rate if it builds a complicated tree. But the
Kappa numbers go the opposite direction, and here's where I'm not sure
what's going on. The examples in the vingette show Accuracy and Kappa being
positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 -
baserate), but the reported Kappa is definitely not that.

Suggestions? Aside from looking for a better model, which would be good
advice here, what metric would you recommend? Thank you!

 -Harlan

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] caret's Kappa for categorical resampling

Reply via email to