Hello, When evaluating different learning methods for a categorization problem with the (really useful!) caret package, I'm getting confusing results from the Kappa computation. The data is about 20,000 rows and a few dozen columns, and the categories are quite asymmetrical, 4.1% in one category and 95.9% in the other. When I train a ctree model as:
model <- train(dat.dts, dat.dts.class, method='ctree', tuneLength=8, trControl=trainControl(number = 5, workers=1), metric='Kappa') I get the following puzzling numbers: mincriterion Accuracy Kappa Accuracy SD Kappa SD 0.01 0.961 0.0609 0.00151 0.0264 0.15 0.962 0.049 0.00116 0.0248 0.29 0.963 0.0405 0.00227 0.035 0.43 0.964 0.0349 0.00257 0.0247 0.57 0.964 0.0382 0.0022 0.0199 0.71 0.964 0.0354 0.00255 0.0257 0.85 0.964 0.036 0.00224 0.024 0.99 0.965 0.0091 0.00173 0.0203 (mincriterion determines the likelihood of accepting a split into the tree.) The Accuracy numbers look sorta reasonable, if not great; the model overfits and barely beats the base rate if it builds a complicated tree. But the Kappa numbers go the opposite direction, and here's where I'm not sure what's going on. The examples in the vingette show Accuracy and Kappa being positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 - baserate), but the reported Kappa is definitely not that. Suggestions? Aside from looking for a better model, which would be good advice here, what metric would you recommend? Thank you! -Harlan [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.