Hello Dimitris, Hello Gabor, absolutely incredible! I can't tell you how happy I am about your code which worked out of the box and saved me from days of boring and stupid Excel-handwork. Thank you a thousand times!
Just for other newbies, that might be faced with a similar problem, I'd like to make a few closing remarks to the way I calculate now: The read.table command is not necessary in my case, because there are already ready-to-use data.frames I created with the "reshape" package. So I started with the line: pairs<-data.frame(pred=factor(unlist(input.frame[2:21])),ref=factor(input.frame[,22])) # explanation for other newbies: creates a data.frame named pairs, with two columns. In the column pred(iction) you have the values from the columns 2-21 of the original input-data.frame "input.frame" which corresponds to all the observations the medical doctors made in my specific case. In the column ref(erence) you have the observations from gold-standard, which are assumed to be the truth. pred<-pairs$pred #saves column "pred" of "pairs" data.frame as vector named "pred" lab <- pairs$ref #saves column "ref" of "pairs" data.frame as vector named "lab" library(caret) #loads library "caret" confusionMatrix(pred, ref, positive=1) #creates a confusion matrix with sensitivity, specificity, accuracy, kappa and much more; please see documentation (?confusionMatrix) for details. Example output for the data.frame I sent with my original question: Confusion Matrix and Statistics Reference Prediction 0 1 0 656 122 1 24 38 Accuracy : 0.8262 95% CI : (0.7988, 0.8512) No Information Rate : 0.8095 P-Value [Acc > NIR] : 0.117 Kappa : 0.264 Sensitivity : 0.2375 Specificity : 0.9647 Pos Pred Value : 0.6129 Neg Pred Value : 0.8432 This works not only for input data that consists of 2 result-classes like true or false, but for data with multiple categories/result-classes as well! See example output: Confusion Matrix and Statistics Reference Prediction 0 1 10 11 100 101 110 0 349 31 60 40 66 1 15 1 25 80 1 22 3 17 3 10 0 1 24 8 3 1 10 11 1 6 5 3 1 0 2 100 3 1 6 7 24 0 5 101 0 0 0 0 0 1 0 110 2 1 4 0 3 0 5 Overall Statistics Accuracy : 0.5786 95% CI : (0.5444, 0.6122) No Information Rate : 0.4524 P-Value [Acc > NIR] : 1.506e-13 Kappa : 0.3571 Statistics by Class: Sensitivity Specificity Pos Pred Value Neg Pred Value Class: 0 0.9184 0.5370 0.6210 0.8885 Class: 1 0.6667 0.9014 0.5298 0.9419 Class: 10 0.2400 0.9689 0.5106 0.9042 Class: 11 0.0375 0.9803 0.1667 0.9063 Class: 100 0.2400 0.9703 0.5217 0.9043 Class: 101 0.0500 1.0000 1.0000 0.9774 Class: 110 0.1250 0.9875 0.3333 0.9576 This is much more than I ever had expected! (Thank you to Max Kuhn, the creator of "caret"-package!) The code from Dimitris (see below) perfectly re-samples the way I did the calculation in Excel by hand. Wow! This is very instructive for me. I never had thought about real programming, cause I always believed this is much too high for me. But as I now try to understand the code, that solves "my problem", I'll re-think this. It is still "magic" to me, but magic, one can learn ;-). I definitely like to become a so(u)rcerer's apprentice :-). So again thank you for your quick and efficient help! Great software, great community. I am really happy, that I decided "against all odds" and advice from colleagues not to use SPSS or SAS, but to learn R. I never had thought, that I might succeed in evaluating the results of our small study in just a few weeks by my own using R. Cheers, Felix. Dimitris Rizopoulos wrote: > try something like this: > > > dat <- read.table(textConnection("video 1 2 3 4 5 6 7 8 9 10 11 12 13 > 14 15 16 17 18 19 20 21 > 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 > 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 > 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 > 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 > 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 > 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 > 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 > 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 > 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 > 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 > 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 > 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 > 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"), > header = TRUE) > closeAllConnections() > > goldstand <- dat$X21 > prev <- sum(goldstand) > cprev <- sum(!goldstand) > n <- prev + cprev > lapply(dat[-1], function(x){ > tab <- table(x, goldstand) > cS <- colSums(tab) > if(nrow(tab) > 1 && ncol(tab) > 1) { > out <- c(sp = tab[1,1], sn = tab[2,2]) / cS > c(out, ac = (out[1] * cprev + out[2] * prev) / n) > } > }) > > > I hope it helps. > > Best, > Dimitris > > Quoting drflxms <[EMAIL PROTECTED]>: > >> Dear R-colleagues, >> >> this is a question from a R-newbie medical doctor: >> >> I am evaluating data on inter-observer-reliability in endoscopy. 20 >> medical doctors judged 42 videos filling out a multiple choice survey >> for each video. The overall-data is organized in a classical way: >> observations (items from the multiple choice survey) as columns, each >> case (identified by the two columns "number of medical doctor" and >> "number of video") in a row. In addition there is a medical doctor >> number 21 who is assumed to be a gold-standard. >> >> As measure of inter-observer-agreement I calculated kappa according to >> Fleiss and simple agreement in percent using the routines >> "kappam.fleiss" and "agree" from the irr-package. Everything worked fine >> so far. >> >> Now I'd like to calculate specificity, sensitivity and accuracy for each >> item (compared to the gold-standard), as these are well-known and easy >> to understand quantities for medical doctors. >> >> Unfortunately I haven't found a feasible way to do this in R so far. All >> solutions I found, describe calculation of specificity, sensitivity and >> accuracy from a contingency-table / confusion-matrix only. For me it is >> very difficult to create such contingency-tables / confusion-matrices >> from the raw data I have. >> >> So I started to do it in Excel by hand - a lot of work! When I'll keep >> on doing this, I'll miss the deadline. So maybe someone can help me out: >> >> It would be very convenient, if there is way to calculate specificity, >> sensitivity and accuracy from the very same data.frames I created for >> the calculation of kappa and agreement. In these data.frames, which were >> generated from the overall-data-table described above using the >> "reshape" package, we have the judging medical doctor in the columns and >> the videos in the rows. In the cells there are the coded answer-options >> from the multiple choice survey. Please see an simple example with >> answer-options 0/1 (copied from R console) below: >> >> video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 >> 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 >> 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 >> 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 >> 7 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 >> 9 9 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 0 0 0 1 0 >> 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 11 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 13 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 14 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 15 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 16 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 17 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 18 18 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 >> 19 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 20 20 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 21 21 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 >> 22 22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 23 23 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 >> 24 24 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 1 >> 25 25 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 >> 26 26 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 >> 27 27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 28 28 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 29 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 31 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 32 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 33 33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 34 34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 35 35 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 36 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 37 37 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 1 >> 38 38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> 39 39 0 1 0 0 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 >> 40 40 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 0 0 1 >> 41 41 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 >> 42 42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> What I did in Excel is: Creating the very same tables using >> pivot-charts. Comparing columns 1-20 to column 21 (gold-standard), >> summing up the count of values that are identical to 21. I repeated this >> for each answer-option. From the results, one can easily calculate >> specificity, sensitivity and accuracy. >> >> How to do this, or something similar leading to the same results in R? >> I'd appreciate any kind of help very much! >> >> Greetings from Munich, >> Felix >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.