results=c(0.31,0.36,0.33) names(results)=c("y=good","y=better","y=best") results
names(results)[results==max(results)] which(names(results)==(names(results)[results==max(results)])) More generally, however, avoid protected operators in your variable names (like the equality sign)! Rather choose something like y.good, y.better, y.best, or whatever you like as variable names. HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Ursprüngliche Nachricht----- Von: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Im Auftrag von Donald Catanzaro, PhD Gesendet: Thursday, August 13, 2009 2:04 AM Cc: r-help@r-project.org Betreff: Re: [R] Proper / Improper scoring Rules Hi All, I have done more background research (including Frank's book) so I feel that my second question is answered. However, as a novice R user I still have the following problem, accessing the output of predict. So simplifying my question, using the example provided in the Design package (http://lib.stat.cmu.edu/S/Harrell/help/Design/html/predict.lrm.html) I might do something like: > # See help for predict.Design for several binary logistic # regression > examples > > # Examples of predictions from ordinal models > set.seed(1) > y <- factor(sample(1:3, 400, TRUE), 1:3, c('good','better','best')) > x1 <- runif(400) > x2 <- runif(400) > f <- lrm(y ~ rcs(x1,4)*x2) > predict(f, type="fitted.ind")[1:10,] #gets Prob(better) and all others y=good y=better y=best 1 0.3124704 0.3631544 0.3243752 2 0.3676075 0.3594685 0.2729240 3 0.2198274 0.3437416 0.4364309 4 0.3063463 0.3629658 0.3306879 5 0.5171323 0.3136088 0.1692590 6 0.3050115 0.3629071 0.3320813 7 0.3532452 0.3612928 0.2854620 8 0.2933928 0.3621220 0.3444852 9 0.3068595 0.3629867 0.3301538 10 0.6214710 0.2612164 0.1173126 > d <- data.frame(x1=.5,x2=.5) > predict(f, d, type="fitted") # Prob(Y>=j) for new observation y>=better y>=best 0.6906593 0.3275849 > predict(f, d, type="fitted.ind") # Prob(Y=j) y=good y=better y=best 0.3093407 0.3630744 0.3275849 So now if I wanted to do > out <- predict(f, d, type="fitted.ind")> > out y=good y=better y=best 0.3093407 0.3630744 0.3275849 > out$"y=better" Error in out$"y=better" : $ operator is invalid for atomic vectors > y=better is the max, so how do I create something that says that ? (which is not exactly what I want to do but close enough to help me figure out what R code I need to accomplish the task) I can push the predictions out to a vector: out.vector <- as.vector(predict(f, d, type="fitted.ind")) > out.vector [1] 0.3093407 0.3630744 0.3275849 which gets me part of the way because I can find out max(out.vector) but I still need to know what column the max is in. I think the problem is that I don't know how to manipulate data frames and vectors in R and need some guidance -Don Don Catanzaro, PhD Landscape Ecologist dgcatanz...@gmail.com 16144 Sigmond Lane Lowell, AR 72745 479-751-3616 Frank E Harrell Jr wrote: > Donald Catanzaro, PhD wrote: >> Hi All, >> >> I am working on some ordinal logistic regresssions using LRM in the >> Design package. My response variable has three categories (1,2,3) >> and after using the creating my model and using a call to predict >> some values and I wanted to use a simple .5 cut-off to classify my >> probabilities into the categories. >> >> I had two questions: >> >> a) first, I am having trouble directly accessing the probabilities >> which may have more to do with my lack of experience with R >> >> For instance, my calls >> >> >ologit.three.NoPerFor <- lrm(Threshold.Three ~ TECI , data=CLD, >> na.action=na.pass) >> >CLD$Threshold.Predict.Three.NoPerFor<- >> predict(ologit.three.NoPerFor, newdata=CLD, type="fitted.ind") >> >> >CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Thre >> e.NoPerFor.Threshold.Three=1 >> > .5] <- 1 >> Error: unexpected '=' in >> "CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Three.NoPer For.Threshold.Three=" >> >> > >> > >> >> produce an error message and it seems as R does not like the equal >> sign at all. So how does one access the probabilities so I can >> classify them into the categories of 1,2,3 so I can look at >> performance of my model ? > > use == to check equality > >> >> b) which leads me to my next question. I thought that simply >> calculating the percent correct off of my predictions would be >> sufficient to look at performance but since my question is very much >> in line with this thread >> http://tolstoy.newcastle.edu.au/R/e4/help/08/04/8987.html I am not so >> sure anymore. I am afraid I did not understand Frank Harrell's last >> suggestion regarding improper scoring rule - can someone point me to >> some internet resources that I might be able to review to see why my >> approach would not be valid ? > > Percent correct will give you misleading answers and is game-able. It > is also ultra-high-variance. Though not a truly proper scoring rule, > Somers' Dxy rank correlation (generalization of ROC area) is helpful. > Better still: use the log-likelihood and related quantities (deviance, > adequacy index as described in my book). > > Frank > >> >> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.