That the most common formula, but not the only one. See Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician*, *39*(4), 279285.
Traditionally, the symbol 'R' is used for the Pearson correlation coefficient and one way to calculate R^2 is... R^2. Max On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr <deter...@umn.edu>wrote: > I was under the impression that in PLS analysis, R2 was calculated by 1- > (Residual sum of squares) / (Sum of squares). Is this still what you are > referring to? I am aware of the linear R2 which is how well two variables > are correlated but the prior equation seems different to me. Could you > explain if this is the same concept? > > Charles > > > On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn <mxk...@gmail.com> wrote: > >> > Is there some literature that you make that statement? >> >> No, but there isn't literature on changing a lightbulb with a duck either. >> >> > Are these papers incorrect in using these statistics? >> >> Definitely, if they convert 3+ categories to integers (but there are >> specialized R^2 metrics for binary classification models). Otherwise, they >> are just using an ill-suited "score". >> >> How would you explain such an R^2 value to someone? R^2 is >> a function of correlation between the two random variables. For two >> classes, one of them is binary. What does it mean? >> >> Historically, models rooted in computer science (eg neural networks) used >> RMSE or SSE to fit models with binary outcomes and that *can* work work >> well. >> >> However, I don't think that communicating R^2 is effective. Other metrics >> (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to >> measure the ability of a model to classify and work well. With 3+ >> categories, I tend to use Kappa. >> >> Max >> >> >> >> >> On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr <deter...@umn.edu>wrote: >> >>> Thank you for your response Max. Is there some literature that you make >>> that statement? I am confused as I have seen many publications that >>> contain R^2 and Q^2 following PLSDA analysis. The analysis usually is to >>> discriminate groups (ie. classification). Are these papers incorrect in >>> using these statistics? >>> >>> Regards, >>> Charles >>> >>> >>> On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn <mxk...@gmail.com> wrote: >>> >>>> Charles, >>>> >>>> You should not be treating the classes as numeric (is virginica really >>>> three times setosa?). Q^2 and/or R^2 are not appropriate for >>>> classification. >>>> >>>> Max >>>> >>>> >>>> On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr >>>> <deter...@umn.edu>wrote: >>>> >>>>> I have discovered on of my errors. The timematrix was unnecessary and >>>>> an >>>>> unfortunate habit I brought from another package. The following >>>>> provides >>>>> the same R2 values as it should, however, I still don't know how to >>>>> retrieve Q2 values. Any insight would again be appreciated: >>>>> >>>>> library(caret) >>>>> library(pls) >>>>> >>>>> data(iris) >>>>> >>>>> #needed to convert to numeric in order to do regression >>>>> #I don't fully understand this but if I left as a factor I would get an >>>>> error following the summary function >>>>> iris$Species=as.numeric(iris$Species) >>>>> inTrain1=createDataPartition(y=iris$Species, >>>>> p=.75, >>>>> list=FALSE) >>>>> >>>>> training1=iris[inTrain1,] >>>>> testing1=iris[-inTrain1,] >>>>> >>>>> ctrl1=trainControl(method="cv", >>>>> number=10) >>>>> >>>>> plsFit2=train(Species~., >>>>> data=training1, >>>>> method="pls", >>>>> trControl=ctrl1, >>>>> metric="Rsquared", >>>>> preProc=c("scale")) >>>>> >>>>> data(iris) >>>>> training1=iris[inTrain1,] >>>>> datvars=training1[,1:4] >>>>> dat.sc=scale(datvars) >>>>> >>>>> pls.dat=plsr(as.numeric(training1$Species)~dat.sc, >>>>> ncomp=3, method="oscorespls", data=training1) >>>>> >>>>> x=crossval(pls.dat, segments=10) >>>>> >>>>> summary(x) >>>>> summary(plsFit2) >>>>> >>>>> Regards, >>>>> Charles >>>>> >>>>> On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr <deter...@umn.edu >>>>> >wrote: >>>>> >>>>> > Greetings, >>>>> > >>>>> > I have been exploring the use of the caret package to conduct some >>>>> plsda >>>>> > modeling. Previously, I have come across methods that result in a >>>>> R2 and >>>>> > Q2 for the model. Using the 'iris' data set, I wanted to see if I >>>>> could >>>>> > accomplish this with the caret package. I use the following code: >>>>> > >>>>> > library(caret) >>>>> > data(iris) >>>>> > >>>>> > #needed to convert to numeric in order to do regression >>>>> > #I don't fully understand this but if I left as a factor I would get >>>>> an >>>>> > error following the summary function >>>>> > iris$Species=as.numeric(iris$Species) >>>>> > inTrain1=createDataPartition(y=iris$Species, >>>>> > p=.75, >>>>> > list=FALSE) >>>>> > >>>>> > training1=iris[inTrain1,] >>>>> > testing1=iris[-inTrain1,] >>>>> > >>>>> > ctrl1=trainControl(method="cv", >>>>> > number=10) >>>>> > >>>>> > plsFit2=train(Species~., >>>>> > data=training1, >>>>> > method="pls", >>>>> > trControl=ctrl1, >>>>> > metric="Rsquared", >>>>> > preProc=c("scale")) >>>>> > >>>>> > data(iris) >>>>> > training1=iris[inTrain1,] >>>>> > datvars=training1[,1:4] >>>>> > dat.sc=scale(datvars) >>>>> > >>>>> > n=nrow(dat.sc) >>>>> > dat.indices=seq(1,n) >>>>> > >>>>> > timematrix=with(training1, >>>>> > classvec2classmat(Species[dat.indices])) >>>>> > >>>>> > pls.dat=plsr(timematrix ~ dat.sc, >>>>> > ncomp=3, method="oscorespls", data=training1) >>>>> > >>>>> > x=crossval(pls.dat, segments=10) >>>>> > >>>>> > summary(x) >>>>> > summary(plsFit2) >>>>> > >>>>> > I see two different R2 values and I cannot figure out how to get the >>>>> Q2 >>>>> > value. Any insight as to what my errors may be would be appreciated. >>>>> > >>>>> > Regards, >>>>> > >>>>> > -- >>>>> > Charles >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Charles Determan >>>>> Integrated Biosciences PhD Student >>>>> University of Minnesota >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Max >>>> >>> >>> >>> >>> -- >>> Charles Determan >>> Integrated Biosciences PhD Student >>> University of Minnesota >>> >> >> >> >> -- >> >> Max >> > > > > -- > Charles Determan > Integrated Biosciences PhD Student > University of Minnesota > -- Max [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.