Re: [R] caret pls model statistics

Max Kuhn Sun, 03 Mar 2013 14:40:06 -0800

That the most common formula, but not the only one. See

  Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician*,
*39*(4), 279285.


Traditionally, the symbol 'R' is used for the Pearson correlation
coefficient and one way to calculate R^2 is... R^2.

Max


On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr <deter...@umn.edu>wrote:

> I was under the impression that in PLS analysis, R2 was calculated by 1-
> (Residual sum of squares) / (Sum of squares).  Is this still what you are
> referring to?  I am aware of the linear R2 which is how well two variables
> are correlated but the prior equation seems different to me.  Could you
> explain if this is the same concept?
>
> Charles
>
>
> On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn <mxk...@gmail.com> wrote:
>
>> > Is there some literature that you make that statement?
>>
>> No, but there isn't literature on changing a lightbulb with a duck either.
>>
>> > Are these papers incorrect in using these statistics?
>>
>> Definitely, if they convert 3+ categories to integers (but there are
>> specialized R^2 metrics for binary classification models). Otherwise, they
>> are just using an ill-suited "score".
>>
>>  How would you explain such an R^2 value to someone? R^2 is
>> a function of correlation between the two random variables. For two
>> classes, one of them is binary. What does it mean?
>>
>> Historically, models rooted in computer science (eg neural networks) used
>> RMSE or SSE to fit models with binary outcomes and that *can* work work
>> well.
>>
>> However, I don't think that communicating R^2 is effective. Other metrics
>> (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to
>> measure the ability of a model to classify and work well. With 3+
>> categories, I tend to use Kappa.
>>
>> Max
>>
>>
>>
>>
>> On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr <deter...@umn.edu>wrote:
>>
>>> Thank you for your response Max.  Is there some literature that you make
>>> that statement?  I am confused as I have seen many publications that
>>> contain R^2 and Q^2 following PLSDA analysis.  The analysis usually is to
>>> discriminate groups (ie. classification).  Are these papers incorrect in
>>> using these statistics?
>>>
>>> Regards,
>>> Charles
>>>
>>>
>>> On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn <mxk...@gmail.com> wrote:
>>>
>>>> Charles,
>>>>
>>>> You should not be treating the classes as numeric (is virginica really
>>>> three times setosa?). Q^2 and/or R^2 are not appropriate for 
>>>> classification.
>>>>
>>>> Max
>>>>
>>>>
>>>> On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr 
>>>> <deter...@umn.edu>wrote:
>>>>
>>>>> I have discovered on of my errors.  The timematrix was unnecessary and
>>>>> an
>>>>> unfortunate habit I brought from another package.  The following
>>>>> provides
>>>>> the same R2 values as it should, however, I still don't know how to
>>>>> retrieve Q2 values.  Any insight would again be appreciated:
>>>>>
>>>>> library(caret)
>>>>> library(pls)
>>>>>
>>>>> data(iris)
>>>>>
>>>>> #needed to convert to numeric in order to do regression
>>>>> #I don't fully understand this but if I left as a factor I would get an
>>>>> error following the summary function
>>>>> iris$Species=as.numeric(iris$Species)
>>>>> inTrain1=createDataPartition(y=iris$Species,
>>>>>     p=.75,
>>>>>     list=FALSE)
>>>>>
>>>>> training1=iris[inTrain1,]
>>>>> testing1=iris[-inTrain1,]
>>>>>
>>>>> ctrl1=trainControl(method="cv",
>>>>>     number=10)
>>>>>
>>>>> plsFit2=train(Species~.,
>>>>>     data=training1,
>>>>>     method="pls",
>>>>>     trControl=ctrl1,
>>>>>     metric="Rsquared",
>>>>>     preProc=c("scale"))
>>>>>
>>>>> data(iris)
>>>>> training1=iris[inTrain1,]
>>>>> datvars=training1[,1:4]
>>>>> dat.sc=scale(datvars)
>>>>>
>>>>> pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
>>>>>     ncomp=3, method="oscorespls", data=training1)
>>>>>
>>>>> x=crossval(pls.dat, segments=10)
>>>>>
>>>>> summary(x)
>>>>> summary(plsFit2)
>>>>>
>>>>> Regards,
>>>>> Charles
>>>>>
>>>>> On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr <deter...@umn.edu
>>>>> >wrote:
>>>>>
>>>>> > Greetings,
>>>>> >
>>>>> > I have been exploring the use of the caret package to conduct some
>>>>> plsda
>>>>> > modeling.  Previously, I have come across methods that result in a
>>>>> R2 and
>>>>> > Q2 for the model.  Using the 'iris' data set, I wanted to see if I
>>>>> could
>>>>> > accomplish this with the caret package.  I use the following code:
>>>>> >
>>>>> > library(caret)
>>>>> > data(iris)
>>>>> >
>>>>> > #needed to convert to numeric in order to do regression
>>>>> > #I don't fully understand this but if I left as a factor I would get
>>>>> an
>>>>> > error following the summary function
>>>>> > iris$Species=as.numeric(iris$Species)
>>>>> > inTrain1=createDataPartition(y=iris$Species,
>>>>> >     p=.75,
>>>>> >     list=FALSE)
>>>>> >
>>>>> > training1=iris[inTrain1,]
>>>>> > testing1=iris[-inTrain1,]
>>>>> >
>>>>> > ctrl1=trainControl(method="cv",
>>>>> >     number=10)
>>>>> >
>>>>> > plsFit2=train(Species~.,
>>>>> >     data=training1,
>>>>> >     method="pls",
>>>>> >     trControl=ctrl1,
>>>>> >     metric="Rsquared",
>>>>> >     preProc=c("scale"))
>>>>> >
>>>>> > data(iris)
>>>>> > training1=iris[inTrain1,]
>>>>> > datvars=training1[,1:4]
>>>>> > dat.sc=scale(datvars)
>>>>> >
>>>>> > n=nrow(dat.sc)
>>>>> > dat.indices=seq(1,n)
>>>>> >
>>>>> > timematrix=with(training1,
>>>>> >         classvec2classmat(Species[dat.indices]))
>>>>> >
>>>>> > pls.dat=plsr(timematrix ~ dat.sc,
>>>>> >     ncomp=3, method="oscorespls", data=training1)
>>>>> >
>>>>> > x=crossval(pls.dat, segments=10)
>>>>> >
>>>>> > summary(x)
>>>>> > summary(plsFit2)
>>>>> >
>>>>> > I see two different R2 values and I cannot figure out how to get the
>>>>> Q2
>>>>> > value.  Any insight as to what my errors may be would be appreciated.
>>>>> >
>>>>> > Regards,
>>>>> >
>>>>> > --
>>>>> > Charles
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Charles Determan
>>>>> Integrated Biosciences PhD Student
>>>>> University of Minnesota
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Max
>>>>
>>>
>>>
>>>
>>> --
>>> Charles Determan
>>> Integrated Biosciences PhD Student
>>> University of Minnesota
>>>
>>
>>
>>
>> --
>>
>> Max
>>
>
>
>
> --
> Charles Determan
> Integrated Biosciences PhD Student
> University of Minnesota
>



-- 

Max

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] caret pls model statistics

Reply via email to