Re: [R] sdev value returned by princomp function (used for PCA)

peter dalgaard Thu, 30 Jun 2011 14:03:59 -0700

On Jun 30, 2011, at 21:48 , Coghlan, Avril wrote:

> 
> Dear all,
> 
> I have a question about the 'sdev' value returned by the princomp function 
> (which does principal components analysis).
> 
> On the help page for princomp it says 'sdev' is 'the standard deviations of 
> the principal components'.
> 
> However, when I calculate the principal components for the USArrests data 
> set, I don't find this to be the case:
> 
> Here is how I calculated the principal components and got the 'sdev' values:
>> scaledUSArrests <- scale(USArrests) # standardise the variables to have 
>> variance 1 and mean 0
>> myPCA <- princomp(scaledUSArrests)  # do the PCA
>> myPCA$sdev
>   Comp.1    Comp.2    Comp.3    Comp.4 
> 1.5590500 0.9848705 0.5911277 0.4122639 
> 
> As far as I understand, the principal components themselves are stored in the 
> 'scores' value returned by princomp, so I calculated the standard deviations 
> of those values to see if they agree with 'sdev':
>> sd(myPCA$scores)
>   Comp.1    Comp.2    Comp.3    Comp.4 
> 1.5748783 0.9948694 0.5971291 0.4164494 
> 
> I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why 
> is this?
> 
> As there are 4 standardised variables, and the variance of each standardised 
> variable is 1, I would expect the variances of the principal components to 
> add to 1. I find that this is true for the variances calculated using 
> sd(myPCA$scores) but not myPCA$sdev:
>> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2)
>  3.92
>> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2)
>  4.00
> 
> I am wondering why are the values in 'sdev' not equal to the standard 
> deviations of the principal components (as stored in 'scores'), and why is 
> the total variance calculated by summing the squared 'sdev' values not equal 
> to 4?
> 
> Sorry if I have misunderstood something obvious. I would be grateful for help 
> as I'm a bit confused..


Rather, you overlooked something not-quite obvious:

"Note that the default calculation uses divisor N for the covariance matrix."

You'll have to ask the authors for why this convention has been used, but it is 
of course easy to convert to N-1 divisor once you know what is happening 
(typically you don't really care about scaling anyway). 

When your data have been scaled with their sd, using N-1 for divisor, the 
variance using N as divisor will be (1 - 1/N) and true enough:

4 * (1 - 1/50) = 3.92

> 
> Kind regards,
> Avril
> 
> Avril Coghlan
> Cork, Ireland
> 
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sdev value returned by princomp function (used for PCA)

Reply via email to