On Jun 30, 2011, at 21:48 , Coghlan, Avril wrote: > > Dear all, > > I have a question about the 'sdev' value returned by the princomp function > (which does principal components analysis). > > On the help page for princomp it says 'sdev' is 'the standard deviations of > the principal components'. > > However, when I calculate the principal components for the USArrests data > set, I don't find this to be the case: > > Here is how I calculated the principal components and got the 'sdev' values: >> scaledUSArrests <- scale(USArrests) # standardise the variables to have >> variance 1 and mean 0 >> myPCA <- princomp(scaledUSArrests) # do the PCA >> myPCA$sdev > Comp.1 Comp.2 Comp.3 Comp.4 > 1.5590500 0.9848705 0.5911277 0.4122639 > > As far as I understand, the principal components themselves are stored in the > 'scores' value returned by princomp, so I calculated the standard deviations > of those values to see if they agree with 'sdev': >> sd(myPCA$scores) > Comp.1 Comp.2 Comp.3 Comp.4 > 1.5748783 0.9948694 0.5971291 0.4164494 > > I get different values than I got using sd(myPCA$scores) and myPCA$sdev, why > is this? > > As there are 4 standardised variables, and the variance of each standardised > variable is 1, I would expect the variances of the principal components to > add to 1. I find that this is true for the variances calculated using > sd(myPCA$scores) but not myPCA$sdev: >> (1.5590500^2) + (0.9848705^2) + (0.5911277^2) + (0.4122639^2) > 3.92 >> (1.5748783^2) + (0.9948694^2) + (0.5971291^2) + (0.4164494^2) > 4.00 > > I am wondering why are the values in 'sdev' not equal to the standard > deviations of the principal components (as stored in 'scores'), and why is > the total variance calculated by summing the squared 'sdev' values not equal > to 4? > > Sorry if I have misunderstood something obvious. I would be grateful for help > as I'm a bit confused..
Rather, you overlooked something not-quite obvious: "Note that the default calculation uses divisor N for the covariance matrix." You'll have to ask the authors for why this convention has been used, but it is of course easy to convert to N-1 divisor once you know what is happening (typically you don't really care about scaling anyway). When your data have been scaled with their sd, using N-1 for divisor, the variance using N as divisor will be (1 - 1/N) and true enough: 4 * (1 - 1/50) = 3.92 > > Kind regards, > Avril > > Avril Coghlan > Cork, Ireland > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.