Two additional issues might be considered:

1.  Correlated variables are still correlated after PCA or after tossing one of 
the variables so teasing apart separate effects of the two variables is not 
resolved (nor can it necessarily be resolved with the particular dataset at 
hand).

2.  The purpose for using PCA should be clear and determined to meet your 
objectives.  Just because you can do a PCA doesn't mean you should.  For 
example, if PCA is performed to obtain "uncorrelated" variables for a 
regression, then consider that the component explaining the most variation will 
not necessarily be a wonderful predictor.  The component explaining the least 
amount of variation might be the best predictor.  Performing PCA for a 
regression has always puzzled me because why would one think that doing 
something in complete isolation of the dependent variable would make for better 
predictors.  (Orthogonal and more numerically stable estimators of the 
coefficients, yes, but not necessarily coefficients of interest.)

Jim

-----Original Message-----
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Chris Howden
Sent: Tuesday, March 05, 2013 10:45 PM
To: 张勇; r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Should one remove highly correlated variables before 
doing PCA??

Hi Yong,

PCA is a way to deal with highly correlated variables, so there is no need to 
remove them.

If N variables are highly correlated than they will all load out on the SAME 
Principal Component (Eigenvector), not different ones. This is how you identify 
them as being highly correlated. If you were to do further analysis U can then 
either:

1) Use the PCA, and interpret it according to what variables load out on it
2) Choose one of the highly correlated variables as identified as those that 
all load onto the same variable and analyse only it.

Most people if using PCA would use option 1)

A bit more detail.

Many methods have a hard time dealing with multicollinearity, which is when 
there are a number of variables that are highly correlated (I suggest U Google 
it). Before analysis this is usually dealt with in one of 2 ways:
1) Use PCA to get a set of orthogonal i.e. not correlated, variables and 
analyse them
2) Use correlation co-efficients to determine which variables are highly 
correlated and use only 1 in the analysis. A cut off for highly correlated is 
often 0.8.

Variance Inflation Factors are also used. Personally I don't like them since 
they don't tell me what variables are correlated with. They are also clumsy to 
use. U can't simply remove all variables with high VIF or you will likely 
remove some useful variables e.g. if 4 variables all have a high VIF U don't 
know if it's because all 4 are correlated or if there are 2 sets of highly 
correlated variables. So which do U remove???  If U must use them it's 
IMPERATIVE that U only remove 1 at a time and then rerun to get new VIF's, 
remove 1, get new VIF's, remove 1, etc.... this prevents U removing too many 
variables.


Chris Howden B.Sc. (Hons) GStat.
Founding Partner
Evidence Based Strategic Development, IP Commercialisation and Innovation, Data 
Analysis, Modelling and Training
(mobile) 0410 689 945
(fax) +612 4782 9023
ch...@trickysolutions.com.au




Disclaimer: The information in this email and any attachments to it are 
confidential and may contain legally privileged information. If you are not the 
named or intended recipient, please delete this communication and contact us 
immediately. Please note you are not authorised to copy, use or disclose this 
communication or any attachments without our consent. Although this email has 
been checked by anti-virus software, there is a risk that email messages may be 
corrupted or infected by viruses or other interferences. No responsibility is 
accepted for such interference. Unless expressly stated, the views of the 
writer are not those of the company.
Tricky Solutions always does our best to provide accurate forecasts and 
analyses based on the data supplied, however it is possible that some important 
predictors were not included in the data sent to us. Information provided by us 
should not be solely relied upon when making decisions and clients should use 
their own judgement.

-----Original Message-----
From: r-sig-ecology-boun...@r-project.org
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of ??
Sent: Wednesday, 6 March 2013 4:33 PM
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Should one remove highly correlated variables before doing 
PCA??

Hi list,

Maybe this is not a "R" question, however, it has bothered me for a long time.

Some people think if a set of correlated variables might "load" onto several 
principal components (eigenvectors),so including many variables from such a set 
will differentially weight several eigenvectors--and thereby change the 
directions of all eigenvectors, too.  So, according to these considerations, we 
should discard some highly correlated variables before doing PCA.

On the other hand, some people think that correlated variables are ok, because 
PCA outputs vectors that are orthogonal.  So we do not need to remove highly 
correlated variables before doing PCA.

However, for myself, I choose the first method (removing highly correlated 
variables). But, based on the practical ecology knowledge, I will retain most 
of the ecological meaningful variables as possible as I can.

What's your suggestion for this issue? Any hint will be greatly appreciated!
Thanks a lot in advance.

Best regards,

Yong

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology





This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.
_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to