[R] Subsetting a data frame by dropping correlated variables

Rita Carreira Tue, 19 Apr 2011 12:11:16 -0700

Hello R Users!
I have a data frame that has many variables, some with missing observations, 
and some that are correlated with each other. I would like to subset the data 
by dropping one of the variables that is correlated with another variable that 
I will keep int he data frame. Alternatively, I could also drop both the 
variables that are correlated with each other. Worry not! I am not deleting 
data, I am just finding a subset of the data that I can use to impute some 
missing observations. 
I have tried the following statement 
dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use = "pairwise.complete.obs", 
method ="pearson")<0.8)]
but it gives me the following error:
Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use = 
"pairwise.complete.obs",  : 
  undefined columns selected
Since I have several dozen data frames, it is impractical for me to manually 
inspect the correlation matrices and select which variables to drop, so I am 
trying to have R make the selection for me. Does any one have any idea on how 
to accomplish this? 
Thank you very much!
Rita ===================================== "If you think education is 
expensive, try ignorance."--Derek Bok



                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subsetting a data frame by dropping correlated variables

Reply via email to