The 'findCorrelation' function in the caret package may be helpful.

On Tue, Apr 19, 2011 at 3:10 PM, Rita Carreira <ritacarre...@hotmail.com> wrote:
>
> Hello R Users!
> I have a data frame that has many variables, some with missing observations, 
> and some that are correlated with each other. I would like to subset the data 
> by dropping one of the variables that is correlated with another variable 
> that I will keep int he data frame. Alternatively, I could also drop both the 
> variables that are correlated with each other. Worry not! I am not deleting 
> data, I am just finding a subset of the data that I can use to impute some 
> missing observations.
> I have tried the following statement
> dfQuc <- dfQ[ , sapply(dfQ, function(x) cor(dfQ, use = 
> "pairwise.complete.obs", method ="pearson")<0.8)]
> but it gives me the following error:
> Error in `[.data.frame`(dfQ, , sapply(dfQ, function(x) cor(dfQ, use = 
> "pairwise.complete.obs",  :
>  undefined columns selected
> Since I have several dozen data frames, it is impractical for me to manually 
> inspect the correlation matrices and select which variables to drop, so I am 
> trying to have R make the selection for me. Does any one have any idea on how 
> to accomplish this?
> Thank you very much!
> Rita ===================================== "If you think education is 
> expensive, try ignorance."--Derek Bok
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to