Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-16 Thread Ana Marija
Hi Peter, Thank you so much!!! I will use complete linkage clustering because Mendelian Randomization function (https://cran.r-project.org/web/packages/MendelianRandomization/vignettes/Vignette_MR.pdf) I plan to use allows for correlations but not as high as 0.9 or more. I got 40 SNPs out of 246 s

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-15 Thread Peter Langfelder
Try hclust(as.dist(1-calc.rho), method = "average"). Peter On Fri, Nov 15, 2019 at 10:02 AM Ana Marija wrote: > > HI Peter, > > Thank you for getting back to me and shedding light on this. I see > your point, doing Jim's method: > > > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3))

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-15 Thread Jim Lemon
While the remedy for your dissatisfaction with my previous solution should be obvious, I will make it explicit. # that is rows containing at most one value > 0.8 # ignoring the diagonal keeprows<-apply(ro246,1,function(x) return(sum(x>0.8)<2)) ro246.lt.8<-ro246[keeprows,keeprows] Jim ___

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-15 Thread Ana Marija
if it is of any help my correlation matrix (calc.rho) was done here, under LDmatrix tab https://ldlink.nci.nih.gov/?tab=ldmatrix and dataset of 246 is bellow rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506 rs2089177 rs12325677 rs62064624 rs62064631 rs2349295 rs2174369 rs7218554 rs6

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-15 Thread Ana Marija
HI Peter, Thank you for getting back to me and shedding light on this. I see your point, doing Jim's method: > keeprows<-apply(calc.rho,1,function(x) return(sum(x>0.8)<3)) > ro246.lt.8<-calc.rho[keeprows,keeprows] > ro246.lt.8[ro246.lt.8 == 1] <- NA > (mmax <- max(abs(ro246.lt.8), na.rm=TRUE)) [1

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Peter Langfelder
I suspect that you want to identify which variables are highly correlated, and then keep only "representative" variables, i.e., remove redundant ones. This is a bit of a risky procedure but I have done such things before as well sometimes to simplify large sets of highly related variables. If your

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
HI Jim, This: colnames(calc.jim)[colSums(abs(calc.jim)>0.8)<3] was the master take! Thank you so much!!! On Thu, Nov 14, 2019 at 3:39 PM Jim Lemon wrote: > > I thought you were going to trick us. What I think you are asking now > is how to get the variable names in the columns that have at mos

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Jim Lemon
I thought you were going to trick us. What I think you are asking now is how to get the variable names in the columns that have at most one _absolute_ value greater than 0.8. OK: # I'm not going to try to recreate your correlation matrix calc.jim<-matrix(runif(100,min=-1,max=1),nrow=10) for(i in 1

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Jim Lemon
Hi Ana, Rather than addressing the question of why you want to do this, Let's get make the question easier to answer: calc.rho<-matrix(c(0.903,0.268,0.327,0.327,0.327,0.582, 0.928,0.276,0.336,0.336,0.336,0.598, 0.975,0.309,0.371,0.371,0.371,0.638, 0.975,0.309,0.371,0.371,0.371,0.638, 0.975,0.309,0

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
what would be the approach to remove variable that has at least 2 correlation coefficients >0.8? this is the whole output of the head() > head(calc.rho) rs56192520 rs3764410 rs145984817 rs1807401 rs1807402 rs35350506 rs56192520 1.000 0.976 0.927 0.927 0.927

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
That's assuming your data was returned by head(). __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide c

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
> I basically want to remove all entries for pairs which have value in > between them (correlation calculated not in R, bit it is correlation, > r2) > so for example I would not keep: rs883504 because it has r2>0.8 for > all those rs... I'm still not sure what "remove all entries" means? In your e

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Abby Spurdle
Sorry, but I don't understand your question. When I first looked at this, I thought it was a correlation (or covariance) matrix. e.g. > cor (quakes) > cov (quakes) However, your row and column variables are different, implying two different data sets. Also, some of the (correlation?) coefficien

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
I don't understand. I have to keep only pairs of variables with correlation less than 0.8 in order to proceed with some calculations On Thu, Nov 14, 2019 at 2:09 PM Bert Gunter wrote: > > Obvious advice: > > DON'T DO THIS! > > Bert Gunter > > "The trouble with having an open mind is that people k

Re: [R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Bert Gunter
Obvious advice: DON'T DO THIS! Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Nov 14, 2019 at 10:50 AM Ana Marija wrote: > Hello, > > I have a data fra

[R] Remove highly correlated variables from a data frame or matrix

2019-11-14 Thread Ana Marija
Hello, I have a data frame like this (a matrix): head(calc.rho) rs9900318 rs8069906 rs9908521 rs9908336 rs9908870 rs9895995 rs56192520 0.903 0.268 0.327 0.327 0.327 0.582 rs3764410 0.928 0.276 0.336 0.336 0.336 0.598 rs145984817 0.