Dear all, Sorry to post my query once again in the list, since I did not get attention from anyone in my previous mail to this list. Now I make it simple here that please give me a code for find out the columns of a dataframe whose correlation coefficient is below a pre-determined threshold. (For detailed query please see my previous message to this list, pasted hereunder)
Thanks and regards, B.Nataraj Following is my previous message to this list to which I do not get any reply. Dear all, For removing correlated columns in a data frame,df. I found a code written in R in the page http://cheminfo.informatics.indiana.edu/~rguha/code/R/ of Mr.Rajarshi Guha. The code is ################# r2test <- function(df, cutoff=0.8) { if (cutoff > 1 || cutoff <= 0) { stop(" 0 <= cutoff < 1") } if (!is.matrix(d) && !is.data.frame(d)) { stop("Must supply a data.frame or matrix") } r2cut = sqrt(cutoff); cormat <- cor(d); bad.idx <- which(abs(cormat)>r2cut,arr.ind=T); bad.idx <- matrix( bad.idx[bad.idx[,1] > bad.idx[,2]], ncol=2); drop.idx <- ifelse(runif(nrow(bad.idx)) > .5, bad.idx[,1], bad.idx [,2]); if (length(drop.idx) == 0) { 1:ncol(d) } else { (1:ncol(d))[-unique(drop.idx)] } } ############################################ Now the problem is the code return different output (i.e. different column number) for a different call. I could not understood why it happens from that code, but I can understand the logic in code except the line ******************************************** drop.idx <- ifelse(runif(nrow(bad.idx)) > .5, bad.idx[,1], bad.idx [,2]); **************************************** what it means by comparing > 0.5 of nrow(bad.idx). So I am looking for anyone to help me for different output generation between the different function call as well as meaning of the line which I mentioned above. Thanks! B.Nataraj ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.