Hi Guys I want to apply a clustering algo to my dataset in order to find the regions points(X,Y) which have similar values(percent_GC and mean_phred_quality). Details below.
I have sampled 1% of points from my main data set of 85 million points. The result is still somewhat large 800K points and looks like following. X Y percent_GC mean_phred_quality 1 4286 930 0.50 0.13 2 4825 947 0.50 20.33 3 8207 932 0.32 26.50 4 8451 940 0.48 24.81 5 9331 931 0.38 16.93 6 11501 949 0.49 31.28 What I want to do is find local regions in which I have associations between these 4 values i.e points X,Y have close correlation with percent_GC and mean_phred_quality. PS: I did calculate the overall pearson correlation coeff between percent_GC and mean_phred_quality and it is not statistically significant which got me interested into finding local regions where it may be. I would really appreciate your help as I am still a rookie in applying clustering algorithms. Thanks! -Abhi ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.