On Mon, Jan 07, 2008 at 05:25:44PM +0100, Lorenzo Isella wrote: > Thanks for both the replies. > I am now giving a try to the suggestion by Gabor since it looks easier > (for me) to implement. > I am testing it, but so far it does what I have in mind. > I am going now through the documentation of the igraph package. I can > count the cluster number, but I also want to make sure that I can > retrieve the info about which components are in which cluster i.e. > find out the cluster composition.
That's the 'clusters' function. > And second question: at the moment, my simulations involve a > relatively low particle number (of the order of the hundred), but this > may increase by 1-2 orders of magnitude. > Which alternative methods to the distance matrix can I use then to > find the spacing between my particles? Of course you still need to find out all the distances, but not necessarily for all the particles. One way is to sort the particles according to (say) the first coordinate and then only check pairs which are close enough according to that coordinate, as this is required to be close enough in Euclidean distance. E.g. if the sorted coordinates are x1 x2 x3 x4 then for finding the pairs of particle 1 you only need to check particle 2 if x2-x1<delta, etc. the idea is that as soon as the difference is larger than delta you stop. How well the method works depends of the distribution of the particles. For 1000 particle i would guess that the regular matrix method works, but probably not for 10000 particles. Btw. igraph contains C code for doing the sorted distance check trick, in the igraph_grg_game function in games.c, it can be trivially modified to use supplied x and y vectors instead of uniform randomly generated ones. If you are happy with particles in 2D. Gabor > Cheers > > Lorenzo > > On 07/01/2008, Gabor Csardi <[EMAIL PROTECTED]> wrote: > > Lorenzo, why can't you actually generate the graph to find the > > connection components? With the 'igraph' package this is something like: > > > > g <- graph.adjacency( DIST < 0.5, mode="undirected" ) > > g <- simplify(g) > > no.clusters(g) > > > > assuming you have your distance matrix in 'DIST'. If N is too big > > then you don't really want to create the distance matrix, but use > > a more sophisticated approach to find the points that are close > > to each other than your threshold; then create the graph. > > So why using clustering algorithms? > > > > Btw. from your vector of points you can create the distance matrix > > by using the 'outer' function. > > > > Btw2. your algorithm for graph generation is similar to geometric > > random graphs, see function 'grg.game' in 'igraph'. > > > > Gabor > > > > On Mon, Jan 07, 2008 at 03:26:57PM +0100, Lorenzo Isella wrote: > > > Dear All, > > > I hope I am not asking a FAQ. I am dealing with a problem of graph > > > theory [connected components in a non-directed graph] and I do not > > > want to rediscover the wheel. > > > I saw a large number of R packages dealing for instance with the > > > k-means method or hierarchical clustering for spatially distributed > > > data and I am basically facing a similar problem. > > > I am given a set of data which are the positions of particles in 3 > > > dimensions; I define two particles A and B to be directly connected if > > > their Euclidean distance is below a certain threshold d. If A and B > > > are directly connected and B and C are directly connected, then A,B > > > and C are connected components (physically it means that they are > > > members of the same cluster). > > > All my N particles then split into k disjointed clusters, each with a > > > certain number of connected components, and this is what I want to > > > investigate. > > > I do not know a priori how many clusters I have (this is my problem > > > with e.g. k-means since k is an output for me); the only input is the > > > set of 3-dimensional particle positions and a threshold distance. > > > The algorithm/package I am looking should return the number of > > > clusters and the composition of each cluster, e.g. the fact that the > > > second cluster is made up of particles {R,T,L}. > > > Consider for instance: > > > > > > # a 2-dimensional example > > > x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2), > > > matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) > > > colnames(x) <- c("x", "y") > > > > > > How can I then find out how many connected components I have when my > > > threshold distance is d=0.5? > > > > > > Many thanks > > > > > > Lorenzo > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Csardi Gabor <[EMAIL PROTECTED]> MTA RMKI, ELTE TTK > > -- Csardi Gabor <[EMAIL PROTECTED]> MTA RMKI, ELTE TTK ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.