Am 27.11.2010 08:39, schrieb James Davidson: > For completeness - this result was with the Hierarchical > Clustering(DistMatrix) node set with 'Tanimoto' similarity and 'Complete > Linkage' for cluster comparison. Changing the comparison to 'Single > Linkage' did not reduce the time. That is expected. The "linkage" only controls which distance is used in the the (the maximum, minimum or average) but you need to look at all distances in any case.
> Interestingly, the documentation for the 'standard' Hierarchical > Clustering' (ie non-distance matrix) node states that it operates with > "n-squared complexity". Ooops. That is certainly wrong. It is the same algorithm. n^3 would be right. > I guess other clustering algorithms available > in knime must scale better than cubicly as well (k-means, fuzzy > c-means?) - but as far as I can see they don't currently operate on > distance matrices (or directly on bit vectors). There is a k-medoids that should work on distance matrices. The problem for k-means (and fuzzy c-means) is that you need the full coordinates in order to set the prototypes in each iteration. That doesn't work if you only have pairwise distances. > If they could, then > this may be a solution; or implementing the Murtagh algorithm (I am > guessing the scaling is below cubic from my recollection of the speeds > observed in rdkit). Greg sent me a link to the publication. If I find some time, I will have a look at it. Cheers, Thorsten -- Dr.-Ing. Thorsten Meinl room: Z815 Nycomed Chair for Bioinformatics fax: +49 (0)7531 88-5132 and Information Mining phone: +49 (0)7531 88-5016 Box 712, 78457 Konstanz, Germany ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

