> > I have two lists of molecules. The first list contains testing molecules > > (test.sdf) and the second contains training molecules (train.sdf). > > > > I would like to compare each test molecule to all the training molecule and > > calculate a corresponding Tanimoto similarity score. (I implemented it in my > > code and it is super slow as it is O(n^2) ). I use GetFingerprint() and > > Tanimoto() functions for such purpose. After the comparison, I picked up the > > kth most similar molecules to the test molecule and predict something for > > the test molecule. I am trying to make things a bit faster. > > There is no way to speed this up - if it's O(N^2) it's O(N^2). >
There are clever ways to pre-screen your set if you are looking for molecules above a certain Tanimoto threshold from the other set of molecules. See for example: An Intersection Inequality Sharper than the Tanimoto Triangle Inequality for Efficiently Searching Large Databases Pierre Baldi and Daniel S. Hirschberg J. Chem. Inf. Model., 2009, 49 (8), pp 1866–1870 Publication Date (Web): July 14, 2009 (Article) DOI: 10.1021/ci900133j If all you want is a list of all Tanimoto coefficients between two sets then it's indeed O(N^2), I don't see any way around it. Why would you want such an all-encompassing list though? It's all how you formulate the problem... Best regards, Igor ------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss