> > I have two lists of molecules. The first list contains testing molecules
> > (test.sdf) and the second contains training molecules (train.sdf).
> >
> > I would like to compare each test molecule to all the training molecule and
> > calculate a corresponding Tanimoto similarity score. (I implemented it in my
> > code and it is super slow as it is O(n^2) ). I use GetFingerprint() and
> > Tanimoto() functions for such purpose. After the comparison, I picked up the
> > kth most similar molecules to the test molecule and predict something for
> > the test molecule. I am trying to make things a bit faster.
> 
> There is no way to speed this up - if it's O(N^2) it's O(N^2).
> 

There are clever ways to pre-screen your set if you are looking for
molecules above a certain Tanimoto threshold from the other set of
molecules. See for example:
An Intersection Inequality Sharper than the Tanimoto Triangle Inequality
for Efficiently Searching Large Databases
Pierre Baldi and Daniel S. Hirschberg
J. Chem. Inf. Model., 2009, 49 (8), pp 1866–1870
Publication Date (Web): July 14, 2009 (Article)
DOI: 10.1021/ci900133j


If all you want is a list of all Tanimoto coefficients between two sets
then it's indeed O(N^2), I don't see any way around it.
Why would you want such an all-encompassing list though?
It's all how you formulate the problem...

Best regards,
Igor


------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to