Re: [Open Babel] Need help in calculationg tanimoto coefficient

chakravarthy Mon, 14 Feb 2011 01:38:27 -0800

On 01/02/2011 09:49, Chris Morley wrote:

 On 01/02/2011 07:12, Andrew Dalke wrote:


  On Jan 31, 2011, at 8:15 PM, cha...@ncbs.res.in wrote:
    I noticed that molecules such as Myristic acid and Palmitic acid have
same similarity score of 1, ... I am thinking of modifying Tanimoto
score to other coefficient's like Kulczynski index or Russel index.

 The only way to get a Tanimoto score of 1 is if the two fingerprints are
identical. In that case there is no scoring method can tell the
difference between the two because they are identical. To get what you
want you'll need to come up with a new fingerprinting scheme, not a new
scoring method.

None of OpenBabel's fingerprints provide a complete description of a
molecule. They are really intended as part of a fast screening method to
exclude molecules that, compared with a target molecule,  are too
dissimilar (or too similar) or which are not a superstructure of it.
None of the current fingerprint types include stereochemistry and the
FP2 fingerprint has a built-in lack of certainty because different
fragments can be assigned to the the same bit.  It also indexes by the
presence or absence of fragments of up to 7 atoms, so does not
discriminate well for long chains of carbon atoms, like fatty acids or
normal hydrocarbons. It is possible make specialized FP3 fingerprints to
handle this type of structure, by including the number of times a
substructure occurs. Further description is in the code, although
recompilation is not necessary to make a new fingerprint type. However I
guess this is probably further than you want to go.

Thank you Chris, Andrew, Noel for your inputs.

My data set contains lipid/fattyacids (in smiles format). At times, each
molecule is different from other in a single oxygen atom "O" or single
carbon atom "C" or single double bond "=".

I tried patching-up fingerprint similarity by adding weight to number of
carbon atoms, number of double-bonds etc, this turns out to be a dirty
job as this can easily overfit or misfit. I would like to see a permanent
solution, that is applicable for any given set of smiles.

Can anyone point out existing 1D (input as smiles string) similarity
methods that can differentiate single atom/bond differences ?

Thanks
Chak


------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Re: [Open Babel] Need help in calculationg tanimoto coefficient

Reply via email to