On 01/02/2011 09:49, Chris Morley wrote: On 01/02/2011 07:12, Andrew Dalke wrote:
On Jan 31, 2011, at 8:15 PM, cha...@ncbs.res.in wrote: I noticed that molecules such as Myristic acid and Palmitic acid have same similarity score of 1, ... I am thinking of modifying Tanimoto score to other coefficient's like Kulczynski index or Russel index. The only way to get a Tanimoto score of 1 is if the two fingerprints are identical. In that case there is no scoring method can tell the difference between the two because they are identical. To get what you want you'll need to come up with a new fingerprinting scheme, not a new scoring method. None of OpenBabel's fingerprints provide a complete description of a molecule. They are really intended as part of a fast screening method to exclude molecules that, compared with a target molecule, are too dissimilar (or too similar) or which are not a superstructure of it. None of the current fingerprint types include stereochemistry and the FP2 fingerprint has a built-in lack of certainty because different fragments can be assigned to the the same bit. It also indexes by the presence or absence of fragments of up to 7 atoms, so does not discriminate well for long chains of carbon atoms, like fatty acids or normal hydrocarbons. It is possible make specialized FP3 fingerprints to handle this type of structure, by including the number of times a substructure occurs. Further description is in the code, although recompilation is not necessary to make a new fingerprint type. However I guess this is probably further than you want to go. Thank you Chris, Andrew, Noel for your inputs. My data set contains lipid/fattyacids (in smiles format). At times, each molecule is different from other in a single oxygen atom "O" or single carbon atom "C" or single double bond "=". I tried patching-up fingerprint similarity by adding weight to number of carbon atoms, number of double-bonds etc, this turns out to be a dirty job as this can easily overfit or misfit. I would like to see a permanent solution, that is applicable for any given set of smiles. Can anyone point out existing 1D (input as smiles string) similarity methods that can differentiate single atom/bond differences ? Thanks Chak ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss