On 12/01/2011 16:05, Floriane Montanari wrote: > Hi all, > > I am doing a similarity search between a query molecule given as a > smiles string and a database of molecules. This database has been > properly indexed with the command line. I am using MACCS fingerprints > and programming in Python. But actually I am seeing the same thing > with the command line: > So I have noticed that, if my query is in the database, the output is > correct and the first compound of the list of similar compounds is my > query itself. > But when I give as -s option a smiles string that corresponds to a > molecule that is not present in the database, the first molecule of > the output file is not my query molecule anymore, but apparently one > close molecule from my database. > > I have read here <http://openbabel.org/wiki/Tutorial:Fingerprints> this: > > *note:* if the query molecule does not match the SMARTS string > this will not work as expected, as the first molecule in the > database that matches the SMARTS string will instead be used as > the query
The documentation applies when you are using the fpt output format. This calculates the Tanimoto coefficient from the first molecule it is given to all the rest. So you could provide the target molecule first: obabel -:"CCO" data.xxx -ofpt -xfMACCS This will give an output line for every molecule, which is maybe not what you want. If you try to filter using -s or --filter you will get the behaviour you observe. A more robust way, which I guess you are using, is to index it first and then do one or more similarity searches: obabel data.xxx -ofs -xfMACCS obabel data.fs -O out.smi -at10 -aa -sSMILES This will output the ten most similar molecules with the Tanimoto attached. It seems to work ok for me. (Detailed output below.) If you are still having difficulty, perhaps you could post the Python or commandline you are using. > Is it what is happening to me? Is there a way to force the fingerprint > comparison between my /real query/ and the database? > > In case this is not possible, I was planning to use more programming, > and doing that I have a second question: > is it possible to get a Fingerprint object from a list of "on" bits? > Using SetBit() for example? I'm not clear what you are wanting here. It is possible to define a new type of fingerprint that has each of its bits defined by a SMARTS string in a data file and without programming. The MACCS fingerprint is done in this way. I'm not sure where it is described, but I'll look it out if you need it. Chris >type sim4.smi CC CCCC COC COCCC >obabel sim4.smi -ofs -xfMACCS This will prepare an index of sim4.smi and may take some time... It contains 4 molecules It took 0 seconds 4 molecules converted >obabel sim4.fs -osmi -aa -at5 -sCC CC 1 CCCC 0.333333 COC 0.285714 COCCC 0.142857 4 molecules converted >obabel sim4.fs -osmi -aa -at5 -sCCC CCCC 0.571429 CC 0.4 COC 0.333333 COCCC 0.266667 4 molecules converted >obabel -:CC sim4.smi -ofpt -xfMACCS > > Tanimoto from first mol = 1 Possible superstructure of first mol > Tanimoto from first mol = 0.333333 Possible superstructure of first mol > Tanimoto from first mol = 0.285714 Possible superstructure of first mol > Tanimoto from first mol = 0.142857 Possible superstructure of first mol 5 molecules converted >obabel -:CCC sim4.smi -ofpt -xfMACCS > > Tanimoto from first mol = 0.4 > Tanimoto from first mol = 0.571429 > Tanimoto from first mol = 0.333333 > Tanimoto from first mol = 0.266667 5 molecules converted ------------------------------------------------------------------------------ Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss