Thank you Chris and Noel, but I am afraid I cannot use your advice, because
what I am trying to do is to search "each molecule in the DB as a substructure
against all molecules in the same DB", one by one, applying all possible
permutations using a script; for this reason I can't decide beforehand what is
the limit of #atoms or what is the SMARTS string. I guess my best bet is to
increase the -al limit. Or perhaps there is a more intelligent way to do this,
I'll have to think.
Best regards,
Vis
________________________________
From: Chris Morley <c.mor...@gaseq.co.uk>
To: openbabel-discuss@lists.sourceforge.net
Sent: Tuesday, November 20, 2012 11:42 AM
Subject: Re: [Open Babel] dbase filtering, then substructure search (in that
order)
On 20/11/2012 08:03, Visvaldas K. wrote:
> Dear all,
>
> I am trying to look for certain molecules in the sdf/fs database
> containing certain fragments as substructure. The problem is my
> fragments can be small (I am running a script to do multiple searches),
> so I can get "too many candidates in the fingerprint search phase". The
> simple solution is to filter out the big molecules before doing
> substructure search, but openbabel does the substructure search first, i.e.
>
> obabel dbase.sdf -d --filter "atoms < 20" -O results.sdf -s trial.smi
> -al 9000 -ifs
>
> gives the identical "too many candidates" message. Of course, I can keep
> increasing "-al 9000" but that's a workaround and not the real solution.
>
It is possible to do filtering while making the .fs file, so only a
subset of the molecules in the dataset are indexed.
obabel dbase.sdf -O filtered.fs -d --filter "atoms<20"
This will help to reduce the number of hits in the fingerprint search.
> For some reason, piping
>
> obabel dbase.fs -d --filter "atoms < 20" -ocopy | obabel -ifs -O
> results.sdf -s trial.smi -al 9000
>
> stalls even at the first step --- it seems that one cannot filter "fs"
> database, but I am not sure if I am using the correct syntax.
The .fs file is a list of fingerprints and the position of the
corresponding molecule in the datafile. So a fastsearch needs to be on a
data file which has been indexed. This piping attempt does not work
because the molecules you are hoping to search are in a stream not a
file and they have not been indexed.
Incidentally, the FastsearchFormat in the development code now allows
substructure searching with multiple molecules in the target file (your
trial.smi), which may save some scripting.
Chris
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss