On Thu, Jul 11, 2013 at 7:24 PM, James Swetnam <[email protected]> wrote:
> How about
>
> select * from mols where tanimoto_sml(morganbv_fp(':query_smiles'::mol),
> morganbv_fp(mols.mol)) > :cutoff ?
> where 'query_smiles' is your desired query molecule, 'cutoff' is the
> minimum similarity and mols.mol is the serialized RDkit::ROMol extension
>
The problem with this is that the index will not be used to speed up the
queries. Here's a demonstration of that:
chembl_16=# explain analyze select * from rdk.tfps where
tanimoto_sml(morganbv_fp('c1ccccc1O'::mol,0),mfp0)>0.5;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on tfps (cost=0.00..64356.66 rows=430370 width=69) (actual
time=5.851..4003.025 rows=6331 loops=1)
Filter:
(tanimoto_sml('\x00000000000000000000000000000000000000000000000000000000000000000000000080000000000002001000000000000000000000000000000000000000'::bfp,
mfp0) > 0.5::double precision)
Rows Removed by Filter: 1284780
Total runtime: 4003.652 ms
(4 rows)
Time: 4004.881 ms
Whereas using the % operator gives:
chembl_16=# explain analyze select * from rdk.tfps where
morganbv_fp('c1ccccc1O'::mol,0)%mfp0;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tfps (cost=107.22..4077.60 rows=1291 width=69)
(actual time=1356.758..1394.294 rows=22282 loops=1)
Recheck Cond:
('\x00000000000000000000000000000000000000000000000000000000000000000000000080000000000002001000000000000000000000000000000000000000'::bfp
% mfp0)
-> Bitmap Index Scan on tfps_mfp0_idx (cost=0.00..106.90 rows=1291
width=0) (actual time=1354.748..1354.748 rows=22282 loops=1)
Index Cond:
('\x00000000000000000000000000000000000000000000000000000000000000000000000080000000000002001000000000000000000000000000000000000000'::bfp
% mfp0)
Total runtime: 1395.577 ms
(5 rows)
Time: 1396.819 ms
Best,
-greg
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss