Hi JP,
I'm between short trips, so this reply is delayed and will be somewhat
short.
On Mon, Jul 14, 2014 at 5:19 PM, JP <[email protected]> wrote:
> Hi there,
>
> TL;DR Some queries using the index are slower than queries not using it.
>
<snip>
> I wonder if there is anyone who has looked at this/has similar experience
> and has some answers to the following questions:
>
> - Am I doing anything obviously wrong? / Is there an easy fix to speed
> these queries up? (without using limit)
>
not based on the skim I just did.
> - Is there a way for RDKit to suggest to Postgresql to NOT use the index
> in certain cases?
>
You can certainly do it yourself on a query-by-query basis, but I'm not
aware of any way to do it automatically.
> - Which RDKit code should I be looking at to try to optimize this? More
> specifically, even looking for molecules with a carbon atom 'C' triggers a
> search using the index, even if all my molecules in the database have a
> carbon atom (so a sequential search would make more sense here).
> - I have tried to play around with the table statistics - alter table
> molecule alter mol set statistics 1000; - this should be the thing which
> suggests to the query planner whether to use the index or not (based on a
> sample of the table) - but I haven't noticed any difference. Does the
> RDKit indexing technology support this?
>
To restate the problem that you are encountering : the index isn't actually
screening much of anything out, so using it ends up being a waste of time.
What would be required is some way for the postgresql optimizer to
recognize either that the index is not going to be effective for a
particular query or that the query is going to return a large percentage of
the database so that it can switch strategies. I don't know how to do this
in an automated way. One manual approach is to have a small (1K-10K)
structure table that is representative of your main structure table and do
a "pre-query" on that one. If that pre-query returns a sizable fraction of
the rows, then you know that you should disable the index scan for that
query. This requires some coding on your part, but should certainly help
here.
-greg
------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss