Re: Slow "not in array" operation

Marco Colli Tue, 12 Nov 2019 13:41:33 -0800

I am not a PostgreSQL expert, however I think that the following
algorithm should be possible and fast:


1. find the bitmap of all subscriptions in a project that are not trashed
(it can use the index and takes only ~500ms)
2. find the bitmap of all subscriptions that match the above condition and
HAVE the tag (~7s)
3. calculate [bitmap 1] - [bitmap 2] to find the subscriptions of the
project that DON'T HAVE the tag



On Tue, Nov 12, 2019 at 9:50 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> Marco Colli <collimarc...@gmail.com> writes:
> > 3) Here's the query plan that I get after disabling the seq scan:
>
> >  Finalize Aggregate  (cost=2183938.89..2183938.90 rows=1 width=8) (actual
> > time=94972.253..94972.254 rows=1 loops=1)
>
> So, this is slower than the seqscan, which means the planner made the
> right choice.
>
> You seem to be imagining that there's some way the index could be used
> with the NOT clause, but there isn't.  Indexable clauses are of the form
>         indexed_column indexable_operator constant
> and there's no provision for a NOT in that.  If we had a "not contained
> in" array operator, the NOT could be folded to be of this syntactic form,
> but it's highly unlikely that any index operator class would consider such
> an operator to be a supported indexable operator.  It doesn't lend itself
> to searching an index.
>
> So the planner is doing the best it can, which in this case is a
> full-table scan.
>
> A conceivable solution, if the tags array is a lot smaller than
> the table as a whole and the table is fairly static, is that you could
> make a btree index on the tags array and let the planner fall back
> to an index-only scan that is just using the index as a cheaper
> source of the array data.  (This doesn't work for your existing GIST
> index because GIST can't reconstruct the original arrays on-demand.)
> I suspect though that this wouldn't win much, even if you disregard
> the maintenance costs for the extra index.  The really fundamental
> problem here is that a large fraction of the table satisfies the
> NOT-in condition, and no index is going to beat a seqscan by all that
> much when that's true.  Indexes are good at retrieving small portions
> of tables.
>
>                         regards, tom lane
>

Re: Slow "not in array" operation

Reply via email to