Re: Slow "not in array" operation

2019-11-12 Thread Jeff Janes
> > > 3) Here's the query plan that I get after disabling the seq scan: > > > QUERY PLAN > > > > ---

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
I am not a PostgreSQL expert, however I think that the following algorithm should be possible and fast: 1. find the bitmap of all subscriptions in a project that are not trashed (it can use the index and takes only ~500ms) 2. find the bitmap of all subscriptions that match the above condition and

Re: Slow "not in array" operation

2019-11-12 Thread Tom Lane
Marco Colli writes: > 3) Here's the query plan that I get after disabling the seq scan: > Finalize Aggregate (cost=2183938.89..2183938.90 rows=1 width=8) (actual > time=94972.253..94972.254 rows=1 loops=1) So, this is slower than the seqscan, which means the planner made the right choice. You

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
Odd index choice by the optimizer given what is available. The bitmap being lossy means more work_mem is needed if I remember properly. It is interesting that skipping the where condition on the array is only half a second. Is the array being toasted or is it small and being stored in the same fil

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
1) It is running on a DigitalOcean CPU-optimized droplet with dedicated hyperthreads (16 cores) and SSD. SHOW random_page_cost; => 2 2) What config names should I check exactly? I used some suggestions from the online PGTune, when I first configured the db some months ago: max_worker_processes = 1

Re: Slow "not in array" operation

2019-11-12 Thread Justin Pryzby
On Tue, Nov 12, 2019 at 12:20:10PM -0700, Michael Lewis wrote: > It is very interesting to me that the optimizer chose a parallel sequential > scan rather than an index scan on either of your indexes that start > with project_id that also reference trashed_at. Maybe because of low correlation on a

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
It is very interesting to me that the optimizer chose a parallel sequential scan rather than an index scan on either of your indexes that start with project_id that also reference trashed_at. 1) Are you running on SSD type storage? Has random_page_cost been lowered to 1-1.5 or so (close to 1 assum

Re: Slow "not in array" operation

2019-11-12 Thread Marco Colli
To be honest, I have simplified the question above. In order to show you the plan, I must show you the actual query, which is this: === QUERY === SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" = 123 AND "subscriptions"."trashed_at" IS NULL AND NOT (tags @> ARRAY['en']::va

Re: Slow "not in array" operation

2019-11-12 Thread Michael Lewis
What's the plan for the slow one? What's the time to just count all rows? >

Slow "not in array" operation

2019-11-12 Thread Marco Colli
I have a large table with millions of rows. Each row has an array field "tags". I also have the proper GIN index on tags. Counting the rows that have a tag is fast (~7s): SELECT COUNT(*) FROM "subscriptions" WHERE (tags @> ARRAY['t1']::varchar[]); However counting the rows that don't have a tag i