2017-04-05 10:13 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 9:28 GMT+02:00 Daniel Westermann <daniel.westermann@dbi-
> services.com>:
>
>> >>what is result of EXPLAIN statement for slow and fast cases?
>> >>
>> >>regards
>> >>
>> >>Pavel
>>
>> For work_mem=32MB
>>
>> explain (analyze,verbose,buffers)  select count(user_id) from users where
>> user_id not in ( select id from ids);
>>                                                               QUERY
>> PLAN
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> --------------
>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>> time=7945.592..7945.593 rows=1 loops=1)
>>    Output: count(users.user_id)
>>    Buffers: shared read=29425
>>    ->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=1500000
>> width=4) (actual time=1928.665..5888.645 rows=2499999 loops=1)
>>          Output: users.user_id, users.username
>>          Filter: (NOT (hashed SubPlan 1))
>>          Rows Removed by Filter: 500001
>>          Buffers: shared read=29425
>>          SubPlan 1
>>            ->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=1000001
>> width=4) (actual time=11.111..868.382 rows=1000001 loops=1)
>>                  Output: ids.id
>>                  Buffers: shared read=4425
>>  Planning time: 187.396 ms
>>  Execution time: 7948.108 ms
>> (14 rows)
>>
>> Time: 8244.493 ms
>>
>> For work_mem='16MB' it does not complete with analyze in on hour. For
>> explain only:
>>
>> explain (verbose)  select count(user_id) from users where user_id not in
>> ( select id from ids);
>>                                        QUERY
>> PLAN
>> ------------------------------------------------------------
>> ----------------------------
>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>    Output: count(users.user_id)
>>    ->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=1500000
>> width=4)
>>          Output: users.user_id, users.username
>>          Filter: (NOT (SubPlan 1))
>>          SubPlan 1
>>            ->  Materialize  (cost=0.00..23332.01 rows=1000001 width=4)
>>                  Output: ids.id
>>                  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>> rows=1000001 width=4)
>>                        Output: ids.id
>> (10 rows)
>>
>
> >There is a materialize op more
> >
> >do you have a index on ids.id?
>
> Yes:
>
> \d ids
>       Table "public.ids"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  id     | integer |
> Indexes:
>     "i_ids" UNIQUE, btree (id)
>
>

hmm .. NOT IN is just bad :(

The second is slow becase table ids is stored in temp file. and it is
repeatedly read from file. In first case, ids table is stored in memory.

SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
ids where id = user_id)

Regards

Pavel

Reply via email to