2017-04-05 10:33 GMT+02:00 Daniel Westermann <
daniel.westerm...@dbi-services.com>:

> 2017-04-05 10:13 GMT+02:00 Daniel Westermann <daniel.westermann@dbi-
> services.com>:
>
>> 2017-04-05 9:28 GMT+02:00 Daniel Westermann <daniel.westermann@dbi-
>> services.com>:
>>
>>> >>what is result of EXPLAIN statement for slow and fast cases?
>>> >>
>>> >>regards
>>> >>
>>> >>Pavel
>>>
>>> For work_mem=32MB
>>>
>>> explain (analyze,verbose,buffers)  select count(user_id) from users
>>> where user_id not in ( select id from ids);
>>>                                                               QUERY
>>> PLAN
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> --------------
>>>  Aggregate  (cost=83175.01..83175.02 rows=1 width=8) (actual
>>> time=7945.592..7945.593 rows=1 loops=1)
>>>    Output: count(users.user_id)
>>>    Buffers: shared read=29425
>>>    ->  Seq Scan on public.users  (cost=16925.01..79425.01 rows=1500000
>>> width=4) (actual time=1928.665..5888.645 rows=2499999 loops=1)
>>>          Output: users.user_id, users.username
>>>          Filter: (NOT (hashed SubPlan 1))
>>>          Rows Removed by Filter: 500001
>>>          Buffers: shared read=29425
>>>          SubPlan 1
>>>            ->  Seq Scan on public.ids  (cost=0.00..14425.01 rows=1000001
>>> width=4) (actual time=11.111..868.382 rows=1000001 loops=1)
>>>                  Output: ids.id
>>>                  Buffers: shared read=4425
>>>  Planning time: 187.396 ms
>>>  Execution time: 7948.108 ms
>>> (14 rows)
>>>
>>> Time: 8244.493 ms
>>>
>>> For work_mem='16MB' it does not complete with analyze in on hour. For
>>> explain only:
>>>
>>> explain (verbose)  select count(user_id) from users where user_id not in
>>> ( select id from ids);
>>>                                        QUERY
>>> PLAN
>>> ------------------------------------------------------------
>>> ----------------------------
>>>  Aggregate  (cost=38748092500.00..38748092500.01 rows=1 width=8)
>>>    Output: count(users.user_id)
>>>    ->  Seq Scan on public.users  (cost=0.00..38748088750.00 rows=1500000
>>> width=4)
>>>          Output: users.user_id, users.username
>>>          Filter: (NOT (SubPlan 1))
>>>          SubPlan 1
>>>            ->  Materialize  (cost=0.00..23332.01 rows=1000001 width=4)
>>>                  Output: ids.id
>>>                  ->  Seq Scan on public.ids  (cost=0.00..14425.01
>>> rows=1000001 width=4)
>>>                        Output: ids.id
>>> (10 rows)
>>>
>>
>> >There is a materialize op more
>> >
>> >do you have a index on ids.id?
>>
>> Yes:
>>
>> \d ids
>>       Table "public.ids"
>>  Column |  Type   | Modifiers
>> --------+---------+-----------
>>  id     | integer |
>> Indexes:
>>     "i_ids" UNIQUE, btree (id)
>>
>>
>
> >>hmm .. NOT IN is just bad :(
> >>
> >>The second is slow becase table ids is stored in temp file. and it is
> repeatedly read from file. In first case, ids table is stored in memory.
> >>
> >>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from
> ids where id = user_id)
> >>
>
> Yes, really bad :) ... and I still do not get it. Even when reading from
> the tempfile all the time it should at least complete within one hour,
> shouldn't it? The tables are not so big:
>
> select * from pg_size_pretty ( pg_relation_size ('ids' ));
>  pg_size_pretty
> ----------------
>  35 MB
> (1 row)
> select * from pg_size_pretty ( pg_relation_size ('users' ));
>  pg_size_pretty
> ----------------
>  195 MB
> (1 row)
>
>

1500000 * few ms ~ big time

Reply via email to