2017-04-05 10:33 GMT+02:00 Daniel Westermann < daniel.westerm...@dbi-services.com>:
> 2017-04-05 10:13 GMT+02:00 Daniel Westermann <daniel.westermann@dbi- > services.com>: > >> 2017-04-05 9:28 GMT+02:00 Daniel Westermann <daniel.westermann@dbi- >> services.com>: >> >>> >>what is result of EXPLAIN statement for slow and fast cases? >>> >> >>> >>regards >>> >> >>> >>Pavel >>> >>> For work_mem=32MB >>> >>> explain (analyze,verbose,buffers) select count(user_id) from users >>> where user_id not in ( select id from ids); >>> QUERY >>> PLAN >>> ------------------------------------------------------------ >>> ------------------------------------------------------------ >>> -------------- >>> Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual >>> time=7945.592..7945.593 rows=1 loops=1) >>> Output: count(users.user_id) >>> Buffers: shared read=29425 >>> -> Seq Scan on public.users (cost=16925.01..79425.01 rows=1500000 >>> width=4) (actual time=1928.665..5888.645 rows=2499999 loops=1) >>> Output: users.user_id, users.username >>> Filter: (NOT (hashed SubPlan 1)) >>> Rows Removed by Filter: 500001 >>> Buffers: shared read=29425 >>> SubPlan 1 >>> -> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 >>> width=4) (actual time=11.111..868.382 rows=1000001 loops=1) >>> Output: ids.id >>> Buffers: shared read=4425 >>> Planning time: 187.396 ms >>> Execution time: 7948.108 ms >>> (14 rows) >>> >>> Time: 8244.493 ms >>> >>> For work_mem='16MB' it does not complete with analyze in on hour. For >>> explain only: >>> >>> explain (verbose) select count(user_id) from users where user_id not in >>> ( select id from ids); >>> QUERY >>> PLAN >>> ------------------------------------------------------------ >>> ---------------------------- >>> Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) >>> Output: count(users.user_id) >>> -> Seq Scan on public.users (cost=0.00..38748088750.00 rows=1500000 >>> width=4) >>> Output: users.user_id, users.username >>> Filter: (NOT (SubPlan 1)) >>> SubPlan 1 >>> -> Materialize (cost=0.00..23332.01 rows=1000001 width=4) >>> Output: ids.id >>> -> Seq Scan on public.ids (cost=0.00..14425.01 >>> rows=1000001 width=4) >>> Output: ids.id >>> (10 rows) >>> >> >> >There is a materialize op more >> > >> >do you have a index on ids.id? >> >> Yes: >> >> \d ids >> Table "public.ids" >> Column | Type | Modifiers >> --------+---------+----------- >> id | integer | >> Indexes: >> "i_ids" UNIQUE, btree (id) >> >> > > >>hmm .. NOT IN is just bad :( > >> > >>The second is slow becase table ids is stored in temp file. and it is > repeatedly read from file. In first case, ids table is stored in memory. > >> > >>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from > ids where id = user_id) > >> > > Yes, really bad :) ... and I still do not get it. Even when reading from > the tempfile all the time it should at least complete within one hour, > shouldn't it? The tables are not so big: > > select * from pg_size_pretty ( pg_relation_size ('ids' )); > pg_size_pretty > ---------------- > 35 MB > (1 row) > select * from pg_size_pretty ( pg_relation_size ('users' )); > pg_size_pretty > ---------------- > 195 MB > (1 row) > > 1500000 * few ms ~ big time