> > > postgres=# explain verbose insert into testscan select a from x > > > where > > > a<80000 or (a%2=0 and a>199900000); > > > QUERY PLAN > > > > -------------------------------------------------------------------- > > > -- > > > ----------------------------- > > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > > Workers Planned: 4 > > > -> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > > width=0) > > > -> Parallel Bitmap Heap Scan on public.x1 > > > (cost=3346.89..1272067.44 rows=20343 width=8) > > > Output: x1.a, NULL::integer > > > Recheck Cond: ((x1.a < 80000) OR (x1.a > 199900000)) > > > Filter: ((x1.a < 80000) OR (((x1.a % 2) = 0) AND > > > (x1.a > > > > 199900000))) > > > -> BitmapOr (cost=3346.89..3346.89 rows=178808 > > > width=0) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1495.19 rows=80883 width=0) > > > Index Cond: (x1.a < 80000) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1811.01 rows=97925 width=0) > > > Index Cond: (x1.a > 199900000) > > > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, > > > I didn't do any parameters tuning in my test session. > > > > I reproduced this on my machine. > > > > I think we'd better do "analyze" before insert which helps reproduce this > easier. > > Like: > > > > ----- > > analyze; > > explain analyze verbose insert into testscan select a from x where > > a<80000 or (a%2=0 and a>199900000); > > ----- > > OK then. > Can you check if just the underlying SELECTs are run (without INSERT), is > there any performance degradation when compared to a non-parallel scan?
It seems there is no performance degradation without insert. Till now, what I found is that: With tang's conf, when doing parallel insert, the walrecord is more than serial insert (IMO, this is the main reason why it has performance degradation) See the attatchment for the plan info. I have tried alter the target table to unlogged and then the performance degradation will not happen any more. And the additional walrecord seems related to the index on the target table. If the target table does not have any index, the wal record is the same between parallel plan and serial plan. Also, it does not have performance degradation without index. I am still looking at this problem, if someone think of something about it, It's very grateful to share the knowledge with me. Best regards, houzj
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) (actual time=360.474..360.476 rows=0 loops=1) Buffers: shared hit=392569 read=3 dirtied=934 written=933 WAL: records=260354 bytes=16259841 -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005 rows=129999 loops=1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 50000 Heap Blocks: exact=975 Buffers: shared hit=1475 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=7.975..7.976 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.469..3.470 rows=79999 loops=1) Index Cond: (x.a < 80000) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.499..4.499 rows=100000 loops=1) Index Cond: (x.a > 199900000) Buffers: shared hit=278 Planning: Buffers: shared hit=10 Planning Time: 0.126 ms Execution Time: 360.547 ms (22 rows)
postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<80000 or (a%2=0 and a>199900000); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=4272.20..1269111.15 rows=79918 width=0) (actual time=381.764..382.715 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5) Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 Worker 0: actual time=376.638..376.640 rows=0 loops=1 Buffers: shared hit=88281 dirtied=236 written=337 WAL: records=56167 bytes=3674994 Worker 1: actual time=377.889..377.892 rows=0 loops=1 Buffers: shared hit=81231 dirtied=213 written=99 WAL: records=52175 bytes=3386885 Worker 2: actual time=377.388..377.389 rows=0 loops=1 Buffers: shared hit=82544 dirtied=232 written=279 WAL: records=52469 bytes=3443843 Worker 3: actual time=377.733..377.734 rows=0 loops=1 Buffers: shared hit=87728 dirtied=225 written=182 WAL: records=56307 bytes=3660758 -> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 80000) OR (x.a > 199900000)) Filter: ((x.a < 80000) OR (((x.a % 2) = 0) AND (x.a > 199900000))) Rows Removed by Filter: 10000 Heap Blocks: exact=167 Buffers: shared hit=1475 Worker 0: actual time=5.203..14.921 rows=28028 loops=1 Buffers: shared hit=209 Worker 1: actual time=5.165..14.403 rows=26039 loops=1 Buffers: shared hit=196 Worker 2: actual time=5.188..14.284 rows=26177 loops=1 Buffers: shared hit=195 Worker 3: actual time=5.151..14.760 rows=28104 loops=1 Buffers: shared hit=208 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=8.288..8.290 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.681..3.681 rows=79999 loops=1) Index Cond: (x.a < 80000) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.605..4.605 rows=100000 loops=1) Index Cond: (x.a > 199900000) Buffers: shared hit=278 Planning: Buffers: shared hit=19 Planning Time: 0.173 ms Execution Time: 382.776 ms (47 rows)