> > Till now, what I found is that:
> > With tang's conf, when doing parallel insert, the walrecord is more
> > than serial insert (IMO, this is the main reason why it has
> > performance degradation) See the attatchment for the plan info.
> >
> > I have tried alter the target table to unlogged and then the
> > performance degradation will not happen any more.
> >
> > And the additional walrecord seems related to the index on the target
> table.
> > If the target table does not have any index, the wal record is the
> > same between parallel plan and serial plan.
> > Also, it does not have performance degradation without index.
>
>
> [serial]
> Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0)
> (actual time=360.474..360.476 rows=0 loops=1)
> Buffers: shared hit=392569 read=3 dirtied=934 written=933
> WAL: records=260354 bytes=16259841
>
> [parallel]
> -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0
> width=0) (actual time=378.227..378.229 rows=0 loops=5)
> Buffers: shared hit=407094 read=4 dirtied=1085 written=1158
> WAL: records=260498 bytes=17019359
>
>
> More pages are dirtied and written in the parallel execution. Aren't the
> index and possibly the target table bigger with parallel execution than
> with serial execution? That may be due to the difference of inserts of
> index keys.
Yes, the table size and index size is bigger with parallel execution.
table and index's size after parallel insert
----------------------
postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
pg_size_pretty
----------------
4048 kB
(1 row)
postgres=#
postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
pg_size_pretty
----------------
4768 kB
(1 row)
----------------------
table and index's size after serial insert
----------------------
postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
pg_size_pretty
----------------
2864 kB
(1 row)
postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
pg_size_pretty
----------------
4608 kB
----------------------
To Amit:
> I think you might want to see which exact WAL records are extra by using
> pg_waldump?
Yes, thanks for the hint, I was doing that and the result is as follow:
Heap wal record is the same between parallel and serial: (129999 which is the
number count of the query result).
parallel Btree walrecord(130500 record):
----------------------
INSERT_LEAF:129500
INSERT_UPPER:497
SPLIT_L:172
SPLIT_R:328
INSERT_POST:0
DEDUP:0
VACUUM:0
DELETE:0
MARK_PAGE_HALFDEAD:0
UNLINK_PAGE:0
UNLINK_PAGE_META:0
NEWROOT:3
REUSE_PAGE:0
META_CLEANUP:0
----------------------
serial Btree walrecord(130355 record):
----------------------
INSERT_LEAF:129644
INSERT_UPPER:354
SPLIT_L:0
SPLIT_R:355
INSERT_POST:0
DEDUP:0
VACUUM:0
DELETE:0
MARK_PAGE_HALFDEAD:0
UNLINK_PAGE:0
UNLINK_PAGE_META:0
NEWROOT:2
REUSE_PAGE:0
META_CLEANUP:0
----------------------
IMO, due to the difference of inserts with parallel execution,
the btree insert's cost is more than serial.
At the same time, the parallel does not have a huge performance gain with
bitmapscan,
So the extra cost of btree index will result in performance degradation.
Does it make sense ?
Best regards,
Houzj