Bharath-san, all,
Hmm, I didn't experience performance degradation on my poor-man's Linux VM (4 CPU, 4 GB RAM, HDD)... [benchmark preparation] autovacuum = off shared_buffers = 1GB checkpoint_timeout = 1h max_wal_size = 8GB min_wal_size = 8GB (other settings to enable parallelism) CREATE UNLOGGED TABLE a (c char(1100)); INSERT INTO a SELECT i FROM generate_series(1, 300000) i; (the table size is 335 MB) [benchmark] CREATE TABLE b AS SELECT * FROM a; DROP TABLE a; CHECKPOINT; (measure only CTAS) [results] parallel_leader_participation = off workers time(ms) 0 3921 2 3290 4 3132 parallel_leader_participation = on workers time(ms) 2 3266 4 3247 Although this should be a controversial and may be crazy idea, the following change brought 4-11% speedup. This is because I thought parallel workers might contend for WAL flush as a result of them using the limited ring buffer and flushing dirty buffers when the ring buffer is filled. Can we take advantage of this? [GetBulkInsertState] /* bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/ bistate->strategy = NULL; [results] parallel_leader_participation = off workers time(ms) 0 3695 (5% reduction) 2 3135 (4% reduction) 4 2767 (11% reduction) Regards Takayuki Tsunakawa