Hi all,
I’ve been experimenting with some performance tuning on a particular query, and 
I observed a result that I don’t understand. 

I’ve been setting max_parallel_workers_per_gather to values the range 1-6 and 
then running EXPLAIN ANALYZE to see how much  benefit we get from more 
parallelization. My data is organized by year, so the year is a parameter in 
the query’s WHERE clause.

For my 2018 data, Postgres launches as many workers as 
max_parallel_workers_per_gather permits, and the execution time decreases 
nicely, from 280 seconds with 1 worker all the way down to 141s with 6 workers. 
So far, so good.

When I run the same query for our 2022 data, I get the same behavior 
(improvement) for max_parallel_workers_per_gather values of 1-4. But with 
max_parallel_workers_per_gather set to 5 or 6, Postgres only uses 1 worker, and 
the execution time increases dramatically, even worse than when I deliberately 
limit the number of workers to 1  —

- max_parallel_workers_per_gather=1, runtime = 1061s
- max_parallel_workers_per_gather=2, runtime = 770s
- max_parallel_workers_per_gather=3, runtime = 637s
- max_parallel_workers_per_gather=4, runtime = 573s
- max_parallel_workers_per_gather=5, runtime = 1468s
- max_parallel_workers_per_gather=6, runtime = 1469s

Our 2022 data set is several times larger than our 2018 data, so I suspect some 
resource is getting exhausted, but I’m not sure what. So far, this result has 
been 100% re-creatable. I’m on a dedicated test server with 16 virtual CPUs and 
128Gb RAM; no one else is competing with me for Postgres processes. 
max_worker_processes and max_parallel_workers are both set to 12.

Can anyone help me understand why this happens, or where I might look for 
clues? 

Thanks,
Philip

Reply via email to