Re: increased max_parallel_workers_per_gather results in fewer workers?

Philip Semanchuk Thu, 04 Jun 2020 14:31:12 -0700

> On Jun 4, 2020, at 3:03 PM, Sebastian Dressler <sebast...@swarm64.com> wrote:
> 
> Hi Philip,
> 
>> On 4. Jun 2020, at 20:37, Philip Semanchuk <phi...@americanefficient.com> 
>> wrote:
>> 
>> [...]
>>> 
>>>> This brings up a couple of questions —
>>>> 1) I’ve read that this is Postgres’ formula for the max # of workers it 
>>>> will consider for a table —
>>>> 
>>>>  max_workers = log3(table size / min_parallel_table_scan_size)
>>>> 
>>>> Does that use the raw table size, or does the planner use statistics to 
>>>> estimate the size of the subset of the table that will be read before 
>>>> allocating workers?
>>> 
>>> "table size" is the number of PSQL pages, i.e. relation-size / 8 kB. This 
>>> comes from statistics.
>> 
>> OK, so it sounds like the planner does *not* use the values in pg_stats when 
>> planning workers, true?
> 
> Full disclosure: I am not too deep into these internals, likely others on the 
> list know much more about it. But with respect to the relation size, I think 
> this is tracked elsewhere, it might be affected by other parameters though 
> like vacuuming and probably, the estimated amount of how much of the table is 
> scanned also plays a role.

I’m not too familiar with the internals either, but if I interpret this line of 
code correctly, it’s seems that pg_stats is not involved, and the worker 
allocation is based strictly on pages in the relation --
https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/allpaths.c#L800

That means I still don’t have a reason for why this query gets a different 
number of workers depending on the WHERE clause, but I can experiment with that 
more on my own. 


>> I’m still trying to understand one thing I’ve observed. I can run the query 
>> that produced the plan in the gist I linked to above with 
>> max_parallel_workers_per_gather=6 and the year param = 2018, and I get 6 
>> workers. When I set the year param=2022 I get only one worker. Same tables, 
>> same query, different parameter. That suggests to me that the planner is 
>> using pg_stats when allocating workers, but I can imagine there might be 
>> other things going on that I don’t understand. (I haven’t ruled out that 
>> this might be an AWS-specific quirk, either.)
> 
> I think it would be helpful, if you could post again both plans. The ideal 
> would be to use https://explain.dalibo.com/ and share the links. You will 
> have to generate them with JSON format, but still can anonymize them.

I really appreciate all the help you and others have already given. I think I’m 
good for now. 

Thank you so much,
Philip
Re: increased max_parallel_workers_per_gather results in fewer workers?

Reply via email to