Re: Query Performance / Planner estimate off

Mats Julian Olsen Tue, 20 Oct 2020 04:10:13 -0700

On Tue, Oct 20, 2020 at 10:50 AM Pavel Stehule <[email protected]>
wrote:


>
>
> út 20. 10. 2020 v 11:59 odesílatel Mats Julian Olsen <
> [email protected]> napsal:
>
>> On Tue, Oct 20, 2020 at 9:50 AM David Rowley <[email protected]>
>> wrote:
>>
>>> On Tue, 20 Oct 2020 at 22:38, Mats Julian Olsen <[email protected]>
>>> wrote:
>>> >
>>> > The crux of our issue is that the query planner chooses a nested loop
>>> join for this query. Essentially making this query (and other queries) take
>>> a very long time to complete. In contrast, by toggling `enable_nestloop`
>>> and `enable_seqscan` off we can take the total runtime down from 16 minutes
>>> to 2 minutes.
>>> >
>>> > 1) Vanilla plan (16 min) : https://explain.depesz.com/s/NvDR
>>> > 2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK
>>> > 3) enable_nestloop=off; enable_seqscan=off (2 min):
>>> https://explain.depesz.com/s/0WXx
>>> >
>>> > How can I get Postgres not to loop over 12M rows?
>>>
>>> You'll likely want to look at what random_page_cost is set to. If the
>>> planner is preferring nested loops then it may be too low.  You'll
>>> also want to see if effective_cache_size is set to something
>>> realistic.  Higher values of that will prefer nested loops like this.
>>>
>>
>> random_page_cost is 1.1 and effective_cache_size is '60GB' (listed in the
>> gist). random_page_cost may be too low?
>>
>
> random_page_cost 2 is safer - the value 1.5 is a little bit aggressive for
> me.
>

Thanks Pavel. I tried changing random_page_cost from 1.1 to 2, to 3... all
the way up to 10. All values resulted in the same query plan, except for
10, which then executed a parallel hash join (however with sequential
scans) https://explain.depesz.com/s/Srcb.

10 seems like a way too high value for random_page_cost though?


>
>>
>>> You may also want to reduce max_parallel_workers_per_gather.  It looks
>>> like you're not getting your parallel workers as often as you'd like.
>>> If the planner chooses a plan thinking it's going to get some workers
>>> and gets none, then that plan may be inferior the one that the planner
>>> would have chosen if it had known the workers would be unavailable.
>>>
>>
>> Interesting, here are the values for those:
>> max_parallel_workers = 8
>> max_parallel_workers_per_gather = 4
>>
>>
>>>
>>> > Let me know if there is anything I left out here that would be useful
>>> for further debugging.
>>>
>>> select name,setting from pg_Settings where category like 'Query
>>> Tuning%' and source <> 'default';
>>> select version();
>>>
>>
>> default_statistics_target = 500
>> effective_cache_size = 7864320
>> random_page_cost = 1.1
>>
>> PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
>> compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit
>>
>>>
>>> would be useful.
>>>
>>> David
>>>
>>
>> Thanks David, see above for more information.
>>
>> --
>> Mats
>> CTO @ Dune Analytics
>> We're hiring: https://careers.duneanalytics.com
>>
>

-- 
Mats
CTO @ Dune Analytics
We're hiring: https://careers.duneanalytics.com

Re: Query Performance / Planner estimate off

Reply via email to