Re: [PATCH] Add extra statistics to explain for Nested Loop

Julien Rouhaud Fri, 16 Oct 2020 21:26:58 -0700

On Sat, Oct 17, 2020 at 12:15 PM Pavel Stehule <pavel.steh...@gmail.com> wrote:
>
> so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova 
> <a.lubennik...@postgrespro.ru> napsal:
>>
>> On 16.10.2020 12:07, Julien Rouhaud wrote:
>>
>> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <pavel.steh...@gmail.com> a 
>> écrit :
>>>
>>>
>>>
>>> pá 16. 10. 2020 v 9:43 odesílatel <e.sokol...@postgrespro.ru> napsal:
>>>>
>>>> Hi, hackers.
>>>> For some distributions of data in tables, different loops in nested loop
>>>> joins can take different time and process different amounts of entries.
>>>> It makes average statistics returned by explain analyze not very useful
>>>> for DBA.
>>>> To fix it, here is the patch that add printing of min and max statistics
>>>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
>>>> Please don't hesitate to share any thoughts on this topic!
>>>
>>>
>>> +1
>>>
>>> This is great feature - sometimes it can be pretty messy current limited 
>>> format
>>
>>
>> +1, this can be very handy!
>>
>> Cool.
>> I have added your patch to the commitfest, so it won't get lost.


Thanks!  I'll also try to review it next week.

>> https://commitfest.postgresql.org/30/2765/
>>
>> I will review the code next week.  Unfortunately, I cannot give any feedback 
>> about usability of this feature.
>>
>> User visible change is:
>>
>> -               ->  Nested Loop (actual rows=N loops=N)
>> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
>
>
> This interface is ok - there is not too much space for creativity.

Yes I also think it's ok. We should also consider usability for tools
like explain.depesz.com, I don't know if the current output is best.
I'm adding Depesz and Pierre which are both working on this kind of
tool for additional input.

> I can imagine displaying variance or average - but I am afraid about very bad 
> performance impacts.

The original counter (rows here) is already an average right?
Variance could be nice too.  Instrumentation will already spam
gettimeofday() calls for nested loops, I don't think that computing
variance would add that much overhead?

Re: [PATCH] Add extra statistics to explain for Nested Loop

Reply via email to