Re: explain analyze rows=%.0f

Ilia Evdokimov Wed, 19 Feb 2025 06:01:43 -0800


On 18.02.2025 23:55, Andrei Lepikhov wrote:

On 17/2/2025 15:19, Robert Haas wrote:
On Mon, Feb 17, 2025 at 3:08 AM Ilia Evdokimov
if (nloops > 1)

Instead of:

if (nloops > 1 && rows_is_fractonal)

I don't think it's really safe to just cast a double back to int64. In
practice, the number of tuples should never be large enough to
overflow int64, but if it did, this result would be nonsense. Also, if
the double ever lost precision, the result would be nonsense. If we
want to have an exact count of tuples, we ought to change ntuples and
ntuples2 to be uint64. But I don't think we should do that in this
patch, because that adds a whole bunch of new problems to worry about
and might cause us to get nothing committed. Instead, I think we
should just always show two decimal digits if there's more than one
loop.

That's simpler than what the patch currently does and avoids this
problem. Perhaps it's objectionable for some other reason, but if so,
can somebody please spell out what that reason is so we can talk about
it?
I can understand two decimal places. You might be concerned aboutpotential issues with some codes that parse PostgreSQL explains.However, I believe it would be beneficial to display fractional partsonly when iterations yield different numbers of tuples. Given that Ioften work with enormous explains, I think this approach would enhancethe readability and comprehension of the output. Frequently, I may seeonly part of the EXPLAIN on the screen. A floating-point row numberformat may immediately give an idea about parameterisation (or anotherreason for the subtree's variability) and trace it down to the source.

The idea of indicating to the user that different iterations producedvarying numbers of rows is quite reasonable. Most likely, this wouldrequire adding a new boolean field to the Instrumentation structure,which would track this information by comparing the rows value from thecurrent and previous iterations.

However, there is a major issue: this case would be quite difficult todocument clearly. Even with an example and explanatory text, users maystill be confused about why rows=100 means the same number of rows onall iterations, while rows=100.00 indicates variation. Even if wedescribe this in the documentation, a user seeing rows=100.00 will mostlikely assume it represents an average of 100 rows per iteration and maystill not realize that the actual number of rows varied.

If we want to convey this information more clearly, we should consider amore explicit approach. For example, instead of using a fractionalvalue, we could display the minimum and maximum row counts observedduring execution (e.g.,rows=10..20, formatting details could bediscussed). However, in my opinion, this discussion is beyond the scopeof this thread.


Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Re: explain analyze rows=%.0f

Reply via email to