On 18.02.2025 23:55, Andrei Lepikhov wrote:
On 17/2/2025 15:19, Robert Haas wrote:
On Mon, Feb 17, 2025 at 3:08 AM Ilia Evdokimov
if (nloops > 1)
Instead of:
if (nloops > 1 && rows_is_fractonal)
I don't think it's really safe to just cast a double back to int64. In
practice, the number of tuples should never be large enough to
overflow int64, but if it did, this result would be nonsense. Also, if
the double ever lost precision, the result would be nonsense. If we
want to have an exact count of tuples, we ought to change ntuples and
ntuples2 to be uint64. But I don't think we should do that in this
patch, because that adds a whole bunch of new problems to worry about
and might cause us to get nothing committed. Instead, I think we
should just always show two decimal digits if there's more than one
loop.
That's simpler than what the patch currently does and avoids this
problem. Perhaps it's objectionable for some other reason, but if so,
can somebody please spell out what that reason is so we can talk about
it?
I can understand two decimal places. You might be concerned about
potential issues with some codes that parse PostgreSQL explains.
However, I believe it would be beneficial to display fractional parts
only when iterations yield different numbers of tuples. Given that I
often work with enormous explains, I think this approach would enhance
the readability and comprehension of the output. Frequently, I may see
only part of the EXPLAIN on the screen. A floating-point row number
format may immediately give an idea about parameterisation (or another
reason for the subtree's variability) and trace it down to the source.
The idea of indicating to the user that different iterations produced
varying numbers of rows is quite reasonable. Most likely, this would
require adding a new boolean field to the Instrumentation structure,
which would track this information by comparing the rows value from the
current and previous iterations.
However, there is a major issue: this case would be quite difficult to
document clearly. Even with an example and explanatory text, users may
still be confused about why rows=100 means the same number of rows on
all iterations, while rows=100.00 indicates variation. Even if we
describe this in the documentation, a user seeing rows=100.00 will most
likely assume it represents an average of 100 rows per iteration and may
still not realize that the actual number of rows varied.
If we want to convey this information more clearly, we should consider a
more explicit approach. For example, instead of using a fractional
value, we could display the minimum and maximum row counts observed
during execution (e.g.,rows=10..20, formatting details could be
discussed). However, in my opinion, this discussion is beyond the scope
of this thread.
Any thoughts?
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.