On 18.02.2025 23:55, Andrei Lepikhov wrote:
On 17/2/2025 15:19, Robert Haas wrote:
On Mon, Feb 17, 2025 at 3:08 AM Ilia Evdokimov
if (nloops > 1)

Instead of:

if (nloops > 1 && rows_is_fractonal)

I don't think it's really safe to just cast a double back to int64. In
practice, the number of tuples should never be large enough to
overflow int64, but if it did, this result would be nonsense. Also, if
the double ever lost precision, the result would be nonsense. If we
want to have an exact count of tuples, we ought to change ntuples and
ntuples2 to be uint64. But I don't think we should do that in this
patch, because that adds a whole bunch of new problems to worry about
and might cause us to get nothing committed. Instead, I think we
should just always show two decimal digits if there's more than one
loop.

That's simpler than what the patch currently does and avoids this
problem. Perhaps it's objectionable for some other reason, but if so,
can somebody please spell out what that reason is so we can talk about
it?
I can understand two decimal places. You might be concerned about potential issues with some codes that parse PostgreSQL explains. However, I believe it would be beneficial to display fractional parts only when iterations yield different numbers of tuples. Given that I often work with enormous explains, I think this approach would enhance the readability and comprehension of the output. Frequently, I may see only part of the EXPLAIN on the screen. A floating-point row number format may immediately give an idea about parameterisation (or another reason for the subtree's variability) and trace it down to the source.


The idea of indicating to the user that different iterations produced varying numbers of rows is quite reasonable. Most likely, this would require adding a new boolean field to the Instrumentation structure, which would track this information by comparing the rows value from the current and previous iterations.

However, there is a major issue: this case would be quite difficult to document clearly. Even with an example and explanatory text, users may still be confused about why rows=100 means the same number of rows on all iterations, while rows=100.00 indicates variation. Even if we describe this in the documentation, a user seeing rows=100.00 will most likely assume it represents an average of 100 rows per iteration and may still not realize that the actual number of rows varied.

If we want to convey this information more clearly, we should consider a more explicit approach. For example, instead of using a fractional value, we could display the minimum and maximum row counts observed during execution (e.g.,rows=10..20, formatting details could be discussed). However, in my opinion, this discussion is beyond the scope of this thread.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Reply via email to