On Sat, Mar 8, 2025 at 6:15 PM David Rowley <dgrowle...@gmail.com> wrote:

> On Sun, 9 Mar 2025 at 10:39, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > One thing that puzzled me a bit is that many of the outputs
> > show "ROWS UNBOUNDED PRECEDING" in window functions where that
> > definitely wasn't in the source query.  Eventually I realized
> > that that comes from window_row_number_support() and cohorts
> > optimizing the query.  While this isn't wrong, I suspect it
> > will cause a lot of confusion and questions.  I wonder if we
> > should do something to hide the change?
>
> I suspect it might be more confusing if we were to show the user the
> original frame options. Isn't EXPLAIN meant to be a window into the
> plan that's been or would be executed? I think it would be misleading
> to display something different to what will be executed.
>
>
Looking at this example:

SELECT
    empno,
    depname,
    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
    rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                 UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
    count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
                   CURRENT ROW AND CURRENT ROW) cnt
FROM empsalary;

The new output is:

 WindowAgg
   Output: empno, depname, (row_number() OVER (PARTITION BY depname ORDER
BY enroll_date ROWS UNBOUNDED PRECEDING)), (rank() OVER (PARTITION BY
depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)), count(*) OVER
(PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND
CURRENT ROW), enroll_date
   ->  WindowAgg
         Output: depname, enroll_date, empno, row_number() OVER (PARTITION
BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING), rank() OVER
(PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)
         ->  Sort
               Output: depname, enroll_date, empno
               Sort Key: empsalary.depname, empsalary.enroll_date
               ->  Seq Scan on pg_temp.empsalary
                     Output: depname, enroll_date, empno


It is kinda annoying that row_number and rank have their entire expression
output twice when the computation only happens once.  But that is outside
the scope; just making an observation.  It just becomes even worse when we
fill in the details.

As for the optimization, any reason to not just show that it was done?  In
optimize_window_clauses arrange to save the existing_wc somewhere on the
relevant window functions then, in explain, output something like:

-> WindowAgg
Output: depname, enroll_date, empno, row_number() OVER (...), rank() OVER
(...)
Reframed: row_number() from (default) RANGE => ROWS
(I'm unsure whether we can write "default" here though, it isn't critical.)
Reframed: rank() from UNBOUNDED FOLLOWING => CURRENT ROW

(I initially put the entire frame clause, without omitting default
frame_end, there but then figured it defeated the point.  We should only
show those elements (type, start, end) that actually are different between
the parsed query and what gets executed.)

Which does bring up the point, to what extent should the explain output
rely on defaults versus being explicit?  We are omitting frame_end of
CURRENT ROW generally here.

David J.

Reply via email to