On Sat, Mar 8, 2025 at 6:15 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Sun, 9 Mar 2025 at 10:39, Tom Lane <t...@sss.pgh.pa.us> wrote: > > One thing that puzzled me a bit is that many of the outputs > > show "ROWS UNBOUNDED PRECEDING" in window functions where that > > definitely wasn't in the source query. Eventually I realized > > that that comes from window_row_number_support() and cohorts > > optimizing the query. While this isn't wrong, I suspect it > > will cause a lot of confusion and questions. I wonder if we > > should do something to hide the change? > > I suspect it might be more confusing if we were to show the user the > original frame options. Isn't EXPLAIN meant to be a window into the > plan that's been or would be executed? I think it would be misleading > to display something different to what will be executed. > > Looking at this example: SELECT empno, depname, row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn, rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk, count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW) cnt FROM empsalary; The new output is: WindowAgg Output: empno, depname, (row_number() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)), (rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)), count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW), enroll_date -> WindowAgg Output: depname, enroll_date, empno, row_number() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING), rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING) -> Sort Output: depname, enroll_date, empno Sort Key: empsalary.depname, empsalary.enroll_date -> Seq Scan on pg_temp.empsalary Output: depname, enroll_date, empno It is kinda annoying that row_number and rank have their entire expression output twice when the computation only happens once. But that is outside the scope; just making an observation. It just becomes even worse when we fill in the details. As for the optimization, any reason to not just show that it was done? In optimize_window_clauses arrange to save the existing_wc somewhere on the relevant window functions then, in explain, output something like: -> WindowAgg Output: depname, enroll_date, empno, row_number() OVER (...), rank() OVER (...) Reframed: row_number() from (default) RANGE => ROWS (I'm unsure whether we can write "default" here though, it isn't critical.) Reframed: rank() from UNBOUNDED FOLLOWING => CURRENT ROW (I initially put the entire frame clause, without omitting default frame_end, there but then figured it defeated the point. We should only show those elements (type, start, end) that actually are different between the parsed query and what gets executed.) Which does bring up the point, to what extent should the explain output rely on defaults versus being explicit? We are omitting frame_end of CURRENT ROW generally here. David J.