On 12/12/24 10:09, David Rowley wrote:
On Mon, 2 Dec 2024 at 17:18, Andrei Lepikhov <lepi...@gmail.com> wrote:
Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
been found.

I did end up adding another fast path there, but I felt like checking
relid > 0 wasn't as good as it could be as that would have only
short-circuited when we don't see any Vars of level 0 in the GROUP BY.
It seemed cheap enough to short-circuit when none of the relations
mentioned in the GROUP BY have multiple columns mentioned.
Your solution seems much better my proposal. Thanks to apply it!

when how do you decide if the GROUP BY should become t1.a,t1.b or
t2.x,t2.y? It's not clear to me that using t1's columns is always
better than using t2's. I imagine using a mix is never better, but I'm
unsure how you'd decide which ones to use.
Depends on how to calculate that 'better'. Right now, GROUP-BY employs two strategies to reduce path cost: 1) ORDER-BY statement (avoid final sorting); 2) To fit incoming subtree pathkeys (avoid grouping presorting). My idea comes close with [1], where the cost depends on the estimated number of groups in the first grouping column because cost_sort predicts the number of comparison operator calls based on statistics. In this case, the choice between (x,y) and (a,b) will depend on the ndistinct of 'x' and 'a'. In general, it was the idea to debate, more for further development than for the patch in this thread.

[1] Consider the number of columns in the sort cost model
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

--
regards, Andrei Lepikhov


Reply via email to