Re: Remove useless GROUP BY columns considering unique index

Andrei Lepikhov Wed, 11 Dec 2024 19:39:17 -0800

On 12/12/24 10:09, David Rowley wrote:

On Mon, 2 Dec 2024 at 17:18, Andrei Lepikhov <lepi...@gmail.com> wrote:

Patch 0002 looks helpful and performant. I propose to check 'relid > 0'
to avoid diving into 'foreach(lc, parse->rtable)' at all if nothing has
been found.


I did end up adding another fast path there, but I felt like checking
relid > 0 wasn't as good as it could be as that would have only
short-circuited when we don't see any Vars of level 0 in the GROUP BY.
It seemed cheap enough to short-circuit when none of the relations
mentioned in the GROUP BY have multiple columns mentioned.

Your solution seems much better my proposal. Thanks to apply it!

when how do you decide if the GROUP BY should become t1.a,t1.b or
t2.x,t2.y? It's not clear to me that using t1's columns is always
better than using t2's. I imagine using a mix is never better, but I'm
unsure how you'd decide which ones to use.

Depends on how to calculate that 'better'. Right now, GROUP-BY employstwo strategies to reduce path cost: 1) ORDER-BY statement (avoid finalsorting); 2) To fit incoming subtree pathkeys (avoid grouping presorting).My idea comes close with [1], where the cost depends on the estimatednumber of groups in the first grouping column because cost_sort predictsthe number of comparison operator calls based on statistics. In thiscase, the choice between (x,y) and (a,b) will depend on the ndistinct of'x' and 'a'.In general, it was the idea to debate, more for further development thanfor the patch in this thread.


[1] Consider the number of columns in the sort cost model
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com

--
regards, Andrei Lepikhov

Re: Remove useless GROUP BY columns considering unique index

Reply via email to