On 10/15/24 12:15, David Rowley wrote:
As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate patch. The main purpose here is to debate the way & introduce considering of number of columns. Conservative development approach has been preferred before.

it seems you're just charging cpu_operator_cost * <number of columns
to sort by>. It seems like it won't be very hard to fool that into
doing the wrong thing when the first column to sort by is distinct or
almost distinct. There's going to be far fewer or no tiebreaker
comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the balance between different types of orderings can be changed in extreme cases. I can join this patch with the following, implementing differentiation by distincts, but the effect on regression tests will be smoothed out. The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give the optimiser a tool to choose a more optimal sort, comparing MergeAppend/Sort/IncrementalSort costs. The whole idea is implemented in the branch [1] and described in the post [2]. Of course, we differentiate sortings by distinct of the first column (only one trustworthy statistic). It is not so difficult (but I doubt the real necessity) to use extended statistics and reflect in the cost values of different combinations of columns.


As mentioned by Kirill, I also don't understand the cost_sort
signature change. Why would you do that over just doing
list_length(pathkeys) within cost_sort()? Your throwing away a
parameter that might  be the most useful one of the bunch for allowing
better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the interface later, if necessary.
Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov



Reply via email to