Re: Consider the number of columns in the sort cost model

Andrei Lepikhov Mon, 14 Oct 2024 23:21:31 -0700

On 10/15/24 12:15, David Rowley wrote:

As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement.

I didn't intend to treat it as an 'improvement' but as an intermediatepatch. The main purpose here is to debate the way & introduceconsidering of number of columns. Conservative development approach hasbeen preferred before.


it seems you're just charging cpu_operator_cost * <number of columns
to sort by>. It seems like it won't be very hard to fool that into
doing the wrong thing when the first column to sort by is distinct or
almost distinct. There's going to be far fewer or no tiebreaker
comparisons for that case.

As I've written above, it is for starters. It allow to analyse how thebalance between different types of orderings can be changed in extremecases. I can join this patch with the following, implementingdifferentiation by distincts, but the effect on regression tests will besmoothed out.The primary idea is 1) to elaborate GROUP-BY optimisation and 2) givethe optimiser a tool to choose a more optimal sort, comparingMergeAppend/Sort/IncrementalSort costs.The whole idea is implemented in the branch [1] and described in thepost [2]. Of course, we differentiate sortings by distinct of the firstcolumn (only one trustworthy statistic). It is not so difficult (but Idoubt the real necessity) to use extended statistics and reflect in thecost values of different combinations of columns.


As mentioned by Kirill, I also don't understand the cost_sort
signature change. Why would you do that over just doing
list_length(pathkeys) within cost_sort()? Your throwing away a
parameter that might  be the most useful one of the bunch for allowing
better sort cost estimates.

Ok, may be it is too much for an intermediate patch. We can change theinterface later, if necessary.

Perhaps that could be done within the EquivalenceClass.

Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

--
regards, Andrei Lepikhov

Re: Consider the number of columns in the sort cost model

Reply via email to