On 12/9/2024 12:12, David Rowley wrote:
On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepi...@gmail.com> wrote:
Initial problem causes wrong cost_sort estimation. Right now I think
about providing cost_sort() the sort clauses instead of (or in addition
to) the pathkeys.
I'm not quite sure why the sort clauses matter any more than the
EquivalenceClass. If the EquivalanceClass defines that all members
will have the same value for any given row, then, if we had to choose
any single member to drive the n_distinct estimate from, isn't the
most accurate distinct estimate from the member with the smallest
n_distinct estimate? (That assumes the less distinct member has every
value the more distinct member has, which might not be true)
Thanks for your efforts! Your idea looks more stable and applicable than
my patch.
BTW, it could still provide wrong ndistinct estimations if we choose a
sorting operator under clauses mentioned in the EquivalenceClass.
However, this thread's primary intention is to stabilize query plans, so
I'll try to implement your idea.
The second reason was to distinguish sortings by cost (see proposal [1])
because sometimes it could help to save CPU cycles on comparisons.
Having a lot of sort/grouping queries with only sporadic joins, I see
how profitable it could sometimes be - text or numeric grouping over
mostly Cartesian join may be painful without fine tuned sorting.
[1]
https://www.postgresql.org/message-id/8742aaa8-9519-4a1f-91bd-364aec65f...@gmail.com
--
regards, Andrei Lepikhov