On Sun, Sep 15, 2024 at 6:01 AM Tomas Vondra <to...@vondra.me> wrote: > Hmmm ... I wasn't involved in that discussion and I haven't studied the > thread now, but this seems a bit weird to me. If the presorted keys have > low cardinality, can't the full sort be faster? > > Maybe it's not possible with the costing changes (removing the > penalization), in which case it would be fine to not consider the full > sort, because we'd just throw it away immediately. But if the full sort > could be costed as cheaper, I'd say that might be wrong.
IIUC, we initially applied a 1.5x pessimism factor to the cost estimates of incremental sort in an effort to avoid performance regressions in cases with a large skew in the number of rows within the presorted groups. OTOH, this also restricted our ability to leverage incremental sort when it would be preferable. I agree with you that sometimes the definition of 'regression' can depend on when the alternative plans are introduced. Imagine if we initially did not have the 1.5x pessimism factor and introduced it later, it would be treated as a 'regression' because some queries that could benefit from incremental sort would then have to resort to full sort. In commit 4a29eabd1, we removed the 1.5x pessimism factor to allow incremental sort to have a fairer chance at being chosen against a full sort. With this change, the cost model now tends to favor incremental sort as being cheaper than full sort in the presence of presorted keys, making it unnecessary to consider full sort in such cases, because, as you mentioned, we'd just throw the full sort path away immediately. So 4a29eabd1 also modified the logic so that if there are presorted keys, we only create an incremental sort path. As for the potential performance regressions caused by these changes, 4a29eabd1 opted to use enable_incremental_sort as an escape hatch. I think the same theory applies to mergejoins too. We can leverage incremental sort if it is enabled and there are presorted keys, assuming that it is always faster than full sort, and use the GUC as an escape hatch in the worst case. So here is the v2 patch, which is almost the same as v1, but includes changes in test cases and comments, and also includes a commit message. I'll put it in the commitfest. Thanks Richard
v2-0001-Consider-explicit-incremental-sort-for-mergejoins.patch
Description: Binary data