On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <to...@vondra.me> wrote: > I think we intentionally added incremental sort ... incrementally ;-)
Haha, right. > I think one challenge with this case is that create_mergejoin_plan > creates these Sort plans explicitly - it's not "pathified" so it doesn't > go through the usual cost comparison etc. And it certainly is not the > case that incremental sort would win always, so we'd need to replicate > the cost comparison logic too. > > I have not thought about this particular case too much, but how likely > is it that mergejoin will win for this plan in practice? If we consider > a query that only needs a fraction of rows (say, thanks to LIMIT), > aren't we likely to pick a nested loop (which can do incremental sort > for the outer plan)? For joins that need to run to completion it might > be a win, but there's also the higher risk of a poor costing. Yeah, currently mergejoin path always assumes that full sort would be used on top of the outer path and inner path if they are not already ordered appropriately. So initial_cost_mergejoin directly charges the cost of full sort into outer/inner path's cost, without going through the usual cost comparison with incremental sort. It seems to me that some parts of our code assume that, for a given input path that is partially ordered, an incremental sort is always preferable to a full sort (see commit 4a29eabd1). Am I getting this correctly? If this is the case, then I think using the following outer path for the merge join: -> Incremental Sort Sort Key: t1.a, t1.b Presorted Key: t1.a -> Index Scan using t1_a_idx on t1 ... would be an immediate improvement over the current path, which is: -> Sort Sort Key: t1.a, t1.b -> Index Scan using t1_a_idx on t1 The shaky cost estimates for incremental sort that you mentioned are indeed a concern. Fortunately we have the enable_incremental_sort GUC already. As in may other parts of the code (such as in the function add_paths_to_grouping_rel), we can always disable incremental sort to fall back to a full sort if needed. > I'm not saying it's not worth exploring, I'm just trying recall reasons > why it might not work. Also I don't think it's fundamentally impossible > to do mark/restore for incremental sort - I just haven't tried doing it > because it wasn't necessary. In the worst case we could simply add a > Materialize node on top, no? Yeah, perhaps we could support mark/restore for incremental sort someday. This would allow us to apply incremental sort to the inner path of a merge join too. But if we apply a Material node on top of the inner due to the lack of mark/restore support for incremental sort, then we will need to compare two paths: partially sorted path -> incremental sort -> material VS. partially sorted path -> full sort I think it's hard to tell which is cheaper without a cost comparison, which we do not have for now. To help with the discussion, I drafted a WIP patch as attached, which chooses an incremental sort over a full sort on the given outer path of a mergejoin whenever possible. There is one ensuing plan diff in the regression tests (not included in the patch). It seems that some query in the tests begins to use incremental sort for mergejoin. Thanks Richard
v1-0001-Consider-explicit-incremental-sort-for-mergejoins.patch
Description: Binary data