On 9/13/24 13:18, Richard Guo wrote: > On Mon, Sep 9, 2024 at 6:22 PM Tomas Vondra <to...@vondra.me> wrote: >> I have not thought about this particular case too much, but how likely >> is it that mergejoin will win for this plan in practice? If we consider >> a query that only needs a fraction of rows (say, thanks to LIMIT), >> aren't we likely to pick a nested loop (which can do incremental sort >> for the outer plan)? For joins that need to run to completion it might >> be a win, but there's also the higher risk of a poor costing. > > I think one situation where mergejoin tends to outperform hashjoin and > nestloop is when ORDER BY clauses are present. For example, for the > query below, mergejoin runs much faster than hashjoin and nestloop, > and mergejoin with incremental sort is even faster than mergejoin with > full sort. >
Sure, an incremental sort can improve things if things go well, no doubt about that. But how significant can the improvement be, especially if we need to sort everything? In your example it's ~15%, which is nice, and maybe the benefits can be much larger if the incremental sort can do everything in memory, without flushing to disk. But what about the risks? What will happen if the groups are not this uniformly and independently sized, and stuff like that? Maybe it'll be costed well enough, I haven't tried. I don't recall the exact reasoning for why we didn't add incremental sort in various places, I'd have to dig into the old threads, or something. But I believe thinking about these risks was part of it - trying to handle places where the potential benefits are much larger than the risks. As I wrote earlier, it's not my intent to discourage you from working on this. Please do, I'm sure it can be improved. regards -- Tomas Vondra