On Tue, Mar 31, 2020 at 7:56 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > On Tue, Mar 31, 2020 at 07:09:04PM -0400, James Coleman wrote: > >On Tue, Mar 31, 2020 at 6:54 PM Tomas Vondra > ><tomas.von...@2ndquadrant.com> wrote: > >> > >> On Tue, Mar 31, 2020 at 06:35:32PM -0400, Tom Lane wrote: > >> >Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > >> >> In general, I think it'd be naive that we can make planner smarter with > >> >> no extra overhead spent on planning, and we can never accept patches > >> >> adding even tiny overhead. With that approach we'd probably end up with > >> >> a trivial planner that generates just a single query plan, because > >> >> that's going to be the fastest planner. A realistic approach needs to > >> >> consider both the planning and execution phase, and benefits of this > >> >> patch seem to be clear - if you have queries that do benefit from it. > >> > > >> >I think that's kind of attacking a straw man, though. The thing that > >> >people push back on, or should push back on IMO, is when a proposed > >> >patch adds significant slowdown to queries that it has no or very little > >> >hope of improving. The trick is to do expensive stuff only when > >> >there's a good chance of getting a better plan out of it. > >> > > >> > >> Yeah, I agree with that. I think the main issue is that we don't really > >> know what the "expensive stuff" is in this case, so it's not really > >> clear how to be smarter :-( > > > >To add to this: I agree that ideally you'd check cheaply to know > >you're in a situation that might help, and then do more work. But here > >the question is always going to be simply "would we benefit from an > >ordering, and, if so, do we have it already partially sorted". It's > >hard to imagine that reducing much conceptually, so we're left with > >optimizations of that check. > > > > I think it depends on what exactly is the expensive part. For example if > it's the construction of IncrementalSort paths, then maybe we could try > do a quick/check check if the path can even be useful by estimating the > cost and only then building the path. > > That's what we do for join algorithms, for example - we first compute > initial_cost_nestloop and only when that seems cheap enough we do the > more expensive stuff. > > But I'm not sure the path construction is the expensive part, as it > should be disabled by enable_incrementalsort=off. But the regression > does not seem to disappear, at least not entirely. > > > >> One possibility is that it's just one of those regressions due to change > >> in binary layout, but I'm not sure know how to verify that. > > > >If we are testing with a case that can't actually add more paths (due > >to it checking the guc before building them), doesn't that effectively > >leave one of these two options: > > > >1. Binary layout/cache/other untraceable change, or > >2. Changes due to refactored function calls. > > > > Hmm, but in case of (1) the overhead should be there even with tests > that don't really have any additional paths to consider, right? I've > tried with such test (single table with no indexes) and I don't quite > see any regression (maybe ~1%).
Not necessarily, if the cost is in sort costing or useful pathkeys checking, right? We have run that code even without incremental sort, but it's changed from master. > (2) might have impact, but I don't see any immediate supects. Did you > have some functions in mind? I guess this is where the lines blur: I didn't see anything obvious either, but the changes to sort costing...should probably not have real impact...but... > BTW I see the patch adds pathkeys_common but it's not called from > anywhere. It's probably leftover from an earlier patch version. > > >There's not anything obvious in point (2) that would be a big cost, > >but there are definitely changes there. I was surprised that just > >eliminating the loop through the pathkeys on the query and the index > >was enough to save us ~4%. > > > >Tomas: Earlier you'd wondered about if we should try to shortcut the > >changes in costing...I was skeptical of that originally, but maybe > >it's worth looking into? I'm going to try backing that out and see > >what the numbers look like. > > BTW, I did this test, and it looks like we can get back something close to 1% by reverting that initial fix on partial path costing. But we can't get rid of it all the time, at the very least. *Maybe* we could condition it on incremental sort, but I'm not convinced that's the only place it's needed as a fix. > I've described the idea about something like initial_cost_nestloop and > so on. But I'm a bit skeptical about it, considering that the GUC only > has limited effect. > > > A somewhat note is that the number of indexes has pretty significant > impact on planning time, even on master. This is timing of the same > explain script (similar to the one shown before) with different number > of indexes on master: > > 0 indexes 7 indexes 49 indexes > ------------------------------------------- > 10.85 12.56 27.83 > > The way I look at incremental sort is that it allows using indexes for > queries that couldn't use it before, possibly requiring a separate > index. So incremental sort might easily reduce the number of indexes > needed, compensating for the overhead we're discussing here. Of course, > that may or may not be true. One small idea, but I'm not yet sure it helps us a whole lot: if the query pathkeys is only length 1, then we could skip the additional path creation. James