On Wed, Aug 28, 2024 at 9:46 AM Jakub Wartak <jakub.war...@enterprisedb.com> wrote: > As a somewhat tiny culprit of the self-flaming done by Robert (due to > nagging him about this in the past on various occasions), I'm of > course obligated to +1 to any work related to giving end-users/DBA the > ability to cage the plans generated by the optimizer.
Thanks. > When dealing with issues like those, I have a feeling we have 2 > classes of most frequent issues being reported (that's my subjective > experience): > a. cardinality misestimate leading usually to nest loop plans (e.g. > JOIN estimates thread [1] could also somehow help and it also has nice > reproducers) > b. issues after upgrades > > So the "selectivity estimation hook(s)" mentioned by Andrei seems to > be a must, but also the ability not to just guess & tweak (shape) the > plan, but a way to export all SQL plans before upgrade with capability > to import and override(lock) specific SQL query to specific plan from > before upgrade. I'm not against some kind of selectivity estimation hook in principle, but I don't know what the proposal is specifically, and I think it's separate from the topic of this thread. On the other hand, being able to force the same plans after an upgrade that you were getting before an upgrade is the kind of thing that I'd like to enable with the infrastructure proposed here. I do not propose to put something like that in core, at least not any time soon, but I'd like to have the infrastructure be good enough that people can try to do it in an extension module and learn from how it turns out. Ever since I read https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf I have believed that the cardinality misestimate leading to nested loop plans is just because we're doing something dumb. They write: "When looking at the queries that did not finish in a reasonable time using the estimates, we found that most have one thing in common: PostgreSQL’s optimizer decides to introduce a nestedloop join (without an index lookup) because of a very low cardinality estimate, whereas in reality the true cardinality is larger. As we saw in the previous section, systematic underestimation happens very frequently, which occasionally results in the introduction of nested-loop joins. [...] if the cost estimate is 1,000,000 with the nested-loop join algorithm and 1,000,001 with a hash join, PostgreSQL will always prefer the nested-loop algorithm even if there is a equality join predicate, which allows one to use hashing. [...] given the fact that underestimates are quite frequent, this decision is extremely risky. And even if the estimates happen to be correct, any potential performance advantage of a nested-loop join in comparison with a hash join is very small, so taking this high risk can only result in a very small payoff. Therefore, we disabled nested-loop joins (but not index-nestedloop joins) in all following experiments." We don't even have an option to turn off that kind of join, and we could probably avoid a lot of pain if we did. This, too, is mostly separate from the topic of this thread, but I just can't believe we've chosen to do literally nothing about this given that we all know this specific thing hoses everybody, everywhere, all the time. > I'm not into the internals of optimizer at all, but here are other > random thoughts/questions: > - I do think that "hints" words have bad connotations and should not > be used. It might be because of embedding them in SQL query text of > the application itself. On one front they are localized to the SQL > (good), but the PG operator has no realistic way of altering that once > it's embedded in binary (bad), as the application team is usually > separate if not from an external company (very bad situation, but > happens almost always). I haven't quite figured out whether the problem is that hints are actually bad or whether it's more that we just hate saying the word hints. The reason I'm talking about hints here is mostly because that's how other systems let users control the query planner. If we want to let extensions control the query planner, we need to know in what ways it needs to be controllable, and looking to hints in other systems is one way to understand what would be useful. As far as having hints in PostgreSQL, which admittedly is not really the topic of this thread either, one objection is that we should just make the query planner instead, and I used to believe that, but I no longer do, because I've been doing this PostgreSQL thing for 15+ years and we're not much closer to a perfect query planner that never makes any mistakes than we were when I started. It's not really clear that perfection is possible, but it's extremely clear that we're not getting there any time soon. Another objection to hints is that they require modifying the query text, which does indeed suck but it doesn't mean they're useless either. There are also schemes that put them out of line, including pg_hint_plan's optional use of a hint table. Yet another objection is that you should fix cardinalities instead of controlling the plan manually, and I agree that's often a better solution, but it again does not mean that using a hint is never defensible in any situation. I think we've become so negative about hints that we rarely have a rational discussion about them. I'm no more keen to see every PostgreSQL query in the universe decorated with a bunch of hints than anyone else here, but I also don't enjoy telling a customer "hey, I know this query started misbehaving in the middle of the night on Christmas, but hints are bad and we shouldn't ever have them so you'd better get started on redesigning your schema or alternatively you can just have your web site be down for the next 20 years while we try to improve the optimizer." I don't know what the right solution(s) are exactly but it's insane not to have some kind of pressure relief valve that can be used in case of emergency. > - Would stacking of such extensions, each overriding the same planner > hooks, be allowed or not in the long run ? Technically there's nothing > preventing it and I think I could imagine someone attempting to run > multiple planner hook hotfixes for multiple issues, all at once? I suspect this would tend to work poorly in practice, but there might be specific cases where it works OK. It's usually best if only one person is steering a given vehicle at a time, and so here. But there's no intrinsic reason you couldn't use multiple extensions at once if you happen to have multiple extensions that use the hooks in mutually compatible ways. > - Shouldn't EXPLAIN contain additional information that for that > specific plan the optimizer hooks changed at least 1 thing ? (e.g. > "Plan was tainted" or something like that). Maybe extension could mark > it politely that it did by setting a certain flag, or maybe there > should be routines exposed by the core to do that ? This could be really useful when things go wrong and someone is trying to figure out from an EXPLAIN ANALYZE output what in the world happened. I'm not sure exactly what makes sense to do here but I think we should come back to this topic after we've settled some of the basics. -- Robert Haas EDB: http://www.enterprisedb.com