I think that it is worth paying more than nothing to avoid plans that are so far from optimal that they might as well take forever to execute
I recently came across this article from 2016 that expounded upon a bad plan of the sort discussed in this thread: https://heap.io/blog/when-to-avoid-jsonb-in-a-postgresql-schema (The proximate cause in this case was Postgresql not collecting statistics for fields in a JSONB column, estimating rowcount of 1, and thus creating a pathological slowdown.) –Mike On Tue, Jun 22, 2021 at 7:37 PM, Peter Geoghegan <p...@bowt.ie> wrote: > On Tue, Jun 22, 2021 at 2:53 AM Tomas Vondra > <tomas.von...@enterprisedb.com> wrote: > > Yeah, I like the insurance analogy - it gets to the crux of the problem, > because insurance is pretty much exactly about managing risk. > > The user's exposure to harm is what truly matters. I admit that that's > very hard to quantify, but we should at least try to do so. > > We sometimes think about a plan that is 10x slower as if it's infinitely > slow, or might as well be. But it's usually not like that > -- it is generally meaningfully much better than the plan being 100x > slower, which is itself sometimes appreciably better than 1000x slower. And > besides, users often don't get anything like the optimal plan, even on what > they would consider to be a good day (which is most days). So maybe 10x > slower is actually the baseline good case already, without anybody knowing > it. Most individual queries are not executed very often, even on the > busiest databases. The extremes really do matter a lot. > > If a web app or OLTP query is ~10x slower than optimal then it might be > the practical equivalent of an outage that affects the query alone > (i.e. "infinitely slow") -- but probably not. I think that it is worth > paying more than nothing to avoid plans that are so far from optimal that > they might as well take forever to execute. This is not meaningfully > different from a database outage affecting one particular query. It kind of > is in a category of its own that surpasses "slow plan", albeit one that is > very difficult or impossible to define formally. > > There may be a huge amount of variation in risk tolerance among basically > reasonable people. For example, if somebody chooses to engage in some kind > of extreme sport, to me it seems understandable. It's just not my cup of > tea. Whereas if somebody chooses to never wear a seatbelt while driving, > then to me they're simply behaving foolishly. They're not willing to incur > the tiniest inconvenience in order to get a huge reduction in potential > harm -- including a very real risk of approximately the worst thing that > can happen to you. Sure, refusing to wear a seatbelt can theoretically be > classified as just another point on the risk tolerance spectrum, but that > seems utterly contrived to me. Some things (maybe not that many) really are > like that, or can at least be assumed to work that way as a practical > matter. > > But making > everything slower will be a hard sell, because wast majority of workloads > already running on Postgres don't have this issue at all, so for them it's > not worth the expense. > > I think that we're accepting too much risk here. But I bet it's also true > that we're not taking enough risk in other areas. That was really my point > with the insurance analogy -- we can afford to take lots of individual > risks as long as they don't increase our exposure to truly disastrous > outcomes -- by which I mean queries that might as well take forever to > execute as far as the user is concerned. (Easier said than done, of > course.) > > A simple trade-off between fast and robust doesn't seem like a universally > helpful thing. Sometimes it's a very unhelpful way of looking at the > situation. If you make something more robust to extreme bad outcomes, then > you may have simultaneously made it *faster* (not slower) for all practical > purposes. This can happen when the increase in robustness allows the user > to tune the system aggressively, and only take on new risks that they can > truly live with (which wouldn't have been possible without the increase in > robustness). For example, I imagine that the failsafe mechanism added to > VACUUM will actually make it possible to tune VACUUM much more aggressively > -- it might actually end up significantly improving performance for all > practical purposes, even though technically it has nothing to do with > performance. > > Having your indexes a little more bloated because the failsafe kicked-in > is a survivable event -- the DBA lives to fight another day, and *learns* > to tune vacuum/the app so it doesn't happen again and again. An > anti-wraparound failure is perhaps not a survivable event -- the DBA gets > fired. This really does seem like a fundamental difference to me. > > Following the insurance analogy, > selling tornado insurance in Europe is mostly pointless. > > Principled skepticism of this kind of thing is of course necessary and > welcome. It *could* be taken too far. > > And the lack of data also plays role - the insurance company will ask for > higher rates when it does not have enough accurate data about the > phenomenon, or when there's a lot of unknowns. Maybe this would allow some > basic measure of uncertainty, based on the number and type of restrictions, > joins, etc. > > I don't think that you can really model uncertainty. But you can have true > certainty (or close to it) about a trade-off that makes the system > fundamentally more robust over time. You can largely be certain about both > the cost of the insurance, as well as how it ameliorates the problem in at > least some cases. > > So maybe some fairly rough measure of uncertainty might work, and the user > might specify how much risk it's willing to tolerate. > > I think that most or all of the interesting stuff is where you have this > extreme asymmetry -- places where it's much more likely to be true that > basically everybody wants that. Kind of like wearing seatbelts -- things > that we really can claim are a universal good without too much controversy. > There might be as few as one or two things in the optimizer that this could > be said of. But they matter. > > -- > Peter Geoghegan >