On Sun, Aug 6, 2023 at 10:23 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > The index AM is entitled to make certain assumptions of opclass > > members -- assumptions that cannot be made during expression > > evaluation.
> Thanks for reminding me, I keep forgetting about this. I was almost certain that you already knew that, actually. It's easy to forget such details in a discussion like this one, where the focus zooms out and then zooms back in, again and again. > Would it be possible to inspect the expression and determine if it's > "safe" to be treated almost like an index qual? Say, given a complex > expression, we might check if it consists only of expressions that could > be treated as an index qual. So a bit like leak-proof, which may > actually be relevant here too I guess (e.g. int4div is not leak-proof, > for example, so e.g. the division-by-zero would not allow index-qual > treatment). Clearly you're talking about a distinct set of guarantees to the ones that B-Tree opclasses make about not throwing errors when scanning maybe-not-visible index tuples. The B-Tree opclass guarantees might not even be written down anywhere -- they seem like common sense, almost. What you've described definitely seems like it could be very useful, but I don't think that it solves the fundamental problem with cases like the BitmapOr plan. Even if you do things in a way that precludes the possibility of extra heap page accesses (when the VM bit isn't set), you still have the problem of "access predicates vs index filter predicates". Which is a big problem, in and of itself. > Anyway, I think there are always going to be clauses that would be safe > to evaluate on the index, but the index AM does not know to handle them > for some reason. For example it might require extending the AM to handle > generic expressions, which doesn't seem quite desirable. Actually, I mostly don't think we'd need to teach nbtree or other index AMs anything about simplifying expressions. Structurally, we should try to make things like ScalarArrayOpExpr into "just another indexable operator", which has little to no difference with any other indexable operator at runtime. There probably are several good reasons why "normalizing to CNF" in the planner is a good idea (to some extent we do this already). Alena Rybakina's OR-to-SAOP transformation patch was written well before anybody knew about the MDAM/SAOP patch I'm working on. The original motivation was to lower expression evaluation overhead. You could probably find a third of even a fourth reason to do that specific transformation, if you thought about it for a while. Top-down planner designs such as Cascades really have to spend a lot of time on this kind of normalization process. For very general reasons -- many of which are bound to apply in our own bottom-up planner design. > So I think I see three points where we could evaluate expressions: > > 1) in the AM, as access preditates / index quals (doing this more often > is kinda what the SAOP patches aim to do) > > 2) in the index scan, before checking VM / visibility (if the qual is > safe to be evaluated early) > > 3) in the index scan, after checking VM / visibility (if the expression > does unsafe things) Agreed. Presumably there would also be a class of expressions that the patch should make into index filters rather than table filters, despite being unable to determine whether they're safe to evaluate early. Even if there is only a small chance of it helping at runtime, there is no reason (or infinitesimally small reason) to not to just do prefer index filters where possible -- so it's desirable to always prefer index filters, regardless of opclass/type restrictions on early evaluation. Right? Assuming the answer is yes, then I think that you still need all of the index-only scan stuff that can "fallback to heap access", just to be able to cover this other class of expression. I don't think that this class that I've described will be rarely encountered, or anything. > I agree. That seems like a discussion relevant to the general topic of > "upgrading" clauses. If anything, the patch may need to worry about > moving table filters to index filters, that's the only thing it does. Obviously that will have indirect consequences due to the changes in the costing. > > It is always better to make what could be an "index filter" into an > > index qual. Of course, the current practical problem for you is > > figuring out how to deal with that in cases like the BitmapOr case. > > Since it's not as if the planner is wrong, exactly...it really is the > > cheapest plan, so the planner is at least right on its own terms. I am > > interested in finding a solution to that problem. > > > > Well, if the planner is not wrong, what solution are we looking for? ;-) I imagine that you really don't want to have to rely on some wishy-washy philosophical argument about the planner's expectation being the only reasonable basis for choosing a plan. Just as I don't want to have to rely on some similarly hand-wavy argument about risk. The ideal outcome is one that doesn't require any of that, from either of us. I believe that the patch that I'm working on can allow us to totally avoid it. I hesitate to say this, since it might sound like I'm imposing conditions in a self-interested way. AFAICT it really does provide us with a practical way of just not having to go down the road that nobody wants to go down. So I am just about ready to say that I believe that that will end up being the solution we use. It just seems to make sense. By normalizing to CNF, the planner is given the ability to work with higher-level index paths, that abstract-away inessential "physical plan" differences. Suppose, for example, we're building index paths for a scan that comes from an SAOP that was generated through OR transformation. But, it's an index AM that lacks native support for SAOPs -- not nbtree. That index path will still end up using a BitmapOr, in the end, since it'll ultimately have to compensate for the lack of index AM infrastructure. So the final "physical plan" will be exactly the same as today -- the OR transformation will actually have changed nothing about the physical plan in these sorts of cases. The CNF transformation process just puts what was already true ("these two plans are logically equivalent") on a more formal footing -- and so implicitly avoids "risky plans" like the one we've discussed. We'll only be relying on the nbtree work to get those small efficiencies that come from avoiding duplicative primitive index scans. Since that was never actually a goal of your patch to begin with, limiting that benefit to nbtree scans (where we can do it without novel risks) seems more than acceptable. Since you're not relying on the nbtree work at all here, really (just on the transformation process itself), the strategic risk that this adds to your project isn't too great. It's not like this ties the success of your patch to the success of my own patch. At most it ties the success of your patch to something like Alena Rybakina's OR-to-SAOP transformation patch, which seems manageable to me. (To be clear, I'm not relying on that work in the same way myself -- for my patch the transformation process is just a nice-to-have.) > FWIW if the problem is the patch may make the new plan look cheaper than > some "actually better" plan (e.g. the BitmapOr one). In that case, we > could just keep the old costing (kinda assuming the worst case, but as > you said, the benefits are limited, while the risks are arbitrary). > That's the only idea I have. That's almost ideal for my patch. I should be able to pursue my patch without concern about your patch interfering too much -- presumably my patch will be able to generate the cheapest plan in cases like the BitmapOr case. It'll at least be slightly cheaper in physical reality, and now you'll artificially penalize the paths that your patch generates -- so cases like the BitmapOr case should do the right thing by seeing the SAOP path as cheaper during planning. But is that approach really ideal for your patch? I doubt it. Why wouldn't we want to give the index paths with index filters credit for being cheaper in almost all cases? That's why this doesn't feel like a costing issue to me (it's more of a "just don't do that" issue, I think). Your patch seems too important to nerf like this, even if it's convenient in some ways. -- Peter Geoghegan