On Wed, Aug 9, 2023 at 9:05 AM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > But in the example you shared yesterday, the problem is not really about > visibility checks. In fact, the index scan costing completely ignores > the VM checks - it didn't matter before, and the patch did not change > this. It's about the number of rows the index scan is expected to > produce - and those will always do a random I/O, we can't skip those.
I wasn't really talking about that example here. As I see it, the problem from my example is that plain index scans had an "unnatural" advantage over bitmap index scans. There was no actual reason why the system couldn't just deal with the inequality on the second column uniformly, so that index scans and bitmap index scans both filtered out all non-matches inexpensively, without heap access. Then the costing could have been quite off, and it really wouldn't have mattered at runtime, because the index scan and bitmap index scan would do approximately the same thing in any case. As I've said, it's obviously also true that there are many other cases where there really will be a "natural" advantage for index scans, that bitmap index scans just cannot hope to offer. These are the cases where the mechanism from your patch is best placed to be the thing that avoids heap accesses, or maybe even avoid all visibility checks (despite not using true index quals). > Understood. I think this whole discussion is about figuring out these > trade offs and also how to divide the various improvements into "minimum > viable" changes. That's exactly how I see it myself. Obviously, there is still plenty of gray area here -- cases where it's not at all clear whether or not we should rely on the mechanism from your patch, or whether we should provide some alternative, more specialized mechanism. For example, I've made a lot out of simple != inequalities recently, but it's natural to wonder what that might mean for NOT IN ( ... ) SAOP inequalities. Am I also going to add specialized code that passes those down to the index AM? Where do you draw the line? I totally accept that there is a significant amount of gray area, and that that's likely to remain true for the foreseeable future. But I also believe that there is a small but important number of things that are either exactly black or exactly white. If we can actually firmly agree on what these other things are in days or weeks (which seems quite doable), then we'll have the right framework for figuring everything else out over time (possibly over multiple releases). We'll at least have the right shared vocabulary for discussing the problems, which is a very good start. I want to have a general structure that has the right general concepts in place from the start -- that's all. I also suspect that we'll discover that the large amount of gray area clauses/items are those that tend to be far less important than "exactly black" and "exactly white" items. So even if we can only agree that a small handful of things are in either category, that small handful will likely be very overrepresented in real world queries. For example, simple inequalities are very common -- it's surprising that nbtree can't already handle them directly. I should have thought of this myself, long ago, but it took your patch to force me to think about it. The problem with simple inequalities was "hiding in plain sight" for a very long time. Could there be anything else like that? -- Peter Geoghegan