Re: Use of additional index columns in rows filtering

Peter Geoghegan Wed, 09 Aug 2023 09:54:53 -0700

On Wed, Aug 9, 2023 at 9:05 AM Tomas Vondra
<[email protected]> wrote:
> But in the example you shared yesterday, the problem is not really about
> visibility checks. In fact, the index scan costing completely ignores
> the VM checks - it didn't matter before, and the patch did not change
> this. It's about the number of rows the index scan is expected to
> produce - and those will always do a random I/O, we can't skip those.


I wasn't really talking about that example here.

As I see it, the problem from my example is that plain index scans had
an "unnatural" advantage over bitmap index scans. There was no actual
reason why the system couldn't just deal with the inequality on the
second column uniformly, so that index scans and bitmap index scans
both filtered out all non-matches inexpensively, without heap access.
Then the costing could have been quite off, and it really wouldn't
have mattered at runtime, because the index scan and bitmap index scan
would do approximately the same thing in any case.

As I've said, it's obviously also true that there are many other cases
where there really will be a "natural" advantage for index scans, that
bitmap index scans just cannot hope to offer. These are the cases
where the mechanism from your patch is best placed to be the thing
that avoids heap accesses, or maybe even avoid all visibility checks
(despite not using true index quals).

> Understood. I think this whole discussion is about figuring out these
> trade offs and also how to divide the various improvements into "minimum
> viable" changes.

That's exactly how I see it myself.

Obviously, there is still plenty of gray area here -- cases where it's
not at all clear whether or not we should rely on the mechanism from
your patch, or whether we should provide some alternative, more
specialized mechanism. For example, I've made a lot out of simple !=
inequalities recently, but it's natural to wonder what that might mean
for NOT IN ( ... ) SAOP inequalities. Am I also going to add
specialized code that passes those down to the index AM? Where do you
draw the line?

I totally accept that there is a significant amount of gray area, and
that that's likely to remain true for the foreseeable future. But I
also believe that there is a small but important number of things that
are either exactly black or exactly white. If we can actually firmly
agree on what these other things are in days or weeks (which seems
quite doable), then we'll have the right framework for figuring
everything else out over time (possibly over multiple releases). We'll
at least have the right shared vocabulary for discussing the problems,
which is a very good start. I want to have a general structure that
has the right general concepts in place from the start -- that's all.

I also suspect that we'll discover that the large amount of gray area
clauses/items are those that tend to be far less important than
"exactly black" and "exactly white" items. So even if we can only
agree that a small handful of things are in either category, that
small handful will likely be very overrepresented in real world
queries. For example, simple inequalities are very common -- it's
surprising that nbtree can't already handle them directly. I should
have thought of this myself, long ago, but it took your patch to force
me to think about it.

The problem with simple inequalities was "hiding in plain sight" for a
very long time. Could there be anything else like that?

-- 
Peter Geoghegan

Re: Use of additional index columns in rows filtering

Reply via email to