Re: index prefetching

Peter Geoghegan Wed, 16 Jul 2025 11:19:47 -0700

On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra <to...@vondra.me> wrote:
> On 7/16/25 16:45, Peter Geoghegan wrote:
> > I get that index characteristics could be the limiting factor,
> > especially in a world where we're not yet eagerly reading leaf pages.
> > But that in no way justifies just forgetting about prefetch distance
> > like this.
> >
>
> True. I think it's simply a matter of "no one really needed that yet",
> so the read stream does not have a way to do that. I suspect Thomas
> might have a WIP patch for that somewhere ...


This seems really important.

I don't fully understand why this appears to be less of a problem with
the complex patch. Can you help me to confirm my understanding?

I think that this "complex" patch code is relevant:

static bool
index_batch_getnext(IndexScanDesc scan)
{
    ...
    /*
     * If we already used the maximum number of batch slots available, it's
     * pointless to try loading another one. This can happen for various
     * reasons, e.g. for index-only scans on all-visible table, or skipping
     * duplicate blocks on perfectly correlated indexes, etc.
     *
     * We could enlarge the array to allow more batches, but that's futile, we
     * can always construct a case using more memory. Not only it would risk
     * OOM, it'd also be inefficient because this happens early in the scan
     * (so it'd interfere with LIMIT queries).
     *
     * XXX For now we just error out, but the correct solution is to pause the
     * stream by returning InvalidBlockNumber and then unpause it by doing
     * read_stream_reset.
     */
    if (INDEX_SCAN_BATCH_FULL(scan))
    {
        DEBUG_LOG("index_batch_getnext: ran out of space for batches");
        scan->xs_batches->reset = true;
    }

It looks like we're able to fill up quite a few batches/pages before
having to give anything to the read stream. Is that all this is?

We do still need to reset the read stream with the "complex" patch --
I see that. But it's just much less of a frequent thing, presumably
contributing to the performance advantages that we see for the
"complex" patch over the "simple" patch from your testing. Does that
seem like a fair summary?

BTW, don't think that we actually error-out here? Is that XXX comment
block obsolete?

> So that's the other thing this probably needs to consider - some concept
> of how much effort to invest into finding the next prefetchable block.

I agree, of course. That's the main argument in favor of the "complex"
design. Every possible cost/benefit is relevant (or may be), so one
centralized decision that weighs all those factors seems like the way
to go. We don't need to start with a very sophisticated approach, but
I do think that we need a design that is orientated around this view
of things from the start.

The "simple" patch basically has all the same problems, but doesn't
even try to address them. The INDEX_SCAN_BATCH_FULL thing is probably
still pretty far from optimal, but at least all the pieces are there
in one place. At least we're not leaving it up to chance index AM
implementation details (i.e. leaf page boundaries) that have very
little to do with heapam related costs/what really matters.

-- 
Peter Geoghegan

Re: index prefetching

Reply via email to