Re: index prefetching

Tomas Vondra Wed, 16 Jul 2025 12:00:53 -0700

On 7/16/25 20:18, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra <to...@vondra.me> wrote:
>> On 7/16/25 16:45, Peter Geoghegan wrote:
>>> I get that index characteristics could be the limiting factor,
>>> especially in a world where we're not yet eagerly reading leaf pages.
>>> But that in no way justifies just forgetting about prefetch distance
>>> like this.
>>>
>>
>> True. I think it's simply a matter of "no one really needed that yet",
>> so the read stream does not have a way to do that. I suspect Thomas
>> might have a WIP patch for that somewhere ...
> 
> This seems really important.
> 
> I don't fully understand why this appears to be less of a problem with
> the complex patch. Can you help me to confirm my understanding?
> 
> I think that this "complex" patch code is relevant:
> 
> static bool
> index_batch_getnext(IndexScanDesc scan)
> {
>     ...
>     /*
>      * If we already used the maximum number of batch slots available, it's
>      * pointless to try loading another one. This can happen for various
>      * reasons, e.g. for index-only scans on all-visible table, or skipping
>      * duplicate blocks on perfectly correlated indexes, etc.
>      *
>      * We could enlarge the array to allow more batches, but that's futile, we
>      * can always construct a case using more memory. Not only it would risk
>      * OOM, it'd also be inefficient because this happens early in the scan
>      * (so it'd interfere with LIMIT queries).
>      *
>      * XXX For now we just error out, but the correct solution is to pause the
>      * stream by returning InvalidBlockNumber and then unpause it by doing
>      * read_stream_reset.
>      */
>     if (INDEX_SCAN_BATCH_FULL(scan))
>     {
>         DEBUG_LOG("index_batch_getnext: ran out of space for batches");
>         scan->xs_batches->reset = true;
>     }
> 
> It looks like we're able to fill up quite a few batches/pages before
> having to give anything to the read stream. Is that all this is?
> 
> We do still need to reset the read stream with the "complex" patch --
> I see that. But it's just much less of a frequent thing, presumably
> contributing to the performance advantages that we see for the
> "complex" patch over the "simple" patch from your testing. Does that
> seem like a fair summary?
> 

Yes, sounds like a fair summary.

> BTW, don't think that we actually error-out here? Is that XXX comment
> block obsolete?
> 

Right, obsolete comment.

>> So that's the other thing this probably needs to consider - some concept
>> of how much effort to invest into finding the next prefetchable block.
> 
> I agree, of course. That's the main argument in favor of the "complex"
> design. Every possible cost/benefit is relevant (or may be), so one
> centralized decision that weighs all those factors seems like the way
> to go. We don't need to start with a very sophisticated approach, but
> I do think that we need a design that is orientated around this view
> of things from the start.
> 
> The "simple" patch basically has all the same problems, but doesn't
> even try to address them. The INDEX_SCAN_BATCH_FULL thing is probably
> still pretty far from optimal, but at least all the pieces are there
> in one place. At least we're not leaving it up to chance index AM
> implementation details (i.e. leaf page boundaries) that have very
> little to do with heapam related costs/what really matters.
> 

Perhaps, although I don't quite see why the simpler patch couldn't
address some of those problems (within the limit of a single leaf page,
of course). I don't think there's anything that's prevent collecting the
"details" somewhere (e.g. in the IndexScanDesc), and querying it from
the callbacks. Or something like that.

I understand you may see the "one leaf page" as a limitation of various
optimizations, and that's perfectly correct, ofc. I also saw it as a
crude limitation of how "bad" the things can go.


regards

-- 
Tomas Vondra
Re: index prefetching

Reply via email to