On 7/16/25 20:18, Peter Geoghegan wrote:
> On Wed, Jul 16, 2025 at 1:42 PM Tomas Vondra <to...@vondra.me> wrote:
>> On 7/16/25 16:45, Peter Geoghegan wrote:
>>> I get that index characteristics could be the limiting factor,
>>> especially in a world where we're not yet eagerly reading leaf pages.
>>> But that in no way justifies just forgetting about prefetch distance
>>> like this.
>>>
>>
>> True. I think it's simply a matter of "no one really needed that yet",
>> so the read stream does not have a way to do that. I suspect Thomas
>> might have a WIP patch for that somewhere ...
>
> This seems really important.
>
> I don't fully understand why this appears to be less of a problem with
> the complex patch. Can you help me to confirm my understanding?
>
> I think that this "complex" patch code is relevant:
>
> static bool
> index_batch_getnext(IndexScanDesc scan)
> {
> ...
> /*
> * If we already used the maximum number of batch slots available, it's
> * pointless to try loading another one. This can happen for various
> * reasons, e.g. for index-only scans on all-visible table, or skipping
> * duplicate blocks on perfectly correlated indexes, etc.
> *
> * We could enlarge the array to allow more batches, but that's futile, we
> * can always construct a case using more memory. Not only it would risk
> * OOM, it'd also be inefficient because this happens early in the scan
> * (so it'd interfere with LIMIT queries).
> *
> * XXX For now we just error out, but the correct solution is to pause the
> * stream by returning InvalidBlockNumber and then unpause it by doing
> * read_stream_reset.
> */
> if (INDEX_SCAN_BATCH_FULL(scan))
> {
> DEBUG_LOG("index_batch_getnext: ran out of space for batches");
> scan->xs_batches->reset = true;
> }
>
> It looks like we're able to fill up quite a few batches/pages before
> having to give anything to the read stream. Is that all this is?
>
> We do still need to reset the read stream with the "complex" patch --
> I see that. But it's just much less of a frequent thing, presumably
> contributing to the performance advantages that we see for the
> "complex" patch over the "simple" patch from your testing. Does that
> seem like a fair summary?
>
Yes, sounds like a fair summary.
> BTW, don't think that we actually error-out here? Is that XXX comment
> block obsolete?
>
Right, obsolete comment.
>> So that's the other thing this probably needs to consider - some concept
>> of how much effort to invest into finding the next prefetchable block.
>
> I agree, of course. That's the main argument in favor of the "complex"
> design. Every possible cost/benefit is relevant (or may be), so one
> centralized decision that weighs all those factors seems like the way
> to go. We don't need to start with a very sophisticated approach, but
> I do think that we need a design that is orientated around this view
> of things from the start.
>
> The "simple" patch basically has all the same problems, but doesn't
> even try to address them. The INDEX_SCAN_BATCH_FULL thing is probably
> still pretty far from optimal, but at least all the pieces are there
> in one place. At least we're not leaving it up to chance index AM
> implementation details (i.e. leaf page boundaries) that have very
> little to do with heapam related costs/what really matters.
>
Perhaps, although I don't quite see why the simpler patch couldn't
address some of those problems (within the limit of a single leaf page,
of course). I don't think there's anything that's prevent collecting the
"details" somewhere (e.g. in the IndexScanDesc), and querying it from
the callbacks. Or something like that.
I understand you may see the "one leaf page" as a limitation of various
optimizations, and that's perfectly correct, ofc. I also saw it as a
crude limitation of how "bad" the things can go.
regards
--
Tomas Vondra