Re: PoC: prefetching data between executor nodes (e.g. nestloop + indexscan)

Peter Geoghegan Tue, 27 Aug 2024 16:18:32 -0700

On Tue, Aug 27, 2024 at 6:44 PM Tomas Vondra <[email protected]> wrote:
> > One reason to do it this way is because it cuts down on index descent
> > costs, and other executor overheads. But it is likely that it will
> > also make prefetching itself more effective, too -- just because
> > prefetching will naturally end up with fewer, larger batches of
> > logically related work.
> >
>
> Perhaps.


I expect this to be particularly effective whenever there is naturally
occuring locality. I think that's fairly common. We'll sort the SAOP
array on the nbtree side, as we always do.

> So nestloop would pass down multiple values, the inner subplan
> would do whatever it wants (including prefetching), and then return the
> matching rows, somehow?

Right.

> It's not very clear to me how would we return
> the tuples for many matches, but it seems to shift the prefetching
> closer to the "normal" index prefetching discussed elsewhere.

It'll be necessary to keep track of which outer side rows relate to
which inner-side array values (within a given batch/block). Some new
data structure will be needed to manage that book keeping.

Currently, we deduplicate arrays for SAOP scans. I suppose that it
works that way because it's not really clear what it would mean for
the scan to have duplicate array keys. I don't see any need to change
that for block nested loop join/whatever this is. We would have to use
the new data structure to "pair up" outer side tuples with their
associated inner side result sets, at the end of processing each
batch/block. That way we avoid repeating the same inner index scan
within a given block/batch -- a little like with a memoize node.

Obviously, that's the case where we can exploit naturally occuring
locality most effectively -- the case where multiple duplicate inner
index scans are literally combined into only one. But, as I already
touched on, locality will be important in a variety of cases, not just
this one.

-- 
Peter Geoghegan

Re: PoC: prefetching data between executor nodes (e.g. nestloop + indexscan)

Reply via email to