Re: BitmapHeapScan streaming read user and prelim refactoring

Melanie Plageman Mon, 18 Mar 2024 07:47:31 -0700

On Sun, Mar 17, 2024 at 3:21 PM Tomas Vondra
<[email protected]> wrote:
>
> On 3/14/24 22:39, Melanie Plageman wrote:
> > On Thu, Mar 14, 2024 at 5:26 PM Tomas Vondra
> > <[email protected]> wrote:
> >>
> >> On 3/14/24 19:16, Melanie Plageman wrote:
> >>> On Thu, Mar 14, 2024 at 03:32:04PM +0200, Heikki Linnakangas wrote:
> >>>> ...
> >>>>
> >>>> Ok, committed that for now. Thanks for looking!
> >>>
> >>> Attached v6 is rebased over your new commit. It also has the "fix" in
> >>> 0010 which moves BitmapAdjustPrefetchIterator() back above
> >>> table_scan_bitmap_next_block(). I've also updated the Streaming Read API
> >>> commit (0013) to Thomas' v7 version from [1]. This has the update that
> >>> we theorize should address some of the regressions in the bitmapheapscan
> >>> streaming read user in 0014.
> >>>
> >>
> >> Should I rerun the benchmarks with these new patches, to see if it
> >> really helps with the regressions?
> >
> > That would be awesome!
> >
>
> OK, here's a couple charts comparing the effect of v6 patches to master.
> These are from 1M and 10M data sets, same as the runs presented earlier
> in this thread (the 10M is still running, but should be good enough for
> this kind of visual comparison).


Thanks for doing this!

> What is even more obvious is that 0014 behaves *VERY* differently. I'm
> not sure if this is a good thing or a problem is debatable/unclear. I'm
> sure we don't want to cause regressions, but perhaps those are due to
> the prefetch issue discussed elsewhere in this thread (identified by
> Andres and Melanie). There are also many cases that got much faster, but
> the question is whether this is due to better efficiency or maybe the
> new code being more aggressive in some way (not sure).

Are these with the default effective_io_concurrency (1)? If so, the
"effective" prefetch distance in many cases will be higher with the
streaming read code applied. With effective_io_concurrency 1,
"max_ios" will always be 1, but the number of blocks prefetched may
exceed this (up to MAX_BUFFERS_PER_TRANSFER) because the streaming
read code is always trying to build bigger IOs. And, if prefetching,
it will prefetch IOs not yet in shared buffers before reading them.

It's hard to tell without going into a specific repro why this would
cause some queries to be much slower. In the forced bitmapheapscan, it
would make sense that more prefetching is worse -- which is why a
bitmapheapscan plan wouldn't have been chosen. But in the optimal
cases, it is unclear why it would be worse.

I don't think there is any way it could be the issue Andres
identified, because there is only one iterator. Nothing to get out of
sync. It could be that the fadvises are being issued too close to the
reads and aren't effective enough at covering up read latency on
slower, older hardware. But that doesn't explain why master would
sometimes be faster.

Probably the only thing we can do is get into a repro. It would, of
course, be easiest to do this with a serial query. I can dig into the
scripts you shared earlier and try to find a good repro. Because the
regressions may have shifted with Thomas' new version, it would help
if you shared a category (cyclic/uniform/etc, parallel or serial, eic
value, work mem, etc) where you now see the most regressions.

- Melanie

Re: BitmapHeapScan streaming read user and prelim refactoring

Reply via email to