On Wed, May 21, 2025 at 10:09 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > > > On Tue, May 20, 2025 at 6:59 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: >> >> While I haven't identified how exactly read stream is related to this >> issue, what I've observed through debugging this issue is, during a >> single read_stream_next_buffer() call, I observed that >> heap_vac_scan_next_block() is invoked multiple times to return blocks >> to the read stream and that we continued processing eagerly scanned >> pages even after the success counter reaches zero while processing the >> previous block. Even when I set both io_combine_limit and >> maintenance_io_concurrency to 1, the debug logs showed that vacuum >> still performs eager scanning two blocks ahead of the current block >> being processed. > > > I spent some more time looking into this today. This assert no longer works > with streaming vacuum. > Even with the minimums: io_combine_limit 1 and maintenance_io_concurrency set > to 0, we will still invoke read_stream_get_block() one extra time at the > start of the read stream. > > To your point about eagerly scanning beyond the success and failure caps, it > is possible, but it is hard to get into the situation where you far exceed > them. > > Though max_pinned_buffers is derived from io_combine_limit * > maintenance_io_concurrency, it is limited by the size of the buffer access > strategy ring (by default 2 MB for vacuum) and the size of shared buffers. > And because we won't prefetch if sequential access is detected (and the > prefetch distance has to first ramp up), it is unlikely we will both prefetch > the maximum distance of randomly located blocks and combine blocks to the > maximum IO size before returning to vacuum code from read stream code. That > is to say, the maximum number of extra all-visible blocks scanned should not > be very large. > > We ran into a similar concern with the autoprewarm read stream user and > decided the overrun would not be large enough to merit special handling. > > In this case, because we don't know if we will successfully freeze eagerly > scanned blocks until after they have been yielded by the read stream API, > there is no way to use the caps to limit the prefetch distance or read > combine size. Doing so would entail mixing logic into vacuum code about how > the read stream API works -- at least as far as I can tell.
I concur with this analysis. Even in an extreme scenario where we set high values for both io_combine_limit and maintenance_io_concurrency, utilize a large shared buffer, and disable the ring buffer, the potential overrun would likely not be huge. > > I think the best course of action is to either change the assert to a guard > > if (vacrel->eager_scan_remaining_successes > 0) > vacrel->eager_scan_remaining_successes--; I've attached a patch that uses this idea. Feedback is very welcome. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
v1-0001-Fix-assertion-when-decrementing-eager_scan_remain.patch
Description: Binary data