On Tue, May 20, 2025 at 6:59 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> While I haven't identified how exactly read stream is related to this > issue, what I've observed through debugging this issue is, during a > single read_stream_next_buffer() call, I observed that > heap_vac_scan_next_block() is invoked multiple times to return blocks > to the read stream and that we continued processing eagerly scanned > pages even after the success counter reaches zero while processing the > previous block. Even when I set both io_combine_limit and > maintenance_io_concurrency to 1, the debug logs showed that vacuum > still performs eager scanning two blocks ahead of the current block > being processed. I spent some more time looking into this today. This assert no longer works with streaming vacuum. Even with the minimums: io_combine_limit 1 and maintenance_io_concurrency set to 0, we will still invoke read_stream_get_block() one extra time at the start of the read stream. To your point about eagerly scanning beyond the success and failure caps, it is possible, but it is hard to get into the situation where you far exceed them. Though max_pinned_buffers is derived from io_combine_limit * maintenance_io_concurrency, it is limited by the size of the buffer access strategy ring (by default 2 MB for vacuum) and the size of shared buffers. And because we won't prefetch if sequential access is detected (and the prefetch distance has to first ramp up), it is unlikely we will both prefetch the maximum distance of randomly located blocks and combine blocks to the maximum IO size before returning to vacuum code from read stream code. That is to say, the maximum number of extra all-visible blocks scanned should not be very large. We ran into a similar concern with the autoprewarm read stream user and decided the overrun would not be large enough to merit special handling. In this case, because we don't know if we will successfully freeze eagerly scanned blocks until after they have been yielded by the read stream API, there is no way to use the caps to limit the prefetch distance or read combine size. Doing so would entail mixing logic into vacuum code about how the read stream API works -- at least as far as I can tell. I think the best course of action is to either change the assert to a guard if (vacrel->eager_scan_remaining_successes > 0) vacrel->eager_scan_remaining_successes--; or make the counter signed and allow it to go negative (as you mentioned). - Melanie