On Wed, May 21, 2025 at 10:09 AM Melanie Plageman
<melanieplage...@gmail.com> wrote:
>
>
> On Tue, May 20, 2025 at 6:59 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>>
>> While I haven't identified how exactly read stream is related to this
>> issue, what I've observed through debugging this issue is, during a
>> single read_stream_next_buffer() call, I observed that
>> heap_vac_scan_next_block() is invoked multiple times to return blocks
>> to the read stream and that we continued processing eagerly scanned
>> pages even after the success counter reaches zero while processing the
>> previous block. Even when I set both io_combine_limit and
>> maintenance_io_concurrency to 1, the debug logs showed that vacuum
>> still performs eager scanning two blocks ahead of the current block
>> being processed.
>
>
> I spent some more time looking into this today. This assert no longer works 
> with streaming vacuum.
> Even with the minimums: io_combine_limit 1 and maintenance_io_concurrency set 
> to 0, we will still invoke read_stream_get_block() one extra time at the 
> start of the read stream.
>
> To your point about eagerly scanning beyond the success and failure caps, it 
> is possible, but it is hard to get into the situation where you far exceed 
> them.
>
> Though max_pinned_buffers is derived from io_combine_limit * 
> maintenance_io_concurrency, it is limited by the size of the buffer access 
> strategy ring (by default 2 MB for vacuum) and the size of shared buffers. 
> And because we won't prefetch if sequential access is detected (and the 
> prefetch distance has to first ramp up), it is unlikely we will both prefetch 
> the maximum distance of randomly located blocks and combine blocks to the 
> maximum IO size before returning to vacuum code from read stream code. That 
> is to say, the maximum number of extra all-visible blocks scanned should not 
> be very large.
>
> We ran into a similar concern with the autoprewarm read stream user and 
> decided the overrun would not be large enough to merit special handling.
>
> In this case, because we don't know if we will successfully freeze eagerly 
> scanned blocks until after they have been yielded by the read stream API, 
> there is no way to use the caps to limit the prefetch distance or read 
> combine size. Doing so would entail mixing logic into vacuum code about how 
> the read stream API works -- at least as far as I can tell.

I concur with this analysis. Even in an extreme scenario where we set
high values for both io_combine_limit and maintenance_io_concurrency,
utilize a large shared buffer, and disable the ring buffer, the
potential overrun would likely not be huge.

>
> I think the best course of action is to either change the assert to a guard
>
> if (vacrel->eager_scan_remaining_successes > 0)
>      vacrel->eager_scan_remaining_successes--;

I've attached a patch that uses this idea. Feedback is very welcome.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment: v1-0001-Fix-assertion-when-decrementing-eager_scan_remain.patch
Description: Binary data

Reply via email to