On Wed, Mar 6, 2024 at 6:47 PM Melanie Plageman <melanieplage...@gmail.com> wrote: > > Performance results: > > The TL;DR of my performance results is that streaming read vacuum is > faster. However there is an issue with the interaction of the streaming > read code and the vacuum buffer access strategy which must be addressed.
I have investigated the interaction between maintenance_io_concurrency, streaming reads, and the vacuum buffer access strategy (BAS_VACUUM). The streaming read API limits max_pinned_buffers to a pinned buffer multiplier (currently 4) * maintenance_io_concurrency buffers with the goal of constructing reads of at least MAX_BUFFERS_PER_TRANSFER size. Since the BAS_VACUUM ring buffer is size 256 kB or 32 buffers with default block size, that means that for a fully uncached vacuum in which all blocks must be vacuumed and will be dirtied, you'd have to set maintenance_io_concurrency at 8 or lower to see the same number of reuses (and shared buffer consumption) as master. Given that we allow users to specify BUFFER_USAGE_LIMIT to vacuum, it seems like we should force max_pinned_buffers to a value that guarantees the expected shared buffer usage by vacuum. But that means that maintenance_io_concurrency does not have a predictable impact on streaming read vacuum. What is the right thing to do here? At the least, the default size of the BAS_VACUUM ring buffer should be BLCKSZ * pinned_buffer_multiplier * default maintenance_io_concurrency (probably rounded up to the next power of two) bytes. - Melanie