Hi, I’ve updated and rebased Maxim's patch version 4 with optimizations like ring buffer and capped max_requests proposed by Heikki. These are the summary of Changes in v5:
• Replaced the linear array model in AbsorbSyncRequests() with a bounded ring buffer to avoid large memmove() operations when processing sync requests. • Introduced a tunable batch size (CKPT_REQ_BATCH_SIZE, default: 10,000) to incrementally absorb requests while respecting MaxAllocSize. • Capped max_requests with a hard upper bound (MAX_CHECKPOINT_REQUESTS = 10,000,000) to avoid pathological memory growth when shared_buffers is very large. • Updated CompactCheckpointerRequestQueue() to support the ring buffer layout. • Retained the existing compacting logic but modified it to operate in-place over the ring. > >> The patch itself looks ok to me. I'm curious about the trade-offs > >> between this incremental approach and the alternative of > >> using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach > >> of splitting the requests into fixed-size slices avoids OOM > >> failures or process termination by the OOM killer, which is good. > > +1 > > >> However, it does add some overhead with additional lock acquisition/ > >> release cycles and memory movement operations via memmove(). The > >> natural question is whether the security justify the cost. > > Hmm, if you turned the array into a ring buffer, you could absorb part > of the queue without the memmove(). > > With that, I'd actually suggest using a much smaller allocation, maybe > 10000 entries or even less. That should be enough to keep the lock > acquire/release overhead acceptable > > >> Regarding the slice size of 1 GB, is this derived from > >> MaxAllocSize limit, or was it chosen for other performance > >> reasons? whether a different size might offer better performance > >> under typical workloads? > > > > I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a > > relatively old one, and no one expected the number of requests to exceed > > 1 GB. Now we have the ability to set the shared_buffers to a huge number > > (without discussing now whether this makes any real sense), thus this > > limit for palloc becomes a problem. > > Yeah. Frankly even 1 GB seems excessive for this. We set max_requests = > NBuffers, which makes some sense so that you can fit the worst case > scenario of quickly evicting all pages from the buffer cache. But even > then I'd expect the checkpointer to be able to absorb the changes before > the queue fills up. And we have the compaction logic now too. > > So I wonder if we should cap max_requests at, say, 10 million requests now? > > CompactCheckpointerRequestQueue() makes a pretty large allocation too, > BTW, although much smaller than the one in AbsorbSyncRequests(). The > hash table it builds could grow quite large though. > > I’m not certain what the optimal cap for max_requests should be—whether it’s 10 million(setting it to 10 million would result in a peak temporary allocation of roughly 700 MB within CompactCheckpointerRequestQueue), 1 million, or some other value—but introducing an upper bound seems important. Without a cap, max_requests could theoretically scale with NBuffers, potentially resulting in excessive temporary memory allocations. As this is my first patch submission, it might be somewhat naive and experimental— I appreciate your patience and feedback.
v5-0001-Process-sync-requests-incrementally-in-AbsorbSyncReq.patch
Description: Binary data