Hi, Xuneng! On Thu, Jun 26, 2025 at 4:31 PM Xuneng Zhou <xunengz...@gmail.com> wrote: > Patch v7 modifies CompactCheckpointerRequestQueue() to process requests > incrementally in batches of CKPT_REQ_BATCH_SIZE (10,000), similar to the > approach used in AbsorbSyncRequests(). This limits memory usage from > O(num_requests) to O(batch_size) for both hash tables and skip arrays. > > > - Hash table memory bounded by batch size regardless of total queue size > > - Skip array allocation limited to batch size instead of max_requests > > - Prevents potential OOM conditions with very large request queues > > > Trade-offs > > Cross-batch duplicate detection: The incremental approach won't detect > duplicates spanning batch boundaries. This limitation seems acceptable since: > > - The main issue need to be addressed is preventing memory allocation > failures > > - Remaining duplicates are still handled by the RememberSyncRequest() > function in the sync subsystem > > - The purpose of this function is to make some rooms for new requests not > remove all duplicates. > > > Lock holding Duration > > Andres pointed out[1] that compacting a very large queue takes considerable > time, and holding the exclusive lock for an extended period makes it much > more likely that backends will have to perform syncs themselves - which is > exactly what CompactCheckpointerRequestQueue() is trying to avoid in the > first place. However, releasing the lock between batches would introduce race > conditions that would make the design much more complex. Given that the > primary goal of this patch is to avoid large memory allocations, I keep the > lock held for the whole function for simplicity now. > > [1] > https://postgrespro.com/list/id/bjno37ickfafixkqmd2lcyopsajnuig5mm4rg6tn2ackpqyiba@w3sjfo3usuos
I've reviewed the v7 of patch. I can note following. 1) The patch makes CompactCheckpointerRequestQueue() process queue in batches, but doesn't release the lock between batches. The algorithm becomes more complex, and it holds the lock for the same (or probably longer) time. We trade possibility to miss some duplicates to less memory consumption. However, with MAX_CHECKPOINT_REQUESTS limit, maximal memory consumption shouldn't be too high anyway. 2) Even if we manage to release the lock between the batches then we still need some additional coordination to prevent two processed doing CompactCheckpointerRequestQueue() simultaneously. 3) Another problem of releasing the lock is that in spite of AbsorbSyncRequests() there is no work to do while not holding the lock. Releasing the lock and immediately taking it back have a little use: other processes have almost no chance to grab the lock. In the token of all of above, I think it's not reasonable to do CompactCheckpointerRequestQueue() in batches. Sorry for the confusion. Regarding the gap filling, I've got from [1] that the requests order matters. Then gap filling strategy is impossible or too complex. The point is withdrawn. I think v6 is basically good enough. The v8 is attached. It's basically the same as v6, but has revised commit message and comments. The patch for stable branches is also attached: it just limits the maximum size of the checkpointer requests queue. Links. 1. https://www.postgresql.org/message-id/CABPTF7XSSecQ-k7k9cQJsA3ACHmCVwdoRfv4DxOMom4cNQL%3D5Q%40mail.gmail.com ------ Regards, Alexander Korotkov Supabase
v8-0001-Process-sync-requests-incrementally-in-AbsorbSync.patch
Description: Binary data
v1-backpatch-0001-Limit-checkpointer-requests-queue-size.patch
Description: Binary data