Re: Proposal: Limitations of palloc inside checkpointer

Xuneng Zhou Thu, 10 Apr 2025 08:43:54 -0700

Hi,

I’ve updated and rebased Maxim's patch version 4 with optimizations like
ring buffer and capped max_requests proposed by Heikki.  These are the
summary of Changes in v5:


• Replaced the linear array model in AbsorbSyncRequests() with a bounded
ring buffer to avoid large memmove() operations when processing sync
requests.

• Introduced a tunable batch size (CKPT_REQ_BATCH_SIZE, default:
10,000) to incrementally
absorb requests while respecting MaxAllocSize.

• Capped max_requests with a hard upper bound (MAX_CHECKPOINT_REQUESTS =
10,000,000) to avoid pathological memory growth when shared_buffers is very
large.

• Updated CompactCheckpointerRequestQueue() to support the ring buffer
layout.

• Retained the existing compacting logic but modified it to operate
in-place over the ring.

> >> The patch itself looks ok to me. I'm curious about the trade-offs
> >> between this incremental approach and the alternative of
> >> using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach
> >> of splitting the requests into fixed-size slices  avoids OOM
> >> failures or process termination by the OOM killer, which is good.
>
> +1
>
> >> However, it does add some overhead with additional lock acquisition/
> >> release cycles and memory movement operations via memmove(). The
> >> natural question is whether the security justify the cost.
>
> Hmm, if you turned the array into a ring buffer, you could absorb part
> of the queue without the memmove().
>
> With that, I'd actually suggest using a much smaller allocation, maybe
> 10000 entries or even less. That should be enough to keep the lock
> acquire/release overhead acceptable
>


> >> Regarding the slice size of 1 GB,  is this derived from
> >> MaxAllocSize limit, or was it chosen for other performance
> >> reasons? whether a different size might offer better performance
> >> under typical workloads?
> >
> > I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a
> > relatively old one, and no one expected the number of requests to exceed
> > 1 GB. Now we have the ability to set the shared_buffers to a huge number
> > (without discussing now whether this makes any real sense), thus this
> > limit for palloc becomes a problem.
>
> Yeah. Frankly even 1 GB seems excessive for this. We set max_requests =
> NBuffers, which makes some sense so that you can fit the worst case
> scenario of quickly evicting all pages from the buffer cache. But even
> then I'd expect the checkpointer to be able to absorb the changes before
> the queue fills up. And we have the compaction logic now too.
>
> So I wonder if we should cap max_requests at, say, 10 million requests now?
>
> CompactCheckpointerRequestQueue() makes a pretty large allocation too,
> BTW, although much smaller than the one in AbsorbSyncRequests(). The
> hash table it builds could grow quite large though.
>
> I’m not certain what the optimal cap for max_requests should be—whether
it’s 10 million(setting it to 10 million would result in a peak temporary
allocation of roughly 700 MB within CompactCheckpointerRequestQueue), 1
million, or some other value—but introducing an upper bound seems
important. Without a cap, max_requests could theoretically scale with
NBuffers, potentially resulting in excessive temporary memory allocations.

As this is my first patch submission, it might be somewhat naive and
experimental— I appreciate your patience and feedback.

v5-0001-Process-sync-requests-incrementally-in-AbsorbSyncReq.patch
Description: Binary data

Re: Proposal: Limitations of palloc inside checkpointer

Reply via email to