On 18/04/2025 12:26 am, Dmitry Dolgov wrote:
On Thu, Apr 17, 2025 at 02:21:07PM GMT, Konstantin Knizhnik wrote:
1. Performance of Postgres CLOCK page eviction algorithm depends on number
of shared buffers. My first native attempt just to mark unused buffers as
invalid cause significant degrade of performance
Thanks for sharing!
Right, but it concerns the case when the number of shared buffers is
high, independently from whether it was changed online or with a
restart, correct? In that case it's out of scope for this patch.
2. There are several data structures i Postgres which size depends on number
of buffers.
In my patch I used in some cases dynamic shared buffer size, but if this
structure has to be allocated in shared memory then still maximal size has
to be used. We have the buffers themselves (8 kB per buffer), then the main
BufferDescriptors array (64 B), the BufferIOCVArray (16 B), checkpoint's
CkptBufferIds (20 B), and the hashmap on the buffer cache (24B+8B/entry).
128 bytes per 8kb bytes seems to large overhead (~1%) but but it may be
quote noticeable with size differences larger than 2 orders of magnitude:
E.g. to support scaling to from 0.5Gb to 128GB , with 128 bytes/buffer we'd
have ~2GiB of static overhead on only 0.5GiB of actual buffers.
Not sure what do you mean by using a maximal size, can you elaborate.
In the current patch those structures are allocated as before, except
each goes into a separate segment -- without any extra memory overhead
as far as I see.
Thank you for explanation. I am sorry that I have not precisely
investigated your patch before writing: it seems to be that you are are
placing in separate segment only content of shared buffers.
Now I see that I was wrong and it is actually the main difference with
memory ballooning approach I have used. As far as you are are allocating
buffers descriptors and hash table in the same segment,
there is no extra memory overhead.
The only drawback is that we are loosing content of shared buffers in
case of resize. It may be sadly, but not looks like there is no better
alternative.
But there are still some dependencies on shared buffers size which are
not addressed in this PR.
I am not sure how critical they are and is it possible to do something
here, but at least I want to enumerate them:
1. Checkpointer: maximal number of checkpointer requests depends on
NBuffers. So if we start with small shared buffers and then upscale, it
may cause the too frequent checkpoints:
Size
CheckpointerShmemSize(void)
...
size = add_size(size, mul_size(NBuffers,
sizeof(CheckpointerRequest)));
CheckpointerShmemInit(void)
CheckpointerShmem->max_requests = NBuffers;
2. XLOG: number of xlog buffers is calculated depending on number of
shared buffers:
XLOGChooseNumBuffers(void)
{
...
xbuffers = NBuffers / 32;
Should not cause some errors, but may be not so efficient if once again
we start we tiny shared buffers.
3. AIO: AIO max concurrency is also calculated based on number of shared
buffers:
AioChooseMaxConcurrency(void)
{
...
max_proportional_pins = NBuffers / max_backends;
For small shared buffers (i.e. 1Mb, there will be no concurrency at all).
So none of this issues can cause some error, just some inefficient behavior.
But if we want to start with very small shared buffers and then increase
them on demand,
then it can be a problem.
In all this three cases NBuffers is used not just to calculate some
threshold value, but also determine size of the structure in shared memory.
The straightforward solution is to place them in the same segment as
shared buffers. But I am not sure how difficult it will be to implement.