On 4/6/25 01:00, Andres Freund wrote:
> Hi,
> 
> On 2025-04-05 18:29:22 -0400, Andres Freund wrote:
>> I think one thing that the docs should mention is that calling the numa
>> functions/views will force the pages to be allocated, even if they're
>> currently unused.
>>
>> Newly started server, with s_b of 32GB an 2MB huge pages:
>>
>>   grep ^Huge /proc/meminfo
>>   HugePages_Total:   34802
>>   HugePages_Free:    34448
>>   HugePages_Rsvd:    16437
>>   HugePages_Surp:        0
>>   Hugepagesize:       2048 kB
>>   Hugetlb:        76517376 kB
>>
>> run
>>   SELECT node_id, sum(size) FROM pg_shmem_allocations_numa GROUP BY node_id;
>>
>> Now the pages that previously were marked as reserved are actually allocated:
>>
>>   grep ^Huge /proc/meminfo
>>   HugePages_Total:   34802
>>   HugePages_Free:    18012
>>   HugePages_Rsvd:        1
>>   HugePages_Surp:        0
>>   Hugepagesize:       2048 kB
>>   Hugetlb:        76517376 kB
>>
>>
>> I don't see how we can avoid that right now, but at the very least we ought 
>> to
>> document it.
> 
> The only allocation where that really matters is shared_buffers. I wonder if
> we could special case the logic for that, by only probing if at least one of
> the buffers in the range is valid.
> 
> Then we could treat a page status of -ENOENT as "page is not mapped" and
> display NULL for the node_id?
> 
> Of course that would mean that we'd always need to
> pg_numa_touch_mem_if_required(), not just the first time round, because we
> previously might not have for a page that is now valid.  But compared to the
> cost of actually allocating pages, the cost for that seems small.
> 

I don't think this would be a good trade off. The buffers already have a
NUMA node, and users would be interested in that. It's just that we
don't have the memory mapped in the current backend, so I'd bet people
would not be happy with NULL, and would proceed to force the allocation
in some other way (say, a large query of some sort). Which obviously
causes a lot of other problems.

I can imagine having a flag that makes the allocation optional, but
there's no convenient way to pass that to a view, and I think most
people want the allocation anyway.

Especially for monitoring purposes, which usually happens in a new
connection, so the backend has little opportunity to allocate the pages
"naturally."

regards

-- 
Tomas Vondra



Reply via email to