Re: pgsql: Introduce pg_shmem_allocations_numa view

Tomas Vondra Tue, 24 Jun 2025 05:42:07 -0700

On 6/24/25 13:10, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Jun 24, 2025 at 11:20:15AM +0200, Tomas Vondra wrote:
>> On 6/24/25 10:24, Bertrand Drouvot wrote:
>>> Yeah, same for me with pg_get_shmem_allocations_numa(). It works if
>>> pg_numa_query_pages() is done on chunks <= 16 pages but fails if done on 
>>> more
>>> than 16 pages.
>>>
>>> It's also confirmed by test_chunk_size.c attached:
>>>
>>> $ gcc-11 -m32 -o test_chunk_size test_chunk_size.c
>>> $ ./test_chunk_size
>>>  1 pages: SUCCESS (0 errors)
>>>  2 pages: SUCCESS (0 errors)
>>>  3 pages: SUCCESS (0 errors)
>>>  4 pages: SUCCESS (0 errors)
>>>  5 pages: SUCCESS (0 errors)
>>>  6 pages: SUCCESS (0 errors)
>>>  7 pages: SUCCESS (0 errors)
>>>  8 pages: SUCCESS (0 errors)
>>>  9 pages: SUCCESS (0 errors)
>>> 10 pages: SUCCESS (0 errors)
>>> 11 pages: SUCCESS (0 errors)
>>> 12 pages: SUCCESS (0 errors)
>>> 13 pages: SUCCESS (0 errors)
>>> 14 pages: SUCCESS (0 errors)
>>> 15 pages: SUCCESS (0 errors)
>>> 16 pages: SUCCESS (0 errors)
>>> 17 pages: 1 errors
>>> Threshold: 17 pages
>>>
>>> No error if -m32 is not used.
>>>
>>> We could work by chunks (16?) on 32 bits but would probably produce 
>>> performance
>>> degradation (we mention it in the doc though). Also would always 16 be a 
>>> correct
>>> chunk size? 
>>
>> I don't see how this would solve anything?
>>
>> AFAICS the problem is the two places are confused about how large the
>> array elements are, and get to interpret that differently.
> 
>> I don't see how using smaller array makes this correct. That it works is
>> more a matter of luck,
> 
> Not sure it's luck, maybe the wrong pointers arithmetic has no effect if batch
> size is <= 16.
> 
> So we have kernel_move_pages() -> kernel_move_pages() (because nodes is NULL 
> here
> for us as we call "numa_move_pages(pid, count, pages, NULL, status, 0);").
> 
> So, if we look at do_pages_stat() ([1]), we can see that it uses an hardcoded
> "#define DO_PAGES_STAT_CHUNK_NR 16UL" and that this pointers arithmetic:
> 
> "
>         pages += chunk_nr;
>         status += chunk_nr;
> "
> 
> is done but has no effect since nr_pages will exit the loop if we use a batch
> size <= 16.
> 
> So if this pointer arithmetic is not correct, (it seems that it should advance
> by 16 * sizeof(compat_uptr_t) instead) then it has no effect as long as the 
> batch
> size is <= 16.
> 
> Does test_chunk_size also fails at 17 for you?

Yes, it fails for me at 17 too. So you're saying the access within each
chunk of 16 elements is OK, but that maybe advancing to the next chunk
is not quite right? In which case limiting the access to 16 entries
might be a workaround.

In any case, this sounds like a kernel bug, right? I don't have much
experience with the kernel code, so don't want to rely too much on my
interpretation of it.


regards

-- 
Tomas Vondra
Re: pgsql: Introduce pg_shmem_allocations_numa view

Reply via email to