On 6/23/25 23:47, Tomas Vondra wrote:
> ...
> 
> Or maybe the 32-bit chroot on 64-bit host matters and confuses some
> calculation.
>

I think it's likely something like this. I noticed that if I modify
pg_buffercache_numa_pages() to query the addresses one by one, it works.
And when I increase the number, it stops working somewhere between 16k
and 17k items.

It may be a coincidence, but I suspect it's related to the sizeof(void
*) being 8 in the kernel, but only 4 in the chroot. So the userspace
passes an array of 4-byte items, but kernel interprets that as 8-byte
items. That is, we call

long move_pages(int pid, unsigned long count, void *pages[.count], const
int nodes[.count], int status[.count], int flags);

Which (I assume) just passes the parameters to kernel. And it'll
interpret them per kernel pointer size.


If this is what's happening, I'm not sure what to do about it ...


FWIW while looking into this, I tried running this under valgrind (on a
regular 64-bit system, not in the chroot), and I get this report:

==65065== Invalid read of size 8
==65065==    at 0x113B0EBE: pg_buffercache_numa_pages
(pg_buffercache_pages.c:380)
==65065==    by 0x6B539D: ExecMakeTableFunctionResult (execSRF.c:234)
==65065==    by 0x6CEB7E: FunctionNext (nodeFunctionscan.c:94)
==65065==    by 0x6B6ACA: ExecScanFetch (execScan.h:126)
==65065==    by 0x6B6B31: ExecScanExtended (execScan.h:170)
==65065==    by 0x6B6C9D: ExecScan (execScan.c:59)
==65065==    by 0x6CEF0F: ExecFunctionScan (nodeFunctionscan.c:269)
==65065==    by 0x6B29FA: ExecProcNodeFirst (execProcnode.c:469)
==65065==    by 0x6A6F56: ExecProcNode (executor.h:313)
==65065==    by 0x6A9533: ExecutePlan (execMain.c:1679)
==65065==    by 0x6A7422: standard_ExecutorRun (execMain.c:367)
==65065==    by 0x6A7330: ExecutorRun (execMain.c:304)
==65065==    by 0x934EF0: PortalRunSelect (pquery.c:921)
==65065==    by 0x934BD8: PortalRun (pquery.c:765)
==65065==    by 0x92E4CD: exec_simple_query (postgres.c:1273)
==65065==    by 0x93301E: PostgresMain (postgres.c:4766)
==65065==    by 0x92A88B: BackendMain (backend_startup.c:124)
==65065==    by 0x85A7C7: postmaster_child_launch (launch_backend.c:290)
==65065==    by 0x860111: BackendStartup (postmaster.c:3580)
==65065==    by 0x85DE6F: ServerLoop (postmaster.c:1702)
==65065==  Address 0x7b6c000 is in a rw- anonymous segment


This fails here (on the pg_numa_touch_mem_if_required call):

    for (char *ptr = startptr; ptr < endptr; ptr += os_page_size)
    {
        os_page_ptrs[idx++] = ptr;

        /* Only need to touch memory once per backend process */
        if (firstNumaTouch)
            pg_numa_touch_mem_if_required(touch, ptr);
    }

The 0x7b6c000 is the very first pointer, and it's the only pointer that
triggers this warning. At first I thought there's something wrong with
how we align the pointer using TYPEALIGN_DOWN(), but then I noticed it's
actually the pointer of BufferGetBlock(1).

So I'm a bit puzzled by this, and I'm not sure it's related to the other
issue at all (it probably is not).

It's a bit too late here, I'll continue investigating this tomorrow.

-- 
Tomas Vondra



Reply via email to