Re: Andres Freund
> How confident are we that this isn't actually because we passed a bogus
> address to the kernel or such? With this patch, are *any* pages recognized as
> valid on the machines that triggered the error?

See upthread - the first 35 pages were ok, then a lot of -14.

> I wonder if we ought to report the failures as a separate "numa node"
> (e.g. NULL as node id) instead ...

Did that now, using N+1 (== 1 here) for errors in this Debian i386
environment (chroot on an amd64 host):

select * from pg_shmem_allocations_numa \crosstabview 

                      name                      │    0     │    1
────────────────────────────────────────────────┼──────────┼──────────
 multixact_offset                               │    69632 │    65536
 subtransaction                                 │   139264 │   131072
 notify                                         │   139264 │        0
 Shared Memory Stats                            │   188416 │   131072
 serializable                                   │   188416 │    86016
 PROCLOCK hash                                  │     4096 │        0
 FinishedSerializableTransactions               │     4096 │        0
 XLOG Ctl                                       │  2117632 │  2097152
 Shared MultiXact State                         │     4096 │        0
 Proc Header                                    │     4096 │        0
 Archiver Data                                  │     4096 │        0
.... more 0s in the last column ...
 AioHandleData                                  │  1429504 │        0
 Buffer Blocks                                  │ 67117056 │ 67108864
 Buffer IO Condition Variables                  │   266240 │        0
 Proc Array                                     │     4096 │        0
.... more 0s
(73 rows)


There is something fishy with pg_buffercache. If I restart PG, I'm
getting "Bad address" (errno 14), this time as return value of
move_pages().

postgres =# select * from pg_buffercache_numa;
DEBUG:  00000: NUMA: NBuffers=16384 os_page_count=32768 os_page_size=4096
LOCATION:  pg_buffercache_numa_pages, pg_buffercache_pages.c:383
2025-06-23 19:41:41.315 UTC [1331894] ERROR:  failed NUMA pages inquiry: Bad 
address
2025-06-23 19:41:41.315 UTC [1331894] STATEMENT:  select * from 
pg_buffercache_numa;
ERROR:  XX000: failed NUMA pages inquiry: Bad address
LOCATION:  pg_buffercache_numa_pages, pg_buffercache_pages.c:394

Repeated calls are fine.

Maybe NUMA is just not supported on 32-bit archs, but I'd rather be
sure about that before play that card.

Christoph


Reply via email to