Re: Adding basic NUMA awareness

Jakub Wartak Fri, 25 Jul 2025 03:27:59 -0700

On Thu, Jul 17, 2025 at 11:15 PM Tomas Vondra <[email protected]> wrote:
>
> On 7/4/25 20:12, Tomas Vondra wrote:
> > On 7/4/25 13:05, Jakub Wartak wrote:
> >> ...
> >>
> >> 8. v1-0005 2x + /* if (numa_procs_interleave) */
> >>
> >>    Ha! it's a TRAP! I've uncommented it because I wanted to try it out
> >> without it (just by setting GUC off) , but "MyProc->sema" is NULL :
> >>
> >>     2025-07-04 12:31:08.103 CEST [28754] LOG:  starting PostgreSQL
> >> 19devel on x86_64-linux, compiled by gcc-12.2.0, 64-bit
> >>     [..]
> >>     2025-07-04 12:31:08.109 CEST [28754] LOG:  io worker (PID 28755)
> >> was terminated by signal 11: Segmentation fault
> >>     2025-07-04 12:31:08.109 CEST [28754] LOG:  terminating any other
> >> active server processes
> >>     2025-07-04 12:31:08.114 CEST [28754] LOG:  shutting down because
> >> "restart_after_crash" is off
> >>     2025-07-04 12:31:08.116 CEST [28754] LOG:  database system is shut down
> >>
> >>     [New LWP 28755]
> >>     [Thread debugging using libthread_db enabled]
> >>     Using host libthread_db library 
> >> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> >>     Core was generated by `postgres: io worker                     '.
> >>     Program terminated with signal SIGSEGV, Segmentation fault.
> >>     #0  __new_sem_wait_fast (definitive_result=1, sem=sem@entry=0x0)
> >> at ./nptl/sem_waitcommon.c:136
> >>     136     ./nptl/sem_waitcommon.c: No such file or directory.
> >>     (gdb) where
> >>     #0  __new_sem_wait_fast (definitive_result=1, sem=sem@entry=0x0)
> >> at ./nptl/sem_waitcommon.c:136
> >>     #1  __new_sem_trywait (sem=sem@entry=0x0) at ./nptl/sem_wait.c:81
> >>     #2  0x00005561918e0cac in PGSemaphoreReset (sema=0x0) at
> >> ../src/backend/port/posix_sema.c:302
> >>     #3  0x0000556191970553 in InitAuxiliaryProcess () at
> >> ../src/backend/storage/lmgr/proc.c:992
> >>     #4  0x00005561918e51a2 in AuxiliaryProcessMainCommon () at
> >> ../src/backend/postmaster/auxprocess.c:65
> >>     #5  0x0000556191940676 in IoWorkerMain (startup_data=<optimized
> >> out>, startup_data_len=<optimized out>) at
> >> ../src/backend/storage/aio/method_worker.c:393
> >>     #6  0x00005561918e8163 in postmaster_child_launch
> >> (child_type=child_type@entry=B_IO_WORKER, child_slot=20086,
> >> startup_data=startup_data@entry=0x0,
> >>         startup_data_len=startup_data_len@entry=0,
> >> client_sock=client_sock@entry=0x0) at
> >> ../src/backend/postmaster/launch_backend.c:290
> >>     #7  0x00005561918ea09a in StartChildProcess
> >> (type=type@entry=B_IO_WORKER) at
> >> ../src/backend/postmaster/postmaster.c:3973
> >>     #8  0x00005561918ea308 in maybe_adjust_io_workers () at
> >> ../src/backend/postmaster/postmaster.c:4404
> >>     [..]
> >>     (gdb) print *MyProc->sem
> >>     Cannot access memory at address 0x0
> >>
> >
> > Yeah, good catch. I'll look into that next week.
> >
>
> I've been unable to reproduce this issue, but I'm not sure what settings
> you actually used for this instance. Can you give me more details how to
> reproduce this?


Better late than never, well feel free to partially ignore me, i've
missed that it is known issue as per FIXME there, but I would just rip
out that commented out `if(numa_proc_interleave)` from
FastPathLockShmemSize() and PGProcShmemSize() unless you want to save
those memory pages of course (in case of no-NUMA). If you do want to
save those pages I think we have problem:

For complete picture, steps:

1. patch -p1 < v2-0001-NUMA-interleaving-buffers.patch
2. patch -p1 < v2-0006-NUMA-interleave-PGPROC-entries.patch

BTW the pgbench accidentinal ident is still there (part of v2-0001 patch))
14 out of 14 hunks FAILED -- saving rejects to file
src/bin/pgbench/pgbench.c.rej

3. As I'm just applying 0001 and 0006, I've got two simple rejects,
but fixed it (due to not applying missing numa_ freelist patches).
That's intentional on my part, because I wanted to play just with
those two.

4. Then I uncomment those two "if (numa_procs_interleave)" related for
optional memory shm initialization - add_size() and so on (that have
XXX comment above that it is causing bootstrap issues)

5. initdb with numa_procs_interleave=on, huge_pages = on (!), start, it is ok

6. restart with numa_procs_interleave=off, which gets me to every bg
worker crashing e.g.:

(gdb) where
#0  __new_sem_wait_fast (definitive_result=1, sem=sem@entry=0x0) at
./nptl/sem_waitcommon.c:136
#1  __new_sem_trywait (sem=sem@entry=0x0) at ./nptl/sem_wait.c:81
#2  0x0000563e2d6e4d5c in PGSemaphoreReset (sema=0x0) at
../src/backend/port/posix_sema.c:302
#3  0x0000563e2d774d93 in InitAuxiliaryProcess () at
../src/backend/storage/lmgr/proc.c:995
#4  0x0000563e2d6e9252 in AuxiliaryProcessMainCommon () at
../src/backend/postmaster/auxprocess.c:65
#5  0x0000563e2d6eb683 in CheckpointerMain (startup_data=<optimized
out>, startup_data_len=<optimized out>) at
../src/backend/postmaster/checkpointer.c:190
#6  0x0000563e2d6ec363 in postmaster_child_launch
(child_type=child_type@entry=B_CHECKPOINTER, child_slot=249,
startup_data=startup_data@entry=0x0,
    startup_data_len=startup_data_len@entry=0,
client_sock=client_sock@entry=0x0) at
../src/backend/postmaster/launch_backend.c:290
#7  0x0000563e2d6ee29a in StartChildProcess
(type=type@entry=B_CHECKPOINTER) at
../src/backend/postmaster/postmaster.c:3973
#8  0x0000563e2d6f17a6 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x563e377cc0e0) at
../src/backend/postmaster/postmaster.c:1386
#9  0x0000563e2d4948fc in main (argc=3, argv=0x563e377cc0e0) at
../src/backend/main/main.c:231

notice sema=0x0, because:
#3  0x000056050928cd93 in InitAuxiliaryProcess () at
../src/backend/storage/lmgr/proc.c:995
995             PGSemaphoreReset(MyProc->sem);
(gdb) print MyProc
$1 = (PGPROC *) 0x7f09a0c013b0
(gdb) print MyProc->sem
$2 = (PGSemaphore) 0x0

or with printfs:

2025-07-25 11:17:23.683 CEST [21772] LOG:  in InitProcGlobal
PGPROC=0x7f9de827b880 requestSize=148770
// after proc && ptr manipulation:
2025-07-25 11:17:23.683 CEST [21772] LOG:  in InitProcGlobal
PGPROC=0x7f9de827bdf0 requestSize=148770 procs=0x7f9de827b880
ptr=0x7f9de827bdf0
[..initialization of aux PGPROCs i=0.., still fromInitProcGlobal(),
each gets proper sem allocated as one would expect:]
[..for i loop:]
2025-07-25 11:17:23.689 CEST [21772] LOG:  i=136 ,
proc=0x7f9de8600000, proc->sem=0x7f9da4e04438
2025-07-25 11:17:23.689 CEST [21772] LOG:  i=137 ,
proc=0x7f9de8600348, proc->sem=0x7f9da4e044b8
2025-07-25 11:17:23.689 CEST [21772] LOG:  i=138 ,
proc=0x7f9de8600690, proc->sem=0x7f9da4e04538
[..but then in the children codepaths, out of the blue in
InitAuxilaryProcess the whole MyProc looks like it would memsetted to
zeros:]
2025-07-25 11:17:23.693 CEST [21784] LOG:  auxiliary process using
MyProc=0x7f9de8600000 auxproc=0x7f9de8600000 proctype=0
MyProcPid=21784 MyProc->sem=(nil)

above got pgproc slot i=136 with addr 0x7f9de8600000 and later that
auxiliary is launched but somehow something NULLified ->sem there
(according to gdb , everything is zero there)

7. Original patch v2-0006 (with commented out 2x if
numa_procs_interleave), behaves OK, so in my case here with 1x NUMA
node that gives add_size(.., 1+1 * 2MB)=4MB

2025-07-25 11:38:54.131 CEST [23939] LOG:  in InitProcGlobal
PGPROC=0x7f25cbe7b880 requestSize=4343074
2025-07-25 11:38:54.132 CEST [23939] LOG:  in InitProcGlobal
PGPROC=0x7f25cbe7bdf0 requestSize=4343074 procs=0x7f25cbe7b880
ptr=0x7f25cbe7bdf0

so something is zeroing out all those MyProc structures apparently on
startup (probably due to some wrong alignment maybe somewhere ?) I was
thinking about trapping via mprotect() this single i=136
0x7f9de8600000 PGPROC to see what is resetting it, but oh well,
mprotect() works only on whole pages...

-J.

Re: Adding basic NUMA awareness

Reply via email to