Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
> I took this patch for a quick spin on macOS.  The result was that the 
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't 
> see any mentions of this anywhere in the thread, but that test is newer 
> than the beginning of this thread.  Can anyone confirm or deny this 
> issue?  Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD.  The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops.  I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
    frame #1: 0x0000000105511533 
postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at 
latch.c:622:19 [opt]
    frame #2: 0x0000000105511305 
postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1, 
timeout=237000, wait_event_info=83886084) at latch.c:389:22 [opt]
    frame #3: 0x00000001054a7073 postgres`CheckpointerMain at 
checkpointer.c:514:10 [opt]
    frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2, 
argv=0x00007ffeea9dded0) at bootstrap.c:461:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10
    frame #1: 0x0000000105511ddc 
postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920, 
event=0x00007fc8e8805958, old_events=<unavailable>) at latch.c:1034:7 [opt]
    frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>, 
events=<unavailable>, fd=<unavailable>, latch=<unavailable>, 
user_data=<unavailable>) at latch.c:778:2 [opt]
    frame #3: 0x0000000105511342 
postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1, 
timeout=200, wait_event_info=83886083) at latch.c:397:3 [opt]
    frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at 
bgwriter.c:304:8 [opt]
    frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2, 
argv=0x00007ffeea9dded0) at bootstrap.c:456:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10
    frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined] 
FreeWaitEventSet(set=<unavailable>) at latch.c:660:2 [opt]
    frame #2: 0x000000010551145d 
postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>, 
sock=-1, timeout=5000, wait_event_info=83886093) at latch.c:432 [opt]
    frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10 
[opt]
    frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2, 
argv=0x00007ffeea9dded0) at bootstrap.c:467:4 [opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10
    frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at 
pgsleep.c:56:10 [opt]
    frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at 
pgstat.c:5720:3 [opt]
    frame #3: 0x00000001054adcc0 
postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt]
    frame #4: 0x00000001054a320c postgres`do_start_worker at 
autovacuum.c:1248:20 [opt]
    frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined] 
launch_worker(now=632853327674576) at autovacuum.c:1357:9 [opt]
    frame #6: 0x00000001054a2634 
postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at 
autovacuum.c:769 [opt]
    frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at 
autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

                        regards, tom lane


Reply via email to