Hello,
18.02.2023 04:06, Andres Freund wrote:
Hi,

On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
I still have no theory for how this condition was reached despite a
lot of time thinking about it and searching for more clues.  As far as
I can tell, the recent improvements to postmaster's signal and event
handling shouldn't be related: the state management and logic was
unchanged.
Yea, it's all very odd.

If you look at the log:

2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL:  No valid 
identifier after ".".
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT:  
SELECT parse_ident('xxx.1020');
...
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File: 
"../src/backend/storage/ipc/pmsignal.c", Line: 329, PID: 5948
abort() has been called
...
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  server process (PID 5948) was 
terminated by exception 0xC0000354
2023-02-08 00:53:27.420 GMT postmaster[872] HINT:  See C include file 
"ntstatus.h" for a description of the hexadecimal value.
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  terminating any other active 
server processes
2023-02-08 00:53:27.434 GMT postmaster[872] LOG:  all server processes 
terminated; reinitializing


and that it's indeed the money test that failed:
      money                        ... FAILED (test process exited with exit 
code 2)     7337 ms

it's very hard to understand how this stack can come to be:

00000085`f03ffa40 00007ff6`fd89faa8     ucrtbased!abort(void)+0x5a 
[minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
00000085`f03ffa80 00007ff6`fd6474dc     postgres!ExceptionalCondition(
                        char * conditionName = 0x00007ff6`fdd03ca8 
"PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED",
                        char * fileName = 0x00007ff6`fdd03c80 
"../src/backend/storage/ipc/pmsignal.c",
                        int lineNumber = 0n329)+0x78 
[c:\cirrus\src\backend\utils\error\assert.c @ 67]
00000085`f03ffac0 00007ff6`fd676eff     
postgres!MarkPostmasterChildActive(void)+0x7c 
[c:\cirrus\src\backend\storage\ipc\pmsignal.c @ 329]
00000085`f03ffb00 00007ff6`fd59aa3a     postgres!InitProcess(void)+0x2ef 
[c:\cirrus\src\backend\storage\lmgr\proc.c @ 375]
00000085`f03ffb60 00007ff6`fd467689     postgres!SubPostmasterMain(
                        int argc = 0n3,
                        char ** argv = 0x000001c6`f3814e80)+0x33a 
[c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
00000085`f03ffd90 00007ff6`fda0e1c9     postgres!main(
                        int argc = 0n3,
                        char ** argv = 0x000001c6`f3814e80)+0x2f9 
[c:\cirrus\src\backend\main\main.c @ 192]

How can a process that we did notify crashing, that has already executed SQL
statements, end up in MarkPostmasterChildActive()?
Maybe it's just the backend started for the money test has got
the same PID (5948) that the backend for the name test had?
A simple script that I've found [1] shows that the pids reused rather often
(for me, approximately each 300 process starts in Windows 10 H2), buy maybe
under some circumstances (many concurrent processes?) PIDs can coincide even
so often to trigger that behavior.

[1] https://superuser.com/questions/636497/does-windows-7-reuse-process-ids


Reply via email to