Hello,
18.02.2023 04:06, Andres Freund wrote:
Hi,
On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
I still have no theory for how this condition was reached despite a
lot of time thinking about it and searching for more clues. As far as
I can tell, the recent improvements to postmaster's signal and event
handling shouldn't be related: the state management and logic was
unchanged.
Yea, it's all very odd.
If you look at the log:
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL: No valid
identifier after ".".
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT:
SELECT parse_ident('xxx.1020');
...
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), File:
"../src/backend/storage/ipc/pmsignal.c", Line: 329, PID: 5948
abort() has been called
...
2023-02-08 00:53:27.420 GMT postmaster[872] LOG: server process (PID 5948) was
terminated by exception 0xC0000354
2023-02-08 00:53:27.420 GMT postmaster[872] HINT: See C include file
"ntstatus.h" for a description of the hexadecimal value.
2023-02-08 00:53:27.420 GMT postmaster[872] LOG: terminating any other active
server processes
2023-02-08 00:53:27.434 GMT postmaster[872] LOG: all server processes
terminated; reinitializing
and that it's indeed the money test that failed:
money ... FAILED (test process exited with exit
code 2) 7337 ms
it's very hard to understand how this stack can come to be:
00000085`f03ffa40 00007ff6`fd89faa8 ucrtbased!abort(void)+0x5a
[minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
00000085`f03ffa80 00007ff6`fd6474dc postgres!ExceptionalCondition(
char * conditionName = 0x00007ff6`fdd03ca8
"PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED",
char * fileName = 0x00007ff6`fdd03c80
"../src/backend/storage/ipc/pmsignal.c",
int lineNumber = 0n329)+0x78
[c:\cirrus\src\backend\utils\error\assert.c @ 67]
00000085`f03ffac0 00007ff6`fd676eff
postgres!MarkPostmasterChildActive(void)+0x7c
[c:\cirrus\src\backend\storage\ipc\pmsignal.c @ 329]
00000085`f03ffb00 00007ff6`fd59aa3a postgres!InitProcess(void)+0x2ef
[c:\cirrus\src\backend\storage\lmgr\proc.c @ 375]
00000085`f03ffb60 00007ff6`fd467689 postgres!SubPostmasterMain(
int argc = 0n3,
char ** argv = 0x000001c6`f3814e80)+0x33a
[c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
00000085`f03ffd90 00007ff6`fda0e1c9 postgres!main(
int argc = 0n3,
char ** argv = 0x000001c6`f3814e80)+0x2f9
[c:\cirrus\src\backend\main\main.c @ 192]
How can a process that we did notify crashing, that has already executed SQL
statements, end up in MarkPostmasterChildActive()?
Maybe it's just the backend started for the money test has got
the same PID (5948) that the backend for the name test had?
A simple script that I've found [1] shows that the pids reused rather often
(for me, approximately each 300 process starts in Windows 10 H2), buy maybe
under some circumstances (many concurrent processes?) PIDs can coincide even
so often to trigger that behavior.
[1] https://superuser.com/questions/636497/does-windows-7-reuse-process-ids