Hi,

On 2023-02-18 13:27:04 +1300, Thomas Munro wrote:
> I still have no theory for how this condition was reached despite a
> lot of time thinking about it and searching for more clues.  As far as
> I can tell, the recent improvements to postmaster's signal and event
> handling shouldn't be related: the state management and logic was
> unchanged.

Yea, it's all very odd.

If you look at the log:

2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name DETAIL:  No 
valid identifier after ".".
2023-02-08 00:53:20.175 GMT client backend[5948] pg_regress/name STATEMENT:  
SELECT parse_ident('xxx.1020');
...
TRAP: failed Assert("PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED"), 
File: "../src/backend/storage/ipc/pmsignal.c", Line: 329, PID: 5948
abort() has been called
...
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  server process (PID 5948) was 
terminated by exception 0xC0000354
2023-02-08 00:53:27.420 GMT postmaster[872] HINT:  See C include file 
"ntstatus.h" for a description of the hexadecimal value.
2023-02-08 00:53:27.420 GMT postmaster[872] LOG:  terminating any other active 
server processes
2023-02-08 00:53:27.434 GMT postmaster[872] LOG:  all server processes 
terminated; reinitializing


and that it's indeed the money test that failed:
     money                        ... FAILED (test process exited with exit 
code 2)     7337 ms

it's very hard to understand how this stack can come to be:

00000085`f03ffa40 00007ff6`fd89faa8     ucrtbased!abort(void)+0x5a 
[minkernel\crts\ucrt\src\appcrt\startup\abort.cpp @ 77]
00000085`f03ffa80 00007ff6`fd6474dc     postgres!ExceptionalCondition(
                        char * conditionName = 0x00007ff6`fdd03ca8 
"PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED", 
                        char * fileName = 0x00007ff6`fdd03c80 
"../src/backend/storage/ipc/pmsignal.c", 
                        int lineNumber = 0n329)+0x78 
[c:\cirrus\src\backend\utils\error\assert.c @ 67]
00000085`f03ffac0 00007ff6`fd676eff     
postgres!MarkPostmasterChildActive(void)+0x7c 
[c:\cirrus\src\backend\storage\ipc\pmsignal.c @ 329]
00000085`f03ffb00 00007ff6`fd59aa3a     postgres!InitProcess(void)+0x2ef 
[c:\cirrus\src\backend\storage\lmgr\proc.c @ 375]
00000085`f03ffb60 00007ff6`fd467689     postgres!SubPostmasterMain(
                        int argc = 0n3, 
                        char ** argv = 0x000001c6`f3814e80)+0x33a 
[c:\cirrus\src\backend\postmaster\postmaster.c @ 4962]
00000085`f03ffd90 00007ff6`fda0e1c9     postgres!main(
                        int argc = 0n3, 
                        char ** argv = 0x000001c6`f3814e80)+0x2f9 
[c:\cirrus\src\backend\main\main.c @ 192]

How can a process that we did notify crashing, that has already executed SQL
statements, end up in MarkPostmasterChildActive()?



> While failing to understand this, I worked[1] on CI log indexing tool
> with public reports that highlight this sort of thing[2], so I'll be
> watching out for more evidence.  Unfortunately I have no data from
> before 1 Feb (cfbot previously wasn't interested in the past at all;
> I'd need to get my hands on the commit IDs for earlier testing but I
> can't figure out how to get those out of Cirrus or Github -- anyone
> know how?).  FWIW I have a thing I call bfbot for slurping up similar
> data from the build farm.  It's not pretty enough for public
> consumption, but I do know that this assertion hasn't failed there,
> except the cases I mentioned earlier, and a load of failures on
> lorikeet which was completely b0rked until recently.

> [1] https://xkcd.com/974/
> [2] http://cfbot.cputube.org/highlights/assertion-90.html

I think this extremely useful.

Greetings,

Andres Freund


Reply via email to