Parallel query vs smart shutdown and Postmaster death

Thomas Munro Sun, 24 Feb 2019 17:14:51 -0800

Hello hackers,

1.  In a nearby thread, I misdiagnosed a problem reported[1] by Justin
Pryzby (though my misdiagnosis is probably still a thing to be fixed;
see next).  I think I just spotted the real problem he saw: if you
execute a parallel query after a smart shutdown has been initiated,
you wait forever in gather_readnext()!  Maybe parallel workers can't
be launched in this state, but we lack code to detect this case?  I
haven't dug into the exact mechanism or figured out what to do about
it yet, and I'm tied up with something else for a bit, but I will come
back to this later if nobody beats me to it.


2.  Commit cfdf4dc4 on the master branch fixed up all known waits that
didn't respond to postmaster death, and added an assertion to that
effect.  One of the cases fixed was in gather_readnext(), and
initially I thought that's what Justin was telling us about (his
report was from 11.x), until I reread his message and saw that it was
SIGTERM and not eg SIGKILL.  I should probably go and back-patch a fix
for that case anyway... but now I'm wondering, was there a reason for
that omission, and likewise for mq_putmessage()?

(Another case of missing PM death detection in the back-branches is
postgres_fdw.)

[1] 
https://www.postgresql.org/message-id/CAEepm%3D0kMunPC0hhuT0VC-5dfMT3K-xsToJHkTznA6yrSARsPg%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com

Parallel query vs smart shutdown and Postmaster death

Reply via email to