Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Alvaro Herrera <alvhe...@alvh.no-ip.org> writes: > > On 2020-Oct-29, Stephen Frost wrote: > >> I do think it'd be good to find a way to check every once in a while > >> even when we aren't going to delay though. Not sure what the best > >> answer there is. > > > Maybe instead of thinking specifically in terms of vacuum, we could > > count buffer accesses (read from kernel) and check the latch once every > > 1000th such, or something like that. Then a very long query doesn't > > have to wait until it's run to completion. The cost is one integer > > addition per syscall, which should be bearable. > > I'm kind of unwilling to add any syscalls at all to normal execution > code paths for this purpose. People shouldn't be sig-kill'ing the > postmaster, or if they do, cleaning up the mess is their responsibility. > I'd also suggest that adding nearly-untestable code paths for this > purpose is a fine way to add bugs we'll never catch.
Not sure if either is at all viable, but I had a couple of thoughts about other ways to possibly address this. The first simplistic idea is this- we have lots of processes that pick up pretty quickly on the postmaster going away due to checking if it's still around while waiting for something else to happen anyway (like the autovacuum launcher...), and we have CFI's in a lot of places where it's reasonable to do a CFI but isn't alright to check for postmaster death. While it'd be better if there were more platforms where parent death would send a signal to the children, that doesn't seem to be coming any time soon- so why don't we do it ourselves? That is, when we discover that the postmaster has died, scan through the proc array (carefully, since it could be garbage, but all we're looking for are the PIDs of anything that might still be around) and try sending a signal to any processes that are left? Those signals would hopefully get delivered and the other backends would discover the signal through CFI and exit reasonably quickly. The other thought I had was around trying to check for postmaster death when we're about to do some I/O, which would probably catch a large number of these cases too though technically some process might stick around for a while if it's only dealing with things that are already in shared buffers, I suppose. Also seems complicated and expensive to do. > The if-we're-going-to-delay-anyway path in vacuum_delay_point seems > OK to add a touch more overhead to, though. Yeah, this certainly seems reasonable to do too and on a well run system would likely be enough 90+% of the time. Thanks, Stephen
signature.asc
Description: PGP signature