On 2015-01-12 00:40:50 +0100, Andres Freund wrote: > Fixed in what I've since pushed (as Heikki basically was ok with the > patch sent a couple months back, modulo some fixes)...
I'd not actually pushed that patch... I had pushed some patches (barriers, atomics), but had decided to hold off on this. I've now done so. I've mentioned the portability concerns over select() bugs in the commit message & a comment. ATM I'm not inclined to add a relatively elaborate test for the bug on pretty fringe platforms. Thanks for looking at this! I plan to continue with committing 1) Commonalize process startup code 2) Add a default local latch for use in signal handlers 3) Use a nonblocking socket for FE/BE communication and block using latches pretty soon. As we already seem to assume that WaitLatch() is signal safe/reentrant (c.f. walsender.c), I'm fine with committing 3) in isolation, without 4). I need a test that properly exercises catchup interrupts before committing that. Once I have that test I plan to commit 4) Introduce and use infrastructure for interrupt processing during client reads. I'd like some input from others what they think about the problem that 5) "Process 'die' interrupts while reading/writing from a socket." can reduce the likelihood of clients getting the error message. I personally think that's more than outweighed by not having backends stuck (save quickdie) for a long time when the client is gone/stuck. I think the middleground in the patch to only process die events when actually blocked in writes reduces the likelihood of this sufficiently. I have hacks ontop this to get rid of ImmediateInterrupt alltogether, although I'm not sure how well this will work for some parts of auth/crypt.c. Everything else, including the deadlock checker, seems quite doable. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers