On Tue, Sep 6, 2011 at 6:05 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Tue, Sep 6, 2011 at 5:34 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> And I doubt >>> that the goal is worth taking risks for. > >> I am unable to count the number of times that I have had a customer >> come to me and say "well, the backend crashed". And I go look at >> their logs and I have no idea what happened. > > gdb and print debug_query_string?
Surely you're kidding. These are customer systems which I frequently don't even have access to. They don't always have gdb installed (sometimes they are Windows systems) and if they do the customer isn't likely to know how to use it, and even if they do they don't think the better of us for needing such a tool to troubleshoot a crash. Even if none of that were an issue, gdb is only going to work if you attach it before the crash or have a core dump available. Typically you don't know the crash is going to happen and core dumps aren't enabled anyway. > I don't dispute that this would be nice to have. But I don't think that > it's sane to compromise the postmaster's reliability in order to print > information of doubtful accuracy. In practice, I think very few crashes will clobber it. A lot of crashes are going to be caused by a null pointer deference in some random part of the program, an assertion failure, the OOM killer, etc. It's certainly POSSIBLE that it could get clobbered, but it shouldn't be very likely; and as Marti says, with proper defensive coding, the worst case scenario if it does happen should be some log garbage. > If you want to do something that doesn't violate the system's basic > design goals, think about setting up a SIGSEGV handler that tries to > print debug_query_string via elog before crashing. It might well crash > too, but it won't be risking taking out more of the database with it. I don't think that's adequate. You need to trap a lot more than just SIGSEGV to catch all the crashes - there's also SIGABRT and SIGILL and a bunch of other ones, including SIGKILL. I think you really, really need something that executes outside the context of the dying process. TBH, I'm very unclear what could cause the postmaster to go belly-up copying a bounded amount of data out of shared memory for logging purposes only. It's surely possible to make the code safe against any sequence of bytes that might be found there. The only real danger seems to be that the memory access itself might trigger a segmentation fault of some sort - but how is that going to happen? The child can't unmap the address space in the parent, can it? If it's a real danger, perhaps we could fork off a dedicated child process just to read the relevant portion of shared memory and emit a log message - but I'm not seeing what plausible scenario that would guard against. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers