Re: [HACKERS] Crash dumps

Craig Ringer Mon, 04 Jul 2011 04:58:06 -0700

On 4/07/2011 7:03 PM, Radosław Smogura wrote:

Actually this, what I was thinking about was, to add dumping of GUC,
etc. List of mappings came from when I tired to mmap PostgreSQL, and due
to many of errors, which sometimes occurred in unexpected places, I was
in need to add something that will be useful for me and easy to analyse
(I could simple find pointer, and then check which region failed). The
idea to try to evolve this come later.

Why not produce a tool that watches the datadir for core files andprocesses them? Most but not all of the info you listed should be ableto be extracted from a core file. Things like GUCs should be extractablewith a bit of gdb scripting - and with much less chance of crashing thantrying to read them from a possibly corrupt heap within a crashing backend.

To capture any information not available from the core, you can enlistthe postmaster's help. It gets notified when a child crashes and shouldbe able to capture things like the memory and disk state. See voidreaper(SIGNAL_ARGS) in postmaster.c and HandleChildCrash(...) . Ifnothing else, the postmaster could probably fork a "child crashed"helper to collect data, analyse the core file, email the report to theadmin, etc.

About the only issue there is that the postmaster relies on the exitstatus to trigger the reaper code. Once an exit status is available, thecrashed process is gone, so the free memory will reflect the memorystate after the backend dies, and shared memory's state will have movedon from how it was when the backend was alive.

For that reason, it'd be handy if a backend could trap SIGSEGV andreliably tell the postmaster "I'm crashing!" so the postmaster couldfork a helper to capture any additional info the backend needs to bealive for. Then the helper can gcore() the backend, or the backend canjust clear the SIGSEGV handler and kill(11) its self to keep on crashingand generate a core.

Unfortunately, "reliably" and "segfault" don't go together. You don'twant a crashing postmaster writing to shared memory so it can't use shmto tell the postmaster it's dying. Signals are ... interesting ... atthe best of times, but would probably still be the best bet. Thepostmaster could install a SIGUSR[whatever] or RT signal handler thattakes a siginfo so it knows the pid of the signal sender. The crashingbackend could signal the postmaster with an agreed signal to say "I'mcrashing" and let the postmaster clean it up. The problem with this isthat a lost signal (for any reason) would cause a zombie backend to hangaround waiting to be killed by a postmaster that never heard it wascrashing.

BTW, the win32 crash dump handler would benefit from being able to usesome of the same facilities. In particular, being able to tell thepostmaster "Argh, ogod I'm crashing, fork something to dump my core!"rather than trying to self-dump would be great. It'd also allow theaddition of extra info like GUC data, last few lines of logs etc to theminidump, something that the win32 crash dump handler cannot currentlydo safely.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Crash dumps

Reply via email to