On 22/11/2010 7:37 PM, Magnus Hagander wrote:

Finally getting to looking at this one - sorry about the very long delay.

Ditto, I'm afraid.

I agree with Heikki's earlier comment that it's better to have this
included in the backend - but that's obviously not going to happen for
already-released versions. I'd therefor advocate a contrib module for
existing versions, and then in core for 9.1+.

I think that was the conclusion the following discussion came to, as well. It's easy to distribute a binary .dll for older versions and if the EDB folks found this facility really useful they could even bundle it in the installer. There's no point backpatching it given how few win32 users will build from source.

We should then have an option to turn it off (on by default). But we
don't need to pay the overhead on every backend startup - we could
just map the value into the parameter shared memory block, and require
a full postmaster restart in order to change it.

As discussed later, implemented by testing for a 'crashdumps' directory. If the directory exists, a dump is written. If the directory is missing a log message is emitted to explain why no dump was written; as this'll only happen when a backend dies, log noise isn't an issue and it'll help save sysadmin head-scratching given the "magic" behaviour of turning on/off based on directory existence.

Do we want to backpatch it into contrib/? Adding a new module there
seems kind of wrong - probably better to keep the source separate and
just publish the DLL files for people who do debugging?

+1 to just publishing the DLLs

Looking at the code:
* Why do we need to look at differnt versions of dbghelp.dll? Can't we
always use the one with Windows? And in that case, can't we just use
compile-time linking, do we need to bother with DLL loading at all?

Newer versions than those shipped with an OS release contain updated and improved functionality as well as bug fixes. We *can* use the version shipped with the OS, as even in XP dbghelp.dll contains the required functionality, but it'd be better to bundle it. That's microsoft's recommendation when you use dbghelp.dll .

* Per your comment about using elog() - a good idea in that case is to
use write_stderr(). That will send the output to stderr if there is
one, and otherwise the eventlog (when running as service).

Hm, ok. The attached patch doesn't do that yet. Given the following comments, do you think the change still necessary?

* And yes, in the crash handler, we should *definitely* not use elog().

Currently I'm actually using it, but only after the dump has been written or we've decided not to write a dump. At this point, if elog() fails we don't really care, we'd just be returning control to the OS's fatal exception cleanup anyway. It's IMO desirable for output to go to the same log as everything else, so admins can find out what's going on and can associate the crash dump files with events that happened around a certain time.

* On Unix, the core file is dropped in the database directory, we
don't have a separate directory for crashdumps. If we want to be
consistent, we should do that here too. I do think that storing them
in a directory like "crashdumps" is better, but I just wanted to raise
the comment.

Given the recent discussion about using the dir as an on/off switch, that'll be necessary.

* However, when storing it in crashdumps, I think the code would need
to create that directory if it does not exist, doesn't it?

Not now that we're using it an on/off switch, but of course it would've made sense otherwise.

* Right now, we overwrite old crashdumps. It is well known that PIDs
are recycled pretty quickly on Windows - should we perhaps dump as
postgres-<pid>-<sequence>.mdmp when there is a filename conflict?

Good idea. Rather than try to test for existing files I've just taken the simple route (always good in code run during process crash) and appended the system ticks to the dump name. System ticks are a 32-bit quantity that wraps about every 40 days, so they should be easily unique enough in combination with the pid. The logs will report the crashdump filenames unless the backend is too confused to be able to elog(), and all the crash dump files have file system timestamps so there isn't much point trying to format a crash date/time into their names.

I could use wall clock seconds (or ms) instead, but favour system ticks because they can be read with a simple int-returning call that doesn't require any more stack allocation. On Windows, getting wall clock time in seconds seems to involve a call to GetSystemTimeAsFileTime (http://msdn.microsoft.com/en-us/library/ms724397(v=VS.85).aspx) to populate a struct containing a fake-64-bit int as two 32-bit ints. It's not clear that it's free of timezone calculations and generally looks like something better avoided in a crash handler if it's not necessary. As GetSystemTicks seems quite sufficient, I thought I'd stick with it.

--
Craig Ringer
>From 989a833a3d3657e5587372b5fe20740bc129df92 Mon Sep 17 00:00:00 2001
From: unknown <cr...@.(none)>
Date: Mon, 6 Dec 2010 11:57:37 +0800
Subject: [PATCH] Crash dump support for win32

Create a crash dump when a backend fails, so it can be debugged after backend
death or even on another machine.
---
 src/backend/main/main.c            |    7 ++
 src/backend/port/win32/Makefile    |    2 +-
 src/backend/port/win32/crashdump.c |  202 ++++++++++++++++++++++++++++++++++++
 src/include/port.h                 |   13 +++
 src/include/postgres.h             |    4 +
 5 files changed, 227 insertions(+), 1 deletions(-)
 create mode 100644 src/backend/port/win32/crashdump.c

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 6cb70f2..45a25f4 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -79,6 +79,13 @@ main(int argc, char *argv[])
        argv = save_ps_display_args(argc, argv);
 
        /*
+        * If supported on the current platform, set up a handler to be called 
if
+        * the backend/postmaster crashes with a fatal signal or exception.
+        * See port/crashdump.c
+        */
+       installCrashDumpHandler();
+
+       /*
         * Set up locale information from environment.  Note that LC_CTYPE and
         * LC_COLLATE will be overridden later from pg_control if we are in an
         * already-initialized database.  We set them here so that they will be
diff --git a/src/backend/port/win32/Makefile b/src/backend/port/win32/Makefile
index 8bf9f74..d00c334 100644
--- a/src/backend/port/win32/Makefile
+++ b/src/backend/port/win32/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/port/win32
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = timer.o socket.o signal.o security.o mingwcompat.o
+OBJS = timer.o socket.o signal.o security.o mingwcompat.o crashdump.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/port/win32/crashdump.c 
b/src/backend/port/win32/crashdump.c
new file mode 100644
index 0000000..a5629ea
--- /dev/null
+++ b/src/backend/port/win32/crashdump.c
@@ -0,0 +1,202 @@
+/*-------------------------------------------------------------------------
+ *
+ * crashdump.c
+ *       Automatic crash dump creation for PostgreSQL on Windows
+ *
+ * The crashdump module traps unhandled exceptions produced by the backend
+ * the module is loaded in, and tries to produce a Windows MiniDump crash
+ * dump for later debugging and analysis. The machine performing the dump
+ * doesn't need any special debugging tools; the user only needs to send
+ * the dump to somebody who has the same version of PostgreSQL and has 
debugging
+ * tools.
+ *
+ * crashdump module originally by Craig Ringer <ring...@ringerc.id.au>
+ *
+ * LIMITATIONS:
+ * ============
+ * This *won't* work in hard OOM situations or stack overflows.
+ * 
+ * For those, it'd be necessary to take a much more complicated approach where
+ * the handler switches to a new stack (if it can) and forks a helper process
+ * to debug its self. That's in the too hard basket as far as I'm concerned;
+ * this approach will get 90% of the results with 10% of the work.
+ *
+ * POSSIBLE FUTURE WORK:
+ * =====================
+ * For bonus points, the crash dump format permits embedding of user-supplied 
data.
+ * If there's anything else (postgresql.conf? Last few lines of a log file?) 
that
+ * should always be supplied with a crash dump, it could potentially be added,
+ * though at the cost of a greater chance of the crash dump failing. Again, 
+ * I'm not going to tackle that, but thought it worth mentioning in case
+ * someone wants it down the track.
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#define VC_EXTRALEAN
+#include <windows.h>
+#include <string.h>
+#include <dbghelp.h>
+
+/* 
+ * Much of the following code is based on CodeProject and MSDN examples,
+ * particularly 
http://www.codeproject.com/KB/debug/postmortemdebug_standalone1.aspx
+ *
+ * Useful MSDN articles:
+ *
+ * http://msdn.microsoft.com/en-us/library/ff805116(v=VS.85).aspx
+ * http://msdn.microsoft.com/en-us/library/ms679294(VS.85).aspx
+ * http://msdn.microsoft.com/en-us/library/ms679291(VS.85).aspx
+ * http://msdn.microsoft.com/en-us/library/ms680519(v=VS.85).aspx
+ * http://msdn.microsoft.com/en-us/library/ms680360(VS.85).aspx
+ * 
+ * Other useful articles on working with minidumps:
+ * http://www.debuginfo.com/articles/effminidumps.html
+ */
+
+typedef BOOL (WINAPI *MINIDUMPWRITEDUMP)(HANDLE hProcess, DWORD dwPid, HANDLE 
hFile, MINIDUMP_TYPE DumpType,
+                                                                       CONST 
PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
+                                                                       CONST 
PMINIDUMP_USER_STREAM_INFORMATION UserStreamParam,
+                                                                       CONST 
PMINIDUMP_CALLBACK_INFORMATION CallbackParam
+                                                                       );
+
+/*
+ * To perform a crash dump, we need to load dbghelp.dll and find the
+ * MiniDumpWriteDump function. If possible, the copy of dbghelp.dll
+ * shipped with PostgreSQL is loaded; failing that, the system copy will
+ * be used.
+ *
+ * This function is called in crash handler context. It can't trust that much
+ * of the backend is working.
+ *
+ * If NULL is returned, loading failed and the crash dump handler should
+ * not try to continue.
+ */
+static MINIDUMPWRITEDUMP
+loadDbgHelp()
+{
+       // Use the dbghelp.dll shipped with Pg if we can find and load
+       // it, otherwise fall back to the copy installed in Windows.
+       HMODULE hDll = NULL;
+       MINIDUMPWRITEDUMP pDump = NULL;
+       char dbgHelpPath[_MAX_PATH];
+       if (GetModuleFileName( NULL, dbgHelpPath, _MAX_PATH ))
+       {
+               char *slash = strrchr( dbgHelpPath, '\\' );
+               if (slash)
+               {
+                       strcpy( slash+1, "DBGHELP.DLL" );
+                       hDll = LoadLibrary( dbgHelpPath );
+               }
+       }
+       if (hDll==NULL)
+       {
+               /*
+                * Load whichever version of dbghelp.dll we can find.
+                *
+                * It is safe to call LoadLibrary with an unqualified name 
(thus searching
+                * the PATH) only because the postmaster ensures that backends 
run in a sensible
+                * environment with the datadir as the working directory. It's 
usually unsafe to
+                * LoadLibrary("unqualifiedname.dll") because the user can run 
your program with
+                * a CWD containing a malicious copy of "unqualifiedname.dll" 
thanks to the way
+                * windows (rather insecurely) includes the CWD in the PATH by 
default.
+                */
+               hDll = LoadLibrary( "DBGHELP.DLL" );
+       }
+       if (hDll!=NULL)
+       {
+               pDump = (MINIDUMPWRITEDUMP)GetProcAddress( hDll, 
"MiniDumpWriteDump" );
+       }
+       return pDump;
+}
+
+/*
+ * This function is the exception handler passed to 
SetUnhandledExceptionFilter. It's invoked
+ * only if there's an unhandled exception. The handler will use dbghelp.dll to 
generate a crash
+ * dump, then resume the normal unhandled exception process, which will 
generally exit with a
+ * an error message from the runtime.
+ *
+ * This function is run under the unhandled exception handler, effectively
+ * in a crash context, so it should be careful with memory and avoid using
+ * any PostgreSQL API or other functions that use PostgreSQL API. For that 
reason,
+ * logging calls should not be made until after the dump has been written
+ * successfully.
+ * 
+ */
+static LONG WINAPI
+crashDumpHandler(struct _EXCEPTION_POINTERS *pExceptionInfo)
+{
+       /* We only write crash dumps if the 'crashdumps' directory within
+        * the postgres data directory exists. That's the documented way to
+        * turn crash dumps on and off. */
+       DWORD attribs = GetFileAttributesA("crashdumps");
+       if (attribs != INVALID_FILE_ATTRIBUTES && (attribs & 
FILE_ATTRIBUTE_DIRECTORY) )
+       {
+               /* 'crashdumps' exists and is a directory. Try to write a dump' 
*/
+               MINIDUMPWRITEDUMP pDump = NULL;
+               char dumpPath[_MAX_PATH];
+               
+               // Dump pretty much everything except shared memory, code 
segments, and memory mapped files
+               MINIDUMP_TYPE dumpType = MiniDumpNormal|\
+                                        
MiniDumpWithIndirectlyReferencedMemory|\
+                                        MiniDumpWithHandleData|\
+                                        MiniDumpWithThreadInfo|\
+                                        MiniDumpWithPrivateReadWriteMemory|
+                                        MiniDumpWithDataSegs;
+
+               HANDLE selfProcHandle = GetCurrentProcess();
+               DWORD selfPid = GetProcessId(selfProcHandle);
+               HANDLE dumpFile;
+               DWORD systemTicks;
+
+               struct _MINIDUMP_EXCEPTION_INFORMATION ExInfo;
+               ExInfo.ThreadId = GetCurrentThreadId();
+               ExInfo.ExceptionPointers = pExceptionInfo;
+               ExInfo.ClientPointers = FALSE;
+
+               pDump = loadDbgHelp();
+               if (pDump==NULL)
+               {
+                       elog(WARNING, "crashdump: Cannot write dump, failed to 
load dbghelp.dll");
+                       return EXCEPTION_CONTINUE_SEARCH;
+               }
+
+               systemTicks = GetTickCount();
+               snprintf(&dumpPath[0], _MAX_PATH, 
"crashdumps\\postgres-pid%0i-%0i.mdmp", selfPid, systemTicks);
+               dumpPath[_MAX_PATH-1] = '\0';
+               dumpFile = CreateFile( dumpPath, GENERIC_WRITE, 
FILE_SHARE_WRITE,\
+                               NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 
NULL );
+               if (dumpFile==INVALID_HANDLE_VALUE)
+               {
+                       elog(WARNING, "crashdump: Unable to open dump file %s 
for writing (win32 error %i)",
+                                       &dumpPath[0], GetLastError());
+                       elog(DEBUG1, "crashdump: is there a 'crashdump' 
directory within the data dir, and is it writable by the postgres user?");
+                       return EXCEPTION_CONTINUE_SEARCH;
+               }
+
+               if( (*pDump)( selfProcHandle, selfPid, dumpFile, dumpType, 
&ExInfo, NULL, NULL ) )
+               {
+                       elog(WARNING,"crashdump: wrote crash dump to %s", 
&dumpPath[0]);
+               }
+               else
+               {
+                       elog(WARNING,"crashdump: failed to write dump file to 
%s (win32 error %i)",
+                                       &dumpPath[0], GetLastError());
+               }
+               CloseHandle(dumpFile);
+       }
+       else
+       {
+               elog(WARNING, "crashdump: Not writing dump, output directory 
'crashdumps' missing or not writable.");
+       }
+       return EXCEPTION_CONTINUE_SEARCH;
+}
+
+void
+installCrashDumpHandler()
+{
+       // http://msdn.microsoft.com/en-us/library/ms680634(VS.85).aspx
+       SetUnhandledExceptionFilter( crashDumpHandler );
+}
diff --git a/src/include/port.h b/src/include/port.h
index 0cfbed8..b3fc730 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -441,4 +441,17 @@ extern int pg_get_encoding_from_locale(const char *ctype);
 extern char *inet_net_ntop(int af, const void *src, int bits,
                          char *dst, size_t size);
 
+#if defined(WIN32)
+       /* For platforms that have a crashdump handler in 
src/port/$platform/crashdump.c,
+        * reference it. */
+       extern void
+       installCrashDumpHandler();
+#else
+       /* Provide a no-op inline function for platforms with no handler */
+       inline void
+       installCrashDumpHandler()
+       {
+       }
+#endif
+
 #endif   /* PG_PORT_H */
diff --git a/src/include/postgres.h b/src/include/postgres.h
index b8b8037..7897bbb 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -687,4 +687,8 @@ extern int ExceptionalCondition(const char *conditionName,
                                         const char *errorType,
                                         const char *fileName, int lineNumber);
 
+
+/* Install a platform-specific crash dump handler routine from 
port/crashdump.c */
+extern void installCrashDumpHandler();
+
 #endif   /* POSTGRES_H */
-- 
1.7.0.2.msysgit.0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to