Hi,

On 2021-10-06 14:11:51 +0800, Craig Ringer wrote:
> On Wed, 6 Oct 2021, 03:30 Andres Freund, <and...@anarazel.de> wrote:
>
> > Hi,
> >
> >
> > My first attempt was to try to use the existing crashdump stuff in
> > pgwin32_install_crashdump_handler(). That's not really quite what I want,
> > because it only handles postmaster rather than any binary, but I thought
> > it'd
> > be a good start. But outside of toy situations it didn't work for me.
> >
>
> Odd. It usually has for me, and definitely not limited to the postmaster.
> But it will fall down for OOM, smashed stack, and other states where
> in-process self-debugging is likely to fail.

I think it's a question of running debug vs optimized builds. At least in
debug builds it doesn't appear to work because the debug c runtime abort
preempts it.


> I patch this out when working on windows because it's a real pain.
>
> I keep meaning to propose that we remove this functionality entirely. It's
> obsolete. It was introduced back in the days where DrWatson.exe "windows
> error reporting") used to launch an interactive prompt asking the user what
> to do when a process crashed. This would block the crashed process from
> exiting, making everything grind to a halt until the user interacted with
> the
> UI. Even for a service process.

> Not fun on a headless or remote server.

Yea, the way we do it right now is definitely not helpful. Especially because
it doesn't actually prevent the "hang" issue - the CRT boxes at least cause
precisely such stalls.

We've had a few CI hangs due to such errors.


> These days Windows handles all this a lot more sensibly, and blocking crash
> reporting is quite obsolete and unhelpful.

>From what I've seen it didn't actually get particularly sensible, just
different and more complicated.

>From what I've seen one needs at least:
- _set_abort_behavior(_CALL_REPORTFAULT | _WRITE_ABORT_MSG)
- _set_error_mode(_OUT_TO_STDERR)
- _CrtSetReportMode(_CRT_ASSERT/ERROR, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG)
- SetErrorMode(SEM_FAILCRITICALERRORS)

There's many things this hocuspocus can be called, but sensible isn't among my
word choices for it.


> I'd like to just remove it.

I think we need to remove the SEM_NOGPFAULTERRORBOX, but I don't think we can
remove the SEM_FAILCRITICALERRORS, and I think we need the rest of the above
to prevent error boxes from happening.


I think we ought to actually apply these to all binaries, not just
postgres. One CI hung was due to psql asserting. But there's currently no easy
hook point for all binaries afaict. If we were to introduce something it
should probably be in pgport? But I'd defer to that to a later step.


I've attached a patch implementing these changes.

I also made one run with that intentionally fail (with an Assert(false)), and
with the changes the debugger is invoked and creates a backtrace etc:
https://cirrus-ci.com/task/5447300246929408
(click on crashlog-> at the top)

Greetings,

Andres Freund
>From 4979447881836c86f0bf24a164d3ca920323f9eb Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Thu, 9 Sep 2021 17:49:39 -0700
Subject: [PATCH v6 1/2] windows: Improve crash / assert / exception handling.

---
 src/backend/main/main.c | 48 +++++++++++++++++++++++++++++++++++++++--
 .cirrus.yml             |  6 +++++-
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 9124060bde7..111e7867cc7 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -22,6 +22,10 @@
 
 #include <unistd.h>
 
+#if defined(WIN32)
+#include <crtdbg.h>
+#endif
+
 #if defined(__NetBSD__)
 #include <sys/param.h>
 #endif
@@ -237,8 +241,48 @@ startup_hacks(const char *progname)
 			exit(1);
 		}
 
-		/* In case of general protection fault, don't show GUI popup box */
-		SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
+		/*
+		 * By default abort() only generates a crash-dump in *non* debug
+		 * builds. As our Assert() / ExceptionalCondition() uses abort(),
+		 * leaving the default in place would make debugging harder.
+		 */
+#if !defined(__MINGW32__) && !defined(__MINGW64__)
+		_set_abort_behavior(_CALL_REPORTFAULT | _WRITE_ABORT_MSG,
+							_CALL_REPORTFAULT | _WRITE_ABORT_MSG);
+#endif /* MINGW */
+
+		/*
+		 * SEM_FAILCRITICALERRORS causes more errors to be reported to
+		 * callers.
+		 *
+		 * We used to also specify SEM_NOGPFAULTERRORBOX, but that prevents
+		 * windows crash reporting from working. Which includes registered
+		 * just-in-time debuggers. Now we try to disable sources of popups
+		 * separately below (note that SEM_NOGPFAULTERRORBOX didn't actually
+		 * prevent all sources of such popups).
+		 */
+		SetErrorMode(SEM_FAILCRITICALERRORS);
+
+		/*
+		 * Show errors on stderr instead of popup box (note this doesn't
+		 * affect errors originating in the C runtime, see below).
+		 */
+		_set_error_mode(_OUT_TO_STDERR);
+
+		/*
+		 * In DEBUG builds, errors, including assertions, C runtime errors are
+		 * reported via _CrtDbgReport. By default such errors are displayed
+		 * with a popup (even with NOGPFAULTERRORBOX), preventing forward
+		 * progress. Instead report such errors stderr (and the
+		 * debugger). This is C runtime specific and thus the above
+		 * incantations aren't sufficient to suppress these popups.
+		 */
+		_CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
+		_CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR);
+		_CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
+		_CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
+		_CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE | _CRTDBG_MODE_DEBUG);
+		_CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
 
 #if defined(_M_AMD64) && _MSC_VER == 1800
 
diff --git a/.cirrus.yml b/.cirrus.yml
index 19b3737fa11..89fbff14abb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -423,7 +423,11 @@ task:
     - cd src/tools/msvc
     - perl vcregress.pl ecpgcheck
 
-  on_failure: *on_failure
+  on_failure:
+    <<: *on_failure
+    crashlog_artifacts:
+      path: "crashlog-**.txt"
+      type: text/plain
 
 
 task:
-- 
2.34.0

Reply via email to