On 18 December 2017 at 10:05, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Dec 14, 2017 at 9:34 PM, Craig Ringer <cr...@2ndquadrant.com>
> wrote:
> > On 15 December 2017 at 09:24, Greg Stark <st...@mit.edu> wrote:
> >> Another simpler option would be to open up a new file in the log
> >> directory
> >
> > ... if we have one.
> >
> > We might be logging to syslog, or whatever else.
> >
> > I'd rather keep it simple(ish).
>
> +1.  I still think just printing it to the log is fine.
>
>
Here we go. Implemented pretty much as outlined above. A new
pg_diag_backend(pid) function sends a new
ProcSignalReason PROCSIG_DIAG_REQUEST. It's handled by
CHECK_FOR_INTERRUPTS() and logs MemoryContextStats() output to elog(...).

I didn't want to mess with the MemoryContextMethods and expose a
printf-wrapper style typedef in memnodes.h, so I went with a hook global.
It's a diagnostic routine so I don't think that's going to be a great
bother. By default it's set to write_stderr. That just writes to vfprintf
on unix so the outcome's the same as our direct use of fprintf now.

On Windows, though, using write_stderr will make Pg attempt to write memory
context dumps to the eventlog with ERROR level if running as a service with
no console. That means we vsnprintf the string into a static buffer first.
I'm not 100% convinced of the wisdom of that because it could flood the
event log, which is usually size limited by # of events and recycled. It'd
be better to write one event than write one per memory context line, but
it's not safe to do that when OOM. I lean toward this being a win: at least
Windows users should have some hope of finding out why Pg OOM'd, which
currently they don't when it runs as a service. If we don't, we should look
at using SetStdHandle to write stderr to a secondary logfile instead.

I'm happy to just add a trivial vfprintf wrapper so we preserve exactly the
same behaviour instead, but I figured I'd start with reusing write_stderr.

I'd really like to preserve the writing to elog(...) not fprintf, because
on some systems simply finding where stderr is written can be a challenge,
if it's not going to /dev/null via some detached session. Is it in
journald? In some separate log? Being captured by the logging collector
(and probably silently eaten)? Who knows!

Using elog(...) provides a log_line_prefix and other useful info to
associate the dump with the process of interest and what it's doing at the
time.

Also, an astonishing number of deployed systems I've worked with actually
don't put the pid or anything useful in log_line_prefix to make grouping up
multi-line output practical. Which is insane. But it's only PGC_SIGHUP so
fixing it is easy enough.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 663e31ffbe471b5dac18ff6322c0ea45a948350a Mon Sep 17 00:00:00 2001
From: Craig Ringer <cr...@2ndquadrant.com>
Date: Tue, 19 Dec 2017 20:47:36 +0800
Subject: [PATCH v1] Add pg_diag_backend to print memory context info

The new pg_diag_backend(pid) function signals a backend to dump
a summary of its memory context state when it next calls
CHECK_FOR_INTERRUPTS(). The memory context information is
output to the server log.

This works on any backend that uses the standard
procsignal_sigusr1_handler or explicitly handles PROCSIG_DIAG_REQUEST in
its own handler. Currently this includes normal user backends, the
walsender and autovacuum workers but not the other system processes.
Background workers must handle it explicitly handle SIGUSR1 with
procsignal_sigusr1_handler.
---
 doc/src/sgml/bgworker.sgml              | 10 ++++-
 doc/src/sgml/func.sgml                  | 13 ++++++
 src/backend/storage/ipc/procsignal.c    |  3 ++
 src/backend/tcop/postgres.c             | 70 +++++++++++++++++++++++++++++++++
 src/backend/utils/adt/misc.c            | 16 ++++++++
 src/backend/utils/init/globals.c        |  1 +
 src/backend/utils/mmgr/aset.c           |  4 +-
 src/backend/utils/mmgr/mcxt.c           | 10 +++--
 src/include/catalog/pg_proc.h           |  2 +
 src/include/miscadmin.h                 |  4 ++
 src/include/storage/procsignal.h        |  5 ++-
 src/include/utils/memutils.h            |  4 ++
 src/test/regress/sql/misc_functions.sql |  9 +++++
 13 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 4bc2b69..50bb54f 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -185,7 +185,8 @@ typedef struct BackgroundWorker
    when the process is started or exits.  It should be 0 for workers registered
    at postmaster startup time, or when the backend registering the worker does
    not wish to wait for the worker to start up.  Otherwise, it should be
-   initialized to <literal>MyProcPid</literal>.
+   initialized to <literal>MyProcPid</literal>. Typically this sets the
+   receiving process's latch via its signal handler.
   </para>
 
   <para>Once running, the process can connect to a database by calling
@@ -208,7 +209,12 @@ typedef struct BackgroundWorker
    allow the process to customize its signal handlers, if necessary.
    Signals can be unblocked in the new process by calling
    <function>BackgroundWorkerUnblockSignals</function> and blocked by calling
-   <function>BackgroundWorkerBlockSignals</function>.
+   <function>BackgroundWorkerBlockSignals</function>. The default
+   <literal>SIGUSR1</literal> handler for background workers just
+   sets the process latch. Background workers may instead adopt
+   the <function>procsignal_sigusr1_handler</function> used by 
+   normal backends and call <function>CHECK_FOR_INTERRUPTS()</function>
+   in their main loops.
   </para>
 
   <para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4dd9d02..be5ed54 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18348,6 +18348,9 @@ SELECT set_config('log_statement_stats', 'off', false);
    <indexterm>
     <primary>pg_terminate_backend</primary>
    </indexterm>
+   <indexterm>
+    <primary>pg_diag_backend</primary>
+   </indexterm>
 
    <indexterm>
     <primary>signal</primary>
@@ -18407,6 +18410,16 @@ SELECT set_config('log_statement_stats', 'off', false);
         however only superusers can terminate superuser backends.
        </entry>
       </row>
+      <row>
+       <entry>
+        <literal><function>pg_diag_backend(<parameter>pid</parameter> <type>int</type>)</function></literal>
+        </entry>
+       <entry><type>boolean</type></entry>
+       <entry>Ask a backend to report diagnostics including memory use details
+        to the server log. Restricted to superusers. The log entry begins
+        with <literal>diagnostic dump</literal> or the localized equivalent.
+       </entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b9302ac..450e4f1 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -292,6 +292,9 @@ procsignal_sigusr1_handler(SIGNAL_ARGS)
 	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
 		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
 
+	if (CheckProcSignal(PROCSIG_DIAG_REQUEST))
+		HandleDiagRequestInterrupt(PROCSIG_DIAG_REQUEST);
+
 	SetLatch(MyLatch);
 
 	latch_sigusr1_handler();
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1ae9ac2..d1ca55f 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -169,6 +169,9 @@ static ProcSignalReason RecoveryConflictReason;
 static MemoryContext row_description_context = NULL;
 static StringInfoData row_description_buf;
 
+/* Reused for printwrapper_stringinfo */
+static StringInfoData diag_request_buf;
+
 /* ----------------------------------------------------------------
  *		decls for routines only used in this file
  * ----------------------------------------------------------------
@@ -193,6 +196,7 @@ static void drop_unnamed_stmt(void);
 static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
+static void HandleDiagRequest(void);
 
 
 /* ----------------------------------------------------------------
@@ -3047,8 +3051,74 @@ ProcessInterrupts(void)
 
 	if (ParallelMessagePending)
 		HandleParallelMessages();
+
+	if (DiagRequestPending)
+		HandleDiagRequest();
+}
+
+/*
+ * Accumulate writes into the buffer in diag_request_buf,
+ * for use with functions that expect a printf-like callback.
+ */
+pg_attribute_printf(1, 2) static void
+printwrapper_stringinfo(const char * fmt, ...)
+{
+	Assert(diag_request_buf.data != NULL);
+	for (;;)
+	{
+		va_list		args;
+		int			needed;
+		va_start(args, fmt);
+		needed = appendStringInfoVA(&diag_request_buf, fmt, args);
+		va_end(args);
+		if (needed == 0)
+			break;
+		enlargeStringInfo(&diag_request_buf, needed);
+	}
 }
 
+/*
+ * The interrupt-handler side of handling a diagnostic request
+ * interrupt. This just saves the request for handling during
+ * CHECK_FOR_INTERRUPTS(). Any prior request that isn't yet actioned is
+ * clobbered.
+ */
+void
+HandleDiagRequestInterrupt(ProcSignalReason request)
+{
+	DiagRequestPending = request;
+	InterruptPending = true;
+}
+
+/*
+ * Dump memory context information to the logs in response to a
+ * PROCSIG_DIAG_REQUEST sent via pg_diag_backend or similar.
+ *
+ * Usually invoked via procsignal_sigusr1_handler and
+ * CHECK_FOR_INTERRUPTS(), so this won't work by default
+ * for background workers and other nonstandard processes.
+ */
+static void
+HandleDiagRequest(void)
+{
+	ProcSignalReason DiagRequest = DiagRequestPending;
+	DiagRequestPending = 0;
+
+	/* So far we only have a generic diagnostics dump */
+	if (DiagRequest == PROCSIG_DIAG_REQUEST)
+	{
+		MemoryContextStats_printf_hook_type	old_printf_hook
+			= MemoryContextStats_printf_hook;
+		initStringInfo(&diag_request_buf);
+		MemoryContextStats_printf_hook = printwrapper_stringinfo;
+		MemoryContextStats(TopMemoryContext);
+		MemoryContextStats_printf_hook = old_printf_hook;
+		elog(LOG, "diagnostic dump requested; memory context info: %s",
+			diag_request_buf.data);
+		pfree(diag_request_buf.data);
+		diag_request_buf.data = NULL;
+	}
+}
 
 /*
  * IA64-specific code to fetch the AR.BSP register for stack depth checks.
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index f53d411..ac0c29f 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -321,6 +321,22 @@ pg_terminate_backend(PG_FUNCTION_ARGS)
 	PG_RETURN_BOOL(r == SIGNAL_BACKEND_SUCCESS);
 }
 
+Datum
+pg_diag_backend(PG_FUNCTION_ARGS)
+{
+	int pid = PG_GETARG_INT32(0);
+	int r;
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("pg_diag_backend requires superuser privileges")));
+
+	r = SendProcSignal(pid, PROCSIG_DIAG_REQUEST, InvalidBackendId);
+
+	PG_RETURN_BOOL(r == 0);
+}
+
 /*
  * Signal to reload the database configuration
  *
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index 9680a4b..f3e4a4d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -29,6 +29,7 @@ ProtocolVersion FrontendProtocol;
 volatile bool InterruptPending = false;
 volatile bool QueryCancelPending = false;
 volatile bool ProcDiePending = false;
+volatile int DiagRequestPending = false;
 volatile bool ClientConnectionLost = false;
 volatile bool IdleInTransactionSessionTimeoutPending = false;
 volatile sig_atomic_t ConfigReloadPending = false;
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 1519da0..76f1cf0 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -1369,8 +1369,8 @@ AllocSetStats(MemoryContext context, int level, bool print,
 		int			i;
 
 		for (i = 0; i < level; i++)
-			fprintf(stderr, "  ");
-		fprintf(stderr,
+			MemoryContextStats_printf_hook("  ");
+		MemoryContextStats_printf_hook(
 				"%s: %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 				set->header.name, totalspace, nblocks, freespace, freechunks,
 				totalspace - freespace);
diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 97382a6..afd120d 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -56,6 +56,10 @@ static void MemoryContextStatsInternal(MemoryContext context, int level,
 						   bool print, int max_children,
 						   MemoryContextCounters *totals);
 
+/* The printf-like destination for MemoryContextStats output */
+MemoryContextStats_printf_hook_type MemoryContextStats_printf_hook
+	= write_stderr;
+
 /*
  * You should not do memory allocations within a critical section, because
  * an out-of-memory error will be escalated to a PANIC. To enforce that
@@ -454,7 +458,7 @@ MemoryContextStatsDetail(MemoryContext context, int max_children)
 
 	MemoryContextStatsInternal(context, 0, true, max_children, &grand_totals);
 
-	fprintf(stderr,
+	MemoryContextStats_printf_hook(
 			"Grand total: %zu bytes in %zd blocks; %zu free (%zd chunks); %zu used\n",
 			grand_totals.totalspace, grand_totals.nblocks,
 			grand_totals.freespace, grand_totals.freechunks,
@@ -510,8 +514,8 @@ MemoryContextStatsInternal(MemoryContext context, int level,
 			int			i;
 
 			for (i = 0; i <= level; i++)
-				fprintf(stderr, "  ");
-			fprintf(stderr,
+				MemoryContextStats_printf_hook("  ");
+			MemoryContextStats_printf_hook(
 					"%d more child contexts containing %zu total in %zd blocks; %zu free (%zd chunks); %zu used\n",
 					ichild - max_children,
 					local_totals.totalspace,
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c969375..2f8fb7b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -3256,6 +3256,8 @@ DATA(insert OID = 2171 ( pg_cancel_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v
 DESCR("cancel a server process' current query");
 DATA(insert OID = 2096 ( pg_terminate_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_terminate_backend _null_ _null_ _null_ ));
 DESCR("terminate a server process");
+DATA(insert OID = 3556 ( pg_diag_backend		PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 16 "23" _null_ _null_ _null_ _null_ _null_ pg_diag_backend _null_ _null_ _null_ ));
+DESCR("signal a server process to dump diagnostics to the log");
 DATA(insert OID = 2172 ( pg_start_backup		PGNSP PGUID 12 1 0 0 0 f f f f t f v r 3 0 3220 "25 16 16" _null_ _null_ _null_ _null_ _null_ pg_start_backup _null_ _null_ _null_ ));
 DESCR("prepare for taking an online backup");
 DATA(insert OID = 2173 ( pg_stop_backup			PGNSP PGUID 12 1 0 0 0 f f f f t f v r 0 0 3220 "" _null_ _null_ _null_ _null_ _null_ pg_stop_backup _null_ _null_ _null_ ));
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 59da7a6..9b17173 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -80,6 +80,7 @@
 extern PGDLLIMPORT volatile bool InterruptPending;
 extern PGDLLIMPORT volatile bool QueryCancelPending;
 extern PGDLLIMPORT volatile bool ProcDiePending;
+extern PGDLLIMPORT volatile int DiagRequestPending;
 extern PGDLLIMPORT volatile bool IdleInTransactionSessionTimeoutPending;
 extern PGDLLIMPORT volatile sig_atomic_t ConfigReloadPending;
 
@@ -275,6 +276,9 @@ extern bool stack_is_too_deep(void);
 
 extern void PostgresSigHupHandler(SIGNAL_ARGS);
 
+enum ProcSignalReason;
+extern void HandleDiagRequestInterrupt(enum ProcSignalReason request);
+
 /* in tcop/utility.c */
 extern void PreventCommandIfReadOnly(const char *cmdname);
 extern void PreventCommandIfParallelMode(const char *cmdname);
diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index 20bb05b..121fb86 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -27,7 +27,7 @@
  * Also, because of race conditions, it's important that all the signals be
  * defined so that no harm is done if a process mistakenly receives one.
  */
-typedef enum
+typedef enum ProcSignalReason
 {
 	PROCSIG_CATCHUP_INTERRUPT,	/* sinval catchup interrupt */
 	PROCSIG_NOTIFY_INTERRUPT,	/* listen/notify interrupt */
@@ -42,6 +42,9 @@ typedef enum
 	PROCSIG_RECOVERY_CONFLICT_BUFFERPIN,
 	PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
 
+	/* Signals used to ask the backend for diagnostic output */
+	PROCSIG_DIAG_REQUEST,		/* Dump memory context info etc */
+
 	NUM_PROCSIGNALS				/* Must be last! */
 } ProcSignalReason;
 
diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h
index 9c30eb7..f5b6d2c 100644
--- a/src/include/utils/memutils.h
+++ b/src/include/utils/memutils.h
@@ -45,6 +45,10 @@
 
 #define AllocHugeSizeIsValid(size)	((Size) (size) <= MaxAllocHugeSize)
 
+typedef void (*MemoryContextStats_printf_hook_type)(const char *fmt, ...)
+	pg_attribute_printf(1, 2);
+
+extern MemoryContextStats_printf_hook_type MemoryContextStats_printf_hook;
 
 /*
  * Standard top-level memory contexts.
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 1a20c1f..fb64b87 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -1,4 +1,13 @@
 --
+-- Misc admin functions
+--
+
+-- pg_diag_backend makes the backend log, so it can't really
+-- be tested within pg_regress except to ensure it doesn't
+-- upset the backend.
+SELECT pg_diag_backend(pg_backend_pid());
+
+--
 -- num_nulls()
 --
 
-- 
2.9.5

Reply via email to