Augmenting the deadlock message with application_name

2024-05-10 Thread Karoline Pauls
As we know, the deadlock error message isn't the most friendly one. All the 
client gets back is process PIDs, transaction IDs, and lock types. You have to 
check the server log to retrieve lock details. This is tedious.

In one of my apps I even added a deadlock exception handler on the app side to 
query pg_stat_activity for processes involved in the deadlock and include their 
application names and queries in the exception message. It is a little racy but 
works well enough.

Ideally I'd like to see that data coming from Postgres upon detecting the 
deadlock. That's why I made this small change.

The change makes the deadlock error look as follows - the new element is the 
application name or "" in its place if the activity 
user doesn't match the current user (and the current use isn't a superuser):

postgres=*> SELECT * FROM q WHERE id = 2 FOR UPDATE;
ERROR: deadlock detected
DETAIL: Process 194520 (application_name: ) waits for 
ShareLock on transaction 776; blocked by process 194521.
Process 194521 (application_name: woof) waits for ShareLock on transaction 775; 
blocked by process 194520.
HINT: See server log for query details.
CONTEXT: while locking tuple (0,2) in relation "q"

I added a new LocalPgBackendCurrentActivity struct combining application name 
and query string pointers and a sameProcess boolean. It is returned by value, 
since it's small. Performance-wise, this is a a part of the deadlock handler, 
if the DB hits it frequently, there are much more serious problems going on.

I could extend it by sending the queries back to the client, with an identical 
security check, but this is a potential information exposure of whatever's in 
the query plaintext. Another extension is to replace "(application_name: 
)" with something better like "(unknown 
application_name)", or even nothing.

Attached patch is for master, 2fb7560c. It doesn't contain any tests.

Let me know if you approve of the patch and if it makes sense to continue 
working on it.

Best,
KarolineFrom f7d3b3854ece45b2ba3b79643e99c1acbb09abae Mon Sep 17 00:00:00 2001
From: Karoline Pauls 
Date: Wed, 8 May 2024 23:49:46 +0100
Subject: [PATCH] call pgstat_get_backend_current_activity once

---
 src/backend/storage/lmgr/deadlock.c | 31 +
 src/backend/utils/activity/backend_status.c | 10 +++
 src/include/utils/backend_status.h  |  4 ++-
 src/tools/pgindent/typedefs.list|  1 +
 4 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index d94f65df..c08f66af 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -1075,17 +1075,22 @@ DeadLockReport(void)
 	StringInfoData logbuf;		/* errdetail for server log */
 	StringInfoData locktagbuf;
 	int			i;
+	bool		isSuperuser = superuser();
+	LocalPgBackendCurrentActivity *activities = \
+		(LocalPgBackendCurrentActivity *) palloc(sizeof(LocalPgBackendCurrentActivity) * nDeadlockDetails);
 
 	initStringInfo(&clientbuf);
 	initStringInfo(&logbuf);
 	initStringInfo(&locktagbuf);
 
-	/* Generate the "waits for" lines sent to the client */
+	/* Generate the "waits for" lines sent to the client and the log */
 	for (i = 0; i < nDeadlockDetails; i++)
 	{
 		DEADLOCK_INFO *info = &deadlockDetails[i];
 		int			nextpid;
-		LocalPgBackendCurrentActivity activity = pgstat_get_backend_current_activity(info->pid, true);
+		const char *lockModeName = GetLockmodeName(info->locktag.locktag_lockmethodid, info->lockmode);
+
+		activities[i] = pgstat_get_backend_current_activity(info->pid);
 
 		/* The last proc waits for the first one... */
 		if (i < nDeadlockDetails - 1)
@@ -1104,29 +1109,33 @@ DeadLockReport(void)
 		appendStringInfo(&clientbuf,
 		 _("Process %d (application_name: %s) waits for %s on %s; blocked by process %d."),
 		 info->pid,
-		 activity.st_appname,
-		 GetLockmodeName(info->locktag.locktag_lockmethodid,
-		 info->lockmode),
+		 (activities[i].sameUser || isSuperuser) ? activities[i].st_appname : "",
+		 lockModeName,
 		 locktagbuf.data,
 		 nextpid);
-	}
 
-	/* Duplicate all the above for the server ... */
-	appendBinaryStringInfo(&logbuf, clientbuf.data, clientbuf.len);
+		/* the difference is that we don't do user checking for the server log */
+		appendStringInfo(&logbuf,
+		 _("Process %d (application_name: %s) waits for %s on %s; blocked by process %d."),
+		 info->pid,
+		 activities[i].st_appname,
+		 lockModeName,
+		 locktagbuf.data,
+		 nextpid);
+	}
 
 	/* ... and add info about query strings */
 	for (i = 0; i < nDeadlockDetails; i++)
 	{
 		DEADLOCK_INFO *info = &deadlockDetails[i];
-		LocalPgBackendCurrentActivity activi

Re: Augmenting the deadlock message with application_name

2024-05-10 Thread Karoline Pauls
On Friday, 10 May 2024 at 20:17, Bruce Momjian  wrote:
>
> log_line_prefix supports application name --- why would you not use
> that?
>

log_line_prefix is effective in the server log. This change is mostly about 
improving the message sent back to the client. While the server log is also 
changed to reflect the client message, it doesn't need to be.

Additionally, with `%a` added to log_line_prefix, the server log would only 
contain the application name of the client affected by the deadlock, not the 
application names of all other clients involved in it.

Example server log with application names (here: a and b) added to the log 
prefix:

2024-05-10 20:39:58.459 BST [197591] (a)ERROR:  deadlock detected
2024-05-10 20:39:58.459 BST [197591] (a)DETAIL:  Process 197591 
(application_name: a) waits for ShareLock on transaction 782; blocked by 
process 197586.
Process 197586 (application_name: b) waits for ShareLock on transaction 
781; blocked by process 197591.
Process 197591, (application_name: a): SELECT * FROM q WHERE id = 2 FOR 
UPDATE;
Process 197586, (application_name: b): SELECT * FROM q WHERE id = 1 FOR 
UPDATE;
2024-05-10 20:39:58.459 BST [197591] (a)HINT:  See server log for query details.
2024-05-10 20:39:58.459 BST [197591] (a)CONTEXT:  while locking tuple (0,2) in 
relation "q"
2024-05-10 20:39:58.459 BST [197591] (a)STATEMENT:  SELECT * FROM q WHERE id = 
2 FOR UPDATE;

All log line prefixes refer to the application a. The message has both a and b.

Anyway, the server log is not the important part here. The crucial UX feature 
is the client getting application names back, so  browsing through server logs 
can be avoided.

Best,
Karoline