At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat 
<ashutosh.bapat....@gmail.com> wrote in 
> On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> > > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> > + [17605] DETAIL:  The slot's restart_lsn 0/2C0000A0 exceeds 
> > max_slot_wal_keep_size.
> > > [17614] FATAL:  terminating connection due to administrator command
> > > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2C0000A0 
> > > exceeds max_slot_wal_keep_size
> >
> > Somewhat the second and fourth lines look inconsistent each other but
> > that wouldn't be such a problem.  I don't think we want to concatenate
> > the two lines together as the result is a bit too long.
> >
> > > LOG:  terminating process 17614 to release replication slot "s1" because 
> > > it's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size.
> >
> > What do you think about this?
> 
> Agree. I think we should also specify the restart_lsn value which
> would be within max_slot_wal_keep_size for better understanding.

Thanks!  It seems to me the main message of the "invalidating" log has
no room for further detail.  So I split the reason out to DETAILS line
the same way with the "terminating" message in the attached second
patch. (It is separated from the first patch just for review) I
believe someone can make the DETAIL message simpler or more natural.

The attached patch set emits the following message.

> LOG:  invalidating slot "s1"
> DETAIL:  The slot's restart_lsn 0/10000D68 is behind the limit 0/11000000 
> defined by max_slot_wal_keep_size.

The second line could be changed like the following or anything other.

> DETAIL:  The slot's restart_lsn 0/10000D68 got behind the limit 0/11000000 
> determined by max_slot_wal_keep_size.
.....

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6909d854a0d48f2504cbb6dfbc7eb571fd32bcd8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Tue, 14 Dec 2021 10:50:00 +0900
Subject: [PATCH v2 1/2] Make an error message about process termination more
 descriptive

If checkpointer kills a process due to a temporary replication slot
exceeding max_slot_wal_keep_size, the messages fails to describe the
cause of the termination.  It is finally shown for persistent slots in
a subsequent message, but that message doesn't show for temporary
slots.  It's better to attach an explainaion to the termination
message directly.

Reported-by: Alex Enachioaie <a...@altmetric.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..cba9a29113 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			{
 				ereport(LOG,
 						(errmsg("terminating process %d to release replication slot \"%s\"",
-								active_pid, NameStr(slotname))));
+								active_pid, NameStr(slotname)),
+						 errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn))));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
-- 
2.27.0

>From 265a84fa6a20ee7c5178ace41096df1a4b00465c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota....@gmail.com>
Date: Wed, 15 Dec 2021 13:08:20 +0900
Subject: [PATCH v2 2/2] Make the message further detailed

---
 src/backend/replication/slot.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index cba9a29113..695151e484 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 				ereport(LOG,
 						(errmsg("terminating process %d to release replication slot \"%s\"",
 								active_pid, NameStr(slotname)),
-						 errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn))));
+						 errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.",
+								   LSN_FORMAT_ARGS(restart_lsn),
+								   LSN_FORMAT_ARGS(oldestLSN))));
 
 				(void) kill(active_pid, SIGTERM);
 				last_signaled_pid = active_pid;
@@ -1292,9 +1294,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			ReplicationSlotRelease();
 
 			ereport(LOG,
-					(errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
-							NameStr(slotname),
-							LSN_FORMAT_ARGS(restart_lsn))));
+					(errmsg("invalidating slot \"%s\"",
+							NameStr(slotname)),
+					 errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.",
+							LSN_FORMAT_ARGS(restart_lsn),
+							LSN_FORMAT_ARGS(oldestLSN))));
 
 			/* done with this slot for now */
 			break;
-- 
2.27.0

Reply via email to