At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote in > On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > > [17605] LOG: terminating process 17614 to release replication slot "s1" > > + [17605] DETAIL: The slot's restart_lsn 0/2C0000A0 exceeds > > max_slot_wal_keep_size. > > > [17614] FATAL: terminating connection due to administrator command > > > [17605] LOG: invalidating slot "s1" because its restart_lsn 0/2C0000A0 > > > exceeds max_slot_wal_keep_size > > > > Somewhat the second and fourth lines look inconsistent each other but > > that wouldn't be such a problem. I don't think we want to concatenate > > the two lines together as the result is a bit too long. > > > > > LOG: terminating process 17614 to release replication slot "s1" because > > > it's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size. > > > > What do you think about this? > > Agree. I think we should also specify the restart_lsn value which > would be within max_slot_wal_keep_size for better understanding.
Thanks! It seems to me the main message of the "invalidating" log has no room for further detail. So I split the reason out to DETAILS line the same way with the "terminating" message in the attached second patch. (It is separated from the first patch just for review) I believe someone can make the DETAIL message simpler or more natural. The attached patch set emits the following message. > LOG: invalidating slot "s1" > DETAIL: The slot's restart_lsn 0/10000D68 is behind the limit 0/11000000 > defined by max_slot_wal_keep_size. The second line could be changed like the following or anything other. > DETAIL: The slot's restart_lsn 0/10000D68 got behind the limit 0/11000000 > determined by max_slot_wal_keep_size. ..... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 6909d854a0d48f2504cbb6dfbc7eb571fd32bcd8 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Tue, 14 Dec 2021 10:50:00 +0900 Subject: [PATCH v2 1/2] Make an error message about process termination more descriptive If checkpointer kills a process due to a temporary replication slot exceeding max_slot_wal_keep_size, the messages fails to describe the cause of the termination. It is finally shown for persistent slots in a subsequent message, but that message doesn't show for temporary slots. It's better to attach an explainaion to the termination message directly. Reported-by: Alex Enachioaie <a...@altmetric.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org --- src/backend/replication/slot.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 90ba9b417d..cba9a29113 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, { ereport(LOG, (errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)))); + active_pid, NameStr(slotname)), + errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0
>From 265a84fa6a20ee7c5178ace41096df1a4b00465c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Wed, 15 Dec 2021 13:08:20 +0900 Subject: [PATCH v2 2/2] Make the message further detailed --- src/backend/replication/slot.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index cba9a29113..695151e484 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, ereport(LOG, (errmsg("terminating process %d to release replication slot \"%s\"", active_pid, NameStr(slotname)), - errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn)))); + errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.", + LSN_FORMAT_ARGS(restart_lsn), + LSN_FORMAT_ARGS(oldestLSN)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; @@ -1292,9 +1294,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, ReplicationSlotRelease(); ereport(LOG, - (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", - NameStr(slotname), - LSN_FORMAT_ARGS(restart_lsn)))); + (errmsg("invalidating slot \"%s\"", + NameStr(slotname)), + errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by max_slot_wal_keep_size.", + LSN_FORMAT_ARGS(restart_lsn), + LSN_FORMAT_ARGS(oldestLSN)))); /* done with this slot for now */ break; -- 2.27.0