At Thu, 23 Dec 2021 18:08:08 +0530, Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote in > On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > > LOG: invalidating slot "s1" > > > DETAIL: The slot's restart_lsn 0/10000D68 is behind the limit 0/11000000 > > > defined by max_slot_wal_keep_size. > > > > The second line could be changed like the following or anything other. > > > > > DETAIL: The slot's restart_lsn 0/10000D68 got behind the limit > > > 0/11000000 determined by max_slot_wal_keep_size. > > ..... > > > > The second version looks better as it gives more details. I am fine > with either of the above wordings. > > I would prefer everything in the same message though since > "invalidating slot ..." is too short a LOG message. Not everybody > enabled details always.
Mmm. Right. I have gone too much to the same way with the process-termination message. I rearranged the meesages as follows in the attached version. (at master) > LOG: terminating process %d to release replication slot \"%s\" because its > restart_lsn %X/%X exceeds max_slot_wal_keep_size > DETAIL: The slot got behind the limit %X/%X determined by > max_slot_wal_keep_size. > LOG: invalidating slot \"%s\" because its restart_LSN %X/%X exceeds > max_slot_wal_keep_size c> DETAIL: The slot got behind the limit %X/%X determined by max_slot_wal_keep_size. The messages is actually incomplete even in 13 so I think the change to the errmsg() message of the first message is worth back-patching. - v3-0001-Make-a-message-on-process-termination-more-dscrip.patch Changes only the first main message and it can be back-patched to 14. - v3-0001-Make-a-message-on-process-termination-more-dscrip_13.patch The same to the above but for 13, which doesn't have LSN_FORMAT_ARGS. - v3-0002-Add-detailed-information-to-slot-invalidation-mes.patch Attaches the DETAIL line shown above to both messages, only for the master. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 4bfbb58801e9d2a0e0ab8320dd95bc850a0a397b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 24 Dec 2021 12:52:07 +0900 Subject: [PATCH v3 1/2] Make a message on process termination more dscriptive The message at process termination due to slot limit doesn't provide the reason. In the major scenario the message is followed by another message about slot invalidatation, which shows the detail for the termination. However the second message is missing if the slot is temporary one. Augment the first message with the reason same as the second message. Backpatch through to 13 where the message was introduced. Reported-by: Alex Enachioaie <a...@altmetric.com> Author: Kyotaro Horiguchi <horikyota....@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org Backpatch-through: 13 --- src/backend/replication/slot.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 90ba9b417d..0ceac3a54d 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1253,8 +1253,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { ereport(LOG, - (errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)))); + (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", + active_pid, NameStr(slotname), + LSN_FORMAT_ARGS(restart_lsn)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0
>From 444b3bb64ee907280606696e0cb90f6b022c9cd6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 24 Dec 2021 13:23:54 +0900 Subject: [PATCH v3] Make a message on process termination more dscriptive The message at process termination due to slot limit doesn't provide the reason. In the major scenario the message is followed by another message about slot invalidatation, which shows the detail for the termination. However the second message is missing if the slot is temporary one. Augment the first message with the reason same as the second message. Backpatch through to 13 where the message was introduced. Reported-by: Alex Enachioaie <a...@altmetric.com> Author: Kyotaro Horiguchi <horikyota....@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org Backpatch-through: 13 --- src/backend/replication/slot.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 02047ea920..15b8934ae2 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1228,8 +1228,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, if (last_signaled_pid != active_pid) { ereport(LOG, - (errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)))); + (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", + active_pid, NameStr(slotname), + (uint32) (restart_lsn >> 32), + (uint32) restart_lsn))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; -- 2.27.0
>From 4d37e4e6f2af557779d970f5fe4dba98c937ed49 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 24 Dec 2021 12:58:25 +0900 Subject: [PATCH v3 2/2] Add detailed information to slot-invalidation messages The messages says just "exceeds max_slot_wal_keep_size" but doesn't tell the concrete LSN of the limit. That information helps operators' understanding on the issue. Author: Kyotaro Horiguchi <horikyota....@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat....@gmail.com> Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org --- src/backend/replication/slot.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 0ceac3a54d..2b394fe0d7 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, ereport(LOG, (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", active_pid, NameStr(slotname), - LSN_FORMAT_ARGS(restart_lsn)))); + LSN_FORMAT_ARGS(restart_lsn)), + errdetail("The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.", + LSN_FORMAT_ARGS(oldestLSN)))); (void) kill(active_pid, SIGTERM); last_signaled_pid = active_pid; @@ -1294,7 +1296,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, ereport(LOG, (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size", NameStr(slotname), - LSN_FORMAT_ARGS(restart_lsn)))); + LSN_FORMAT_ARGS(restart_lsn)), + errdetail("The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.", + LSN_FORMAT_ARGS(oldestLSN)))); /* done with this slot for now */ break; -- 2.27.0