On 2025/03/28 0:13, torikoshia wrote:
On 2025-03-27 11:06, torikoshia wrote:
Hi,
I had another off-list discussion with Fujii-san, and according to the
following manual[1], it seems that a transaction with an overflowed
subtransaction is already considered inconsistent:
Reaching a consistent state can also be delayed in the presence of
both of these conditions:
- A write transaction has more than 64 subtransactions
- Very long-lived write transactions
IIUC, the manual suggests that both conditions must be met -- recovery
reaching at least minRecoveryPoint and no overflowed subtransactions
—- for the standby to be considered consistent.
OTOH, the following log message is emitted even when subtransactions
have overflowed, which appears to contradict the definition of
consistency mentioned above:
LOG: consistent recovery state reached
This log message is triggered when recovery progresses beyond
minRecoveryPoint(according to CheckRecoveryConsistency()).
However, since this state does not satisfy 'consistency' defined in
the manual, I think it would be more accurate to log that it has
merely reached the "minimum recovery point".
Furthermore, it may be better to emit the above log message only when
recovery has progressed beyond minRecoveryPoint and there are no
overflowed subtransactions.
Attached patch does this.
Additionally, renaming variables such as reachedConsistency in
CheckRecoveryConsistency might also be appropriate.
However, in the attached patch, I have left them unchanged for now.
On 2025-03-25 00:55, Fujii Masao wrote:
- case CAC_NOTCONSISTENT:
+ case CAC_NOTCONSISTENT_OR_OVERFLOWED:
This new name seems a bit too long. I'm OK to leave the name as it is.
Or, something like CAC_NOTHOTSTANDBY seems simpler and better to me.
Beyond just the length issue, given the understanding outlined above,
I now think CAC_NOTCONSISTENT does not actually need to be changed.
In high-availability.sgml, the "Administrator's Overview" section already
describes the conditions for accepting hot standby connections.
This section should also be updated accordingly.
Agreed.
I have updated this section to mention that the resolution is to close
the problematic transaction.
OTOH the changes made in v2 patch seem unnecessary, since the concept
of 'consistent' is already explained in the "Administrator's
Overview."
- errdetail("Consistent recovery state has not been yet
reached.")));
+ errdetail("Consistent recovery state has not been
yet
reached, or snappshot is pending because subtransaction is
overflowed."),
Given the above understanding, "or" is not appropriate in this
context, so I left this message unchanged.
Instead, I have added an errhint. The phrasing in the hint message
aligns with the manual, allowing users to search for this hint and
find the newly added resolution instructions.
On second thought, it may not be appropriate to show this output to clients
attempting to connect. This message should be notified not to clients but to
administrators.
From this point of view, it'd be better to output a message indicating the
status inside ProcArrayApplyRecoveryInfo(). However, a straightforward
implementation would result in the same message being logged every time an
XLOG_RUNNING_XACTS WAL is received, making it noisy.
Instead of directly outputting a log indicating that a hot standby connection
cannot be established due to subtransaction overflow, the attached patch
updates the manual so that administrators can determine whether a
subtransaction overflow has occurred based on the modified log output.
What do you think?
I had the same thought during our off-list discussion. However,
after reviewing the recovery code - such as recoveryStopsBefore(),
which checks whether a consistent state is reached - I now believe
the manual’s definition of a consistent state may be incorrect.
A consistent state should be defined as the point where recovery
has reached minRecoveryPoint.
If we were to change the definition to match the manual,
we would also need to update various recovery checks,
which wouldn't be a trivial task.
Given that, I now think it's better to revive your v1 patch,
which introduces a new postmaster signal and improves the error message
when connections are not accepted during hot standby. I've attached
a revised version of the patch based on your v1. Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From a21c5fa943c2e372bf6b77fcbb32345751fb7a84 Mon Sep 17 00:00:00 2001
From: Fujii Masao <fu...@postgresql.org>
Date: Fri, 28 Mar 2025 01:40:52 +0900
Subject: [PATCH v5] Improve error message when standby does accept
connections.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Even after reaching the minimum recovery point, if there are long-lived
write transactions with 64 subtransactions on the primary, the recovery
snapshot may not yet be ready for hot standby, delaying read-only
connections on the standby. Previously, when read-only connections were
not accepted due to this condition, the following error message was logged:
FATAL: the database system is not yet accepting connections
DETAIL: Consistent recovery state has not been yet reached.
This DETAIL message was misleading because the following message was
already logged in this case:
LOG: consistent recovery state reached
This contradiction, i.e., indicating that the recovery state was consistent
while also stating it wasn’t, caused confusion.
This commit improves the error message to better reflect the actual state:
FATAL: the database system is not accepting connections
DETAIL: Recovery snapshot is not yet ready for hot standby.
HINT: To enable hot standby, close write transactions with more than 64
subtransactions on the primary server.
To implement this, the commit introduces a new postmaster signal,
PMSIGNAL_RECOVERY_CONSISTENT. When the startup process reaches
a consistent recovery state, it sends this signal to the postmaster,
allowing it to correctly recognize that state.
Since this is not a clear bug, the change is applied only to the master
branch and is not back-patched.
---
doc/src/sgml/high-availability.sgml | 9 ++++++---
src/backend/access/transam/xlogrecovery.c | 6 ++++++
src/backend/postmaster/postmaster.c | 12 +++++++++---
src/backend/tcop/backend_startup.c | 16 ++++++++++++----
src/include/storage/pmsignal.h | 1 +
src/include/tcop/backend_startup.h | 2 +-
6 files changed, 35 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/high-availability.sgml
b/doc/src/sgml/high-availability.sgml
index acf3ac0601d..0e84ca66901 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1974,9 +1974,12 @@ LOG: database system is ready to accept read-only
connections
Consistency information is recorded once per checkpoint on the primary.
It is not possible to enable hot standby when reading WAL
written during a period when <varname>wal_level</varname> was not set to
- <literal>replica</literal> or <literal>logical</literal> on the primary.
Reaching
- a consistent state can also be delayed in the presence of both of these
- conditions:
+ <literal>replica</literal> or <literal>logical</literal> on the primary.
+ Even after reaching a consistent state, the recovery snapshot may not
+ be ready for hot standby if both of the following conditions are met,
+ delaying accepting read-only connections. To enable hot standby,
+ a long-lived write transaction with more than 64 subtransactions
+ needs to be closed on the primary.
<itemizedlist>
<listitem>
diff --git a/src/backend/access/transam/xlogrecovery.c
b/src/backend/access/transam/xlogrecovery.c
index 0aa3ab59085..6ce979f2d8b 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -291,6 +291,11 @@ static bool backupEndRequired = false;
* Consistent state means that the system is internally consistent, all
* the WAL has been replayed up to a certain point, and importantly, there
* is no trace of later actions on disk.
+ *
+ * This flag is used only by the startup process and postmaster. When
+ * minRecoveryPoint is reached, the startup process sets it to true and
+ * sends a PMSIGNAL_RECOVERY_CONSISTENT signal to the postmaster,
+ * which then sets it to true upon receiving the signal.
*/
bool reachedConsistency = false;
@@ -2248,6 +2253,7 @@ CheckRecoveryConsistency(void)
CheckTablespaceDirectory();
reachedConsistency = true;
+ SendPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT);
ereport(LOG,
(errmsg("consistent recovery state reached at
%X/%X",
LSN_FORMAT_ARGS(lastReplayedEndRecPtr))));
diff --git a/src/backend/postmaster/postmaster.c
b/src/backend/postmaster/postmaster.c
index c966c2e83af..3fe45de5da0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1825,8 +1825,7 @@ canAcceptConnections(BackendType backend_type)
else if (!FatalError && pmState == PM_STARTUP)
return CAC_STARTUP; /* normal startup */
else if (!FatalError && pmState == PM_RECOVERY)
- return CAC_NOTCONSISTENT; /* not yet at
consistent recovery
-
* state */
+ return CAC_NOTHOTSTANDBY; /* not yet ready for
hot standby */
else
return CAC_RECOVERY; /* else must be crash recovery
*/
}
@@ -3699,6 +3698,7 @@ process_pm_pmsignal(void)
/* WAL redo has started. We're out of reinitialization. */
FatalError = false;
AbortStartTime = 0;
+ reachedConsistency = false;
/*
* Start the archiver if we're responsible for (re-)archiving
received
@@ -3724,8 +3724,14 @@ process_pm_pmsignal(void)
UpdatePMState(PM_RECOVERY);
}
- if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
+ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_CONSISTENT) &&
pmState == PM_RECOVERY && Shutdown == NoShutdown)
+ {
+ reachedConsistency = true;
+ }
+
+ if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
+ (pmState == PM_RECOVERY && Shutdown == NoShutdown))
{
ereport(LOG,
(errmsg("database system is ready to accept
read-only connections")));
diff --git a/src/backend/tcop/backend_startup.c
b/src/backend/tcop/backend_startup.c
index a07c59ece01..f920d9e4e55 100644
--- a/src/backend/tcop/backend_startup.c
+++ b/src/backend/tcop/backend_startup.c
@@ -18,6 +18,7 @@
#include <unistd.h>
#include "access/xlog.h"
+#include "access/xlogrecovery.h"
#include "common/ip.h"
#include "common/string.h"
#include "libpq/libpq.h"
@@ -306,17 +307,24 @@ BackendInitialize(ClientSocket *client_sock, CAC_state
cac)
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
errmsg("the database system is
starting up")));
break;
- case CAC_NOTCONSISTENT:
- if (EnableHotStandby)
+ case CAC_NOTHOTSTANDBY:
+ if (!EnableHotStandby)
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
errmsg("the database
system is not yet accepting connections"),
- errdetail("Consistent
recovery state has not been yet reached.")));
+ errdetail("Hot standby
mode is disabled.")));
+ else if (reachedConsistency)
+ ereport(FATAL,
+
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
+ errmsg("the database
system is not accepting connections"),
+ errdetail("Recovery
snapshot is not yet ready for hot standby."),
+ errhint("To enable hot
standby, close write transactions with more than %d subtransactions on the
primary server.",
+
PGPROC_MAX_CACHED_SUBXIDS)));
else
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
errmsg("the database
system is not accepting connections"),
- errdetail("Hot standby
mode is disabled.")));
+ errdetail("Consistent
recovery state has not been yet reached.")));
break;
case CAC_SHUTDOWN:
ereport(FATAL,
diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index d84a383047e..67fa9ac06e1 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -33,6 +33,7 @@
typedef enum
{
PMSIGNAL_RECOVERY_STARTED, /* recovery has started */
+ PMSIGNAL_RECOVERY_CONSISTENT, /* recovery has reached consistent
state */
PMSIGNAL_BEGIN_HOT_STANDBY, /* begin Hot Standby */
PMSIGNAL_ROTATE_LOGFILE, /* send SIGUSR1 to syslogger to rotate
logfile */
PMSIGNAL_START_AUTOVAC_LAUNCHER, /* start an autovacuum launcher
*/
diff --git a/src/include/tcop/backend_startup.h
b/src/include/tcop/backend_startup.h
index 578828c1caf..dcb9d056643 100644
--- a/src/include/tcop/backend_startup.h
+++ b/src/include/tcop/backend_startup.h
@@ -36,7 +36,7 @@ typedef enum CAC_state
CAC_STARTUP,
CAC_SHUTDOWN,
CAC_RECOVERY,
- CAC_NOTCONSISTENT,
+ CAC_NOTHOTSTANDBY,
CAC_TOOMANY,
} CAC_state;
--
2.48.1