On 2021/12/22 21:11, Tomas Vondra wrote:
Interesting idea, but I think it has a couple of issues :-(
Thanks for the review!
1) We'd need to know the LSN of the last WAL record for any given sequence, and
we'd need to communicate that between backends somehow. Which seems rather
tricky to do without affecting performance.
How about using the page lsn for the sequence? nextval_internal() already uses
that to check whether it's less than or equal to checkpoint redo location.
2) SyncRepWaitForLSN() is used only in commit-like situations, and it's a
simple wait, not a decision to write more WAL. Environments without sync
replicas are affected by this too - yes, the data loss issue is not there, but
the amount of WAL is still increased.
How about reusing only a part of code in SyncRepWaitForLSN()? Attached is the
PoC patch that implemented what I'm thinking.
IIRC sync_standby_names can change while a transaction is running, even just
right before commit, at which point we can't just go back in time and generate
WAL for sequences accessed earlier. But we still need to ensure the sequence is
properly replicated.
Yes. In the PoC patch, SyncRepNeedsWait() still checks sync_standbys_defined
and uses SyncRepWaitMode. But they should not be checked nor used because their
values can be changed on the fly, as you pointed out. Probably
SyncRepNeedsWait() will need to be changed so that it doesn't use them.
3) I don't think it'd actually reduce the amount of WAL records in environments
with many sessions (incrementing the same sequence). In those cases the WAL
(generated by in-progress xact from another session) is likely to not be
flushed, so we'd generate the extra WAL record. (And if the other backends
would need flush LSN of this new WAL record, which would make it more likely
they have to generate WAL too.)
With the PoC patch, only when previous transaction that executed nextval() and
caused WAL record is aborted, subsequent nextval() generates additional WAL
record. So this approach can reduce WAL volume than other approach?
In the PoC patch, to reduce WAL volume more, it might be better to make
nextval_internal() update XactLastRecEnd and assign XID rather than emitting
new WAL record, when SyncRepNeedsWait() returns true.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 72bfdc07a4..38cd55b81a 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -37,6 +37,7 @@
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "parser/parse_type.h"
+#include "replication/syncrep.h"
#include "storage/lmgr.h"
#include "storage/proc.h"
#include "storage/smgr.h"
@@ -676,8 +677,9 @@ nextval_internal(Oid relid, bool check_permissions)
else
{
XLogRecPtr redoptr = GetRedoRecPtr();
+ XLogRecPtr pagelsn = PageGetLSN(page);
- if (PageGetLSN(page) <= redoptr)
+ if (pagelsn <= redoptr || SyncRepNeedsWait(pagelsn))
{
/* last update of seq was before checkpoint */
fetch = log = fetch + SEQ_LOG_VALS;
diff --git a/src/backend/replication/syncrep.c
b/src/backend/replication/syncrep.c
index bdbc9ef844..589364cb58 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -130,6 +130,34 @@ static bool SyncRepQueueIsOrderedByLSN(int mode);
* ===========================================================
*/
+bool
+SyncRepNeedsWait(XLogRecPtr lsn)
+{
+ int mode;
+
+ if (!SyncRepRequested() ||
+ !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined)
+ return false;
+
+ mode = Min(SyncRepWaitMode, SYNC_REP_WAIT_FLUSH);
+
+ Assert(SHMQueueIsDetached(&(MyProc->syncRepLinks)));
+ Assert(WalSndCtl != NULL);
+
+ LWLockAcquire(SyncRepLock, LW_SHARED);
+ Assert(MyProc->syncRepState == SYNC_REP_NOT_WAITING);
+
+ if (!WalSndCtl->sync_standbys_defined ||
+ lsn <= WalSndCtl->lsn[mode])
+ {
+ LWLockRelease(SyncRepLock);
+ return false;
+ }
+
+ LWLockRelease(SyncRepLock);
+ return true;
+}
+
/*
* Wait for synchronous replication, if requested by user.
*
diff --git a/src/include/replication/syncrep.h
b/src/include/replication/syncrep.h
index 4266afde8b..b08f3f32d1 100644
--- a/src/include/replication/syncrep.h
+++ b/src/include/replication/syncrep.h
@@ -82,6 +82,7 @@ extern char *syncrep_parse_error_msg;
extern char *SyncRepStandbyNames;
/* called by user backend */
+extern bool SyncRepNeedsWait(XLogRecPtr lsn);
extern void SyncRepWaitForLSN(XLogRecPtr lsn, bool commit);
/* called at backend exit */