On 12/22/21 18:50, Fujii Masao wrote:
On 2021/12/22 21:11, Tomas Vondra wrote:
Interesting idea, but I think it has a couple of issues :-(
Thanks for the review!
1) We'd need to know the LSN of the last WAL record for any given
sequence, and we'd need to communicate that between backends somehow.
Which seems rather tricky to do without affecting performance.
How about using the page lsn for the sequence? nextval_internal()
already uses that to check whether it's less than or equal to checkpoint
redo location.
Hmm, maybe.
2) SyncRepWaitForLSN() is used only in commit-like situations, and
it's a simple wait, not a decision to write more WAL. Environments
without sync replicas are affected by this too - yes, the data loss
issue is not there, but the amount of WAL is still increased.
How about reusing only a part of code in SyncRepWaitForLSN()? Attached
is the PoC patch that implemented what I'm thinking.
IIRC sync_standby_names can change while a transaction is running,
even just right before commit, at which point we can't just go back in
time and generate WAL for sequences accessed earlier. But we still
need to ensure the sequence is properly replicated.
Yes. In the PoC patch, SyncRepNeedsWait() still checks
sync_standbys_defined and uses SyncRepWaitMode. But they should not be
checked nor used because their values can be changed on the fly, as you
pointed out. Probably SyncRepNeedsWait() will need to be changed so that
it doesn't use them.
Right. I think the data loss with sync standby is merely a symptom, not
the root cause. We'd need to deduce the LSN for which to wait at commit.
3) I don't think it'd actually reduce the amount of WAL records in
environments with many sessions (incrementing the same sequence). In
those cases the WAL (generated by in-progress xact from another
session) is likely to not be flushed, so we'd generate the extra WAL
record. (And if the other backends would need flush LSN of this new
WAL record, which would make it more likely they have to generate WAL
too.)
With the PoC patch, only when previous transaction that executed
nextval() and caused WAL record is aborted, subsequent nextval()
generates additional WAL record. So this approach can reduce WAL volume
than other approach?
> In the PoC patch, to reduce WAL volume more, it might be better to make
nextval_internal() update XactLastRecEnd and assign XID rather than
emitting new WAL record, when SyncRepNeedsWait() returns true.
Yes, but I think there are other cases. For example the WAL might have
been generated by another backend, in a transaction that might be still
running. In which case I don't see how updating XactLastRecEnd in
nextval_internal would fix this, right?
I did some experiments with increasing CACHE for the sequence, and that
mostly eliminates the overhead. See the message I sent a couple minutes
ago. IMHO that's a reasonable solution for the tiny number of people
using nextval() in a way that'd be affected by this (i.e. without
writing anything else in the xact).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company