On 12/22/21 18:50, Fujii Masao wrote:


On 2021/12/22 21:11, Tomas Vondra wrote:
Interesting idea, but I think it has a couple of issues :-(

Thanks for the review!

1) We'd need to know the LSN of the last WAL record for any given sequence, and we'd need to communicate that between backends somehow. Which seems rather tricky to do without affecting performance.

How about using the page lsn for the sequence? nextval_internal() already uses that to check whether it's less than or equal to checkpoint redo location.


Hmm, maybe.


2) SyncRepWaitForLSN() is used only in commit-like situations, and it's a simple wait, not a decision to write more WAL. Environments without sync replicas are affected by this too - yes, the data loss issue is not there, but the amount of WAL is still increased.

How about reusing only a part of code in SyncRepWaitForLSN()? Attached is the PoC patch that implemented what I'm thinking.


IIRC sync_standby_names can change while a transaction is running, even just right before commit, at which point we can't just go back in time and generate WAL for sequences accessed earlier. But we still need to ensure the sequence is properly replicated.

Yes. In the PoC patch, SyncRepNeedsWait() still checks sync_standbys_defined and uses SyncRepWaitMode. But they should not be checked nor used because their values can be changed on the fly, as you pointed out. Probably SyncRepNeedsWait() will need to be changed so that it doesn't use them.


Right. I think the data loss with sync standby is merely a symptom, not the root cause. We'd need to deduce the LSN for which to wait at commit.


3) I don't think it'd actually reduce the amount of WAL records in environments with many sessions (incrementing the same sequence). In those cases the WAL (generated by in-progress xact from another session) is likely to not be flushed, so we'd generate the extra WAL record. (And if the other backends would need flush LSN of this new WAL record, which would make it more likely they have to generate WAL too.)

With the PoC patch, only when previous transaction that executed nextval() and caused WAL record is aborted, subsequent nextval() generates additional WAL record. So this approach can reduce WAL volume than other approach?
 > In the PoC patch, to reduce WAL volume more, it might be better to make
nextval_internal() update XactLastRecEnd and assign XID rather than emitting new WAL record, when SyncRepNeedsWait() returns true.


Yes, but I think there are other cases. For example the WAL might have been generated by another backend, in a transaction that might be still running. In which case I don't see how updating XactLastRecEnd in nextval_internal would fix this, right?

I did some experiments with increasing CACHE for the sequence, and that mostly eliminates the overhead. See the message I sent a couple minutes ago. IMHO that's a reasonable solution for the tiny number of people using nextval() in a way that'd be affected by this (i.e. without writing anything else in the xact).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to