On Sat, Jan 2, 2021 at 8:00 AM Michail Nikolaev <michail.nikol...@gmail.com> wrote: > Working on some stuff, I realized I do not understand why > latestRemovedXid|cuteoff_xid (in different types of WAL records) are > sent every time they appear on the primary side. > > latestRemovedXid|cuteoff_xid is used to call > ResolveRecoveryConflictWithSnapshot and cancel conflicting backend on > Standby. In some of the cases, snapshot conflict resolving is the only > work REDO does (heap_xlog_cleanup_info > or btree_xlog_reuse_page, for example).
But you can say the same thing about fields from certain WAL record types, too. It's not that uncommon for code to make a conceptually optional piece of information into a normal WAL record struct field, even though that approach has unnecessary space overhead in the cases that don't need the information. Often this makes hardly any difference due to factors like alignment and the simple fact that we don't expect very many WAL records (with or without the optional information) in practice. Of course, it's possible that the question of whether or not it's worth it has been misjudged for any given case. And maybe these particular WAL records are one such case where somebody got it wrong, affecting a real workload (I am ignoring the complexity of making it work for latestRemovedXid in particular for now). But I tend to doubt that the space saving would be noticeable, from what I've seen with pg_waldump. > Could we try to somehow optimistically advance the latest sent > latestRemovedXid value in shared memory on the primary and skip > sending it if the newer xid was sent already? In such a way we could > reduce the number of ResolveRecoveryConflictWithSnapshot calls on > Standby and even skip some WAL records. > > At least we could do the same optimization on the standby side > (skipping ResolveRecoveryConflictWithSnapshot if it was called with > newer xid already). Maybe it makes sense to add a fast path to ResolveRecoveryConflictWithSnapshot() so that it falls out early without scanning the proc array (in cases where it will still do so today, since of course ResolveRecoveryConflictWithSnapshot() has the obvious InvalidTransactionId fast path already). > Is it a sane idea or I have missed something huge? It seems like two almost distinct ideas to me. Though the important thing in both cases is the savings in real world conditions. -- Peter Geoghegan