On Wed, Sep 11, 2019 at 3:09 PM Peter Geoghegan <p...@bowt.ie> wrote: > Hmm. So v12 seems to have some problems with the WAL logging for > posting list splits. With wal_debug = on and > wal_consistency_checking='all', I can get a replica to fail > consistency checking very quickly when "make installcheck" is run on > the primary
I see the bug here. The problem is that we WAL-log a version of the new item that already has its heap TID changed. On the primary, the call to _bt_form_newposting() has a new item with the original heap TID, which is then rewritten before being inserted -- that's correct. But during recovery, we *start out with* a version of the new item that *already* had its heap TID swapped. So we have nowhere to get the original heap TID from during recovery. Attached patch fixes the problem in a hacky way -- it WAL-logs the original heap TID, just in case. Obviously this fix isn't usable, but it should make the problem clearer. Can you come up with a proper fix, please? I can think of one way of doing it, but I'll leave the details to you. The same issue exists in _bt_split(), so the tests will still fail with wal_consistency_checking -- it just takes a lot longer to reach a point where an inconsistent page is found, because posting list splits that occur at the same point that we need to split a page are much rarer than posting list splits that occur when we simply need to insert, without splitting the page. I suggest using wal_consistency_checking to test the fix that you come up with. As I mentioned, I regularly use it. Also note that there are further subtleties to doing this within _bt_split() -- see the FIXME comments there. Thanks -- Peter Geoghegan
0001-Save-original-new-heap-TID-in-insert-WAL-record.patch
Description: Binary data