On Thu, Dec 18, 2025 at 10:46 AM Kirill Reshke <[email protected]> wrote: > > On Thu, 18 Dec 2025 at 20:18, Melanie Plageman > <[email protected]> wrote: > > > Also, after the whole set is committed, we should then never > > > experience discrepancy between PD_ALL_VISIBLE and VM bits? Because > > > they will be set in a single WAL record. The only cases when heap and > > > VM disagrees on all-visibility then are corruption, > > > pg_visibilitymap_truncate and old data (data before v19+ upgrade?) > > > If my understanding is correct, should we add document this? > > > > Even on current master, I don't see a scenario other than VM > > corruption or truncation where PD_ALL_VISIBLE can be set but not the > > VM (or vice versa). The only way would be if you error out after > > setting PD_ALL_VISIBLE before setting the VM. Setting PD_ALL_VISIBLE > > is not in a critical section in lazy_scan_prune(), so it won't panic > > and dump shared memory, so the buffer with PD_ALL_VISIBLE set may > > later get written out. But the only obvious way I see to error out of > > MarkBufferDirty() is if the buffer is not valid -- which would have > > kept us from doing previous operations on the buffer, I would think. > > Well... I may be missing something, but on current HEAD, > XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_VISIBLE are two different > record, XLOG_HEAP2_PRUNE_VACUUM_SCAN being always emitted first. So, > WAL writer may end up kill-9-ed just after > XLOG_HEAP2_PRUNE_VACUUM_SCAN makes it to the disk, and > XLOG_HEAP2_VISIBLE never. Crash recovery then, and we have > discrepancy. This does not happen with a single WAL record. > Another simple reproducer here: standby streaming, receiving > XLOG_HEAP2_PRUNE_VACUUM_SCAN from primary, Then network becomes bad, > and we never get XLOG_HEAP2_VISIBLE from primary. Then we promoted by > the admin. And again, VM bit vs PD_ALL_VISIBLE discrepancy. Am I > missing something?
Well, currently XLOG_HEAP2_PRUNE_VACUUM_SCAN doesn't set PD_ALL_VISIBLE. PD_ALL_VISIBLE is WAL-logged in the XLOG_HEAP2_VISIBLE record because in lazy_scan_prune() we call PageSetAllVisible() and then visibilitymap_set() -> log_heap_visible() adds the heap buffer to the WAL chain (with XLogRegisterBuffer()). And if you notice when XLOG_HEAP2_VISIBLE is replayed in heap_xlog_visible(), that is where we do PageSetAllVisible() on the heap page. So I think you can end up with PD_ALL_VISIBLE set if you error out precisely between setting it and WAL logging it because we don't set it in a critical section. But you can't end up with a WAL record that sets PD_ALL_VISIBLE and another one that sets the VM. Once we have my code changes, you can never end up with PD_ALL_VISIBLE set and the VM not set because they are in the same critical section and if we error out, it will cause a panic which will purge shared memory. - Melanie
