Hi Hussein, Apologies for the very delayed response. I'm aware that you've taken an interest in this subject as part of your YouTube channel. Thanks for publicizing the work!
On Tue, Jul 12, 2022 at 7:14 PM PG Doc comments form <nore...@postgresql.org> wrote: > Would be nice to add a note: old tuple versions in the index referencing the > same logical row cannot be deleted by bottom up index deletion process when > older transactions that might require the old state the row are still > running It's really hard to write documentation for something like this, because it's difficult to decide what your audience really needs to know. I agree that it's important to get this specific point across, though. In fact I thought that I already conveyed the same idea at this point: "All indexes will need a successor physical index tuple that points to the latest version in the table. Each new tuple within each index will generally need to coexist with the original “updated” tuple for a short period of time (typically until shortly after the UPDATE transaction commits)." The implication is that we need the old version to coexist until after the updater transaction commits and is seen by every possible MVCC snapshot as having committed -- nobody sees the old version anymore. Maybe we could augment the existing sentences I have highlighted? Could it be more explicit? > That is what the LP_DEAD bit is for I believe? Sort of. You could say that the LP_DEAD bit being set means "this index tuple is definitely useless to everybody, and so is definitely eligible for deletion when the page fills up" -- so in that sense you're right. However, it doesn't work the other way around -- not every index tuple that is deleted (or that is eligible to be deleted) will have its LP_DEAD bit set. Bottom-up index deletion only happens when we have no LP_DEAD bits set on the leaf page (we always prefer to delete them via simple deletion instead). An index tuple's LP_DEAD bit is set by index scans that happen to notice that the tuple is not needed by any possible MVCC snapshot. This allows it to be ignored by other index scans immediately. It also allows simple deletion to physically remove the tuple later on (at the point that the page fills). Of course the LP_DEAD bit can only be set when such an index scan actually happens because some SQL queries happen to be executed by the user app. The nice thing about bottom-up index deletion is that it reliably places the burden of performing deletion on the specific non-HOT updaters that are responsible for bloating the index -- they are required to clean up their own mess, making it much less likely that index bloat will be very destabilizing. So the main difference between bottom-up deletion and simple deletion is the information that drives the whole process. The physical modifications to the index page are actually identical (the LP_DEAD bit is set by index scans, not the deletion process itself). -- Peter Geoghegan