On Thu, Jan 28, 2021 at 10:16 AM Michail Nikolaev <michail.nikol...@gmail.com> wrote: > > I wonder if it would help to not actually use the LP_DEAD bit for > > this. Instead, you could use the currently-unused-in-indexes > > LP_REDIRECT bit. > > Hm… Sound very promising - an additional bit is a lot in this situation.
Yeah, it would help a lot. But those bits are precious. So it makes sense to think about what to do with both of them in index AMs at the same time. Otherwise we risk missing some important opportunity. > > Whether or not "recently dead" means "dead to my > > particular MVCC snapshot" can be determined using some kind of > > in-memory state that won't survive a crash (or a per-index-page > > epoch?). > > Do you have any additional information about this idea? (maybe some thread). > What kind of “in-memory state that won't survive a crash” and how to deal > with flushed bits after the crash? Honestly, that part wasn't very well thought out. A lot of things might work. Some kind of "recently dead" bit is easier on the primary. If we have recently dead bits set on the primary (using a dedicated LP bit for original execution recently-dead-ness), then we wouldn't even necessarily have to change anything about index scans/visibility at all. There would still be a significant benefit if we simply used the recently dead bits when considering which heap blocks nbtree simple deletion will visit inside _bt_deadblocks() -- in practice there would probably be no real downside from assuming that the recently dead bits are now fully dead (it would sometimes be wrong, but not enough to matter, probably only when there is a snapshot held for way way too long). Deletion in indexes can work well while starting off with only an *approximate* idea of which index tuples will be safe to delete -- this is a high level idea behind my recent commit d168b666823. It seems very possible that that could be pushed even further here on the primary. On standbys (which set standby recently dead bits) it will be different, because you need "index hint bits" set that are attuned to the workload on the standby, and because you don't ever use the bit to help with deleting anything on the standby (that all happens during original execution). BTW, what happens when the page splits on the primary, btw? Does your patch "move over" the LP_DEAD bits to each half of the split? > Hm. What is about this way: > > 10 - dead to all on standby (LP_REDIRECT) > 11 - dead to all on primary (LP_DEAD) > 01 - future “recently DEAD” on primary (LP_NORMAL) Not sure. > Also, looks like both GIST and HASH indexes also do not use LP_REDIRECT. Right -- if we were to do this, the idea would be that it would apply to all index AMs that currently have (or will ever have) something like the LP_DEAD bit stuff. The GiST and hash support for index deletion is directly based on the original nbtree version, and there is no reason why we cannot eventually do all this stuff in at least those three AMs. There are already some line-pointer level differences in index AMs: LP_DEAD items have storage in index AMs, but not in heapam. This all-table-AMs/all-index-AMs divide in how item pointers work would be preserved. > Also, btw, do you know any reason to keep minRecoveryPoint at a low value? Not offhand. -- Peter Geoghegan