On Wed, Jan 17, 2024 at 3:12 PM Melanie Plageman <melanieplage...@gmail.com> wrote: > > Reviewing 0001, consider the case where a table has no indexes. > > Pre-patch, PageTruncateLinePointerArray() will happen when > > lazy_vacuum_heap_page() is called; post-patch, it will not occur. > > That's a loss. Should we care? On the plus side, visibility map > > repair, if required, can now take place. That's a gain. > > I thought that this wasn't an issue because heap_page_prune_execute() > calls PageRepairFragmentation() which similarly modifies pd_lower and > sets the hint bit about free line pointers.
Ah, OK, I didn't understand that PageRepairFragmentation() does what is also done by PageTruncateLinePointerArray(). > Yes, I also spent some time thinking about this. In master, we do > always call lazy_scan_new_or_empty() before calling > lazy_scan_noprune(). The code is aiming to ensure we call > lazy_scan_new_or_empty() once before calling either of > lazy_scan_noprune() or lazy_scan_prune(). I think it didn't call > lazy_scan_new_or_empty() unconditionally first because of the > different lock types expected. But, your structure has solved that. > I've used a version of your example code above in attached v9. It is > much nicer. Oh, OK, I see it now. I missed that lazy_scan_new_or_empty() was called either way. Glad that my proposed restructuring managed to be helpful despite that confusion, though. :-) At a quick glance, I also like the way this looks. I'll review it more thoroughly later. Does this patch require 0002 and 0003 or could it equally well go first? I confess that I don't entirely understand why we want 0002 and 0003. > Ah, I realize I was not clear. I am now talking about inconsistencies > in vacuuming the FSM itself. FreeSpaceMapVacuumRange(). Not updating > the freespace map during the course of vacuuming the heap relation. Fair enough, but I'm still not quite sure exactly what the question is. It looks to me like the current code, when there are indexes, vacuums the FSM after each round of index vacuuming. When there are no indexes, doing it after each round of index vacuuming would mean never doing it, so instead we vacuum the FSM every ~8GB. I assume what happened here is that somebody decided doing it after each round of index vacuuming was the "right thing," and then realized that was not going to work if no index vacuuming was happening, and so inserted the 8GB threshold to cover that case. I don't really know what to make of all of this. On a happy PostgreSQL system, doing anything after each round of index vacuuming means doing it once, because multiple rounds of indexing vacuum are extremely expensive and we hope that it won't ever occur. From that point of view, the 8GB threshold is better, because it means that when we vacuum a large relation, space should become visible to the rest of the system incrementally without needing to wait for the entire vacuum to finish. On the other hand, we also have this idea that we want to record free space in the FSM once, after the last time we touch the page. Given that behavior, vacuuming the FSM every 8GB when we haven't yet done index vacuuming wouldn't accomplish much of anything, because we haven't updated it for the pages we just touched. On the third hand, the current behavior seems slightly ridiculous, because pruning the page is where we're mostly going to free up space, so we might be better off just updating the FSM then instead of waiting. That free space could be mighty useful during the long wait between pruning and marking line pointers unused. On the fourth hand, that seems like a significant behavior change that we might not want to undertake without a bunch of research that we might not want to do right now -- and if we did do it, should we then update the FSM a second time after marking line pointers unused? I'm not sure if any of this is answering your actual question, though. -- Robert Haas EDB: http://www.enterprisedb.com