On Tue, Apr 5, 2022 at 6:19 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > I don't think that saving extra index passes is really a small gain. > I think this will save a lot of IO if indexes pages are not in shared > buffers because here we are talking about we can completely avoid the > index passes for some of the indexes if it is already done. And if > this is the only advantage then it might not be worth adding this > infrastructure but what about global indexes?
Sure, I agree that the gain is large when the situation arises -- but in practice I think it's pretty rare that the dead TID array can't fit in maintenance_work_mem. In ten years of doing PostgreSQL support, I've seen only a handful of cases where # of index scans > 1, and those were solved by just increasing maintenance_work_mem until the problem went away. AFAICT, there's pretty much nobody who can't fit the dead TID list in main memory. They just occasionally don't configure enough memory for it to happen. It makes sense if you think about the math. Say you run with maintenance_work_mem=64MB. That's enough for 10 million dead TIDs. With default settings, the table becomes eligible for vacuuming when the number of updates and deletes exceeds 20% of the table. So to fill up that amount of memory, you need the table to have more than 50 million tuples. If you estimate (somewhat randomly) 100 tuples per page, that's 5 million pages, or 40GB. If you have a 40GB table, you don't have a problem with using 64MB of memory to vacuum it. And similarly if you have a 640GB table, you don't have a problem with using 1GB of memory to vacuum it. Practically speaking, if we made work memory for autovacuum unlimited, and allocated on demand as much as we need, I bet almost nobody would have an issue. > Because if we have global indexes then we must need this > infrastructure to store the dead items for the partition because for > example after vacuuming 1000 partitions while vacuuming the 1001st > partition if we need to vacuum the global index then we don't want to > rescan all the previous 1000 partitions to regenerate those old dead > items right? So I think this is the actual use case where we > indirectly skip the heap vacuuming for some of the partitions before > performing the index vacuum. Well I agree. But the problem is what development path we should pursue in terms of getting there. We want to do something that's going to make sense if and when we eventually get global indexes, but which is going to give us a good amount of benefit in the meanwhile, and also doesn't involve having to make too many changes to the code at the same time. I liked the idea of keeping VACUUM basically as it is today -- two heap passes with an index pass in the middle, but now with the conveyor injected -- because it keeps the code changes as simple as possible. And perhaps we should start by doing just that much. But now that I've realized that the benefit of doing only that much is so little, I'm a lot less convinced that it is a good first step. Any hope of getting a more significant benefit out of the conveyor belt stuff relies on our ability to get more decoupling, so that we for example collect dead TIDs on Tuesday, vacuum the indexes on Wednesday, and set the dead TIDs unused on Thursday, doing other things meanwhile. And from that point of view I see two problems. One problem is that I do not think we want to force all vacuuming through the conveyor belt model. It doesn't really make sense for a small table with no associated global indexes. And so then there is a code structure issue: how do we set things up so that we can vacuum as we do today, or alternatively vacuum in completely separate stages, without filling the code up with a million "if" statements? The other problem is understanding whether it's really feasible to postpone the index vacuuming and the second heap pass in realistic scenarios. Postponing index vacuuming and the second heap pass means that dead line pointers remain in the heap, and that can drive bloat via line pointer exhaustion. The whole idea of decoupling table and index vacuum supposes that there are situations in which it's worth performing the first heap pass where we gather the dead line pointers but where it's not necessary to follow that up as quickly as possible with a second heap pass to mark dead line pointers unused. I think Peter and I are in agreement that there are situations in which some indexes need to be vacuumed much more often than others -- but that doesn't matter if the heap needs to be vacuumed more frequently than anything else, because you can't do that without first vacuuming all the indexes. -- Robert Haas EDB: http://www.enterprisedb.com