[ returning to this thread after a bit of a hiatus ] On Fri, Feb 25, 2022 at 12:15 PM Peter Geoghegan <p...@bowt.ie> wrote: > I can only speak for myself, but that sounds correct to me. IMO what > we really want here is to create lots of options for VACUUM.
I agree. That seems like a good way of thinking about it. > I don't think we want to, exactly. Maybe it's easier to store > redundant TIDs than to avoid storing them in the first place (we can > probably just accept some redundancy). There is a trade-off to be made > there. I'm not at all sure of what the best trade-off is, though. My intuition is that storing redundant TIDs will turn out to be a very bad idea. I think that if we do that, the system will become prone to a new kind of vicious cycle (to try to put this in words similar to the ones you've been using to write about similar effects). Imagine that we vacuum a table which contains N dead tuples a total of K times in succession, but each time we either deliberately decide against index vacuuming or get killed before we can complete it. If we don't have logic to prevent duplicate entries from being added to the conveyor belt, we will have N*K TIDs in the conveyor belt rather than only N, and I think it's not hard to imagine situations where K could be big enough for that to be hugely painful. Moreover, at some point, we're going to need to deduplicate anyway, because each of those dead TIDs can only be marked unused once. Letting the data on the conveyor belt in the hopes of sorting it out later seems almost certain to be a losing proposition. The more data we collect on the conveyor belt, the harder it's going to be when we eventually need to deduplicate. Also, I think it's possible to deduplicate at a very reasonable cost so long as (1) we enter each batch of TIDs into the conveyor belt as a distinguishable "run" and (2) we never accumulate so many of these runs at the same time that we can't do a single merge pass to turn them into a sorted output stream. We're always going to discover dead TIDs in sorted order, so as we make our pass over the heap, we can simultaneously be doing an on-the-fly merge pass over the existing runs that are on the conveyor belt. That means we never need to store all the dead TIDs in memory at once. We just fetch enough data from each "run" to cover the block numbers for which we're performing the heap scan, and when the heap scan advances we throw away data for blocks that we've passed and fetch data for the blocks we're now reaching. > > but if you are going to rescan the heap > > again next time before doing any index vacuuming then why we want to > > store them anyway. > > It all depends, of course. The decision needs to be made using a cost > model. I suspect it will be necessary to try it out, and see. But having said that, coming back to this with fresh eyes, I think Dilip has a really good point here. If the first thing we do at the start of every VACUUM is scan the heap in a way that is guaranteed to rediscover all of the dead TIDs that we've previously added to the conveyor belt plus maybe also new ones, we may as well just forget the whole idea of having a conveyor belt at all. At that point we're just talking about a system for deciding when to skip index vacuuming, and the conveyor belt is a big complicated piece of machinery that stores data we don't really need for anything because if we threw it out the next vacuum would reconstruct it anyhow and we'd get exactly the same results with less code. The only way the conveyor belt system has any value is if we think that there is some set of circumstances where the heap scan is separated in time from the index vacuum, such that we might sometimes do an index vacuum without having done a heap scan just before. -- Robert Haas EDB: http://www.enterprisedb.com