On Thu, Mar 24, 2022 at 10:21 AM Robert Haas <robertmh...@gmail.com> wrote: > You're probably not going to love hearing this, but I think you're > still explaining things here in ways that are too baroque and hard to > follow. I do think it's probably better.
There are a lot of dimensions to this work. It's hard to know which to emphasize here. > But, for example, in the > commit message for 0001, I think you could change the subject line to > "Allow non-aggressive vacuums to advance relfrozenxid" and it would be > clearer. But non-aggressive VACUUMs have always been able to do that. How about: "Set relfrozenxid to oldest extant XID seen by VACUUM" > And then I think you could eliminate about half of the first > paragraph, starting with "There is no fixed relationship", and all of > the third paragraph (which starts with "Later work..."), and I think > removing all that material would make it strictly more clear than it > is currently. I don't think it's the place of a commit message to > speculate too much on future directions or to wax eloquent on > theoretical points. If that belongs anywhere, it's in a mailing list > discussion. Okay, I'll do that. > It seems to me that 0002 mixes code movement with functional changes. Believe it or not, I avoided functional changes in 0002 -- at least in one important sense. That's why you had difficulty spotting any. This must sound peculiar, since the commit message very clearly says that the commit avoids a problem seen only in the non-aggressive case. It's really quite subtle. You wrote this comment and code block (which I propose to remove in 0002), so clearly you already understand the race condition that I'm concerned with here: - if (skipping_blocks && blkno < rel_pages - 1) - { - /* - * Tricky, tricky. If this is in aggressive vacuum, the page - * must have been all-frozen at the time we checked whether it - * was skippable, but it might not be any more. We must be - * careful to count it as a skipped all-frozen page in that - * case, or else we'll think we can't update relfrozenxid and - * relminmxid. If it's not an aggressive vacuum, we don't - * know whether it was initially all-frozen, so we have to - * recheck. - */ - if (vacrel->aggressive || - VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer)) - vacrel->frozenskipped_pages++; - continue; - } What you're saying here boils down to this: it doesn't matter what the visibility map would say right this microsecond (in the aggressive case) were we to call VM_ALL_FROZEN(): we know for sure that the VM said that this page was all-frozen *in the recent past*. That's good enough; we will never fail to scan a page that might have an XID < OldestXmin (ditto for XMIDs) this way, which is all that really matters. This is absolutely mandatory in the aggressive case, because otherwise relfrozenxid advancement might be seen as unsafe. My observation is: Why should we accept the same race in the non-aggressive case? Why not do essentially the same thing in every VACUUM? In 0002 we now track if each range that we actually chose to skip had any all-visible (not all-frozen) pages -- if that happens then relfrozenxid advancement becomes unsafe. The existing code uses "vacrel->aggressive" as a proxy for the same condition -- the existing code reasons based on what the visibility map must have said about the page in the recent past. Which makes sense, but only works in the aggressive case. The approach taken in 0002 also makes the code simpler, which is what enabled putting the VM skipping code into its own helper function, but that was just a bonus. And so you could almost say that there is now behavioral change at all. We're skipping pages in the same way, based on the same information (from the visibility map) as before. We're just being a bit more careful than before about how that information is tracked, to avoid this race. A race that we always avoided in the aggressive case is now consistently avoided. > I'm completely on board with moving the code that decides how much to > skip into a function. That seems like a great idea, and probably > overdue. But it is not easy for me to see what has changed > functionally between the old and new code organization, and I bet it > would be possible to split this into two patches, one of which creates > a function, and the other of which fixes the problem, and I think that > would be a useful service to future readers of the code. It seems kinda tricky to split up 0002 like that. It's possible, but I'm not sure if it's possible to split it in a way that highlights the issue that I just described. Because we already avoided the race in the aggressive case. > I also think that the commit message for 0002 is probably longer and > more complex than is really helpful, and that the subject line is too > vague, but since I don't yet understand exactly what's happening here, > I cannot comment on how I think it should be revised at this point, > except to say that the second paragraph of that commit message looks > like the most useful part. I'll work on that. > I would also like to mention a few things that I do like about 0002. > One is that it seems to collapse two different pieces of logic for > page skipping into one. That seems good. As mentioned, it's especially > good because that logic is abstracted into a function. Also, it looks > like it is making a pretty localized change to one (1) aspect of what > VACUUM does -- and I definitely prefer patches that change only one > thing at a time. Totally embracing the idea that we don't necessarily need very recent information from the visibility map (it just has to be after OldestXmin was established) has a lot of advantages, architecturally. It could in principle be hours out of date in the longest VACUUM operations -- that should be fine. This is exactly the same principle that makes it okay to stick with our original rel_pages, even when the table has grown during a VACUUM operation (I documented this in commit 73f6ec3d3c recently). We could build on the approach taken by 0002 to create a totally comprehensive picture of the ranges we're skipping up-front, before we actually scan any pages, even with very large tables. We could in principle cache a very large number of skippable ranges up-front, without ever going back to the visibility map again later (unless we need to set a bit). It really doesn't matter if somebody else unsets a page's VM bit concurrently, at all. I see a lot of advantage to knowing our final scanned_pages almost immediately. Things like prefetching, capping the size of the dead_items array more intelligently (use final scanned_pages instead of rel_pages in dead_items_max_items()), improvements to progress reporting...not to mention more intelligent choices about whether we should try to advance relfrozenxid a bit earlier during non-aggressive VACUUMs. > Hope that's helpful. Very helpful -- thanks! -- Peter Geoghegan