On Fri, Jan 7, 2022 at 12:24 PM Robert Haas <robertmh...@gmail.com> wrote: > This seems like a weak argument. Sure, you COULD hard-code the limit > to be autovacuum_freeze_max_age/2 rather than making it a separate > tunable, but I don't think it's better. I am generally very skeptical > about the idea of using the same GUC value for multiple purposes, > because it often turns out that the optimal value for one purpose is > different than the optimal value for some other purpose.
I thought I was being conservative by suggesting autovacuum_freeze_max_age/2. My first thought was to teach VACUUM to make its FreezeLimit "OldestXmin - autovacuum_freeze_max_age". To me these two concepts really *are* the same thing: vacrel->FreezeLimit becomes a backstop, just as anti-wraparound autovacuum (the autovacuum_freeze_max_age cutoff) becomes a backstop. Of course, an anti-wraparound VACUUM will do early freezing in the same way as any other VACUUM will (with the patch series). So even when the FreezeLimit backstop XID cutoff actually affects the behavior of a given VACUUM operation, it may well not be the reason why most individual tuples that we freeze get frozen. That is, most individual heap pages will probably have tuples frozen for some other reason. Though it depends on workload characteristics, most individual heap pages will typically be frozen as a group, even here. This is a logical consequence of the fact that tuple freezing and advancing relfrozenxid are now only loosely coupled -- it's about as loose as the current relfrozenxid invariant will allow. > I feel generally that a lot of the argument you're making here > supposes that tables are going to get vacuumed regularly. > I agree that > IF tables are being vacuumed on a regular basis, and if as part of > that we always push relfrozenxid forward as far as we can, we will > rarely have a situation where aggressive strategies to avoid > wraparound are required. It's all relative. We hope that (with the patch) cases that only ever get anti-wraparound VACUUMs are limited to tables where nothing else drives VACUUM, for sensible reasons related to workload characteristics (like the pgbench_accounts example upthread). It's inevitable that some users will misconfigure the system, though -- no question about that. I don't see why users that misconfigure the system in this way should be any worse off than they would be today. They probably won't do substantially less freezing (usually somewhat more), and will advance pg_class.relfrozenxid in exactly the same way as today (usually a bit better, actually). What have I missed? Admittedly the design of the "Freeze tuples early to advance relfrozenxid" patch (i.e. v5-0005-*patch) is still unsettled; I need to verify that my claims about it are really robust. But as far as I know they are. Reviewers should certainly look at that with a critical eye. > Now, I agree with you in part: I don't think it's obvious that it's > useful to tune vacuum_freeze_table_age. That's definitely the easier argument to make. After all, vacuum_freeze_table_age will do nothing unless VACUUM runs before the anti-wraparound threshold (autovacuum_freeze_max_age) is reached. The patch series should be strictly better than that. Primarily because it's "continuous", and so isn't limited to cases where the table age falls within the "vacuum_freeze_table_age - autovacuum_freeze_max_age" goldilocks age range. > We should be VERY conservative about removing > existing settings if there's any chance that somebody could use them > to tune their way out of trouble. I agree, I suppose, but right now I honestly can't think of a reason why they would be useful. If I am wrong about this then I'm probably also wrong about some basic facet of the high-level design, in which case I should change course altogether. In other words, removing the GUCs is not an incidental thing. It's possible that I would never have pursued this project if I didn't first notice how wrong-headed the GUCs are. > So, let's see: if we see a page where the tuples are all-visible and > we seize the opportunity to freeze it, we can spare ourselves the need > to ever visit that page again (unless it gets modified). But if we > only mark it all-visible and leave the freezing for later, the next > aggressive vacuum will have to scan and dirty the page. I'm prepared > to believe that it's worth the cost of freezing the page in that > scenario. That's certainly the most compelling reason to perform early freezing. It's not completely free of downsides, but it's pretty close. > There's another situation in which vacuum_freeze_min_age could apply, > though: suppose the page isn't all-visible yet. I'd argue that in that > case we don't want to run around freezing stuff unless it's quite old > - like older than vacuum_freeze_table_age, say. Because we know we're > going to have to revisit this page in the next vacuum anyway, and > expending effort to freeze tuples that may be about to be modified > again doesn't seem prudent. So, hmm, on further reflection, maybe it's > OK to remove vacuum_freeze_min_age. But if we do, then I think we had > better carefully distinguish between the case where the page can > thereby be marked all-frozen and the case where it cannot. I guess you > say the same, further down. I do. Although v5-0005-*patch still freezes early when the page is dirtied by pruning, I have my doubts about that particular "freeze early" criteria. I believe that everything I just said about misconfigured autovacuums doesn't rely on anything more than the "most compelling scenario for early freezing" mechanism that arranges to make us set the all-frozen bit (not just the all-visible bit). > I mean, those kinds of pathological cases happen *all the time*. Sure, > there are plenty of users who don't leave cursors open. But the ones > who do don't leave them around for short periods of time on randomly > selected pages of the table. They are disproportionately likely to > leave them on the same table pages over and over, just like data can't > in general be assumed to be uniformly accessed. And not uncommonly, > they leave them around until the snow melts. > And we need to worry about those kinds of users, actually much more > than we need to worry about users doing normal things. I couldn't agree more. In fact, I was mostly thinking about how to *help* these users. Insisting on waiting for a cleanup lock before it becomes strictly necessary (when the table age is only 50 million/vacuum_freeze_min_age) is actually a big part of the problem for these users. vacuum_freeze_min_age enforces a false dichotomy on aggressive VACUUMs, that just isn't unhelpful. Why should waiting on a cleanup lock fix anything? Even in the extreme case where we are guaranteed to eventually have a wraparound failure in the end (due to an idle cursor in an unsupervised database), the user is still much better off, I think. We will have at least managed to advance relfrozenxid to the exact oldest XID on the one heap page that somebody holds an idle cursor (conflicting buffer pin) on. And we'll usually have frozen most of the tuples that need to be frozen. Sure, the user may need to use single-user mode to run a manual VACUUM, but at least this process only needs to freeze approximately one tuple to get the system back online again. If the DBA notices the problem before the database starts to refuse to allocate XIDs, then they'll have a much better chance of avoiding a wraparound failure through simple intervention (like killing the backend with the idle cursor). We can pay down 99.9% of the "freeze debt" independently of this intractable problem of something holding onto an idle cursor. > Honestly, > autovacuum on a system where things are mostly "normal" - no > long-running transactions, adequate resources for autovacuum to do its > job, reasonable configuration settings - isn't that bad. Right. Autovacuum is "too big to fail". > > But the "freeze early" heuristics work a bit like that anyway. We > > won't freeze all the tuples on a whole heap page early if we won't > > otherwise set the heap page to all-visible (not all-frozen) in the VM > > anyway. > > Hmm, I didn't realize that we had that. Is that an existing thing or > something new you're proposing to do? If existing, where is it? It's part of v5-0005-*patch. Still in flux to some degree, because it's necessary to balance a few things. That shouldn't undermine the arguments I've made here. > I agree that it's OK for this to become a purely backstop mechanism > ... but again, I think that the design of such backstop mechanisms > should be done as carefully as we know how, because users seem to hit > the backstop all the time. We want it to be made of, you know, nylon > twine, rather than, say, sharp nails. :-) Absolutely. But if autovacuum can only ever run due to age(relfrozenxid) reaching autovacuum_freeze_max_age, then I can't see a downside. Again, the v5-0005-*patch needs to meet the standard that I've laid out. If it doesn't then I've messed up already. -- Peter Geoghegan