On Tue, Nov 29, 2022 at 9:35 PM Chris Travers <ch...@orioledata.com> wrote: > My proposal would be to make the threshold configurable and start warning on > every transaction after that. There are a couple reasons to do that. > > The first is that noisy warnings are extremely easy to see. You get them in > cron emails, from psql, in the db logs etc. Having them every million makes > them harder to catch. > > The point here is not to ensure there are no problems, but to make sure that > an existing layer in the current swiss cheese model of safety doesn't go > away. Will it stop all problems? No. But the current warning strategy is > effective, given how many times we hear of cases of people having to take > drastic action to avoid impending xid wraparound. > > If someone has an insert only database and maye doesn't want to ever freeze, > they can set the threshold to -1 or something. I would suggest keeping the > default as at 2 billion to be in line with existing limitations and > practices. People can then adjust as they see fit. > > Warning text might be something like "XID Lag Threshold Exceeded. Is > autovacuum clearing space and keeping up?"
None of this seems unreasonable to me. If we want to allow more configurability, we could also let you choose the threshold and the frequency of warnings (every N transactions). But, I think we might be getting down a little bit in the weeds. It's not clear that everybody's on board with the proposed page format changes. I'm not completely opposed, but I'm also not wild about the approach. It's probably not a good idea to spend all of our energy debating the details of how to reform xidWrapLimit without having some consensus on those points. It is, in a word, bikeshedding: on-disk page format changes are hard, but everyone understands warning messages. Lest we miss the forest for the trees, there is an aspect of this patch that I find to be an extremely good idea and think we should try to get committed even if the rest of the patch set ends up in the rubbish bin. Specifically, there are a couple of patches in here that have to do with making SLRUs indexed by 64-bit integers rather than by 32-bit integers. We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Just by chance, I ran across a situation just yesterday where an SLRU wrapped around on disk for reasons that I don't really understand yet and chaos ensued. Switching to an indexing system for SLRUs that does not ever wrap around would probably enable us to get rid of a whole bunch of crufty code, and would also likely improve the general reliability of the system in situations where wraparound is threatened. It seems like a really, really good idea. I haven't checked the patches to see whether they look correct, and I'm concerned in particular about upgrade scenarios. But if there's a way we can get that part committed, I think it would be a clear win. -- Robert Haas EDB: http://www.enterprisedb.com