On Fri, Apr 23, 2021 at 04:42:56PM -0700, Andres Freund wrote: > Hi, > > On 2019-06-15 11:37:59 -0700, Noah Misch wrote: > > On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote: > > > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <ja...@reddit.com> wrote: > > > > This week I upgraded one of my large(2.8TB), high-volume databases from > > > > 9 to 11. The upgrade itself went fine. About two days later, we > > > > unexpectedly hit transaction ID wraparound. What was perplexing about > > > > this was that the age of our oldest `datfrozenxid` was only 1.2 billion > > > > - far away from where I'd expect a wraparound. Curiously, the > > > > wraparound error referred to a mysterious database of `OID 0`: > > > > > > > > UPDATE ERROR: database is not accepting commands to avoid wraparound > > > > data loss in database with OID 0 > > > > That's bad. > > Yea. The code triggering it in pg_resetwal is bogus as far as I can > tell. That pg_upgrade triggers it makes this quite bad. > > I just hit issues related to it when writing a wraparound handling > test. Peter remembered this issue (how?)... > > Especially before 13 (inserts triggering autovacuum) it is quite common > to have tables that only ever get vacuumed due to anti-wraparound > vacuums. And it's common for larger databases to increase > autovacuum_freeze_max_age. Which makes it fairly likely for this to > guess an oldestXid value that's *newer* than an accurate one. Since > oldestXid is used in a few important-ish places (like triggering > vacuums, and in 14 also some snapshot related logic) I think that's bad. > > The relevant code: > > if (set_xid != 0) > { > ControlFile.checkPointCopy.nextXid = > > FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid), > set_xid); > > /* > * For the moment, just set oldestXid to a value that will force > * immediate autovacuum-for-wraparound. It's not clear whether adding > * user control of this is useful, so let's just do something that's > * reasonably safe. The magic constant here corresponds to the > * maximum allowed value of autovacuum_freeze_max_age. > */ > ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000; > if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId) > ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId; > ControlFile.checkPointCopy.oldestXidDB = InvalidOid; > } > > Originally from: > > commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5 > Author: Tom Lane <t...@sss.pgh.pa.us> > Date: 2009-08-31 02:23:23 +0000 > > Track the current XID wrap limit (or more accurately, the oldest unfrozen > XID) in checkpoint records. This eliminates the need to recompute the > value > from scratch during database startup, which is one of the two remaining > reasons for the flatfile code to exist. It should also simplify life for > hot-standby operation. > > I think we should remove the oldestXid guessing logic, and expose it as > an explicit option. I think it's important that pg_upgrade sets an > accurate value. Probably not worth caring about oldestXidDB though?
This (combination of) thread(s) seems relevant. Subject: pg_upgrade failing for 200+ million Large Objects https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e > Allows the user to provide a constant via pg_upgrade command-line, that >overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the >(window of) Transaction IDs available for pg_upgrade to complete. -- Justin