On Fri, Apr 23, 2021 at 04:42:56PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-06-15 11:37:59 -0700, Noah Misch wrote:
> > On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> > > On Mon, May 20, 2019 at 3:10 AM Jason Harvey <ja...@reddit.com> wrote:
> > > > This week I upgraded one of my large(2.8TB), high-volume databases from 
> > > > 9 to 11. The upgrade itself went fine. About two days later, we 
> > > > unexpectedly hit transaction ID wraparound. What was perplexing about 
> > > > this was that the age of our oldest `datfrozenxid` was only 1.2 billion 
> > > > - far away from where I'd expect a wraparound. Curiously, the 
> > > > wraparound error referred to a mysterious database of `OID 0`:
> > > >
> > > > UPDATE ERROR:  database is not accepting commands to avoid wraparound 
> > > > data loss in database with OID 0
> > 
> > That's bad.
> 
> Yea. The code triggering it in pg_resetwal is bogus as far as I can
> tell. That pg_upgrade triggers it makes this quite bad.
> 
> I just hit issues related to it when writing a wraparound handling
> test. Peter remembered this issue (how?)...
> 
> Especially before 13 (inserts triggering autovacuum) it is quite common
> to have tables that only ever get vacuumed due to anti-wraparound
> vacuums. And it's common for larger databases to increase
> autovacuum_freeze_max_age. Which makes it fairly likely for this to
> guess an oldestXid value that's *newer* than an accurate one. Since
> oldestXid is used in a few important-ish places (like triggering
> vacuums, and in 14 also some snapshot related logic) I think that's bad.
> 
> The relevant code:
> 
>     if (set_xid != 0)
>     {
>         ControlFile.checkPointCopy.nextXid =
>             
> FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
>                                              set_xid);
> 
>         /*
>          * For the moment, just set oldestXid to a value that will force
>          * immediate autovacuum-for-wraparound.  It's not clear whether adding
>          * user control of this is useful, so let's just do something that's
>          * reasonably safe.  The magic constant here corresponds to the
>          * maximum allowed value of autovacuum_freeze_max_age.
>          */
>         ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
>         if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
>             ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
>         ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
>     }
> 
> Originally from:
> 
> commit 25ec228ef760eb91c094cc3b6dea7257cc22ffb5
> Author: Tom Lane <t...@sss.pgh.pa.us>
> Date:   2009-08-31 02:23:23 +0000
> 
>     Track the current XID wrap limit (or more accurately, the oldest unfrozen
>     XID) in checkpoint records.  This eliminates the need to recompute the 
> value
>     from scratch during database startup, which is one of the two remaining
>     reasons for the flatfile code to exist.  It should also simplify life for
>     hot-standby operation.
> 
> I think we should remove the oldestXid guessing logic, and expose it as
> an explicit option. I think it's important that pg_upgrade sets an
> accurate value. Probably not worth caring about oldestXidDB though?

This (combination of) thread(s) seems relevant.

Subject: pg_upgrade failing for 200+ million Large Objects
https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com
https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e
> Allows the user to provide a constant via pg_upgrade command-line, that
>overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
>(window of) Transaction IDs available for pg_upgrade to complete.

-- 
Justin


Reply via email to