Hi,

On 2023-06-21 21:50:39 -0700, Noah Misch wrote:
> On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote:
> > A related issue is that as far as I can tell the determination of what is
> > bogus is bogus.
> > 
> > The relevant cutoffs are determined vac_update_datfrozenxid() using:
> > 
> >     /*
> >      * Identify the latest relfrozenxid and relminmxid values that we could
> >      * validly see during the scan.  These are conservative values, but it's
> >      * not really worth trying to be more exact.
> >      */
> >     lastSaneFrozenXid = ReadNextTransactionId();
> >     lastSaneMinMulti = ReadNextMultiXactId();
> > 
> > but doing checks based on thos is bogus, because:
> > 
> > a) a concurrent create table / truncate / vacuum can update
> >    pg_class.relfrozenxid of some relation in the current database to a newer
> >    value, after lastSaneFrozenXid already has been determined. If that
> >    happens, we skip updating pg_database.datfrozenxid.
> > 
> > b) A concurrent vacuum in another database, ending up in 
> > vac_truncate_clog(),
> >    can compute a newer datfrozenxid. In that case the vac_truncate_clog() 
> > with
> >    the outdated lastSaneFrozenXid will not truncate the clog (and also 
> > forget
> >    to release WrapLimitsVacuumLock currently, as reported upthread) and not
> >    call SetTransactionIdLimit(). The latter is particularly bad, because 
> > that
> >    means we might not come out of "database is not accepting commands" land.
> 
> > I guess we could just recompute the boundaries before actually believing the
> > catalog values are bogus?
> 
> That's how I'd do it.

I was looking at doing that and got confused by the current code. Am I missing
something, or does vac_truncate_clog() have two pretty much identical attempts
at a safety measures?

void
vac_update_datfrozenxid(void)
...
        lastSaneFrozenXid = ReadNextTransactionId();
...
                vac_truncate_clog(newFrozenXid, newMinMulti,
                                                  lastSaneFrozenXid, 
lastSaneMinMulti);
}
...
static void
vac_truncate_clog(TransactionId frozenXID,
                                  MultiXactId minMulti,
                                  TransactionId lastSaneFrozenXid,
                                  MultiXactId lastSaneMinMulti)
{
        TransactionId nextXID = ReadNextTransactionId();
...
                /*
                 * If things are working properly, no database should have a
                 * datfrozenxid or datminmxid that is "in the future".  
However, such
                 * cases have been known to arise due to bugs in pg_upgrade.  
If we
                 * see any entries that are "in the future", chicken out and 
don't do
                 * anything.  This ensures we won't truncate clog before those
                 * databases have been scanned and cleaned up.  (We will issue 
the
                 * "already wrapped" warning if appropriate, though.)
                 */
                if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) ||
                        MultiXactIdPrecedes(lastSaneMinMulti, datminmxid))
                        bogus = true;

                if (TransactionIdPrecedes(nextXID, datfrozenxid))
                        frozenAlreadyWrapped = true;

lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(),
that's the only difference afaict.


I guess this might be caused by 78db307bb23 adding the check, but using
GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed
soon after, in 87f830e0ce03.


Am I missing something?


Greetings,

Andres Freund


Reply via email to