Hi, On 2023-06-21 21:50:39 -0700, Noah Misch wrote: > On Wed, Jun 21, 2023 at 05:46:37PM -0700, Andres Freund wrote: > > A related issue is that as far as I can tell the determination of what is > > bogus is bogus. > > > > The relevant cutoffs are determined vac_update_datfrozenxid() using: > > > > /* > > * Identify the latest relfrozenxid and relminmxid values that we could > > * validly see during the scan. These are conservative values, but it's > > * not really worth trying to be more exact. > > */ > > lastSaneFrozenXid = ReadNextTransactionId(); > > lastSaneMinMulti = ReadNextMultiXactId(); > > > > but doing checks based on thos is bogus, because: > > > > a) a concurrent create table / truncate / vacuum can update > > pg_class.relfrozenxid of some relation in the current database to a newer > > value, after lastSaneFrozenXid already has been determined. If that > > happens, we skip updating pg_database.datfrozenxid. > > > > b) A concurrent vacuum in another database, ending up in > > vac_truncate_clog(), > > can compute a newer datfrozenxid. In that case the vac_truncate_clog() > > with > > the outdated lastSaneFrozenXid will not truncate the clog (and also > > forget > > to release WrapLimitsVacuumLock currently, as reported upthread) and not > > call SetTransactionIdLimit(). The latter is particularly bad, because > > that > > means we might not come out of "database is not accepting commands" land. > > > I guess we could just recompute the boundaries before actually believing the > > catalog values are bogus? > > That's how I'd do it.
I was looking at doing that and got confused by the current code. Am I missing something, or does vac_truncate_clog() have two pretty much identical attempts at a safety measures? void vac_update_datfrozenxid(void) ... lastSaneFrozenXid = ReadNextTransactionId(); ... vac_truncate_clog(newFrozenXid, newMinMulti, lastSaneFrozenXid, lastSaneMinMulti); } ... static void vac_truncate_clog(TransactionId frozenXID, MultiXactId minMulti, TransactionId lastSaneFrozenXid, MultiXactId lastSaneMinMulti) { TransactionId nextXID = ReadNextTransactionId(); ... /* * If things are working properly, no database should have a * datfrozenxid or datminmxid that is "in the future". However, such * cases have been known to arise due to bugs in pg_upgrade. If we * see any entries that are "in the future", chicken out and don't do * anything. This ensures we won't truncate clog before those * databases have been scanned and cleaned up. (We will issue the * "already wrapped" warning if appropriate, though.) */ if (TransactionIdPrecedes(lastSaneFrozenXid, datfrozenxid) || MultiXactIdPrecedes(lastSaneMinMulti, datminmxid)) bogus = true; if (TransactionIdPrecedes(nextXID, datfrozenxid)) frozenAlreadyWrapped = true; lastSaneFrozenXid is a slightly older version of ReadNextTransactionId(), that's the only difference afaict. I guess this might be caused by 78db307bb23 adding the check, but using GetOldestXmin(NULL, true) to determine lastSaneFrozenXid. That was changed soon after, in 87f830e0ce03. Am I missing something? Greetings, Andres Freund