Hi, On Tue, Feb 1, 2022 at 11:58 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Fri, Jun 11, 2021 at 10:19 AM Andres Freund <and...@anarazel.de> wrote: > > > > Hi, > > > > On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote: > > > Cool. Thank you for working on that! > > > Could you please share a WIP patch for the $subj? I'd be happy to help > > > with > > > it. > > > > I've attached the current WIP state, which hasn't evolved much since > > this message... I put the test in > > src/backend/access/heap/t/001_emergency_vacuum.pl > > but I'm not sure that's the best place. But I didn't think > > src/test/recovery is great either. > > > > Thank you for sharing the WIP patch. > > Regarding point (1) you mentioned (StartupSUBTRANS() takes a long time > for zeroing out all pages), how about using single-user mode instead > of preparing the transaction? That is, after pg_resetwal we check the > ages of datfrozenxid by executing a query in single-user mode. That > way, we don’t need to worry about autovacuum concurrently running > while checking the ages of frozenxids. I’ve attached a PoC patch that > does the scenario like: > > 1. start cluster with autovacuum=off and create tables with a few data > and make garbage on them > 2. stop cluster and do pg_resetwal > 3. start cluster in single-user mode > 4. check age(datfrozenxid) > 5. stop cluster > 6. start cluster and wait for autovacuums to increase template0, > template1, and postgres datfrozenxids
The above steps are wrong. I think we can expose a function in an extension used only by this test in order to set nextXid to a future value with zeroing out clog/subtrans pages. We don't need to fill all clog/subtrans pages between oldestActiveXID and nextXid. I've attached a PoC patch for adding this regression test and am going to register it to the next CF. BTW, while testing the emergency situation, I found there is a race condition where anti-wraparound vacuum isn't invoked with the settings autovacuum = off, autovacuum_max_workers = 1. AN autovacuum worker sends a signal to the postmaster after advancing datfrozenxid in SetTransactionIdLimit(). But with the settings, if the autovacuum launcher attempts to launch a worker before the autovacuum worker who has signaled to the postmaster finishes, the launcher exits without launching a worker due to no free workers. The new launcher won’t be launched until new XID is generated (and only when new XID % 65536 == 0). Although autovacuum_max_workers = 1 is not mandatory for this test, it's easier to verify the order of operations. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
v1-0001-Add-regression-tests-for-emergency-vacuums.patch
Description: Binary data