On Tue, Jul 23, 2024 at 3:49 AM Andrew Dunstan <and...@dunslane.net> wrote: > > > On 2024-07-22 Mo 9:29 PM, Masahiko Sawada wrote: > > On Mon, Jul 22, 2024 at 12:53 PM Andrew Dunstan <and...@dunslane.net> wrote: > > On 2024-07-22 Mo 12:46 PM, Tom Lane wrote: > > Masahiko Sawada <sawada.m...@gmail.com> writes: > > Looking at dodo's failures, it seems that while it passes > module-xid_wraparound-check, all failures happened only during > testmodules-install-check-C. Can we check the server logs written > during xid_wraparound test in testmodules-install-check-C? > > Oooh, that is indeed an interesting observation. There are enough > examples now that it's hard to dismiss it as chance, but why would > the two runs be different? > > > It's not deterministic. > > I tested the theory that it was some other concurrent tests causing the > issue, but that didn't wash. Here's what I did: > > for f in `seq 1 100` > do echo iteration = $f > meson test --suite xid_wraparound || break > done > > It took until iteration 6 to get an error. I don't think my Ubuntu instance > is especially slow. e.g. "meson compile" normally takes a handful of seconds. > Maybe concurrent tests make it more likely, but they can't be the only cause. > > Could you provide server logs in both OK and NG tests? I want to see > if there's a difference in the rate at which tables are vacuumed. > > > See > <https://bitbucket.org/adunstan/rotfang-fdw/downloads/xid-wraparound-result.tar.bz2> > > > The failure logs are from a run where both tests 1 and 2 failed. >
Thank you for sharing the logs. I think that the problem seems to match what Alexander Lakhin mentioned[1]. Probably we can fix such a race condition somehow but I'm not sure it's worth it as setting autovacuum = off and autovacuum_max_workers = 1 (or a low number) is an extremely rare case. I think it would be better to stabilize these tests. One idea is to turn the autovacuum GUC parameter on while setting autovacuum_enabled = off for each table. That way, we can ensure that autovacuum workers are launched. And I think it seems to align real use cases. Regards, [1] https://www.postgresql.org/message-id/02373ec3-50c6-df5a-0d65-5b9b1c0c86d6%40gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com