Hi, Adding Amit, I think this is stuff he worked on...
On 2022-04-17 07:31:04 -0700, Andres Freund wrote: > On 2022-04-16 13:11:59 -0400, Tom Lane wrote: > > My pet dinosaur prairiedog just failed in the contrib/test_decoding > > tests [1]: > > > > diff -U3 > > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out > > > > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out > > --- > > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/stream.out > > 2022-04-15 07:59:17.000000000 -0400 > > +++ > > /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/results/stream.out > > 2022-04-15 09:06:36.000000000 -0400 > > @@ -77,10 +77,12 @@ > > streaming change for transaction > > streaming change for transaction > > streaming change for transaction > > + closing a streamed block for transaction > > + opening a streamed block for transaction > > streaming change for transaction > > closing a streamed block for transaction > > committing streamed transaction > > -(13 rows) > > +(15 rows) > > > > Looking at the postmaster log, it's obvious where this extra transaction > > came from: auto-analyze ran on pg_type concurrently with the test step > > just before this one. That could only happen if the tests ran long enough > > for autovacuum_naptime to elapse, but prairiedog is a pretty slow machine. > > (And I hasten to point out that some other animals, such as those running > > valgrind or CLOBBER_CACHE_ALWAYS, are even slower.) > > > > We've seen this sort of problem before [2], and attempted to fix it [3] > > by making these tests ignore empty transactions. But of course > > auto-analyze's transaction wasn't empty, so that didn't help. > > I don't quite understand this bit - the logic test_decoding uses to > decide if a transaction is "empty" is just whether a tuple was > output. And there shouldn't be any as part of auto-analyze, because we > don't decode catalog changes. I suspect there's something broken in the > streaming logic (potentially just in test_decoding) around > skip_empty_xacts. > > > > I think the most expedient way to prevent this type of failure is to run > > the test_decoding tests with autovacuum_naptime cranked up so far as to > > make it a non-issue, like maybe a day. Since test_decoding already adds > > some custom settings to postgresql.conf, this'll take just a one-line > > addition to test_decoding/logical.conf. > > I'm a bit worried about this approach - we've IIRC had past bugs that > came only to light because of autovacuum starting. I wonder if we rather > should do the opposite and reduce naptime so it'll be seen on fast > machines, rather than very slow ones. Greetings, Andres Freund