David Rowley <dgrowle...@gmail.com> writes: > On Thu, 2 Apr 2020 at 16:13, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Quite :-(. While it's too early to declare victory, we've seen no >> more failures of this ilk since 0936d1b6f, so it's sure looking like >> autovacuum did have something to do with it.
> How about [1]? It seems related to me and also post 0936d1b6f. That looks much like the first lousyjack failure, which as I said I wasn't trying to account for at that point. After looking at those failures, though, I believe that the root cause may be the same, ie small changes in pg_class.reltuples due to autovacuum not seeing all pages of the tables. The test structure is a bit different, but it is accessing the tables in between EXPLAIN attempts, so it could be preventing a concurrent autovac from seeing all pages. I see your fix at cefb82d49, but it feels a bit brute-force. Unlike stats_ext.sql, we're not (supposed to be) dependent on exact planner estimates in this test. So I think the real problem here is crappy test case design. Namely, that these various sub-tables are exactly the same size, despite which the test is expecting that the planner will order them consistently --- with a planning algorithm that prefers to put larger tables first in parallel appends (cf. create_append_path). It's not surprising that the result is unstable in the face of small variations in the rowcount estimates. I'd be inclined to undo what you did in favor of initializing the test tables to contain significantly different numbers of rows, because that would (a) achieve plan stability more directly, and (b) demonstrate that the planner is actually ordering the tables by cost correctly. Maybe somewhere else we have a test that is verifying (b), but these test cases abysmally fail to check that point. I'm not really on board with disabling autovacuum in the regression tests anywhere we aren't absolutely forced to do so. It's not representative of real world practice (or at least not real world best practice ;-)) and it could help hide actual bugs. We don't seem to have much choice with the stats_ext tests as they are constituted, but those tests look really fragile to me. Let's not adopt that technique where we have other possible ways to stabilize test results. regards, tom lane