Thomas Munro <thomas.mu...@gmail.com> writes: > On Mon, May 20, 2019 at 4:46 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> Note that in the discussion that led up to 624e440a, we never did >> think that we'd completely explained the original irreproducible >> failure. >> >> I think I've seen a couple of other cases of this same failure >> in the buildfarm recently, but too tired to go looking right now.
> I think it might be dependent on incidental vacuum/analyze activity > having updated reltuples. I got around to excavating in the buildfarm archives, and found a round dozen of more-or-less-similar incidents. I went back 18 months, which by coincidence (i.e., I didn't realize it till just now) is just about the time since 624e440a: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2018-01-14%2006%3A30%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2018-03-02%2011%3A30%3A19 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-11%2023%3A25%3A46 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-03-15%2000%3A02%3A04 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2018-04-05%2003%3A22%3A05 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2018-04-07%2018%3A32%3A02 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite&dt=2018-04-08%2019%3A55%3A06 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2018-04-23%2010%3A00%3A15 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-04-19%2001%3A50%3A08 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2019-04-23%2021%3A23%3A12 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-05-14%2014%3A59%3A43 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=aye-aye&dt=2019-05-19%2018%3A30%3A10 There are two really interesting things about this list: * All the failures are on HEAD. This implies that the issue was not there when we forked off v11, else we'd surely have seen an instance on that branch by now. The dates above are consistent with the idea that we eliminated the problem in roughly May 2018, and then it came back about a month ago. (Of course, maybe this just traces to unrelated changes in test timing.) * All the failures are in the pg_upgrade test (and some are before, some after, we switched that from serial to parallel schedule). This makes very little sense; how is that meaningfully different from the buildfarm's straight-up invocations of "make check" and "make installcheck"? Note that I excluded a bunch of cases where we managed to run select_parallel despite having suffered failures earlier in the test run, typically failures that caused the sanity_check test to not run. These led to diffs in the X_star queries that look roughly similar to these, but not the same. regards, tom lane