Andres Freund <and...@anarazel.de> writes: > On 2022-05-08 11:28:34 -0400, Tom Lane wrote: >> Per lapwing's latest results [1], this wasn't enough. I'm again thinking >> we should pull the whole test from the back branches.
> That failure is different from the earlier failures though. I don't think it's > a timing issue in the test like the deadlock check one. I rather suspect it's > indicative of further problems in this area. Yeah, that was my guess too. > Potentially the known problem > with RecoveryConflictInterrupt() running in the signal handler? I think Thomas > has a patch for that... Maybe; or given that it's on v10, it could be telling us about some yet-other problem we perhaps solved since then without realizing it needed to be back-patched. > One failure in ~20 runs, on one animal doesn't seem worth disabling the test > for. No one is going to thank us for shipping a known-unstable test case. It does nothing to fix the problem; all it will lead to is possible failures during package builds. I have no idea whether any packagers use "make check-world" rather than just "make check" while building. But if they do, even fairly low-probability failures can be problematic. (I still carry the scars I acquired while working at Red Hat and being responsible for packaging mysql: at least back then, their test suite was full of cases that mostly worked fine, except when getting stressed in Red Hat's build farm. Dealing with a test suite that fails 50% of the time under load, while trying to push out an urgent security fix, is NOT a pleasant situation.) I'm happy to have this test in the stable branches once we have committed fixes that address all known problems. Until then, it will just be a nuisance for anyone who is not a developer working on those problems. regards, tom lane