On Fri, Feb 21, 2025 at 5:09 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Feb 21, 2025 at 7:04 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > A very significant fraction of the buildfarm is now pink. > > If you don't have a fix pretty nearly ready, please revert. > > When we're going to do a release, you want no commits for at least 24 > hours before the release so that we can make sure the buildfarm is > clean. But when I commit something and the buildfarm fails, you want > it reverted within a handful of hours before I can even be sure of > having seen all the failures the commit caused, let alone had time to > think about what the best fix might be. It doesn't make sense to me > that we need 24 hours to be sure that the buildfarm is passing, but in > 3 hours I'm supposed to see all the failures -- including the ones > that only happen on buildfarm animals that run once a day, I guess? -- > and analyze them -- and decide on a fix -- naturally without any sort > of discussion because there's no time for that -- and code the fix -- > and push it. I have a really hard time seeing how that's a reasonable > expectation. > > I understand that the buildfarm can't be red all the time or nobody > can distinguish the problems they caused from preexisting ones.
I think you're right, and the solution has to be something like: 1. If it's > 24 hours until a release, then test failures on buildfarm must be allowed. 2. Buildfarm needs to track failures at a fine enough granularity that I can tell whether my change caused the failure. I've worked at a few database companies -- some had workable test frameworks, others didn't. The biggest mistake some companies made was to collapse an entire test suite into a single, binary result: "ALL PASSED" vs. "AT LEAST ONE FAILED." (It's fine to collapse the test suite *before you merge your commit,* because then you are testing *only* your commit. After you merge it, however, CI / buildfarm ends up testing *multiple* new commits at the same time, so the tests need to report at finer granularity.) If we can't do CI / buildfarm at the granularity of a single commit (and, at some point, no organization can -- the test universe becomes too large!), then we have to design for the possibility that one or more of multiple commits will cause a test failure, while the remaining commits are innocent. And so the test framework has to identify the (likely) culprit for a particular new test failure, and then give the author *enough time* to fix that failure. And while the author is fixing his failure, other authors need to be able to merge their commits, etc. (Note that actual *compilation* failures are more severe than test failures, because if PostgreSQL can't compile, then no one can get any work done if they rebase to TIP. But I think you are discussing tests, not compilation.) One company I worked for tracked test failures by individual unit test. So, "parallel" schedule would have 220+ tests, each tracking SUCCESS, [EXISTING] FAILURE, and NEW FAILURE. On every NEW FAILURE, the system notified the unit test owner (this is feasible inside a corporation, but perhaps not for an open source community) and concurrently started running the equivalent of "git bisect" to identify the offending commit. Once it identified (what it thought was) the offending commit, it notified that committer as well. > In the case of my commit today, the failures are the result of a > 2-line regression diff with no functional impact that neither CI nor > any of the 11 reviewers noticed. This is exactly the sort of case where tracking NEW FAILURE at the unit test level is useful. So you made a change that caused a regression diff, and the new output is better than the old (or at least neutral). Flag this individual unit test, send it to Robert, and development continues. Given enough unit tests, along with reasonable pre-commit CI testing, the probability of two commits causing regressions / diffs in the same unit test, over a 1-week period (let's say), is very small. So: +1 to your proposal. James