Re: Improving tracking/processing of buildfarm test failures

Noah Misch Fri, 24 May 2024 13:00:57 -0700

On Thu, May 23, 2024 at 02:00:00PM +0300, Alexander Lakhin wrote:
> I'd like to discuss ways to improve the buildfarm experience for anyone who
> are interested in using information which buildfarm gives to us.
> 
> Unless I'm missing something, as of now there are no means to determine
> whether some concrete failure is known/investigated or fixed, how
> frequently it occurs and so on... From my experience, it's not that
> unbelievable that some failure occurred two years ago and lost in time was
> an indication of e. g. a race condition still existing in the code/tests
> and thus worth fixing. But without classifying/marking failures it's hard
> to find such or other interesting failure among many others...


I agree this is an area of difficulty consuming buildfarm results.  I have an
inefficient template for studying a failure, which your proposals would help:

**** grep recent -hackers for animal name
**** search the log for ~10 strings (e.g. "was terminated") to find the real 
indicator of where it failed
**** search mailing lists for that indicator
**** search buildfarm database for that indicator

> The first way to improve things I can imagine is to add two fields to the
> buildfarm database: a link to the failure discussion (set when the failure
> is investigated/reproduced and reported in -bugs or -hackers) and a commit
> id/link (set when the failure is fixed). I understand that it requires

I bet the hard part is getting data submissions, so I'd err on the side of
making this as easy as possible for submitters.  For example, accept free-form
text for quick notes, not only URLs and commit IDs.

> modifying the buildfarm code, and adding some UI to update these fields,
> but it allows to add filters to see only unknown/non-investigated failures
> in the buildfarm web interface later.
> 
> The second way is to create a wiki page, similar to "PostgreSQL 17 Open
> Items", say, "Known buildfarm test failures" and fill it like below:
> <url to failure1>
> <url to failure2>
> ...
> Useful info from the failure logs for reference
> ...
> <link to -hackers thread>
> ---
> This way is less invasive, but it would work well only if most of
> interested people know of it/use it.
> (I could start with the second approach, if you don't mind, and we'll see
> how it works.)

Certainly you doing (2) can only help, though it may help less than (1).


I recommend considering what the buildfarm server could discover and publish
on its own.  Examples:

- N members failed at the same step, in a related commit range.  Those members
  are now mostly green.  Defect probably got fixed quickly.

- Log contains the following lines that are highly correlated with failure.
  The following other reports, if any, also contained them.

Re: Improving tracking/processing of buildfarm test failures

Reply via email to