On Thu, May 23, 2024 at 02:00:00PM +0300, Alexander Lakhin wrote: > I'd like to discuss ways to improve the buildfarm experience for anyone who > are interested in using information which buildfarm gives to us. > > Unless I'm missing something, as of now there are no means to determine > whether some concrete failure is known/investigated or fixed, how > frequently it occurs and so on... From my experience, it's not that > unbelievable that some failure occurred two years ago and lost in time was > an indication of e. g. a race condition still existing in the code/tests > and thus worth fixing. But without classifying/marking failures it's hard > to find such or other interesting failure among many others...
I agree this is an area of difficulty consuming buildfarm results. I have an inefficient template for studying a failure, which your proposals would help: **** grep recent -hackers for animal name **** search the log for ~10 strings (e.g. "was terminated") to find the real indicator of where it failed **** search mailing lists for that indicator **** search buildfarm database for that indicator > The first way to improve things I can imagine is to add two fields to the > buildfarm database: a link to the failure discussion (set when the failure > is investigated/reproduced and reported in -bugs or -hackers) and a commit > id/link (set when the failure is fixed). I understand that it requires I bet the hard part is getting data submissions, so I'd err on the side of making this as easy as possible for submitters. For example, accept free-form text for quick notes, not only URLs and commit IDs. > modifying the buildfarm code, and adding some UI to update these fields, > but it allows to add filters to see only unknown/non-investigated failures > in the buildfarm web interface later. > > The second way is to create a wiki page, similar to "PostgreSQL 17 Open > Items", say, "Known buildfarm test failures" and fill it like below: > <url to failure1> > <url to failure2> > ... > Useful info from the failure logs for reference > ... > <link to -hackers thread> > --- > This way is less invasive, but it would work well only if most of > interested people know of it/use it. > (I could start with the second approach, if you don't mind, and we'll see > how it works.) Certainly you doing (2) can only help, though it may help less than (1). I recommend considering what the buildfarm server could discover and publish on its own. Examples: - N members failed at the same step, in a related commit range. Those members are now mostly green. Defect probably got fixed quickly. - Log contains the following lines that are highly correlated with failure. The following other reports, if any, also contained them.