The ac_surface_meta_address_test timeout occurs rarely and it's because the test is computationally demanding. It's also possible the machine got slower for some reason.
Marek On Fri, Jan 7, 2022 at 12:32 PM Emma Anholt <e...@anholt.net> wrote: > On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott <cwabbo...@gmail.com> wrote: > > > > Unfortunately batch mode has only made it *worse* - I'm sure it's not > > intentional, but it seems that it's still running the CI pipelines > > individually after the batch pipeline passes and not merging them > > right away, which completely defeats the point. See, for example, > > !14213 which has gone through 8 cycles being batched with earlier MRs, > > 5 of those passing only to have an earlier job in the batch spuriously > > fail when actually merging and Marge seemingly giving up on merging it > > (???). As I type it was "lucky" enough to be the first job in a batch > > which passed and is currently running its pipeline and is blocked on > > iris-whl-traces-performance (I have !14453 to disable that broken job, > > but who knows with the Marge chaos when it's going to get merged...). > > > > Stepping back, I think it was a bad idea to push a "I think this might > > help" type change like this without first carefully monitoring things > > afterwards. An hour or so of babysitting Marge would've caught that > > this wasn't working, and would've prevented many hours of backlog and > > perception of general CI instability. > > I spent the day watching marge, like I do every day. Looking at the > logs, we got 0 MRs in during my work hours PST, out of about 14 or so > marge assignments that day. Leaving marge broken for the night would > have been indistinguishable from the status quo, was my assessment. > > There was definitely some extra spam about trying batches, more than > there were actual batches attempted. My guess would be gitlab > connection reliability stuff, but I'm not sure. > > Of the 5 batches marge attempted before the change was reverted, three > fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to > the git fetch fails, and one to a new timeout I don't think I've seen > before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731. > Of all the sub-MRs involved in those batches, I think two of those > might have gotten through by dodging the LAVA lab fail. Marge's batch > backoff did work, and !14436 and maybe !14433 landed during that time. >