On Thu, May 21, 2020 at 03:23:11PM -0700, Nick Desaulniers wrote: > On Thu, May 21, 2020 at 6:00 AM Michael Ellerman <m...@ellerman.id.au> wrote: > > > > Nathan Chancellor <natechancel...@gmail.com> writes: > > > On Tue, May 19, 2020 at 05:56:32PM -0700, 'Nick Desaulniers' via Clang > > > Built Linux wrote: > > >> Looks like our CI is still red from this: > > >> > > >> https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/builds/166854584 > > >> > > >> Filing a bug to follow up on: > > >> https://github.com/ClangBuiltLinux/linux/issues/1031 > > >> > > >> On Thu, May 7, 2020 at 8:29 PM Michael Ellerman <m...@ellerman.id.au> > > >> wrote: > > >> > > > >> > Nick Desaulniers <ndesaulni...@google.com> writes: > > >> > > Looks like ppc64le powernv_defconfig is suddenly failing the locking > > >> > > torture tests, then locks up? > > >> > > https://travis-ci.com/github/ClangBuiltLinux/continuous-integration/jobs/329211572#L3111-L3167 > > >> > > Any recent changes related here in -next? I believe this is the > > >> > > first > > >> > > failure, so I'll report back if we see this again. > > >> > > > >> > Thanks for the report. > > >> > > > >> > There's nothing newly in next-20200507 that seems related. > > ... > > > > > > This is probably still a manifestation of > > > https://github.com/ClangBuiltLinux/continuous-integration/issues/262 > > > because rekicking the tests usually fixes it. > > I thought we had upgraded our version of QEMU in response to this already? > https://github.com/ClangBuiltLinux/dockerimage/pull/44 > https://github.com/ClangBuiltLinux/dockerimage/pull/46
That was more of a bandaid than an actual fix. It happens a lot less often with QEMU 4.2.0 but I could still reproduce that hang very sparingly with the POWER9 machines on it. My machines are way more powerful than the ones on Travis, which I am sure factors into that. the hang with the POWER9 machines very sparingly with QEMU 4.2.0 but The real solution is to upgrade to QEMU 5.0.0, which we could probably do via a PPA (or through our Docker image), or wait for QEMU 4.2.1, which should hopefully have that fix since it was CC'd for QEMU stable. > > > > Oh yep. > > > > I was looking at the RCU warning, which I still don't understand, but > > the lockup is presumably the same problem you hit with interrupts being > > lost. > > > > > We should probably just disable the torture tests like we do for x86_64 > > > for CI because we do not have access to QEMU 5.0.0 where this should be > > > fixed. I believe it is slated for 4.2.1 as well but we still have to > > > wait for that to be updated and packaged in Ubuntu. > > > > You just need to start building Qemu HEAD as part of your CI ;) > > LOL > https://github.com/ClangBuiltLinux/dockerimage/pull/46#pullrequestreview-395639442 > Yeah I think the hard part for all these dependendencies is the risk > of living on the edge of "top of tree" for all of them, and trying to > control for some by using stable releases. May not always be > possible. Unfortunately, we are at the mercy of a bunch of different parties. If only we had a ClangBuiltLinux build server that we maintained... Cheers, Nathan