On 04/09/2019 09:36, Jan Beulich wrote: > On 03.09.2019 22:00, osstest service owner wrote: >> flight 140960 xen-unstable real [real] >> http://logs.test-lab.xenproject.org/osstest/logs/140960/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-amd64-xl-pvshim 18 guest-localmigrate/x10 fail REGR. vs. >> 139876 > This looks to be recurring, so I've taken another look.
I had a suspicion as well, but fixing the intermittent build problems was the first priority. A major change in shim in the range under test is switching from Credit1 to NULL as a scheduler, following Dario's fixing of what we thought was the final outstanding bug. Perhaps it wasn't the final bug... > The three > migrations leave this abbreviated pattern in the log: > > Sep 3 14:20:42.446667 (XEN) HVM d1v0 save: CPU_MSR > ... > Sep 3 14:20:57.850670 (XEN) HVM2 restore: CPU 0 > ... > Sep 3 14:21:37.062840 (XEN) HVM d2v0 save: CPU_MSR > Sep 3 14:21:37.062888 (XEN) HVM3 restore: CPU 0 > ... > Sep 3 14:21:56.438552 (XEN) HVM d3v0 save: CPU_MSR > ... > Sep 3 14:22:11.506508 (XEN) HVM4 restore: CPU 0 > > Therefore I wonder whether the first one got lucky and finished > barely ahead of timing out, while the 2nd worked instantly and the > 3rd then ended up exceeding the timeout. What is curious are the > intermediate log entries (between the last "save" and the first > corresponding "restore" log entries): Many ones of the form > > (XEN) emul-priv-op.c:1113:d0v2 Domain attempted WRMSR c0011020 from > 0x0000000000000000 to 0x0040000000000000 This is due to a lack of MSR_VIRT_SPEC_CTRL. It is sshd (or systemd on its behalf - unclear which) using the SSBD prctl to protect itself, and Xen, having no support, is causing Linux to fall back to native methods and falling fowl of Xens write/discard policy on MSRs. > with a 15s gap between the first and many subsequent ones) and > finally one of the form > > [ 451.267669] systemd-logind[2766]: New session 39 of user root. > > And finally, at around the time of the failed migration > > INIT: Id "T0" respawning too fast: disabled for 5 minutes Googling around suggests it is an inittab misconfiguration. > > While it's not clear that this parallel activity is causing the > migration to progress too slowly, it looks to be a possibility at > least. Can anyone explain what these are? > >> build-amd64-xsm 6 xen-build fail REGR. vs. >> 139876 > I take it that this is supposed to be taken care of by a342900d48 > ("tools/shim: Apply more duct tape to the linkfarm logic"). Yes - it should do. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel