On Fri, Apr 08, 2022 at 09:01:11AM +0200, Jan Beulich wrote: > On 07.04.2022 10:45, osstest service owner wrote: > > flight 169199 xen-4.12-testing real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/169199/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 12 debian-hvm-install fail > > REGR. vs. 168480 > > While the subsequent flight passed, I thought I'd still look into > the logs here since the earlier flight had failed too. The state of > the machine when the debug keys were issued is somewhat odd (and > similar to the earlier failure's): 11 of the 56 CPUs try to > acquire (apparently) Dom0's event lock, from evtchn_move_pirqs(). > All other CPUs are idle. The test failed because the sole guest > didn't reboot in time. Whether the failure is actually connected to > this apparent lock contention is unclear, though. > > One can further see that really all about 70 ECS_PIRQ ports are > bound to vCPU 0 (which makes me wonder about lack of balancing > inside Dom0 itself, but that's unrelated). This means that all > other vCPU-s have nothing at all to do in evtchn_move_pirqs(). > Since this moving of pIRQ-s is an optimization (the value of which > has been put under question in the past, iirc), I wonder whether we > shouldn't add a check to the function for the list being empty > prior to actually acquiring the lock. I guess I'll make a patch and > post it as RFC.
Seems good to me. I think a better model would be to migrate the PIRQs when fired, or even better when EOI is performed? So that Xen doesn't pointlessly migrate PIRQs for vCPUs that aren't running. > And of course in a mostly idle system the other aspect here (again) > is: Why are vCPU-s moved across pCPU-s in the first place? I've > observed (and reported) such seemingly over-aggressive vCPU > migration before, most recently in the context of putting together > 'x86: make "dom0_nodes=" work with credit2'. Is there anything that > can be done about this in credit2? > > A final, osstest-related question is: Does it make sense to run Dom0 > on 56 vCPU-s, one each per pCPU? The bigger a system, the less > useful it looks to me to actually also have a Dom0 as big, when the > purpose of the system is to run guests, not meaningful other > workloads in Dom0. While this is Xen's default (i.e. in the absence > of command line options restricting Dom0), I don't think it's > representing typical use of Xen in the field. I could add a suitable dom0_max_vcpus parameter to osstest. XenServer uses 16 for example. Albeit not having such parameter has likely led you into figuring out this issue, so it might not be so bad. I agree however it's likely better to test scenarios closer to real world usage. Thanks, Roger.