On Wed, Nov 30, 2016 at 07:19:12AM +0000, Peter Maydell wrote: > On 29 November 2016 at 19:38, Andrew Jones <drjo...@redhat.com> wrote: > > Thanks for making me look, I was simply assuming we were in the while > > loops above. > > > > I couldn't get the problem to reproduce with access to the monitor, > > but by adding '-d exec' I was able to see cpu0 was on the wfe in > > smp_boot_secondary. It should only stay there until cpu1 executes the > > sev in secondary_cinit, but it looks like TCG doesn't yet implement sev > > > > $ grep SEV target-arm/translate.c > > /* TODO: Implement SEV, SEVL and WFE. May help SMP performance. > > Yes, we currently NOP SEV. We only implement WFE as "yield back > to TCG top level loop", though, so this is fine. The idea is > that WFE gets used in busy loops so it's a helpful hint to > try running some other TCG vCPU instead of just spinning in > the guest on this one. Implementing SEV as a NOP and WFE as > a more-or-less NOP is architecturally permitted (guest code > is required to cope with WFE returning "early"). If something > is not working correctly then it's either buggy guest code > or a problem with the generic TCG scheduling of CPUs.
The problem is indeed with the scheduling. The way it currently works is to depend on the iothread to kick a reschedule once in a while, or a cpu to issue an instruction that does so (wfe/wfi). However if there's no io and a cpu never issues a scheduling instruction, then it won't happen. We either need a sched tick or to never have an infinite iothread ppoll timeout (basically using the ppoll timeout as a tick). As for being buggy guest code, I don't think so. Here's another unit test that illustrates the issue taking wfe/sev out. #include <asm/smp.h> void secondary(void) { printf("secondary running\n"); asm("yield"); /* A "real" guest cpu shouldn't do this, but even if it * does, that shouldn't stop other cpus from running. */ while(1); } int main(void) { smp_boot_secondary(1, secondary); printf("primary running\n"); asm("yield"); return 0; } With that test we get the two print statements, but it never exits. Now that I understand the problem much better, I think I may be coming full circle and advocating the iothread's ppoll never be allowed to have an infinite timeout again, but now only for tcg. Something like if (timeout < 0 && tcg_enabled()) timeout = TCG_SCHED_TICK; Thanks, drew