Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state

Nikunj A Dadhania Thu, 08 Jun 2017 22:03:30 -0700

David Gibson <da...@gibson.dropbear.id.au> writes:

> On Thu, Jun 08, 2017 at 12:06:08PM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>
> Ouch.  When exactly did this happen?


Broken since long

> I know that smp boot used to work under TCG, albeit very slowly.

SMP boot works, its the reboot issued from the guest doesn't boot and
crashes in SLOF.

>> When reset happens, all the CPUs are in halted state. First CPU is brought 
>> out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>
> Ok.. how is that happening given that the secondary CPUs should have
> MSR[EE] == 0?

Basically, the CPU is in halted condition and has_work() does not check
for MSR_EE in that case. But I am not sure if checking MSR_EE is
sufficient, as the CPU does go to halted state (idle) while running as
well.

static bool cpu_has_work_POWER8(CPUState *cs)
{
    PowerPCCPU *cpu = POWERPC_CPU(cs);
    CPUPPCState *env = &cpu->env;

    if (cs->halted) {
       [ SNIP ]
       /* Does not check for msr_ee */
    } else {
        return msr_ee && (cs->interrupt_request & CPU_INTERRUPT_HARD);
    }
}

>
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> [snip]
>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index d10808d..eb88bcb 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1013,6 +1013,13 @@ struct CPUPPCState {
>>      int access_type; /* when a memory exception occurs, the access
>>                          type is stored here */
>>  
>> +    /* CPU in reset, shouldn't process any interrupts.
>> +     *
>> +     * Decrementer interrupts in TCG can still wake the CPU up. Make sure 
>> that
>> +     * when this variable is set, cpu_has_work_* should return false.
>> +     */
>> +    int in_reset;
>
> So I'd really rather not add another flag to the cpu structure,
> especially since we'd then need to migrate it as well.

I agree, Bharata and I did discuss about the migrate case. This patch
was to highlight the exact issue.

> I'm pretty sure there should be a way to inhibit the unwanted
> interrupts using existing mechanisms.

One of the thing that I had observed was msr had just MSR_SF bit set
during the reset case, we can test for that maybe.

The below works as well:

+        if ((env->msr & ~(1ull << MSR_SF)) == 0) {
+            return false;
+        }

>> +
>>      CPU_COMMON
>>  
>>      /* MMU context - only relevant for full system emulation */
>> diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
>> index 56a0ab2..64f4348 100644
>> --- a/target/ppc/translate_init.c
>> +++ b/target/ppc/translate_init.c
>> @@ -8561,6 +8561,9 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>>      CPUPPCState *env = &cpu->env;
>>  
>>      if (cs->halted) {
>> +        if (env->in_reset) {
>> +            return false;
>> +        }
>>          if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>              return false;
>>          }
>> @@ -8718,6 +8721,9 @@ static bool cpu_has_work_POWER8(CPUState *cs)
>>      CPUPPCState *env = &cpu->env;
>>  
>>      if (cs->halted) {
>> +        if (env->in_reset) {
>> +            return false;
>> +        }
>>          if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>              return false;
>>          }
>> @@ -8899,6 +8905,9 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>>      CPUPPCState *env = &cpu->env;
>>  
>>      if (cs->halted) {
>> +        if (env->in_reset) {
>> +            return false;
>> +        }
>>          if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>>              return false;
>>          }

Regards
Nikunj

Re: [Qemu-devel] [PATCH RFC] spapr: ignore interrupts during reset state

Reply via email to