On Wednesday 05 November 2014 09:16 PM, Tom Musta wrote:
> On 11/5/2014 2:32 AM, Alexander Graf wrote:
>> On 05.11.14 08:13, Aravinda Prasad wrote:
>>> This patch adds FWNMI support in qemu for powerKVM
>>> guests by handling the ibm,nmi-register rtas call.
>>> Whenever OS issues ibm,nmi-register RTAS call, the
>>> machine check notification address is saved and the
>>> machine check interrupt vector 0x200 is patched to
>>> issue a private hcall.
>>> This patch also handles the cases when multi-processors
>>> experience machine check at or about the same time.
>>> As per PAPR, subsequent processors serialize waiting
>>> for the first processor to issue the ibm,nmi-interlock call.
>>> The second processor retries if the first processor which
>>> received a machine check is still reading the error log
>>> and is yet to issue ibm,nmi-interlock call.
>>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com>
>>> ---
>>>  hw/ppc/spapr_hcall.c            |   16 +++++++
>>>  hw/ppc/spapr_rtas.c             |   93 
>>> +++++++++++++++++++++++++++++++++++++++
>>>  include/hw/ppc/spapr.h          |   17 +++++++
>>>  pc-bios/spapr-rtas/spapr-rtas.S |   38 ++++++++++++++++
>>>  4 files changed, 163 insertions(+), 1 deletion(-)
>>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
>>> index 8f16160..eceb5e5 100644
>>> --- a/hw/ppc/spapr_hcall.c
>>> +++ b/hw/ppc/spapr_hcall.c
>>> @@ -97,6 +97,9 @@ struct rtas_mc_log {
>>>      struct rtas_error_log err_log;
>>>  };
>>> +/* Whether machine check handling is in progress by any CPU */
>>> +bool mc_in_progress;
>>> +
>>>  static void do_spr_sync(void *arg)
>>>  {
>>>      struct SPRSyncState *s = arg;
>>> @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, 
>>> sPAPREnvironment *spapr,
>>>      cpu_synchronize_state(CPU(ppc_env_get_cpu(env)));
>>>      /*
>>> +     * Only one VCPU can process machine check NMI at a time. Hence
>>> +     * set the lock mc_in_progress. Once the VCPU finishes processing
>>> +     * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset
>>> +     * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter
>>> +     * NMI we return 0 asking the VCPU to retry h_report_mc_err
>>> +     */
>>> +    if (mc_in_progress == 1) {
>> Please don't depend on bools being numbers. Use true / false. For if()s,
>> just don't use == at all - it makes it more readable.
>>> +        return 0;
>>> +    }
>>> +
>>> +    mc_in_progress = 1;
>>> +
>>> +    /*
>>>       * We save the original r3 register in SPRG2 in 0x200 vector,
>>>       * which is patched during call to ibm.nmi-register. Original
>>>       * r3 is required to be included in error log
>>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
>>> index 2ec2a8e..71c7662 100644
>>> --- a/hw/ppc/spapr_rtas.c
>>> +++ b/hw/ppc/spapr_rtas.c
>>> @@ -36,6 +36,9 @@
>>>  #include <libfdt.h>
>>> +#define BRANCH_INST_MASK  0xFC000000
>>> +extern bool mc_in_progress;
>> Please put this into the spapr struct.
>>> +
>>>  static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment 
>>> *spapr,
>>>                                     uint32_t token, uint32_t nargs,
>>>                                     target_ulong args,
>>> @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu,
>>>      rtas_st(rets, 0, ret);
>>>  }
>>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu,
>>> +                                  sPAPREnvironment *spapr,
>>> +                                  uint32_t token, uint32_t nargs,
>>> +                                  target_ulong args,
>>> +                                  uint32_t nret, target_ulong rets)
>>> +{
>>> +    int i;
>>> +    uint32_t ori_inst = 0x60630000;
>>> +    uint32_t branch_inst = 0x48000002;
>>> +    target_ulong guest_machine_check_addr;
>>> +    uint32_t trampoline[TRAMPOLINE_INSTS];
>>> +    int total_inst = sizeof(trampoline) / sizeof(uint32_t);
>> ARRAY_SIZE(trampoline), though I don't quite understand why you need a
>> variable that contains the same value as a constant (TRAMPOLINE_INSTS).
>> But since you're moving all of those bits into variable fields on the
>> rtas blob itself as we discussed in the last version, I guess this code
>> will go away anyways ;).
>>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>>> +
>>> +    /* Store the system reset and machine check address */
>>> +    guest_machine_check_addr = rtas_ld(args, 1);
>> Load or Store? I don't find the comment particularly useful either ;).
>>> +
>>> +    /*
>>> +     * Read the trampoline instructions from RTAS Blob and patch
>>> +     * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest
>>> +     * machine check address before copying to 0x200 vector
>>> +     */
>>> +    cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET,
>>> +                             trampoline, sizeof(trampoline));
>>> +
>>> +    /* Safety Check */
>> Same for this comment.
>>> +    QEMU_BUILD_BUG_ON(sizeof(trampoline) > MC_INTERRUPT_VECTOR_SIZE);
>>> +
>>> +    /* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */
>>> +    ori_inst |= KVMPPC_H_REPORT_MC_ERR;
>>> +    memcpy(&trampoline[TRAMPOLINE_ORI_INST_INDEX], &ori_inst,
>>> +            sizeof(ori_inst));
>> Why memcpy a u32 into a u32 array?
> Additionally, I don't see the need for the ori_inst *variable* .... the 
> instruction is known at compile time.
> So why not just do
>   trampoline[TRAMPOLINE_ORI_INST_INDEX] = 0x60630000 | KVMPPC_H_REPORT_MC_ERR;

I can directly do trampoline[TRAMPOLINE_ORI_INST_INDEX] |=

as trampoline[TRAMPOLINE_ORI_INST_INDEX] already contains 0x60630000

> Likewise for the branch_inst variable.
> Also see my comment in the trampoline code below.
>>> +
>>> +    /*
>>> +     * Sanity check guest_machine_check_addr to prevent clobbering
>>> +     * operator value in branch instruction
>>> +     */
>>> +    if (guest_machine_check_addr & BRANCH_INST_MASK) {
>>> +        fprintf(stderr, "Unable to register ibm,nmi_register: "
>>> +                "Invalid machine check handler address\n");
>> In general, printf's in guest triggerable code aren't a great idea,
>> since the guest could flood our host logs with this. I can't say we're
>> doing a great job at it already though, so it probably doesn't matter much.
>>> +        rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED);
> NIT:  Shouldn't this be RTAS_OUT_PARAM_ERR?  That is what SPAPR says (both 
> are implemented to be -3).

Yes, SPAPR says -3 Parameter Error. I think RTAS_OUT_PARAM_ERR is better
to be in consistent with SPAPR.

>>> +        return;
>>> +    }
>>> +
>>> +    /*
>>> +     * Update the branch instruction in trampoline
>>> +     * with the absolute machine check address requested by OS.
>>> +     */
>>> +    branch_inst |= guest_machine_check_addr;
>>> +    memcpy(&trampoline[TRAMPOLINE_BR_INST_INDEX], &branch_inst,
>>> +            sizeof(branch_inst));
>>> +
>>> +    /* Handle all Host/Guest LE/BE combinations */
>>> +    if ((*pcc->interrupts_big_endian)(cpu)) {
>>> +        for (i = 0; i < total_inst; i++) {
>>> +            trampoline[i] = cpu_to_be32(trampoline[i]);
>>> +        }
>>> +    } else {
>>> +        for (i = 0; i < total_inst; i++) {
>>> +            trampoline[i] = cpu_to_le32(trampoline[i]);
>>> +        }
>>> +    }
>>> +
>>> +    /* Patch 0x200 NMI interrupt vector memory area of guest */
>>> +    cpu_physical_memory_write(MC_INTERRUPT_VECTOR, trampoline,
>>> +                              sizeof(trampoline));
>>> +
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu,
>>> +                                   sPAPREnvironment *spapr,
>>> +                                   uint32_t token, uint32_t nargs,
>>> +                                   target_ulong args,
>>> +                                   uint32_t nret, target_ulong rets)
>>> +{
>>> +    /*
>>> +     * VCPU issuing ibm,nmi-interlock is done with NMI handling,
>>> +     * hence unset mc_in_progress.
>>> +     */
>>> +    mc_in_progress = 0;
>>> +    rtas_st(rets, 0, RTAS_OUT_SUCCESS);
>>> +}
>>> +
>>>  static struct rtas_call {
>>>      const char *name;
>>>      spapr_rtas_fn fn;
>>> @@ -419,6 +506,12 @@ static void core_rtas_register_types(void)
>>>                          rtas_ibm_set_system_parameter);
>>>      spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term",
>>>                          rtas_ibm_os_term);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_REGISTER,
>>> +                        "ibm,nmi-register",
>>> +                        rtas_ibm_nmi_register);
>>> +    spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK,
>>> +                        "ibm,nmi-interlock",
>>> +                        rtas_ibm_nmi_interlock);
>>>  }
>>>  type_init(core_rtas_register_types)
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index a2d67e9..98d0a6c 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -384,8 +384,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool 
>>> msi);
>>>  #define RTAS_GET_SENSOR_STATE                   (RTAS_TOKEN_BASE + 0x1D)
>>>  #define RTAS_IBM_OS_TERM                        (RTAS_TOKEN_BASE + 0x1F)
>>> +#define RTAS_IBM_NMI_REGISTER                   (RTAS_TOKEN_BASE + 0x20)
>>> +#define RTAS_IBM_NMI_INTERLOCK                  (RTAS_TOKEN_BASE + 0x21)
>>> -#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x20)
>>> +#define RTAS_TOKEN_MAX                          (RTAS_TOKEN_BASE + 0x22)
>>>  /* RTAS ibm,get-system-parameter token values */
>>> @@ -488,4 +490,17 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const 
>>> char *propname,
>>>  #define RTAS_TRAMPOLINE_OFFSET   0x200
>>>  #define RTAS_ERRLOG_OFFSET       0x800
>>> +/* Machine Check Trampoline related macros
>>> + *
>>> + * These macros should co-relate to the code we
>>> + * have in pc-bios/spapr-rtas/spapr-rtas.S
>>> + */
>>> +#define TRAMPOLINE_INSTS           17
>>> +#define TRAMPOLINE_BR_INST_INDEX   15
>>> +
>>> +/* Machine Check Interrupt related macros */
>>> +#define MC_INTERRUPT_VECTOR           0x200
>>> +#define MC_INTERRUPT_VECTOR_SIZE      0x100
>>> +
>>>  #endif /* !defined (__HW_SPAPR_H__) */
>>> diff --git a/pc-bios/spapr-rtas/spapr-rtas.S 
>>> b/pc-bios/spapr-rtas/spapr-rtas.S
>>> index 903bec2..c315332 100644
>>> --- a/pc-bios/spapr-rtas/spapr-rtas.S
>>> +++ b/pc-bios/spapr-rtas/spapr-rtas.S
>> Please add #defines at the top of the file for the register names:
>>   #define r0 0
>>   #define r1 1
>>   ...
>> That way the code below will get much more readable :)
>> Also, you want a jump table here as we discussed in the last review
>> round. That table would tell you
>>   a) Entry address for RTAS
>>   b) Offset of the NMI code
>>   c) To-be-patched offsets of the instructions inside the NMI code
>> Then we have all offsets automatically generated inside a single file
>> and don't have to maintain fragile relationships between random headers
>> with offset defines and the .S file.
>> Alex
>>> @@ -35,3 +35,41 @@ _start:
>>>     ori     3,3,KVMPPC_H_RTAS@l
>>>     sc      1
>>>     blr
>>> +   . = 0x200
>>> +   /*
>>> +    * Trampoline saves r3 in sprg2 and issues private hcall
>>> +    * to request qemu to build error log. QEMU builds the
>>> +    * error log, copies to rtas-blob and returns the address.
>>> +    * The initial 16 bytes in return adress consist of saved
>>> +    * srr0 and srr1 which we restore and pass on the actual error
>>> +    * log address to OS handled mcachine check notification
>>> +    * routine
>>> +    *
>>> +    * All the below instructions are copied to interrupt vector
>>> +    * 0x200 at the time of handling ibm,nmi-register rtas call.
>>> +    */
>>> +   mtsprg  2,3
>>> +   li      3,0
>>> +   /*
>>> +    * value is patched below
>>> +    */
>>> +1: ori     3,3,0
> Why do "li 3,0" followed by "ori 3,3,X"?  Isn't this just "li 3,X" ?  (aka 
> "addi 3,0,X")

I remember I first tried doing li r3,X but faced some problem (but not
able to exactly recall what was the problem) may be due to not familiar
with ppc assembly.

I will fix this.

> And, perhaps this was discussed in an earlier patch, but couldn't you just do:
> here and avoid the patching altogether?

KVMPPC_H_REPORT_MC_ERR def in not visible in spapr-rtas.S, either I can
define it in spapr-rtas.S as already done for KVMPPC_H_RTAS or patch it
in ibm,nmi-register call.

It is very unlikely that the KVMPPC_H_REPORT_MC_ERR will be changed, but
I prefer to patch it to avoid maintaining it in both places. What do you

>>> +   sc      1               /* Issue H_CALL */
>>> +   cmpdi   cr0,3,0
>>> +   beq     cr0,1b          /* retry KVMPPC_H_REPORT_MC_ERR */
>>> +   mtsprg  2,4
>>> +   ld      4,0(3)
>>> +   mtsrr0  4               /* Restore srr0 */
>>> +   ld      4,8(3)
>>> +   mtsrr1  4               /* Restore srr1 */
>>> +   ld      4,16(3)
>>> +   mtcrf   0,4             /* Restore cr */
>>> +   addi    3,3,24
>>> +   mfsprg  4,2
>>> +   /*
>>> +    * Branch to address registered by OS. The branch address is
>>> +    * patched in the ibm,nmi-register rtas call.
>>> +    */
>>> +   ba      0x0
>>> +   b       .


Reply via email to