On Wednesday 05 November 2014 09:16 PM, Tom Musta wrote: > On 11/5/2014 2:32 AM, Alexander Graf wrote: >> >> >> On 05.11.14 08:13, Aravinda Prasad wrote: >>> This patch adds FWNMI support in qemu for powerKVM >>> guests by handling the ibm,nmi-register rtas call. >>> Whenever OS issues ibm,nmi-register RTAS call, the >>> machine check notification address is saved and the >>> machine check interrupt vector 0x200 is patched to >>> issue a private hcall. >>> >>> This patch also handles the cases when multi-processors >>> experience machine check at or about the same time. >>> As per PAPR, subsequent processors serialize waiting >>> for the first processor to issue the ibm,nmi-interlock call. >>> The second processor retries if the first processor which >>> received a machine check is still reading the error log >>> and is yet to issue ibm,nmi-interlock call. >>> >>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com> >>> --- >>> hw/ppc/spapr_hcall.c | 16 +++++++ >>> hw/ppc/spapr_rtas.c | 93 >>> +++++++++++++++++++++++++++++++++++++++ >>> include/hw/ppc/spapr.h | 17 +++++++ >>> pc-bios/spapr-rtas/spapr-rtas.S | 38 ++++++++++++++++ >>> 4 files changed, 163 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >>> index 8f16160..eceb5e5 100644 >>> --- a/hw/ppc/spapr_hcall.c >>> +++ b/hw/ppc/spapr_hcall.c >>> @@ -97,6 +97,9 @@ struct rtas_mc_log { >>> struct rtas_error_log err_log; >>> }; >>> >>> +/* Whether machine check handling is in progress by any CPU */ >>> +bool mc_in_progress; >>> + >>> static void do_spr_sync(void *arg) >>> { >>> struct SPRSyncState *s = arg; >>> @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, >>> sPAPREnvironment *spapr, >>> cpu_synchronize_state(CPU(ppc_env_get_cpu(env))); >>> >>> /* >>> + * Only one VCPU can process machine check NMI at a time. Hence >>> + * set the lock mc_in_progress. Once the VCPU finishes processing >>> + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset >>> + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter >>> + * NMI we return 0 asking the VCPU to retry h_report_mc_err >>> + */ >>> + if (mc_in_progress == 1) { >> >> Please don't depend on bools being numbers. Use true / false. For if()s, >> just don't use == at all - it makes it more readable. >> >>> + return 0; >>> + } >>> + >>> + mc_in_progress = 1; >>> + >>> + /* >>> * We save the original r3 register in SPRG2 in 0x200 vector, >>> * which is patched during call to ibm.nmi-register. Original >>> * r3 is required to be included in error log >>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >>> index 2ec2a8e..71c7662 100644 >>> --- a/hw/ppc/spapr_rtas.c >>> +++ b/hw/ppc/spapr_rtas.c >>> @@ -36,6 +36,9 @@ >>> >>> #include <libfdt.h> >>> >>> +#define BRANCH_INST_MASK 0xFC000000 >>> +extern bool mc_in_progress; >> >> Please put this into the spapr struct. >> >>> + >>> static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment >>> *spapr, >>> uint32_t token, uint32_t nargs, >>> target_ulong args, >>> @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu, >>> rtas_st(rets, 0, ret); >>> } >>> >>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu, >>> + sPAPREnvironment *spapr, >>> + uint32_t token, uint32_t nargs, >>> + target_ulong args, >>> + uint32_t nret, target_ulong rets) >>> +{ >>> + int i; >>> + uint32_t ori_inst = 0x60630000; >>> + uint32_t branch_inst = 0x48000002; >>> + target_ulong guest_machine_check_addr; >>> + uint32_t trampoline[TRAMPOLINE_INSTS]; >>> + int total_inst = sizeof(trampoline) / sizeof(uint32_t); >> >> ARRAY_SIZE(trampoline), though I don't quite understand why you need a >> variable that contains the same value as a constant (TRAMPOLINE_INSTS). >> >> But since you're moving all of those bits into variable fields on the >> rtas blob itself as we discussed in the last version, I guess this code >> will go away anyways ;). >> >>> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); >>> + >>> + /* Store the system reset and machine check address */ >>> + guest_machine_check_addr = rtas_ld(args, 1); >> >> Load or Store? I don't find the comment particularly useful either ;). >> >>> + >>> + /* >>> + * Read the trampoline instructions from RTAS Blob and patch >>> + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest >>> + * machine check address before copying to 0x200 vector >>> + */ >>> + cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET, >>> + trampoline, sizeof(trampoline)); >>> + >>> + /* Safety Check */ >> >> Same for this comment. >> >>> + QEMU_BUILD_BUG_ON(sizeof(trampoline) > MC_INTERRUPT_VECTOR_SIZE); >>> + >>> + /* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */ >>> + ori_inst |= KVMPPC_H_REPORT_MC_ERR; >>> + memcpy(&trampoline[TRAMPOLINE_ORI_INST_INDEX], &ori_inst, >>> + sizeof(ori_inst)); >> >> Why memcpy a u32 into a u32 array? > > Additionally, I don't see the need for the ori_inst *variable* .... the > instruction is known at compile time. > So why not just do > > trampoline[TRAMPOLINE_ORI_INST_INDEX] = 0x60630000 | KVMPPC_H_REPORT_MC_ERR;
I can directly do trampoline[TRAMPOLINE_ORI_INST_INDEX] |= KVMPPC_H_REPORT_MC_ERR; as trampoline[TRAMPOLINE_ORI_INST_INDEX] already contains 0x60630000 > > Likewise for the branch_inst variable. > > Also see my comment in the trampoline code below. >> >>> + >>> + /* >>> + * Sanity check guest_machine_check_addr to prevent clobbering >>> + * operator value in branch instruction >>> + */ >>> + if (guest_machine_check_addr & BRANCH_INST_MASK) { >>> + fprintf(stderr, "Unable to register ibm,nmi_register: " >>> + "Invalid machine check handler address\n"); >> >> In general, printf's in guest triggerable code aren't a great idea, >> since the guest could flood our host logs with this. I can't say we're >> doing a great job at it already though, so it probably doesn't matter much. >> >>> + rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED); > > NIT: Shouldn't this be RTAS_OUT_PARAM_ERR? That is what SPAPR says (both > are implemented to be -3). Yes, SPAPR says -3 Parameter Error. I think RTAS_OUT_PARAM_ERR is better to be in consistent with SPAPR. > >>> + return; >>> + } >>> + >>> + /* >>> + * Update the branch instruction in trampoline >>> + * with the absolute machine check address requested by OS. >>> + */ >>> + branch_inst |= guest_machine_check_addr; >>> + memcpy(&trampoline[TRAMPOLINE_BR_INST_INDEX], &branch_inst, >>> + sizeof(branch_inst)); >>> + >>> + /* Handle all Host/Guest LE/BE combinations */ >>> + if ((*pcc->interrupts_big_endian)(cpu)) { >>> + for (i = 0; i < total_inst; i++) { >>> + trampoline[i] = cpu_to_be32(trampoline[i]); >>> + } >>> + } else { >>> + for (i = 0; i < total_inst; i++) { >>> + trampoline[i] = cpu_to_le32(trampoline[i]); >>> + } >>> + } >>> + >>> + /* Patch 0x200 NMI interrupt vector memory area of guest */ >>> + cpu_physical_memory_write(MC_INTERRUPT_VECTOR, trampoline, >>> + sizeof(trampoline)); >>> + >>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >>> +} >>> + >>> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu, >>> + sPAPREnvironment *spapr, >>> + uint32_t token, uint32_t nargs, >>> + target_ulong args, >>> + uint32_t nret, target_ulong rets) >>> +{ >>> + /* >>> + * VCPU issuing ibm,nmi-interlock is done with NMI handling, >>> + * hence unset mc_in_progress. >>> + */ >>> + mc_in_progress = 0; >>> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >>> +} >>> + >>> static struct rtas_call { >>> const char *name; >>> spapr_rtas_fn fn; >>> @@ -419,6 +506,12 @@ static void core_rtas_register_types(void) >>> rtas_ibm_set_system_parameter); >>> spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term", >>> rtas_ibm_os_term); >>> + spapr_rtas_register(RTAS_IBM_NMI_REGISTER, >>> + "ibm,nmi-register", >>> + rtas_ibm_nmi_register); >>> + spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, >>> + "ibm,nmi-interlock", >>> + rtas_ibm_nmi_interlock); >>> } >>> >>> type_init(core_rtas_register_types) >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >>> index a2d67e9..98d0a6c 100644 >>> --- a/include/hw/ppc/spapr.h >>> +++ b/include/hw/ppc/spapr.h >>> @@ -384,8 +384,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool >>> msi); >>> #define RTAS_GET_SENSOR_STATE (RTAS_TOKEN_BASE + 0x1D) >>> #define RTAS_IBM_CONFIGURE_CONNECTOR (RTAS_TOKEN_BASE + 0x1E) >>> #define RTAS_IBM_OS_TERM (RTAS_TOKEN_BASE + 0x1F) >>> +#define RTAS_IBM_NMI_REGISTER (RTAS_TOKEN_BASE + 0x20) >>> +#define RTAS_IBM_NMI_INTERLOCK (RTAS_TOKEN_BASE + 0x21) >>> >>> -#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x20) >>> +#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x22) >>> >>> /* RTAS ibm,get-system-parameter token values */ >>> #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS 20 >>> @@ -488,4 +490,17 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const >>> char *propname, >>> #define RTAS_TRAMPOLINE_OFFSET 0x200 >>> #define RTAS_ERRLOG_OFFSET 0x800 >>> >>> +/* Machine Check Trampoline related macros >>> + * >>> + * These macros should co-relate to the code we >>> + * have in pc-bios/spapr-rtas/spapr-rtas.S >>> + */ >>> +#define TRAMPOLINE_INSTS 17 >>> +#define TRAMPOLINE_ORI_INST_INDEX 2 >>> +#define TRAMPOLINE_BR_INST_INDEX 15 >>> + >>> +/* Machine Check Interrupt related macros */ >>> +#define MC_INTERRUPT_VECTOR 0x200 >>> +#define MC_INTERRUPT_VECTOR_SIZE 0x100 >>> + >>> #endif /* !defined (__HW_SPAPR_H__) */ >>> diff --git a/pc-bios/spapr-rtas/spapr-rtas.S >>> b/pc-bios/spapr-rtas/spapr-rtas.S >>> index 903bec2..c315332 100644 >>> --- a/pc-bios/spapr-rtas/spapr-rtas.S >>> +++ b/pc-bios/spapr-rtas/spapr-rtas.S >> >> Please add #defines at the top of the file for the register names: >> >> #define r0 0 >> #define r1 1 >> ... >> >> That way the code below will get much more readable :) >> >> Also, you want a jump table here as we discussed in the last review >> round. That table would tell you >> >> a) Entry address for RTAS >> b) Offset of the NMI code >> c) To-be-patched offsets of the instructions inside the NMI code >> >> Then we have all offsets automatically generated inside a single file >> and don't have to maintain fragile relationships between random headers >> with offset defines and the .S file. >> >> >> Alex >> >>> @@ -35,3 +35,41 @@ _start: >>> ori 3,3,KVMPPC_H_RTAS@l >>> sc 1 >>> blr >>> + . = 0x200 >>> + /* >>> + * Trampoline saves r3 in sprg2 and issues private hcall >>> + * to request qemu to build error log. QEMU builds the >>> + * error log, copies to rtas-blob and returns the address. >>> + * The initial 16 bytes in return adress consist of saved >>> + * srr0 and srr1 which we restore and pass on the actual error >>> + * log address to OS handled mcachine check notification >>> + * routine >>> + * >>> + * All the below instructions are copied to interrupt vector >>> + * 0x200 at the time of handling ibm,nmi-register rtas call. >>> + */ >>> + mtsprg 2,3 >>> + li 3,0 >>> + /* >>> + * ori r3,r3,KVMPPC_H_REPORT_MC_ERR. The KVMPPC_H_REPORT_MC_ERR >>> + * value is patched below >>> + */ >>> +1: ori 3,3,0 > > Why do "li 3,0" followed by "ori 3,3,X"? Isn't this just "li 3,X" ? (aka > "addi 3,0,X") I remember I first tried doing li r3,X but faced some problem (but not able to exactly recall what was the problem) may be due to not familiar with ppc assembly. I will fix this. > > And, perhaps this was discussed in an earlier patch, but couldn't you just do: > > li 3,KVMPPC_H_REPORT_MC_ERR > > here and avoid the patching altogether? KVMPPC_H_REPORT_MC_ERR def in not visible in spapr-rtas.S, either I can define it in spapr-rtas.S as already done for KVMPPC_H_RTAS or patch it in ibm,nmi-register call. It is very unlikely that the KVMPPC_H_REPORT_MC_ERR will be changed, but I prefer to patch it to avoid maintaining it in both places. What do you think? > > > >>> + sc 1 /* Issue H_CALL */ >>> + cmpdi cr0,3,0 >>> + beq cr0,1b /* retry KVMPPC_H_REPORT_MC_ERR */ >>> + mtsprg 2,4 >>> + ld 4,0(3) >>> + mtsrr0 4 /* Restore srr0 */ >>> + ld 4,8(3) >>> + mtsrr1 4 /* Restore srr1 */ >>> + ld 4,16(3) >>> + mtcrf 0,4 /* Restore cr */ >>> + addi 3,3,24 >>> + mfsprg 4,2 >>> + /* >>> + * Branch to address registered by OS. The branch address is >>> + * patched in the ibm,nmi-register rtas call. >>> + */ >>> + ba 0x0 >>> + b . >>> >>> >> > -- Regards, Aravinda