On Wednesday 05 November 2014 02:02 PM, Alexander Graf wrote: > > > On 05.11.14 08:13, Aravinda Prasad wrote: >> This patch adds FWNMI support in qemu for powerKVM >> guests by handling the ibm,nmi-register rtas call. >> Whenever OS issues ibm,nmi-register RTAS call, the >> machine check notification address is saved and the >> machine check interrupt vector 0x200 is patched to >> issue a private hcall. >> >> This patch also handles the cases when multi-processors >> experience machine check at or about the same time. >> As per PAPR, subsequent processors serialize waiting >> for the first processor to issue the ibm,nmi-interlock call. >> The second processor retries if the first processor which >> received a machine check is still reading the error log >> and is yet to issue ibm,nmi-interlock call. >> >> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com> >> --- >> hw/ppc/spapr_hcall.c | 16 +++++++ >> hw/ppc/spapr_rtas.c | 93 >> +++++++++++++++++++++++++++++++++++++++ >> include/hw/ppc/spapr.h | 17 +++++++ >> pc-bios/spapr-rtas/spapr-rtas.S | 38 ++++++++++++++++ >> 4 files changed, 163 insertions(+), 1 deletion(-) >> >> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >> index 8f16160..eceb5e5 100644 >> --- a/hw/ppc/spapr_hcall.c >> +++ b/hw/ppc/spapr_hcall.c >> @@ -97,6 +97,9 @@ struct rtas_mc_log { >> struct rtas_error_log err_log; >> }; >> >> +/* Whether machine check handling is in progress by any CPU */ >> +bool mc_in_progress; >> + >> static void do_spr_sync(void *arg) >> { >> struct SPRSyncState *s = arg; >> @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, >> sPAPREnvironment *spapr, >> cpu_synchronize_state(CPU(ppc_env_get_cpu(env))); >> >> /* >> + * Only one VCPU can process machine check NMI at a time. Hence >> + * set the lock mc_in_progress. Once the VCPU finishes processing >> + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset >> + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter >> + * NMI we return 0 asking the VCPU to retry h_report_mc_err >> + */ >> + if (mc_in_progress == 1) { > > Please don't depend on bools being numbers. Use true / false. For if()s, > just don't use == at all - it makes it more readable.
ok > >> + return 0; >> + } >> + >> + mc_in_progress = 1; >> + >> + /* >> * We save the original r3 register in SPRG2 in 0x200 vector, >> * which is patched during call to ibm.nmi-register. Original >> * r3 is required to be included in error log >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >> index 2ec2a8e..71c7662 100644 >> --- a/hw/ppc/spapr_rtas.c >> +++ b/hw/ppc/spapr_rtas.c >> @@ -36,6 +36,9 @@ >> >> #include <libfdt.h> >> >> +#define BRANCH_INST_MASK 0xFC000000 >> +extern bool mc_in_progress; > > Please put this into the spapr struct. ok > >> + >> static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment *spapr, >> uint32_t token, uint32_t nargs, >> target_ulong args, >> @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu, >> rtas_st(rets, 0, ret); >> } >> >> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu, >> + sPAPREnvironment *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + int i; >> + uint32_t ori_inst = 0x60630000; >> + uint32_t branch_inst = 0x48000002; >> + target_ulong guest_machine_check_addr; >> + uint32_t trampoline[TRAMPOLINE_INSTS]; >> + int total_inst = sizeof(trampoline) / sizeof(uint32_t); > > ARRAY_SIZE(trampoline), though I don't quite understand why you need a > variable that contains the same value as a constant (TRAMPOLINE_INSTS). > > But since you're moving all of those bits into variable fields on the > rtas blob itself as we discussed in the last version, I guess this code > will go away anyways ;). I think we still need this. We need to patch the KVMPPC_H_REPORT_MC_ERR number and branch address in the trampoline and also, depending on whether the guest running in LE/BE we may need to flip the bits in the trampoline before copying it to 0x200 machine check vector. As rtas-blob is part of the guest memory I do not want to patch these in rtas-blob, hence I copy the trampoline from the rtas-blob to an array, modify accordingly and then move it to 0x200 machine check vector. > >> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); >> + >> + /* Store the system reset and machine check address */ >> + guest_machine_check_addr = rtas_ld(args, 1); > > Load or Store? I don't find the comment particularly useful either ;). will reword it or may delete it. > >> + >> + /* >> + * Read the trampoline instructions from RTAS Blob and patch >> + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest >> + * machine check address before copying to 0x200 vector >> + */ >> + cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET, >> + trampoline, sizeof(trampoline)); >> + >> + /* Safety Check */ > > Same for this comment. we have only 0x100 bytes that can be copied at 0x200. If the trampoline size exceeds then the next interrupt vector code is overwritten. Hence a safety check during compile time to make sure trampoline is within 0x100 bytes. > >> + QEMU_BUILD_BUG_ON(sizeof(trampoline) > MC_INTERRUPT_VECTOR_SIZE); >> + >> + /* Update the KVMPPC_H_REPORT_MC_ERR value in trampoline */ >> + ori_inst |= KVMPPC_H_REPORT_MC_ERR; >> + memcpy(&trampoline[TRAMPOLINE_ORI_INST_INDEX], &ori_inst, >> + sizeof(ori_inst)); > > Why memcpy a u32 into a u32 array? not required. forgot to remove while transitioning from earlier patch where the trampoline was char * > >> + >> + /* >> + * Sanity check guest_machine_check_addr to prevent clobbering >> + * operator value in branch instruction >> + */ >> + if (guest_machine_check_addr & BRANCH_INST_MASK) { >> + fprintf(stderr, "Unable to register ibm,nmi_register: " >> + "Invalid machine check handler address\n"); > > In general, printf's in guest triggerable code aren't a great idea, > since the guest could flood our host logs with this. I can't say we're > doing a great job at it already though, so it probably doesn't matter much. noted > >> + rtas_st(rets, 0, RTAS_OUT_NOT_SUPPORTED); >> + return; >> + } >> + >> + /* >> + * Update the branch instruction in trampoline >> + * with the absolute machine check address requested by OS. >> + */ >> + branch_inst |= guest_machine_check_addr; >> + memcpy(&trampoline[TRAMPOLINE_BR_INST_INDEX], &branch_inst, >> + sizeof(branch_inst)); >> + >> + /* Handle all Host/Guest LE/BE combinations */ >> + if ((*pcc->interrupts_big_endian)(cpu)) { >> + for (i = 0; i < total_inst; i++) { >> + trampoline[i] = cpu_to_be32(trampoline[i]); >> + } >> + } else { >> + for (i = 0; i < total_inst; i++) { >> + trampoline[i] = cpu_to_le32(trampoline[i]); >> + } >> + } >> + >> + /* Patch 0x200 NMI interrupt vector memory area of guest */ >> + cpu_physical_memory_write(MC_INTERRUPT_VECTOR, trampoline, >> + sizeof(trampoline)); >> + >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> +} >> + >> +static void rtas_ibm_nmi_interlock(PowerPCCPU *cpu, >> + sPAPREnvironment *spapr, >> + uint32_t token, uint32_t nargs, >> + target_ulong args, >> + uint32_t nret, target_ulong rets) >> +{ >> + /* >> + * VCPU issuing ibm,nmi-interlock is done with NMI handling, >> + * hence unset mc_in_progress. >> + */ >> + mc_in_progress = 0; >> + rtas_st(rets, 0, RTAS_OUT_SUCCESS); >> +} >> + >> static struct rtas_call { >> const char *name; >> spapr_rtas_fn fn; >> @@ -419,6 +506,12 @@ static void core_rtas_register_types(void) >> rtas_ibm_set_system_parameter); >> spapr_rtas_register(RTAS_IBM_OS_TERM, "ibm,os-term", >> rtas_ibm_os_term); >> + spapr_rtas_register(RTAS_IBM_NMI_REGISTER, >> + "ibm,nmi-register", >> + rtas_ibm_nmi_register); >> + spapr_rtas_register(RTAS_IBM_NMI_INTERLOCK, >> + "ibm,nmi-interlock", >> + rtas_ibm_nmi_interlock); >> } >> >> type_init(core_rtas_register_types) >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >> index a2d67e9..98d0a6c 100644 >> --- a/include/hw/ppc/spapr.h >> +++ b/include/hw/ppc/spapr.h >> @@ -384,8 +384,10 @@ int spapr_allocate_irq_block(int num, bool lsi, bool >> msi); >> #define RTAS_GET_SENSOR_STATE (RTAS_TOKEN_BASE + 0x1D) >> #define RTAS_IBM_CONFIGURE_CONNECTOR (RTAS_TOKEN_BASE + 0x1E) >> #define RTAS_IBM_OS_TERM (RTAS_TOKEN_BASE + 0x1F) >> +#define RTAS_IBM_NMI_REGISTER (RTAS_TOKEN_BASE + 0x20) >> +#define RTAS_IBM_NMI_INTERLOCK (RTAS_TOKEN_BASE + 0x21) >> >> -#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x20) >> +#define RTAS_TOKEN_MAX (RTAS_TOKEN_BASE + 0x22) >> >> /* RTAS ibm,get-system-parameter token values */ >> #define RTAS_SYSPARM_SPLPAR_CHARACTERISTICS 20 >> @@ -488,4 +490,17 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const >> char *propname, >> #define RTAS_TRAMPOLINE_OFFSET 0x200 >> #define RTAS_ERRLOG_OFFSET 0x800 >> >> +/* Machine Check Trampoline related macros >> + * >> + * These macros should co-relate to the code we >> + * have in pc-bios/spapr-rtas/spapr-rtas.S >> + */ >> +#define TRAMPOLINE_INSTS 17 >> +#define TRAMPOLINE_ORI_INST_INDEX 2 >> +#define TRAMPOLINE_BR_INST_INDEX 15 >> + >> +/* Machine Check Interrupt related macros */ >> +#define MC_INTERRUPT_VECTOR 0x200 >> +#define MC_INTERRUPT_VECTOR_SIZE 0x100 >> + >> #endif /* !defined (__HW_SPAPR_H__) */ >> diff --git a/pc-bios/spapr-rtas/spapr-rtas.S >> b/pc-bios/spapr-rtas/spapr-rtas.S >> index 903bec2..c315332 100644 >> --- a/pc-bios/spapr-rtas/spapr-rtas.S >> +++ b/pc-bios/spapr-rtas/spapr-rtas.S > > Please add #defines at the top of the file for the register names: > > #define r0 0 > #define r1 1 > ... > > That way the code below will get much more readable :) hmm will do that > > Also, you want a jump table here as we discussed in the last review > round. That table would tell you > > a) Entry address for RTAS > b) Offset of the NMI code > c) To-be-patched offsets of the instructions inside the NMI code > > Then we have all offsets automatically generated inside a single file > and don't have to maintain fragile relationships between random headers > with offset defines and the .S file. I think I got this wrong last time. I will fix this. > > > Alex > >> @@ -35,3 +35,41 @@ _start: >> ori 3,3,KVMPPC_H_RTAS@l >> sc 1 >> blr >> + . = 0x200 >> + /* >> + * Trampoline saves r3 in sprg2 and issues private hcall >> + * to request qemu to build error log. QEMU builds the >> + * error log, copies to rtas-blob and returns the address. >> + * The initial 16 bytes in return adress consist of saved >> + * srr0 and srr1 which we restore and pass on the actual error >> + * log address to OS handled mcachine check notification >> + * routine >> + * >> + * All the below instructions are copied to interrupt vector >> + * 0x200 at the time of handling ibm,nmi-register rtas call. >> + */ >> + mtsprg 2,3 >> + li 3,0 >> + /* >> + * ori r3,r3,KVMPPC_H_REPORT_MC_ERR. The KVMPPC_H_REPORT_MC_ERR >> + * value is patched below >> + */ >> +1: ori 3,3,0 >> + sc 1 /* Issue H_CALL */ >> + cmpdi cr0,3,0 >> + beq cr0,1b /* retry KVMPPC_H_REPORT_MC_ERR */ >> + mtsprg 2,4 >> + ld 4,0(3) >> + mtsrr0 4 /* Restore srr0 */ >> + ld 4,8(3) >> + mtsrr1 4 /* Restore srr1 */ >> + ld 4,16(3) >> + mtcrf 0,4 /* Restore cr */ >> + addi 3,3,24 >> + mfsprg 4,2 >> + /* >> + * Branch to address registered by OS. The branch address is >> + * patched in the ibm,nmi-register rtas call. >> + */ >> + ba 0x0 >> + b . >> >> > -- Regards, Aravinda