On 05.11.14 11:37, Aravinda Prasad wrote: > > > On Wednesday 05 November 2014 02:02 PM, Alexander Graf wrote: >> >> >> On 05.11.14 08:13, Aravinda Prasad wrote: >>> This patch adds FWNMI support in qemu for powerKVM >>> guests by handling the ibm,nmi-register rtas call. >>> Whenever OS issues ibm,nmi-register RTAS call, the >>> machine check notification address is saved and the >>> machine check interrupt vector 0x200 is patched to >>> issue a private hcall. >>> >>> This patch also handles the cases when multi-processors >>> experience machine check at or about the same time. >>> As per PAPR, subsequent processors serialize waiting >>> for the first processor to issue the ibm,nmi-interlock call. >>> The second processor retries if the first processor which >>> received a machine check is still reading the error log >>> and is yet to issue ibm,nmi-interlock call. >>> >>> Signed-off-by: Aravinda Prasad <aravi...@linux.vnet.ibm.com> >>> --- >>> hw/ppc/spapr_hcall.c | 16 +++++++ >>> hw/ppc/spapr_rtas.c | 93 >>> +++++++++++++++++++++++++++++++++++++++ >>> include/hw/ppc/spapr.h | 17 +++++++ >>> pc-bios/spapr-rtas/spapr-rtas.S | 38 ++++++++++++++++ >>> 4 files changed, 163 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c >>> index 8f16160..eceb5e5 100644 >>> --- a/hw/ppc/spapr_hcall.c >>> +++ b/hw/ppc/spapr_hcall.c >>> @@ -97,6 +97,9 @@ struct rtas_mc_log { >>> struct rtas_error_log err_log; >>> }; >>> >>> +/* Whether machine check handling is in progress by any CPU */ >>> +bool mc_in_progress; >>> + >>> static void do_spr_sync(void *arg) >>> { >>> struct SPRSyncState *s = arg; >>> @@ -678,6 +681,19 @@ static target_ulong h_report_mc_err(PowerPCCPU *cpu, >>> sPAPREnvironment *spapr, >>> cpu_synchronize_state(CPU(ppc_env_get_cpu(env))); >>> >>> /* >>> + * Only one VCPU can process machine check NMI at a time. Hence >>> + * set the lock mc_in_progress. Once the VCPU finishes processing >>> + * NMI, it executes ibm,nmi-interlock and mc_in_progress is unset >>> + * in ibm,nmi-interlock handler. Meanwhile if other VCPUs encounter >>> + * NMI we return 0 asking the VCPU to retry h_report_mc_err >>> + */ >>> + if (mc_in_progress == 1) { >> >> Please don't depend on bools being numbers. Use true / false. For if()s, >> just don't use == at all - it makes it more readable. > > ok > >> >>> + return 0; >>> + } >>> + >>> + mc_in_progress = 1; >>> + >>> + /* >>> * We save the original r3 register in SPRG2 in 0x200 vector, >>> * which is patched during call to ibm.nmi-register. Original >>> * r3 is required to be included in error log >>> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c >>> index 2ec2a8e..71c7662 100644 >>> --- a/hw/ppc/spapr_rtas.c >>> +++ b/hw/ppc/spapr_rtas.c >>> @@ -36,6 +36,9 @@ >>> >>> #include <libfdt.h> >>> >>> +#define BRANCH_INST_MASK 0xFC000000 >>> +extern bool mc_in_progress; >> >> Please put this into the spapr struct. > > ok > >> >>> + >>> static void rtas_display_character(PowerPCCPU *cpu, sPAPREnvironment >>> *spapr, >>> uint32_t token, uint32_t nargs, >>> target_ulong args, >>> @@ -290,6 +293,90 @@ static void rtas_ibm_os_term(PowerPCCPU *cpu, >>> rtas_st(rets, 0, ret); >>> } >>> >>> +static void rtas_ibm_nmi_register(PowerPCCPU *cpu, >>> + sPAPREnvironment *spapr, >>> + uint32_t token, uint32_t nargs, >>> + target_ulong args, >>> + uint32_t nret, target_ulong rets) >>> +{ >>> + int i; >>> + uint32_t ori_inst = 0x60630000; >>> + uint32_t branch_inst = 0x48000002; >>> + target_ulong guest_machine_check_addr; >>> + uint32_t trampoline[TRAMPOLINE_INSTS]; >>> + int total_inst = sizeof(trampoline) / sizeof(uint32_t); >> >> ARRAY_SIZE(trampoline), though I don't quite understand why you need a >> variable that contains the same value as a constant (TRAMPOLINE_INSTS). >> >> But since you're moving all of those bits into variable fields on the >> rtas blob itself as we discussed in the last version, I guess this code >> will go away anyways ;). > > I think we still need this. We need to patch the KVMPPC_H_REPORT_MC_ERR > number and branch address in the trampoline and also, depending on > whether the guest running in LE/BE we may need to flip the bits in the > trampoline before copying it to 0x200 machine check vector. > > As rtas-blob is part of the guest memory I do not want to patch these in > rtas-blob, hence I copy the trampoline from the rtas-blob to an array, > modify accordingly and then move it to 0x200 machine check vector.
Yes, you will still need the array. But the array should be dynamically sized based on spapr->rtas_info->fwnmi_size which you extract from the blob on load. That way you wouldn't need the "total_inst" variable anymore ;). > >> >>> + PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu); >>> + >>> + /* Store the system reset and machine check address */ >>> + guest_machine_check_addr = rtas_ld(args, 1); >> >> Load or Store? I don't find the comment particularly useful either ;). > > will reword it or may delete it. > >> >>> + >>> + /* >>> + * Read the trampoline instructions from RTAS Blob and patch >>> + * the KVMPPC_H_REPORT_MC_ERR hcall number and the guest >>> + * machine check address before copying to 0x200 vector >>> + */ >>> + cpu_physical_memory_read(spapr->rtas_addr + RTAS_TRAMPOLINE_OFFSET, >>> + trampoline, sizeof(trampoline)); >>> + >>> + /* Safety Check */ >> >> Same for this comment. > > we have only 0x100 bytes that can be copied at 0x200. If the trampoline > size exceeds then the next interrupt vector code is overwritten. Hence a > safety check during compile time to make sure trampoline is within 0x100 > bytes. I think the check is fine, the comment is just redundant with QEMU_BUILD_BUG_ON. Either be more verbose in the comment or remove it ;). But something a la /* check for failure */ BUG_ON(foo); is useless redundancy, because everyone already knows that BUG_ON checks for failure. The interesting bit is the why. Also, as a general rule of thumb, if you need a comment explaining the "what" of what you're doing, your function and/or variable names are probably not well chosen ;). But I don't think this is a problem here. Thanks for the patches btw :) Alex