Hello, The POWER9 processor comes with a new interrupt controller, called XIVE, which introduces a large number of new features, for virtualization in particular.
* XIVE interrupt controller It is composed of three sub-engines : - Interrupt Virtualization Source Engine (IVSE). These are in PHBs, in the main controller for the IPIS and in the PSI host bridge. They are configured to feed the IVRE with events. - Interrupt Virtualization Routing Engine (IVRE). Its job is to match an event source with a Notification Virtualization Target (NVT), a priority and an Event Queue (EQ) to determine if a Virtual Processor can handle the event. - Interrupt Virtualization Presentation Engine (IVPE). It maintains the interrupt state of each hardware thread and present the notification as an external exception. Each of the engines uses a set of internal tables to redirect exceptions from event sources to CPU threads. Interrupt sources have a 2-bit state machine, the Event State Buffer (ESB), that allows events to be triggered. If the event is let through, the IVRE looks up in the Interrupt Virtualization Entry (IVE) table for the Event Queue Descriptor configured for the source. Each Event Queue Descriptor defines a notification path to a CPU and an in-memory queue in which will be recorded an event identifier for the OS to pull. On a POWER9 sPAPR machine, the Client Architecture Support (CAS) negotiation process determines whether the guest operates with a interrupt controller using the XICS legacy model, as found on POWER8, or in XIVE exploitation mode. On a POWER9 PowerNV machine, the XIVE interrupt controller is a must have. * XIVE for sPAPR Here are the high level ideas of the current design to add support for XIVE : - introduce a persistent sPAPRXive object under the sPAPR machine for newer machines and let the CAS negotiation process decide whether it should be used or not. Use the 'ov5_cas' attribute for this purpose. - introduce a persistent XIVE interrupt presenter under the sPAPR core and switch ICP after CAS. Each core has now two ICPs, one active through the 'intc' pointer and another one among its children ready to be used if the guest requires it. - move the XIVE EQs under the cores to simplify the XIVE model - allocate the CPU IPIs at the beginning of the IRQ number space to be compatible with XICS (which starts at 4096) and also to simplify the model. This means that the XIVE model covers the whole IRQ number space. There are no offset like in XICS splitting the IRQ number space. * sPAPR patchset layout It first defines new models for XIVE, which will be shared between the machines or with KVM for sPAPR : - XiveSource holding the PQ bits and the ESB MMIO region used to control them. - XiveNVT holding the CPU interrupt state and the EQ state. it models the XIVE interrupt presenter engine. - sPAPRXive modeling the XIVE interrupt controller for sPAPR machines, holding the internal routing table, a single XIVE source for the IPIs and other interrupts and the TIMA MMIO regions used by the XiveNVT to do interrupt management. We do not model the IVRE, but this is not a problem to introduce it if needed. Maybe for migration. To be discussed. Then, the notification process and the interrupt delivery to the CPU is described. Support for sPAPR is completed with the integration of the sPAPRXive object in the machine, the definition of the new XIVE hcalls, the device tree layout, and the necessary adjustments to support the CAS negotiation. Follows the support for KVM with a set of specific XIVE models, very much like XICS does. But, the interrupt mode is still chosen at the init of the machine and the reset does change the KVM interrupt device. A couple of patches try to fix this limitation with a proposal to support resets of KVM devices. Some issues in the MMU migration which still need to be addressed. * PowerNV extension It seemed interesting to include the models for PowerNV as a way to validate that the concept are valid. The patchset finishes with RFCs of models for the XIVE interrupt controller and for the PSI bridge device for the POWER9 PowerNV. PSI provides a good example of the usage of the notify() handler of the XiveFabric interface, linking the PSI XiveSource to its owning device. * Coverage At this stage, XIVE support in QEMU covers : - TCG & KVM kernel_irqchip=off/on - CPU hotplug - support for older machines - migration under TCG - migration under KVM, including kernel_irqchip=off <-> kernel_irqchip=on * Caveats Migration still needs some care to make sure all HW states are captured correctly. Extra quiescence points are possibly needed, to turn off/on the XIVE configuration under KVM. KVM device reset works well enough but has consequences on MMU migration. Probably an ordering problem. * Github QEMU: https://github.com/legoater/qemu/commits/xive Linux/KVM (to be sent later on): https://github.com/legoater/linux/commits/xive Thanks, C. Changes since v2 : - added support for Store EOI - added support for two page ESB MMIO setting like on KVM - introduced the XiveFabric interface - introduced spapr_xive_mmio_unmap() - KVM support Cédric Le Goater (35): ppc/xive: introduce a XIVE interrupt source model ppc/xive: add support for the LSI interrupt sources ppc/xive: introduce the XiveFabric interface spapr/xive: introduce a XIVE interrupt controller for sPAPR spapr/xive: add a single source block to the sPAPR XIVE model spapr/xive: introduce a XIVE interrupt presenter model spapr/xive: introduce the XIVE Event Queues spapr: push the XIVE EQ data in OS event queue spapr: notify the CPU when the XIVE interrupt priority is more privileged spapr: add support for the SET_OS_PENDING command (XIVE) spapr: introduce a 'xive_exploitation' option to enable XIVE spapr: add a sPAPRXive object to the machine spapr: add hcalls support for the XIVE exploitation interrupt mode spapr: add device tree support for the XIVE exploitation mode sysbus: add a sysbus_mmio_unmap() helper spapr: introduce a helper to map the XIVE memory regions spapr: add XIVE support to spapr_qirq() spapr: introduce a spapr_icp_create() helper spapr: toggle the ICP depending on the selected interrupt mode spapr: add support to dump XIVE information spapr: advertise XIVE exploitation mode in CAS spapr: add classes for the XIVE models target/ppc/kvm: add Linux KVM definitions for XIVE spapr/xive: add common realize routine for KVM spapr/xive: add KVM support spapr/xive: add a XIVE KVM device to the machine migration: discard non-migratable RAMBlocks intc: introduce a CPUIntc interface spapr/xive,xics: use the CPU_INTC handlers to reset KVM spapr/xive,xics: reset KVM at machine reset spapr/xive: raise migration priority of the machine ppc/pnv: introduce a pnv_icp_create() helper ppc: externalize ppc_get_vcpu_by_pir() ppc/pnv: add XIVE support ppc/pnv: add a PSI bridge model for POWER9 processor default-configs/ppc64-softmmu.mak | 3 + exec.c | 10 + hw/core/sysbus.c | 10 + hw/intc/Makefile.objs | 5 +- hw/intc/intc.c | 26 + hw/intc/pnv_xive.c | 1234 +++++++++++++++++++++++++++++++++++++ hw/intc/pnv_xive_regs.h | 314 ++++++++++ hw/intc/spapr_xive.c | 324 ++++++++++ hw/intc/spapr_xive_hcall.c | 923 +++++++++++++++++++++++++++ hw/intc/spapr_xive_kvm.c | 655 ++++++++++++++++++++ hw/intc/xics.c | 4 + hw/intc/xics_kvm.c | 108 +++- hw/intc/xive.c | 1200 ++++++++++++++++++++++++++++++++++++ hw/ppc/pnv.c | 93 +-- hw/ppc/pnv_core.c | 2 +- hw/ppc/pnv_psi.c | 399 +++++++++++- hw/ppc/ppc.c | 16 + hw/ppc/spapr.c | 264 +++++++- hw/ppc/spapr_cpu_core.c | 55 +- hw/ppc/spapr_hcall.c | 6 + hw/ppc/spapr_rtas.c | 2 - include/exec/cpu-common.h | 1 + include/hw/intc/intc.h | 21 + include/hw/ppc/pnv.h | 37 +- include/hw/ppc/pnv_psi.h | 50 +- include/hw/ppc/pnv_xive.h | 89 +++ include/hw/ppc/pnv_xscom.h | 5 + include/hw/ppc/ppc.h | 1 + include/hw/ppc/spapr.h | 21 +- include/hw/ppc/spapr_cpu_core.h | 2 + include/hw/ppc/spapr_xive.h | 93 +++ include/hw/ppc/xics.h | 1 + include/hw/ppc/xive.h | 269 ++++++++ include/hw/ppc/xive_regs.h | 187 ++++++ include/hw/sysbus.h | 1 + include/migration/vmstate.h | 2 + linux-headers/asm-powerpc/kvm.h | 18 + linux-headers/linux/kvm.h | 5 + migration/ram.c | 42 +- target/ppc/kvm.c | 7 + target/ppc/kvm_ppc.h | 6 + 41 files changed, 6414 insertions(+), 97 deletions(-) create mode 100644 hw/intc/pnv_xive.c create mode 100644 hw/intc/pnv_xive_regs.h create mode 100644 hw/intc/spapr_xive.c create mode 100644 hw/intc/spapr_xive_hcall.c create mode 100644 hw/intc/spapr_xive_kvm.c create mode 100644 hw/intc/xive.c create mode 100644 include/hw/ppc/pnv_xive.h create mode 100644 include/hw/ppc/spapr_xive.h create mode 100644 include/hw/ppc/xive.h create mode 100644 include/hw/ppc/xive_regs.h -- 2.13.6