On 05/05/2018 06:27 AM, David Gibson wrote: > On Fri, May 04, 2018 at 03:11:57PM +0200, Cédric Le Goater wrote: >> On 05/04/2018 06:51 AM, David Gibson wrote: >>> On Thu, May 03, 2018 at 06:06:14PM +0200, Cédric Le Goater wrote: >>>> On 05/03/2018 07:35 AM, David Gibson wrote: >>>>> On Thu, Apr 26, 2018 at 11:27:21AM +0200, Cédric Le Goater wrote: >>>>>> On 04/26/2018 09:11 AM, David Gibson wrote: >>>>>>> On Thu, Apr 19, 2018 at 02:43:02PM +0200, Cédric Le Goater wrote: >>>>>>>> The XIVE presenter engine uses a set of registers to handle priority >>>>>>>> management and interrupt acknowledgment among other things. The most >>>>>>>> important ones being : >>>>>>>> >>>>>>>> - Interrupt Priority Register (PIPR) >>>>>>>> - Interrupt Pending Buffer (IPB) >>>>>>>> - Current Processor Priority (CPPR) >>>>>>>> - Notification Source Register (NSR) >>>>>>>> >>>>>>>> There is one set of registers per level of privilege, four in all : >>>>>>>> HW, HV pool, OS and User. These are called rings. All registers are >>>>>>>> accessible through a specific MMIO region called the Thread Interrupt >>>>>>>> Management Areas (TIMA) but, depending on the privilege level of the >>>>>>>> CPU, the view of the TIMA is filtered. The sPAPR machine runs at the >>>>>>>> OS privilege and therefore can only accesses the OS and the User >>>>>>>> rings. The others are for hypervisor levels. >>>>>>>> >>>>>>>> The CPU interrupt state is modeled with a XiveNVT object which stores >>>>>>>> the values of the different registers. The different TIMA views are >>>>>>>> mapped at the same address for each CPU and 'current_cpu' is used to >>>>>>>> retrieve the XiveNVT holding the ring registers. >>>>>>>> >>>>>>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>>>>>> --- >>>>>>>> >>>>>>>> Changes since v2 : >>>>>>>> >>>>>>>> - introduced the XiveFabric interface >>>>>>>> >>>>>>>> hw/intc/spapr_xive.c | 25 ++++ >>>>>>>> hw/intc/xive.c | 279 >>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++ >>>>>>>> include/hw/ppc/spapr_xive.h | 5 + >>>>>>>> include/hw/ppc/xive.h | 31 +++++ >>>>>>>> include/hw/ppc/xive_regs.h | 84 +++++++++++++ >>>>>>>> 5 files changed, 424 insertions(+) >>>>>>>> >>>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c >>>>>>>> index 90cde8a4082d..f07832bf0a00 100644 >>>>>>>> --- a/hw/intc/spapr_xive.c >>>>>>>> +++ b/hw/intc/spapr_xive.c >>>>>>>> @@ -13,6 +13,7 @@ >>>>>>>> #include "target/ppc/cpu.h" >>>>>>>> #include "sysemu/cpus.h" >>>>>>>> #include "monitor/monitor.h" >>>>>>>> +#include "hw/ppc/spapr.h" >>>>>>>> #include "hw/ppc/spapr_xive.h" >>>>>>>> #include "hw/ppc/xive.h" >>>>>>>> #include "hw/ppc/xive_regs.h" >>>>>>>> @@ -95,6 +96,22 @@ static void spapr_xive_realize(DeviceState *dev, >>>>>>>> Error **errp) >>>>>>>> >>>>>>>> /* Allocate the Interrupt Virtualization Table */ >>>>>>>> xive->ivt = g_new0(XiveIVE, xive->nr_irqs); >>>>>>>> + >>>>>>>> + /* The Thread Interrupt Management Area has the same address for >>>>>>>> + * each chip. On sPAPR, we only need to expose the User and OS >>>>>>>> + * level views of the TIMA. >>>>>>>> + */ >>>>>>>> + xive->tm_base = XIVE_TM_BASE; >>>>>>> >>>>>>> The constant should probably have PAPR in the name somewhere, since >>>>>>> it's just for PAPR machines (same for the ESB mappings, actually). >>>>>> >>>>>> ok. >>>>>> >>>>>> I have also made 'tm_base' a property, like 'vc_base' for ESBs, in >>>>>> case we want to change the value when the guest is instantiated. >>>>>> I doubt it but this is an address in the global address space, so >>>>>> letting the machine have control is better I think. >>>>> >>>>> I agree. >>>>> >>>>>>>> + >>>>>>>> + memory_region_init_io(&xive->tm_mmio_user, OBJECT(xive), >>>>>>>> + &xive_tm_user_ops, xive, "xive.tima.user", >>>>>>>> + 1ull << TM_SHIFT); >>>>>>>> + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_user); >>>>>>>> + >>>>>>>> + memory_region_init_io(&xive->tm_mmio_os, OBJECT(xive), >>>>>>>> + &xive_tm_os_ops, xive, "xive.tima.os", >>>>>>>> + 1ull << TM_SHIFT); >>>>>>>> + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_os); >>>>>>>> } >>>>>>>> >>>>>>>> static XiveIVE *spapr_xive_get_ive(XiveFabric *xf, uint32_t lisn) >>>>>>>> @@ -104,6 +121,13 @@ static XiveIVE *spapr_xive_get_ive(XiveFabric >>>>>>>> *xf, uint32_t lisn) >>>>>>>> return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL; >>>>>>>> } >>>>>>>> >>>>>>>> +static XiveNVT *spapr_xive_get_nvt(XiveFabric *xf, uint32_t server) >>>>>>>> +{ >>>>>>>> + PowerPCCPU *cpu = spapr_find_cpu(server); >>>>>>>> + >>>>>>>> + return cpu ? XIVE_NVT(cpu->intc) : NULL; >>>>>>>> +} >>>>>>> >>>>>>> So this is a bit of a tangent, but I've been thinking of implementing >>>>>>> a scheme where there's an opaque pointer in the cpu structure for the >>>>>>> use of the machine. I'm planning for that to replace the intc pointer >>>>>>> (which isn't really used directly by the cpu). That would allow us to >>>>>>> have spapr put a structure there and have both xics and xive pointers >>>>>>> which could be useful later on. >>>>>> >>>>>> ok. That should simplify the patchset at the end, in which we need to >>>>>> switch the 'intc' pointer. >>>>>> >>>>>>> I think we'd need something similar to correctly handle migration of >>>>>>> the VPA state, which is currently horribly broken. >>>>>>> >>>>>>>> + >>>>>>>> static const VMStateDescription vmstate_spapr_xive_ive = { >>>>>>>> .name = TYPE_SPAPR_XIVE "/ive", >>>>>>>> .version_id = 1, >>>>>>>> @@ -143,6 +167,7 @@ static void spapr_xive_class_init(ObjectClass >>>>>>>> *klass, void *data) >>>>>>>> dc->vmsd = &vmstate_spapr_xive; >>>>>>>> >>>>>>>> xfc->get_ive = spapr_xive_get_ive; >>>>>>>> + xfc->get_nvt = spapr_xive_get_nvt; >>>>>>>> } >>>>>>>> >>>>>>>> static const TypeInfo spapr_xive_info = { >>>>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c >>>>>>>> index dccad0318834..5691bb9474e4 100644 >>>>>>>> --- a/hw/intc/xive.c >>>>>>>> +++ b/hw/intc/xive.c >>>>>>>> @@ -14,7 +14,278 @@ >>>>>>>> #include "sysemu/cpus.h" >>>>>>>> #include "sysemu/dma.h" >>>>>>>> #include "monitor/monitor.h" >>>>>>>> +#include "hw/ppc/xics.h" /* for ICP_PROP_CPU */ >>>>>>>> #include "hw/ppc/xive.h" >>>>>>>> +#include "hw/ppc/xive_regs.h" >>>>>>>> + >>>>>>>> +/* >>>>>>>> + * XIVE Interrupt Presenter >>>>>>>> + */ >>>>>>>> + >>>>>>>> +static uint64_t xive_nvt_accept(XiveNVT *nvt) >>>>>>>> +{ >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void xive_nvt_set_cppr(XiveNVT *nvt, uint8_t cppr) >>>>>>>> +{ >>>>>>>> + if (cppr > XIVE_PRIORITY_MAX) { >>>>>>>> + cppr = 0xff; >>>>>>>> + } >>>>>>>> + >>>>>>>> + nvt->ring_os[TM_CPPR] = cppr; >>>>>>> >>>>>>> Surely this needs to recheck if we should be interrupting the cpu? >>>>>> >>>>>> yes. In patch 9, when we introduce the nvt notify routine. >>>>> >>>>> Ok. >>>>> >>>>>>>> +} >>>>>>>> + >>>>>>>> +/* >>>>>>>> + * OS Thread Interrupt Management Area MMIO >>>>>>>> + */ >>>>>>>> +static uint64_t xive_tm_read_special(XiveNVT *nvt, hwaddr offset, >>>>>>>> + unsigned size) >>>>>>>> +{ >>>>>>>> + uint64_t ret = -1; >>>>>>>> + >>>>>>>> + if (offset == TM_SPC_ACK_OS_REG && size == 2) { >>>>>>>> + ret = xive_nvt_accept(nvt); >>>>>>>> + } else { >>>>>>>> + qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%" >>>>>>>> + HWADDR_PRIx" size %d\n", offset, size); >>>>>>>> + } >>>>>>>> + >>>>>>>> + return ret; >>>>>>>> +} >>>>>>>> + >>>>>>>> +#define TM_RING(offset) ((offset) & 0xf0) >>>>>>>> + >>>>>>>> +static uint64_t xive_tm_os_read(void *opaque, hwaddr offset, >>>>>>>> + unsigned size) >>>>>>>> +{ >>>>>>>> + PowerPCCPU *cpu = POWERPC_CPU(current_cpu); >>>>>>> >>>>>>> So, as I said on a previous version of this, we can actually correctly >>>>>>> represent different mappings in different cpu spaces, by exploiting >>>>>>> cpu->as and not just having them all point to &address_space_memory. >>>>>> >>>>>> Yes, you did and I haven't studied the question yet. For the next >>>>>> version. >>>>> >>>>> So, it's possible that using the cpu->as thing will be more trouble >>>>> that it's worth. >>>> >>>> One of the trouble is the number of memory regions to use, one per cpu, >>> >>> Well, we're already going to have an NVT object for each cpu, yes? So >>> a memory region per-cpu doesn't seem like a big stretch. >>> >>>> and the KVM support. >>> >>> And I really don't see how the memory regions impacts KVM. >> >> The TIMA is setup when the KVM device is initialized using some specific >> ioctl to get an fd on a MMIO region from the host. It is then passed to >> the guest as a 'ram_device', same for the ESBs. > > Ah, good point. > >> This is not a common region. > > I'm not sure what you mean by that.
I meant by that 'out of the ordinary', 'unusual'. Specially when under KVM. C. >>>> Having a single region is much easier. >>>> >>>>> I am a little concerned about using current_cpu though. >>>>> First, will it work with KVM with kernel_irqchip=off - the >>>>> cpus are running truly concurrently, >>>> >>>> FWIW, I didn't see any issue yet while stressing. >>> >>> Ok. >>> >>>>> but we still need to work out who's poking at the TIMA. >>>> >>>> I understand. The registers are accessed by the current cpu to set the >>>> CPPR and to ack an interrupt. But when we route an event, we also access >>>> and modify the registers. Do you suggest some locking ? I am not sure >>>> how are protected the TIMA region accesses vs. the routing, which is >>>> necessarily initiated by an ESB MMIO though. >>> >>> Locking isn't really the issue. I mean, we do need locking, but the >>> BQL should provide that. The issue is what exactly does "current" >>> mean in the context of multiple concurrently running cpus. Does it >>> always mean what we need it to mean in every context we might call >>> this from. >> >> I would say so. > > Ok. >