On 2/13/19 2:33 AM, David Gibson wrote: > On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote: >> On 2/12/19 2:06 AM, David Gibson wrote: >>> On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote: >>>> The IRQ number space of the XIVE and XICS interrupt mode are aligned >>>> when using the dual interrupt mode for the machine. This means that >>>> the ICS offset is set to zero in QEMU and that the KVM XICS device >>>> should be informed of this new value. Unfortunately, there is now way >>>> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset. >>>> >>>> Ignore the lower 4K which are not used under the XICS interrupt >>>> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs. >>>> >>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>> --- >>>> hw/intc/xics_kvm.c | 18 ++++++++++++++++++ >>>> 1 file changed, 18 insertions(+) >>>> >>>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c >>>> index 651bbfdf6966..1d21ff217b82 100644 >>>> --- a/hw/intc/xics_kvm.c >>>> +++ b/hw/intc/xics_kvm.c >>>> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics) >>>> for (i = 0; i < ics->nr_irqs; i++) { >>>> ICSIRQState *irq = &ics->irqs[i]; >>>> >>>> + /* >>>> + * The KVM XICS device considers that the IRQ numbers should >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >>>> + * numbers (only claimed by XIVE for the CPU IPIs). >>>> + */ >>>> + if (i + ics->offset < XICS_IRQ_BASE) { >>>> + continue; >>>> + } >>>> + >>> >>> This seems bogus to me. The guest-visible irq numbers need to line up >>> between xics and xive mode, yes, but that doesn't mean we need to keep >>> around a great big array of unused array of ICS irq states, even in >>> TCG mode. >> >> This is because the qirqs[] array is under the machine and shared between >> both interrupt modes, xics and xive. > > I don't see how that follows. ICSIRQState is indexed in terms of the > ICS source number, not the global irq number, so I don't see why it > has to match up with the qirq array.
The root cause is the use of spapr->irq->nr_irqs to initialize the ICS and sPAPRXive object. In case of the 'dual' backend, it covers the full XIVE IRQ number space (0x2000 today) but XICS only needs 0x1000. I think we can fix the offset issue by using the appropriate nr_irqs which should be for the XICS backend : spapr->irq->nr_irqs - ics->offset I keep in mind the XIVE support for nested guests and I think we will need to extend the IRQ number space in L1 and have the L2 use a portion of it (using an offset). C. >>> >>>> kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES, >>>> i + ics->offset, &state, false, &error_fatal); >>>> >>>> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int >>>> version_id) >>>> ICSIRQState *irq = &ics->irqs[i]; >>>> int ret; >>>> >>>> + /* >>>> + * The KVM XICS device considers that the IRQ numbers should >>>> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K >>>> + * numbers (only claimed by XIVE for the CPU IPIs). >>>> + */ >>>> + if (i + ics->offset < XICS_IRQ_BASE) { >>>> + continue; >>>> + } >>>> + >>>> state = irq->server; >>>> state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK) >>>> << KVM_XICS_PRIORITY_SHIFT; >>> >> >