On Mon, 23 Nov 2020 09:04:42 +0100 Cédric Le Goater <c...@kaod.org> wrote:
> On 11/20/20 6:46 PM, Greg Kurz wrote: > > A regression was recently fixed in the sPAPR XIVE code for QEMU 5.2 > > RC3 [1]. It boiled down to a confusion between IPI numbers and vCPU > > ids, which happen to be numerically equal in general, but are really > > different entities that can diverge in some setups. When this happens, > > we end up misconfiguring XIVE in a way that is almost fatal for the > > guest. > > > > The confusion comes from XICS which has historically assumed equality > > between interrupt server numbers and vCPU ids, as mentionned in a > > comment back from 2011 in the linux kernel icp_native_init_one_node() > > function: > > > > /* This code does the theorically broken assumption that the interrupt > > * server numbers are the same as the hard CPU numbers. > > * This happens to be the case so far but we are playing with fire... > > * should be fixed one of these days. -BenH. > > */ > > > > This assumption crept into QEMU through the "ibm,interrupt-server-ranges" > > property of the "interrupt-controller" node in the DT. This property > > contains ranges of consecutive vCPU ids defined as (first id, # of ids). > > In the case of QEMU, we define a single range starting from 0 up to the > > highest vCPU id, as returned by spapr_max_server_number(). This has > > always been associated to the "nr_servers" wording when naming variables > > or function arguments. When XIVE got added, we introduced an sPAPR IRQ > > abstraction to be able to control several interrupt controller backends. > > The sPAPR IRQ base class provides a dt() handler used to populate the > > "interrupt-controller" node in the DT. This handler takes an "nr_server" > > argument inherited from XICS and we ended up using it to populate the > > "ibm,xive-lisn-ranges" property used with XIVE, which has completely > > different semantics. It contain ranges of interrupt numbers that the > > guest can use for any purpose. Since one obvious purpose is IPI, its > > first range should just be able to accomodate all possible vCPUs. > > To clarify, PAPR says it is a requirement : > > "The first range will contain at least one per possible thread in the > partition." > > The regression showed that we were not initializing correctly this range > in QEMU/KVM. I an not even sure it had the correct size either since > we were anyhow initializing 4096 IPIs. > The bad thing was that each online vCPU would reset it's IPI in KVM using a bogus IPI number (the vCPU id), and thus doesn't reset the interrupt the guest is really going to use for the IPI. > > In the case of QEMU, we define a single range starting from 0 up > > to "nr_server" but we should rather size it to the number of vCPUs > > actually (ie. smp.max_cpus). > > ah. And so spapr_max_server_number(spapr) is crap ? This is starting > to be complex to follow :/ > No. spapr_max_server_number(spapr) gives the highest vCPU id that we end over to KVM in order to optimize VP id allocation in the HW. But it definitely has nothing to do with "ibm,xive-lisn-ranges". David suggested in some other mail that we could maybe pass both spapr_max_server_number(spapr) and smp.max_cpus to the activate() handler. > > This series aims at getting rid of the "nr_server" argument in the > > sPAPR IC handlers. Since both the highest possible vCPU id and the > > maximum number of vCPUs are invariants for XICS and XIVE respectively, > > What XIVE cares about is the number of possible IPIs and the number > of vCPUs since we deduced from that the number of event queue > descriptors, which is another XIVE structure. > > > let's make them device properties to be configured by the machine when > > it creates the interrupt controllers and use them where needed. > > > > This doesn't cause any visible change to setups using the default > > VSMT machine settings. This changes "ibm,xive-lisn-ranges" for > > setups that mess with VSMT, but this is acceptable since linux > > only allocates one interrupt per vCPU and the higher part of the > > range was never used. > > This range is only used for vCPUs. > > C. > > > [1] > > https://git.qemu.org/?p=qemu.git;a=commit;h=6d24795ee7e3199d199d3c415312c93382ad1807 > > > > Greg Kurz (8): > > spapr/xive: Turn some sanity checks into assertions > > spapr/xive: Introduce spapr_xive_nr_ends() > > spapr/xive: Add "nr-servers" property > > spapr/xive: Add "nr-ipis" property > > spapr/xics: Drop unused argument to xics_kvm_has_broken_disconnect() > > spapr/xics: Add "nr-servers" property > > spapr: Drop "nr_servers" argument of the sPAPR IC activate() operation > > spapr: Drop "nr_servers" argument of the sPAPR IC dt() operation > > > > include/hw/ppc/spapr.h | 4 +-- > > include/hw/ppc/spapr_irq.h | 9 ++--- > > include/hw/ppc/spapr_xive.h | 25 ++++++++++++- > > include/hw/ppc/xics_spapr.h | 23 +++++++++--- > > hw/intc/spapr_xive.c | 72 ++++++++++++++++++++++--------------- > > hw/intc/spapr_xive_kvm.c | 4 +-- > > hw/intc/xics_kvm.c | 4 +-- > > hw/intc/xics_spapr.c | 45 ++++++++++++++--------- > > hw/ppc/spapr.c | 7 ++-- > > hw/ppc/spapr_irq.c | 27 +++++++------- > > 10 files changed, 141 insertions(+), 79 deletions(-) > > >