> -----Original Message-----
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, April 09, 2019 11:04 PM
> To: Zhuangyanying <ann.zhuangyany...@huawei.com>
> Cc: marcel.apfelb...@gmail.com; qemu-devel@nongnu.org; Gonglei (Arei)
> <arei.gong...@huawei.com>
> Subject: Re: [PATCH] msix: fix interrupt aggregation problem at the
> passthrough of NVMe SSD
>
> On Tue, Apr 09, 2019 at 02:14:56PM +0000, Zhuangyanying wrote:
> > From: Zhuang Yanying <ann.zhuangyany...@huawei.com>
> >
> > Recently I tested the performance of NVMe SSD passthrough and found
> > that interrupts were aggregated on vcpu0(or the first vcpu of each
> > numa) by /proc/interrupts,when GuestOS was upgraded to sles12sp3 (or
> > redhat7.6). But /proc/irq/X/smp_affinity_list shows that the interrupt is
> spread out, such as 0-10, 11-21,.... and so on.
> > This problem cannot be resolved by "echo X >
> > /proc/irq/X/smp_affinity_list", because the NVMe SSD interrupt is
> > requested by the API pci_alloc_irq_vectors(), so the interrupt has the
> IRQD_AFFINITY_MANAGED flag.
> >
> > GuestOS sles12sp3 backport "automatic interrupt affinity for MSI/MSI-X
> > capable devices", but the implementation of __setup_irq has no
> > corresponding modification. It is still irq_startup(), then
> > setup_affinity(), that is sending an affinity message when the interrupt is
> unmasked.
>
> So does latest upstream still change data/address of an unmasked vector?
>
The latest upstream works fine. Affinity configuration is successful.
The original order at my understanding is
nvme_setup_io_queues()
\ \
\ --->pci_alloc_irq_vectors_affinity()
\ \
\ -> msi_domain_alloc_irqs()
\ \ /* if IRQD_AFFINITY_MANAGED, then "mask = affinity " */
\ -> ...-> __irq_alloc_descs()
\ \ /* cpumask_copy(desc->irq_common_data.affinity,
affinity); */
\ -> ...-> desc_smp_init()
->request_threaded_irq()
\
->__setup_irq()
\ \
\ ->irq_startup()->msi_domain_activate()
\ \
\ ->irq_enable()->pci_msi_unmask_irq()
\
-->setup_affinity()
\ \
\ -->if (irqd_affinity_is_managed(&desc->irq_data))
\ set = desc->irq_common_data.affinity;
\ cpumask_and(mask, cpu_online_mask, set);
\
-->irq_do_set_affinity()
\
-->msi_domain_set_affinity()
\ /* Actual setting affinity*/
-->__pci_write_msi_msg()
Afetr commit e43b3b58:" genirq/cpuhotplug: Enforce affinity setting on startup
of managed irqs "
@@ -265,8 +265,8 @@ int irq_startup(struct irq_desc *desc, bool resend, bool
force)
irq_setup_affinity(desc);
break;
case IRQ_STARTUP_MANAGED:
+ irq_do_set_affinity(d, aff, false);
ret = __irq_startup (desc);
- irq_set_affinity_locked(d, aff, false);
break;
First irq_do_set_affinity(), then __irq_startup(),that is unmask irq.
> > The bare metal configuration is successful, but qemu will not trigger
> > the msix update, and the affinity configuration fails.
> > The affinity is configured by /proc/irq/X/smp_affinity_list,
> > implemented at apic_ack_edge(), the bitmap is stored in pending_mask,
> > mask->__pci_write_msi_msg()->unmask,
> > and the timing is guaranteed, and the configuration takes effect.
> >
> > The GuestOS linux-master incorporates the "genirq/cpuhotplug: Enforce
> > affinity setting on startup of managed irqs" to ensure that the
> > affinity is first issued and then __irq_startup(), for the managerred
> > interrupt. So configuration is successful.
> >
> > It now looks like sles12sp3 (up to sles15sp1, linux-4.12.x), redhat7.6
> > (3.10.0-957.10.1) does not have backport the patch yet.
>
> Sorry - which patch?
genirq/cpuhotplug: Enforce affinity setting on startup of managed irqs
commit e43b3b58
>
> > "if (is_masked == was_masked) return;" can it be removed at qemu?
> > What is the reason for this check?
> >
> > Signed-off-by: Zhuang Yanying <ann.zhuangyany...@huawei.com>
> > ---
> > hw/pci/msix.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/hw/pci/msix.c b/hw/pci/msix.c index 4e33641..e1ff533
> > 100644
> > --- a/hw/pci/msix.c
> > +++ b/hw/pci/msix.c
> > @@ -119,10 +119,6 @@ static void msix_handle_mask_update(PCIDevice
> > *dev, int vector, bool was_masked) {
> > bool is_masked = msix_is_masked(dev, vector);
> >
> > - if (is_masked == was_masked) {
> > - return;
> > - }
> > -
> > msix_fire_vector_notifier(dev, vector, is_masked);
> >
> > if (!is_masked && msix_is_pending(dev, vector)) {
>
>
> To add to that, notifiers generally assume that updates come in pairs, unmask
> followed by mask.
>
>
>
> > --
> > 1.8.3.1
> >