Hi Christoph,

We are facing an issue with masked MSIX vectors received while trying to get 
pci vectors when BLK/SCSI-MQ is enabled when number of CPUs are lesser than the 
available MSIX vectors. For our ISP25xx chipset, hardware supports 32 MSIX 
vectors with MQ enabled. We originally found this issue on system using RH8.0 
kernel which is at 4.19 version. The system that failed has 12 CPUs and maximum 
MSIX vectors requested were 32.
We observed with new pci_alloc_irq_affinity() callback driver is returning 32 
vectors when system has only 12 CPUs. As far as we understand, this call should 
have returned maximum 14 MSIX vectors (12 for CPU affinity + 2 reserved in 
.pre_vectors of irq_affinity structure). Also, we see that vectors returned 
include masked ones. Since driver received 32 vectors, We create 30 qpairs (2 
less for reserved). In this scenario, we observed that on some qpairs, driver 
is not able to process interrupt because CPUs are masked at the PCI layer. 
Looking at the code, we noticed that ‘pre/post’ vectors sets in struct 
irq_affinity don’t appear to help here.

From below call we should get only online_cpus() + reserved number of vectors 
back while requesting number of vectors, instead we get back numbers that 
driver requested.
 
int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
                                   unsigned int max_vecs, unsigned int flags,
                                   const struct irq_affinity *affd)
{
        if (flags & PCI_IRQ_MSIX) {
                vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
                                affd);
                if (vecs > 0)
                        return vecs;
        }
}
 
static int __pci_enable_msix_range(struct pci_dev *dev,
                                   struct msix_entry *entries, int minvec,
                                   int maxvec, const struct irq_affinity *affd)
{
        for (;;) {
                if (affd) {
                        nvec = irq_calc_affinity_vectors(minvec, nvec, affd);
                        if (nvec < minvec)
                                return -ENOSPC;
                }
}

Which in-turn calls irq_calc_affinity_vectors(), Which should return min of 
num_online_cpus() + resv
 
/**
* irq_calc_affinity_vectors - Calculate the optimal number of vectors
* @minvec:     The minimum number of vectors available
* @maxvec:     The maximum number of vectors available
* @affd:       Description of the affinity requirements
*/
int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity 
*affd)
{
        int resv = affd->pre_vectors + affd->post_vectors;
        int vecs = maxvec - resv;
        int ret;
        if (resv > minvec)
                return 0;
        get_online_cpus();
        ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv;
        put_online_cpus();
        return ret;
}

We do see the same using 4.20.0-rc6 kernel. See below table, we experimented by 
forcing maxcpu parameter to expose lower number of CPUs than vectors requested.

Upstream - 4.20-rc6         
 
                              MaxCPU=     Cores       Result    
      MQ Enabled  ISP25xx     Unset       48          Pass
      MQ Enabled  ISP25xx     2           24          Failed
      MQ Enabled  ISP25xx     4           30          Failed
      MQ Enabled  ISP27xx     Unset       48          Pass
      MQ Enabled  ISP27xx     2           24          Failed
      MQ Enabled  ISP27xx     4           30          Failed
      
Note that RH8.0 kernel which has the code from 4.19 kernel behaves the same 
way. We have not be able to do extensive testing with SLES.
We want to make sure we are reading this code right and our understanding is 
right. If not, please advise the right expectations and what changes are needed 
to address this.

In case our understanding is right, whether we have any known issue in this 
area in 4.19 kernel which got addressed in 4.20-rc6 kernel. If yes, can you 
please point us to the commit message. If not, what additional data is needed 
to debug this further. We have captured PCIe trace and ruled out any issues at 
hardware/firmware level and we also see that the MSIX vector associate with the 
queue pair where we are not getting interrupts is masked.

We want to understand how to calculate IRQ vectors that driver can request in 
such scenario.

Thanks,
Himanshu

Reply via email to