On 2025-03-04 11:33 p.m., Arinzon, David wrote:
[RE-SEND] I just realized I sent this only to iwl, sorry for spamming.


On 2025-03-03 10:11 a.m., Arinzon, David wrote:
Use the core's rmap notifiers and delete our own.

Acked-by: David Arinzon <darin...@amazon.com>
Signed-off-by: Ahmed Zaki <ahmed.z...@intel.com>
---
   drivers/net/ethernet/amazon/ena/ena_netdev.c | 43 +-------------------
   1 file changed, 1 insertion(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index c1295dfad0d0..6aab85a7c60a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -5,9 +5,6 @@

   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

-#ifdef CONFIG_RFS_ACCEL
-#include <linux/cpu_rmap.h>
-#endif /* CONFIG_RFS_ACCEL */
   #include <linux/ethtool.h>
   #include <linux/kernel.h>
   #include <linux/module.h>
@@ -162,30 +159,6 @@ int ena_xmit_common(struct ena_adapter
*adapter,
          return 0;
   }

-static int ena_init_rx_cpu_rmap(struct ena_adapter *adapter) -{
-#ifdef CONFIG_RFS_ACCEL
-       u32 i;
-       int rc;
-
-       adapter->netdev->rx_cpu_rmap = alloc_irq_cpu_rmap(adapter-
num_io_queues);
-       if (!adapter->netdev->rx_cpu_rmap)
-               return -ENOMEM;
-       for (i = 0; i < adapter->num_io_queues; i++) {
-               int irq_idx = ENA_IO_IRQ_IDX(i);
-
-               rc = irq_cpu_rmap_add(adapter->netdev->rx_cpu_rmap,
-                                     pci_irq_vector(adapter->pdev, irq_idx));
-               if (rc) {
-                       free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
-                       adapter->netdev->rx_cpu_rmap = NULL;
-                       return rc;
-               }
-       }
-#endif /* CONFIG_RFS_ACCEL */
-       return 0;
-}
-
   static void ena_init_io_rings_common(struct ena_adapter *adapter,
                                       struct ena_ring *ring, u16 qid)
{ @@ -1596,7 +1569,7 @@ static int ena_enable_msix(struct ena_adapter
*adapter)
                  adapter->num_io_queues = irq_cnt - ENA_ADMIN_MSIX_VEC;
          }

-       if (ena_init_rx_cpu_rmap(adapter))
+       if (netif_enable_cpu_rmap(adapter->netdev,
+ adapter->num_io_queues))
                  netif_warn(adapter, probe, adapter->netdev,
                             "Failed to map IRQs to CPUs\n");

@@ -1742,13 +1715,6 @@ static void ena_free_io_irq(struct ena_adapter
*adapter)
          struct ena_irq *irq;
          int i;

-#ifdef CONFIG_RFS_ACCEL
-       if (adapter->msix_vecs >= 1) {
-               free_irq_cpu_rmap(adapter->netdev->rx_cpu_rmap);
-               adapter->netdev->rx_cpu_rmap = NULL;
-       }
-#endif /* CONFIG_RFS_ACCEL */
-
          for (i = ENA_IO_IRQ_FIRST_IDX; i <
ENA_MAX_MSIX_VEC(io_queue_count); i++) {
                  irq = &adapter->irq_tbl[i];
                  irq_set_affinity_hint(irq->vector, NULL); @@
-4131,13 +4097,6 @@ static void __ena_shutoff(struct pci_dev *pdev,
bool shutdown)
          ena_dev = adapter->ena_dev;
          netdev = adapter->netdev;

-#ifdef CONFIG_RFS_ACCEL
-       if ((adapter->msix_vecs >= 1) && (netdev->rx_cpu_rmap)) {
-               free_irq_cpu_rmap(netdev->rx_cpu_rmap);
-               netdev->rx_cpu_rmap = NULL;
-       }
-
-#endif /* CONFIG_RFS_ACCEL */
          /* Make sure timer and reset routine won't be called after
           * freeing device resources.
           */
--
2.43.0

Hi Ahmed,

After the merging of this patch, I see the below stack trace when the IRQs
are freed.
It can be reproduced by unloading and loading the driver using
`modprobe -r ena; modprobe ena` (happens during unload)

Based on the patchset and the changes to other drivers, I think
there's a missing call to the function that releases the affinity
notifier (The warn is in
https://web.git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.gi
t/tree/kernel/irq/manage.c#n2031)

I saw in the intel code in the patchset that ` netif_napi_set_irq(<napi>, -1);`
is added?

After adding the code snippet I don't see this anymore, but I want to
understand whether it's the right call by design.

Yes, in ena_down() the IRQs are freed before napis are deleted (where IRQ
notifiers are released). The code below is fine (and is better IMO) but you
can also delete napis then free IRQs.



Thanks for the clarification. Some book-keeping, as this change fixes the issue.
The need to use `netif_napi_set_irq` was introduced in 
https://lore.kernel.org/netdev/20241002001331.65444-2-jdam...@fastly.com/,
But, technically, there was not need to use the call with the -1 until the 
introduction of this patch.
Is my understanding correct?

Correct. The new patch attaches resources (IRQ notifieres) to the napi instance that should be released before freeing IRQs.


If it's correct, then the fix is for this patch.

(Also adding Joe who authored the mentioned patch)


I guess so since there was no need to call set_irq(-1) previoulsy.


Reply via email to