On Tue, May 28, 2024 at 06:03:15PM +0800, Ricky Wu wrote: > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > that irdma would break and report hardware initialization failed after > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > The problem is caused due to the collision between the irq numbers > requested in irdma and the irq numbers requested in other drivers > after suspend/resume. > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > which stores mappings between MSI-X index and Linux interrupt number. > It's supposed to be cleaned up when suspend and rebuilt in resume but > it's not, causing irdma using the old irq numbers stored in the old > ice_pf->msix_entries to request_irq() when resume. And eventually > collide with other drivers. > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > irdma if we've dynamically allocated them). On Resume, we call > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > MSI-X vectors if we would like to dynamically allocate them). > > Signed-off-by: Ricky Wu <en-wei...@canonical.com> > --- > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c > b/drivers/net/ethernet/intel/ice/ice_main.c > index f60c022f7960..ec3cbadaa162 100644 > --- a/drivers/net/ethernet/intel/ice/ice_main.c > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > */ > disabled = ice_service_task_stop(pf); > > - ice_unplug_aux_dev(pf); > + ice_deinit_rdma(pf); > > /* Already suspended?, then there is nothing to do */ > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > if (ret) > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > + ret = ice_init_rdma(pf); > + if (ret) > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", > ret); > + > clear_bit(ICE_DOWN, pf->state); > /* Now perform PF reset and rebuild */ > reset_type = ICE_RESET_PFR;
Looks fine, thanks for the fix. You can add fixes tag and target it to a net tree. Reviewed-by: Michal Swiatkowski <michal.swiatkow...@linux.intel.com> > -- > 2.43.0 >