Thanks for your kind and quick reply. > I think this should be called later in the reset path IMO. > You should call ice_deinit_rdma in ice_prepare_for_reset (replace > ice_unplug_aux_dev), I'm afraid this would break the existing code because in ice_deinit_rdma(), it will remove some entries in pf->irq_tracker.entries. And in ice_reinit_interrupt_scheme() (which is called before ice_prepare_for_reset), some entries have been allocated for other irq usage.
> What effect does this have on resume time? When we call ice_init_rdma() at resume time, it will allocate entries at pf->irq_tracker.entries and update pf->msix_entries for later use (request_irq) by irdma. On Tue, 28 May 2024 at 19:12, Paul Menzel <pmen...@molgen.mpg.de> wrote: > > Dear Ricky, > > > Thank you for your patch. Some minor nits. It’d be great if you made the > summary about the action and not an issue description. Maybe: > > Avoid IRQ collision to fix init failure on ACPI S3 resume > > Am 28.05.24 um 12:03 schrieb Ricky Wu: > > A bug in https://bugzilla.kernel.org/show_bug.cgi?id=218906 describes > > that irdma would break and report hardware initialization failed after > > suspend/resume with Intel E810 NIC (tested on 6.9.0-rc5). > > > > The problem is caused due to the collision between the irq numbers > > requested in irdma and the irq numbers requested in other drivers > > after suspend/resume. > > > > The irq numbers used by irdma are derived from ice's ice_pf->msix_entries > > which stores mappings between MSI-X index and Linux interrupt number. > > It's supposed to be cleaned up when suspend and rebuilt in resume but > > it's not, causing irdma using the old irq numbers stored in the old > > ice_pf->msix_entries to request_irq() when resume. And eventually > > collide with other drivers. > > > > This patch fixes this problem. On suspend, we call ice_deinit_rdma() to > > clean up the ice_pf->msix_entries (and free the MSI-X vectors used by > > irdma if we've dynamically allocated them). On Resume, we call > > resume > > > ice_init_rdma() to rebuild the ice_pf->msix_entries (and allocate the > > MSI-X vectors if we would like to dynamically allocate them). > > > > Signed-off-by: Ricky Wu <en-wei...@canonical.com> > > Please add a Link: tag. > > If this was tested by somebody else, please also add a Tested-by: line. > > > --- > > drivers/net/ethernet/intel/ice/ice_main.c | 6 +++++- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c > > b/drivers/net/ethernet/intel/ice/ice_main.c > > index f60c022f7960..ec3cbadaa162 100644 > > --- a/drivers/net/ethernet/intel/ice/ice_main.c > > +++ b/drivers/net/ethernet/intel/ice/ice_main.c > > @@ -5544,7 +5544,7 @@ static int ice_suspend(struct device *dev) > > */ > > disabled = ice_service_task_stop(pf); > > > > - ice_unplug_aux_dev(pf); > > + ice_deinit_rdma(pf); > > > > /* Already suspended?, then there is nothing to do */ > > if (test_and_set_bit(ICE_SUSPENDED, pf->state)) { > > @@ -5624,6 +5624,10 @@ static int ice_resume(struct device *dev) > > if (ret) > > dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); > > > > + ret = ice_init_rdma(pf); > > + if (ret) > > + dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", > > ret); > > + > > clear_bit(ICE_DOWN, pf->state); > > /* Now perform PF reset and rebuild */ > > reset_type = ICE_RESET_PFR; > > What effect does this have on resume time? > > > Kind regards, > > Paul