Dear Jesse,

Thank you for your patch.

Am 19.09.23 um 23:29 schrieb Jesse Brandeburg:
When booting into the crash dump kernels there are cases where upon
enabling the device, the system under test will panic or machine check.

One such test is to
- load ice driver
$ modprobe ice
- enable SR-IOV (2 VFs)
$ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
- crash
echo c > /proc/sysrq-trigger

Above you prepended a $.

- load ice driver (or happens automatically)
modprobe ice
- crash during pcim_enable_device()

Avoid this problem by issuing a FLR to the device via PCIe config space
on the crash kernel, to clear out any outstanding transactions and stop
all queues and interrupts. Restore config space afterword because the

afterw*a*rd

driver won't load successfully otherwise.

Excuse my ignorance, could you please add, what the crashdump kernel does differently from the “normal” kernel, so this special handling is needed?

Reviewed-by: Przemek Kitszel <przemyslaw.kits...@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeb...@intel.com>
---
  drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
  1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index c8286adae946..6550c46e4e36 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -6,6 +6,7 @@
  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <generated/utsrelease.h>
+#include <linux/crash_dump.h>
  #include "ice.h"
  #include "ice_base.h"
  #include "ice_lib.h"
@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct 
pci_device_id __always_unused *ent)
                return -EINVAL;
        }
+ /* when under a kdump kernel initiate a reset before enabling the
+        * device in order to clear out any pending DMA transactions. These
+        * transactions can cause some systems to machine check when doing
+        * the pcim_enable_device() below.
+        */
+       if (is_kdump_kernel()) {
+               pci_save_state(pdev);
+               pci_clear_master(pdev);
+               err = pcie_flr(pdev);
+               if (err)
+                       return err;
+               pci_restore_state(pdev);
+       }
+

Should this be added to the common PCI code? Maybe loop the PCI subsystem folks in?

        /* this driver uses devres, see
         * Documentation/driver-api/driver-model/devres.rst
         */


Kind regards,

Paul
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Reply via email to