On 9/19/2023 10:18 PM, Paul Menzel wrote: > Dear Jesse, > > > Thank you for your patch. > > Am 19.09.23 um 23:29 schrieb Jesse Brandeburg: >> When booting into the crash dump kernels there are cases where upon >> enabling the device, the system under test will panic or machine check. >> >> One such test is to >> - load ice driver >> $ modprobe ice >> - enable SR-IOV (2 VFs) >> $ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs >> - crash >> echo c > /proc/sysrq-trigger > > Above you prepended a $.
Fixed in v2. > >> - load ice driver (or happens automatically) >> modprobe ice >> - crash during pcim_enable_device() >> >> Avoid this problem by issuing a FLR to the device via PCIe config space >> on the crash kernel, to clear out any outstanding transactions and stop >> all queues and interrupts. Restore config space afterword because the > > afterw*a*rd Fixed in v2. > >> driver won't load successfully otherwise. > > Excuse my ignorance, could you please add, what the crashdump kernel > does differently from the “normal” kernel, so this special handling is > needed? I added more description in the v2 commit message, I hope that helps. In summary: the crashdump kernel is starting up on "dirty" state of hardware, due to the surprise crash of the previously running kernel that had running devices when it "panicked" > >> Reviewed-by: Przemek Kitszel <przemyslaw.kits...@intel.com> >> Signed-off-by: Jesse Brandeburg <jesse.brandeb...@intel.com> >> --- >> drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c >> b/drivers/net/ethernet/intel/ice/ice_main.c >> index c8286adae946..6550c46e4e36 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_main.c >> +++ b/drivers/net/ethernet/intel/ice/ice_main.c >> @@ -6,6 +6,7 @@ >> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt >> #include <generated/utsrelease.h> >> +#include <linux/crash_dump.h> >> #include "ice.h" >> #include "ice_base.h" >> #include "ice_lib.h" >> @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct >> pci_device_id __always_unused *ent) >> return -EINVAL; >> } >> + /* when under a kdump kernel initiate a reset before enabling the >> + * device in order to clear out any pending DMA transactions. These >> + * transactions can cause some systems to machine check when doing >> + * the pcim_enable_device() below. >> + */ >> + if (is_kdump_kernel()) { >> + pci_save_state(pdev); >> + pci_clear_master(pdev); >> + err = pcie_flr(pdev); >> + if (err) >> + return err; >> + pci_restore_state(pdev); >> + } >> + > > Should this be added to the common PCI code? Maybe loop the PCI > subsystem folks in? Ok, I'll cc: linux-pci when I send v2. > >> /* this driver uses devres, see >> * Documentation/driver-api/driver-model/devres.rst >> */ > > > Kind regards, > > Paul _______________________________________________ Intel-wired-lan mailing list Intel-wired-lan@osuosl.org https://lists.osuosl.org/mailman/listinfo/intel-wired-lan