On 9/19/2023 10:18 PM, Paul Menzel wrote:
> Dear Jesse,
> 
> 
> Thank you for your patch.
> 
> Am 19.09.23 um 23:29 schrieb Jesse Brandeburg:
>> When booting into the crash dump kernels there are cases where upon
>> enabling the device, the system under test will panic or machine check.
>>
>> One such test is to
>> - load ice driver
>> $ modprobe ice
>> - enable SR-IOV (2 VFs)
>> $ echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
>> - crash
>> echo c > /proc/sysrq-trigger
> 
> Above you prepended a $.

Fixed in v2.

> 
>> - load ice driver (or happens automatically)
>> modprobe ice
>> - crash during pcim_enable_device()
>>
>> Avoid this problem by issuing a FLR to the device via PCIe config space
>> on the crash kernel, to clear out any outstanding transactions and stop
>> all queues and interrupts. Restore config space afterword because the
> 
> afterw*a*rd

Fixed in v2.


> 
>> driver won't load successfully otherwise.
> 
> Excuse my ignorance, could you please add, what the crashdump kernel
> does differently from the “normal” kernel, so this special handling is
> needed?

I added more description in the v2 commit message, I hope that helps.

In summary: the crashdump kernel is starting up on "dirty" state of
hardware, due to the surprise crash of the previously running kernel
that had running devices when it "panicked"

> 
>> Reviewed-by: Przemek Kitszel <przemyslaw.kits...@intel.com>
>> Signed-off-by: Jesse Brandeburg <jesse.brandeb...@intel.com>
>> ---
>>   drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
>> b/drivers/net/ethernet/intel/ice/ice_main.c
>> index c8286adae946..6550c46e4e36 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>> @@ -6,6 +6,7 @@
>>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>     #include <generated/utsrelease.h>
>> +#include <linux/crash_dump.h>
>>   #include "ice.h"
>>   #include "ice_base.h"
>>   #include "ice_lib.h"
>> @@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct
>> pci_device_id __always_unused *ent)
>>           return -EINVAL;
>>       }
>>   +    /* when under a kdump kernel initiate a reset before enabling the
>> +     * device in order to clear out any pending DMA transactions. These
>> +     * transactions can cause some systems to machine check when doing
>> +     * the pcim_enable_device() below.
>> +     */
>> +    if (is_kdump_kernel()) {
>> +        pci_save_state(pdev);
>> +        pci_clear_master(pdev);
>> +        err = pcie_flr(pdev);
>> +        if (err)
>> +            return err;
>> +        pci_restore_state(pdev);
>> +    }
>> +
> 
> Should this be added to the common PCI code? Maybe loop the PCI
> subsystem folks in?

Ok, I'll cc: linux-pci when I send v2.


> 
>>       /* this driver uses devres, see
>>        * Documentation/driver-api/driver-model/devres.rst
>>        */
> 
> 
> Kind regards,
> 
> Paul

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Reply via email to