On Sat, May 09, 2020 at 09:32:51AM +0300, Igor Russkikh wrote: > > > This makes use of the new module_firmware_crashed() to help > > annotate when firmware for device drivers crash. When firmware > > crashes devices can sometimes become unresponsive, and recovery > > sometimes requires a driver unload / reload and in the worst cases > > a reboot. > > > > Using a taint flag allows us to annotate when this happens clearly. > > > > Cc: Ariel Elior <ael...@marvell.com> > > Cc: gr-everest-linux...@marvell.com > > Signed-off-by: Luis Chamberlain <mcg...@kernel.org> > > --- > > drivers/net/ethernet/qlogic/qed/qed_debug.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c > > b/drivers/net/ethernet/qlogic/qed/qed_debug.c > > index f4eebaabb6d0..9cc6287b889b 100644 > > --- a/drivers/net/ethernet/qlogic/qed/qed_debug.c > > +++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c > > @@ -7854,6 +7854,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > REGDUMP_HEADER_SIZE, > > &feature_size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, > > PROTECTION_OVERRIDE, > > cur_engine, > > @@ -7869,6 +7870,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > rc = qed_dbg_fw_asserts(cdev, (u8 *)buffer + offset + > > REGDUMP_HEADER_SIZE, > > &feature_size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, FW_ASSERTS, > > cur_engine, > > feature_size, > > @@ -7906,6 +7908,7 @@ int qed_dbg_all_data(struct qed_dev *cdev, void > > *buffer) > > rc = qed_dbg_grc(cdev, (u8 *)buffer + offset + > > REGDUMP_HEADER_SIZE, &feature_size); > > if (!rc) { > > + module_firmware_crashed(); > > *(u32 *)((u8 *)buffer + offset) = > > qed_calc_regdump_header(cdev, GRC_DUMP, > > cur_engine, > > > Hi Luis, > > qed_dbg_all_data is being used to gather debug dump from device. Failures > inside it may happen due to various reasons, but they normally do not indicate > FW failure. > > So I think its not a good place to insert this call. > Its hard to find exact good place to insert it in qed.
Is there a way to check if what happened was indeed a fw crash? > One more thing is that AFAIU taint flag gets permanent on kernel, but for > example our device can recover itself from some FW crashes, thus it'd be > transparent for user. Similar things are *supposed* to recoverable with other device, however this can also sometimes lead to a situation where devices are not usable anymore, and require a full driver unload / load. > Whats the logical purpose of module_firmware_crashed? Does it mean fatal > unrecoverable error on device? Its just to annotate on the module and kernel that this has happened. I take it you may agree that, firmware crashing *often* is not good design, and these issues should be reported to / fixed by vendors. In cases where driver bugs are reported it is good to see if a firmware crash has happened before, so that during analysis this is ruled out. Luis