From: Paul Menzel <pmen...@molgen.mpg.de> Sent: Monday, February 10, 2025 1:07 PM >Dear Jedrzej, > > >Thank you for the quick reply. > > >Am 10.02.25 um 12:59 schrieb Jagielski, Jedrzej: >> From: Paul Menzel <pmen...@molgen.mpg.de> >> Sent: Monday, February 10, 2025 12:40 PM > >>> Am 10.02.25 um 11:40 schrieb Jedrzej Jagielski: >>>> E610 NICs unlike the previous devices utilising ixgbe driver >>>> are notified in the case of overheatning by the FW ACI event. >>> >>> overheating (without n) >>> >>>> In event of overheat when threshold is exceeded, FW suspends all >>>> traffic and sends overtemp event to the driver. >>> >>> I guess this was already the behavior before your patch, and now it’s >>> only logged, and the device stopped properly? > >> Without this patch driver cannot receive the overtemp info. FW handles >> the event - device stops but user has no clue what has happened. >> FW does the major work but this patch adds this minimal piece of >> overtemp handling done by SW - informing user at the very end. > >Thank you for the confirmation. Maybe add a small note, that it’s not >logged to make it crystal clear for people like myself.
OK, i will extend the commit msg. > >>>> Then driver >>>> logs appropriate message and closes the adapter instance. >>>> The card remains in that state until the platform is rebooted. >>> >>> As a user I’d be interested what the threshold is, and what the measured >>> temperature is. Currently, the log seems to be just generic? >> >> These details are FW internals. >> Driver just gets info that such event has happened. >> There's no additional information. >> >> In that case driver's job is just to inform user that such scenario >> has happened and tell what should be the next steps. > > From a user perspective that is a suboptimal behavior, and shows >another time that the Linux kernel should have all the control, and >stuff like this should be moved *out* of the firmware and not into the >firmware. > >It’d be great if you could talk to the card designers/engineers to take >that into account, and maybe revert this decision for future versions or >future adapters. I will have it on my mind. Thanks, Jedrek > > >Kind regards, > >Paul > > >>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:static const char >>> ixgbe_overheat_msg[] = "Network adapter has been stopped because it has >>> over heated. Restart the computer. If the problem persists, power off the >>> system and replace the adapter"; >>> >>>> Reviewed-by: Przemek Kitszel <przemyslaw.kits...@intel.com> >>>> Reviewed-by: Mateusz Polchlopek <mateusz.polchlo...@intel.com> >>>> Signed-off-by: Jedrzej Jagielski <jedrzej.jagiel...@intel.com> >>>> --- >>>> v2,3 : commit msg tweaks >>>> --- >>>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +++++ >>>> drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h | 3 +++ >>>> 2 files changed, 8 insertions(+) >>> >>> >>> Kind regards, >>> >>> Paul >>> >>> >>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>>> index 7236f20c9a30..5c804948dd1f 100644 >>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c >>>> @@ -3165,6 +3165,7 @@ static void ixgbe_aci_event_cleanup(struct >>>> ixgbe_aci_event *event) >>>> static void ixgbe_handle_fw_event(struct ixgbe_adapter *adapter) >>>> { >>>> struct ixgbe_aci_event event __cleanup(ixgbe_aci_event_cleanup); >>>> + struct net_device *netdev = adapter->netdev; >>>> struct ixgbe_hw *hw = &adapter->hw; >>>> bool pending = false; >>>> int err; >>>> @@ -3185,6 +3186,10 @@ static void ixgbe_handle_fw_event(struct >>>> ixgbe_adapter *adapter) >>>> case ixgbe_aci_opc_get_link_status: >>>> ixgbe_handle_link_status_event(adapter, &event); >>>> break; >>>> + case ixgbe_aci_opc_temp_tca_event: >>>> + e_crit(drv, "%s\n", ixgbe_overheat_msg); >>>> + ixgbe_close(netdev); >>>> + break; >>>> default: >>>> e_warn(hw, "unknown FW async event captured\n"); >>>> break; >>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h >>>> b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h >>>> index 8d06ade3c7cd..617e07878e4f 100644 >>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h >>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type_e610.h >>>> @@ -171,6 +171,9 @@ enum ixgbe_aci_opc { >>>> ixgbe_aci_opc_done_alt_write = 0x0904, >>>> ixgbe_aci_opc_clear_port_alt_write = 0x0906, >>>> >>>> + /* TCA Events */ >>>> + ixgbe_aci_opc_temp_tca_event = 0x0C94, >>>> + >>>> /* debug commands */ >>>> ixgbe_aci_opc_debug_dump_internals = 0xFF08,