So there is a very nasty bug in the e1000e network card
driver.
I am running Debian 12 Bookworm.
You will get the message "Detected Hardware Unit Hang" and then
the network card just stops working.
This is a built in NIC on the computer
The computer is a is a HP Prodesk 600 G4 MT
This is the mini tower version as denoted by the MT.
This log comes from my /var/log/syslog.
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] e1000e 0000:00:1f.6 eth1:
Detected Hardware Unit Hang:
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] TDH <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] TDT <ed>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_use <ed>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_clean <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] buffer_info[next_to_clean]:
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] time_stamp
<1001c6345>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_watch <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] jiffies
<1001c6550>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_watch.status <0>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] MAC Status
<80083>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY Status <796d>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY 1000BASE-T Status <3800>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY Extended Status <3000>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PCI Status <10>
Apr 15 01:57:13 gateway vmunix: [ 7744.123237] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:13 gateway vmunix: [ 7744.417235] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:14 gateway vmunix: [ 7745.412183] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:14 gateway vmunix: [ 7745.659234] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] e1000e 0000:00:1f.6 eth1:
Detected Hardware Unit Hang:
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] TDH <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] TDT <ed>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_use <ed>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_clean <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] buffer_info[next_to_clean]:
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] time_stamp
<1001c6345>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_watch <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] jiffies
<1001c6740>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_watch.status <0>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] MAC Status
<80083>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY Status <796d>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY 1000BASE-T Status <3800>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY Extended Status <3000>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PCI Status <10>
Apr 15 01:57:15 gateway vmunix: [ 7746.220253] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:15 gateway vmunix: [ 7746.485268] net-fw DROP IN=eth0 OUT=
MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219
DST=199.126.41.116 LE>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] e1000e 0000:00:1f.6 eth1:
Detected Hardware Unit Hang:
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] TDH <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] TDT <ed>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_use <ed>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_clean <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] buffer_info[next_to_clean]:
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] time_stamp
<1001c6345>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_watch <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] jiffies
<1001c6938>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_watch.status <0>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] MAC Status
<80083>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY Status <796d>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY 1000BASE-T Status <3800>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY Extended Status <3000>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PCI Status <10>
It does this multiple times and the network interface in this case eth1
becomes
unstable and just stops responding now I can't have that because this
computer
is being used as a gateway. Usually what you have to do at that point
is reboot the
machine.
uname -a
Linux gateway 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1
(2024-04-11) x86_64 GNU/Linux
This is a gigabit network card as I said it is a built in NIC I believe
it is an Intel NIC.
ethtool --show-eee eth1
EEE settings for eth1:
EEE status: enabled - inactive
Tx LPI: 17 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: Not reported
ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini: n/a
RX Jumbo: n/a
TX: 4096
Current hardware settings:
RX: 256
RX Mini: n/a
RX Jumbo: n/a
TX: 256
RX Buf Len: n/a
CQE Size: n/a
TX Push: off
TCP data split: n/a
lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host
Bridge/DRAM Registers (rev 07)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2
[UHD Graphics 630]
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH
Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI
Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Cannon Lake PCH CNVi WiFi
(rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI
Controller (rev 10)
00:16.3 Serial controller: Intel Corporation Cannon Lake PCH Active
Management Technology - SOL (rev 10)
00:17.0 RAID bus controller: Intel Corporation SATA Controller [RAID
mode] (rev 10)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root
Port #6 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Q370 Chipset LPC/eSPI Controller
(rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI
Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7)
I219-LM (rev 10)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
lspci -vn
00:1f.6 0200: 8086:15bb (rev 10)
DeviceName: Onboard Lan
Subsystem: 103c:83ed
Flags: bus master, fast devsel, latency 0, IRQ 123, IOMMU group 8
Memory at f1180000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: e1000e
Kernel modules: e1000e
This seems to happen when you are actually pushing a bit of traffic
though it not a lot but just even a little bit. It isn't network overload
or anything I am barely doing anything really but it will do this.
I have already tried the following
ethtool -K eth1 tx off rx off
ethtool -K eth1 tso off gso off
ethtool -K eth1 gso off gro off tso off tx off rx off rxvlan off txvlan
off sg off
I have disabled all power management in the bios as well including the one
for ASPM
I added the following to grub
pcie_aspm=off e1000e.SmartPowerDownEnable=0
This is in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off
e1000e.SmartPowerDownEnable=0"
Then I did an update-grub as well.
None of this has worked in fixing this problem. I am still getting the
same issue.
Can you please fix this issue this is a really nasty problem with Debian
12 (Bookworm)
I am seeing this being reported back in Kernel 5.3.x but i am not seeing any
reports for 6.1.x about this issue.
Debian Bug report logs - #945912
Kernel 5.3 e100e Detected Hardware Unit Hang
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945912
To reproduce this I mean if you had the same type of computer with the
same NIC in it.
and installed Debian 12 it will happen.
This should go to your kernel team I believe as this is an issue with
the kernel driver
module for this NIC.
Any response should be done via email only on this bug please.
Please reply back and confirm that you got this email and that you are
looking
into this problem please.
Thank you,
Jamie (she / her)
--
This email message, including any attachments, is for the intended recipient(s)
only
and may contain information that is privileged, confidential and/or exempt from
disclosure under applicable law. If you have received this message in error, or
are
obviously not one of the intended recipients, please immediately notify the
sender
by reply email and delete this email message, including any attachments.
All information in this email including any attachment(s)
is to be kept in strict confidence and is not to be released
to anyone without my prior written consent.