I’m going to also post this to the Ceph list since it seems to only happen when 
I have a cephfs volume mounted from a cloudstack instance.

Attempting to rsync a large file to the Ceph volume, the instance becomes 
unresponsive at the network level. It eventually returns but it will 
continually drop offline as the file copies. Dmesg shows this:

[ 7144.888744] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687140>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7146.872563] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687900>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7148.856703] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <1006880c0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7150.199756] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly

The host machine:

System Information
Manufacturer: Dell Inc.
Product Name: OptiPlex 990

Running CentOS 8.4.

I also see the same error on another host of a different hw type:

Manufacturer: Hewlett-Packard
Product Name: HP Compaq 8200 Elite SFF PC

but both are using e1000 drivers.

I upgraded the kernel to 5.13.x and I thought this fixed the issue, but now I 
see the error again.

Migrating the instance to a bigger server class machine (also e1000e, old 
Rackable system) where I have a bigger pipe via bonding, I don’t seem to have 
the issue.

Just curious if this could be a known bug with e1000e and if there is any kind 
of work around.

Thanks
-jeremy

Attachment: signature.asc
Description: PGP signature

Reply via email to