I’m going to also post this to the Ceph list since it seems to only happen when I have a cephfs volume mounted from a cloudstack instance.
Attempting to rsync a large file to the Ceph volume, the instance becomes unresponsive at the network level. It eventually returns but it will continually drop offline as the file copies. Dmesg shows this: [ 7144.888744] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT <d0> next_to_use <d0> next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <100687140> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7146.872563] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT <d0> next_to_use <d0> next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <100687900> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7148.856703] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <80> TDT <d0> next_to_use <d0> next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <100686d46> next_to_watch <80> jiffies <1006880c0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> [ 7150.199756] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly The host machine: System Information Manufacturer: Dell Inc. Product Name: OptiPlex 990 Running CentOS 8.4. I also see the same error on another host of a different hw type: Manufacturer: Hewlett-Packard Product Name: HP Compaq 8200 Elite SFF PC but both are using e1000 drivers. I upgraded the kernel to 5.13.x and I thought this fixed the issue, but now I see the error again. Migrating the instance to a bigger server class machine (also e1000e, old Rackable system) where I have a bigger pipe via bonding, I don’t seem to have the issue. Just curious if this could be a known bug with e1000e and if there is any kind of work around. Thanks -jeremy
signature.asc
Description: PGP signature
