On Tue, Mar 21, 2017 at 11:08:34PM -0400, Bruce Momjian,,, wrote: > > The e1000e driver *does* have statistics for pause frames transmitted > > and received (run: "ethtool -S eth0| grep flow_control"). If you log > > these every second then it should be possible to see what happens > > around the time the TX watchdog fires. That could provide some clues > > as to whether the NIC is behaving correctly. > > OK, I am running this after setting flow control on/default on the > switch and Debian, and rebooting: > > daemon -- sh -c "while :; do date;ethtool -S eth0| grep flow_control; > sleep 1;done > /root/ethtool" > > I will report back with the relevant logging lines once it hangs again.
OK, I have results of a hang after 24 hours of uptime. The hangs are listed here via dmesg -T: http://momjian.us/expire/eth0/dmesg.txt showing the watchdog warning/hang/reset at 23:01 and port hang/reset at 23:10. I have also produced the ethtool -S output every second for the entire 24-hour period, gziped, at: http://momjian.us/expire/eth0/ethtool.gz You will see reception of a large number of rx_flow_control_xoff messages about 50 minutes before the hangs, and just before the hangs. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +