On Mon, 28 Jan 2008 13:43:19 -0500 Tony Battersby <[EMAIL PROTECTED]> wrote:
> I am experiencing network tx hangs on a dual-port SK-9E22 with sky2 in > 2.6.24. The problem is triggered by both ports transmitting at high > speed simultaneously. This problem is 100% quickly reproducible. Here > is the setup: > > PC #1 with Intel PRO/1000 NIC: > e1000 IP address 192.168.1.1 > running iperf -s > > PC #2 with Intel PRO/1000 NIC: > e1000 IP address 192.168.2.1 > running iperf -s > > PC #3 with SysKonnect SK-9E22 (dual-port copper PCI-express) > sky2 IP address 192.168.1.2 > sky2 IP address 192.168.2.2 > > So basically, I have two PCs with Intel PRO/1000 NICs running "iperf > -s". Each of these Intel NICs is directly cabled to one of the two > ports of the SysKonnect NIC. > > When I run: > (PC #3 tty1) iperf -c 192.168.1.1 -t 30 > (wait for a second or two) > (PC #3 tty2) iperf -c 192.168.2.1 -t 30 > > "iperf -c 192.168.1.1" never finishes, but "iperf -c 192.168.2.1" does > finish. Press Ctrl-C to abort the hung iperf. Ping 192.168.1.1 does > not respond. Ping 192.168.2.1 does respond, but each ping has almost > exactly 1 second latency (the latency should be < 1 ms). > > When I switch the order of the tests, whichever iperf -c was started > _first_ is the one that locks up with no ping afterward, and whichever > was started _second_ is the one that finishes, but with a 1-second ping > latency afterward. So the problem follows the ordering of the tests > rather than a specific port. > > Also, the trigger seems to be transmitting, not receiving. If I run > "iperf -s" on the SysKonnect PC and "iperf -c" on the two Intel PRO/1000 > PCs, then the tests pass. > > When I do "ethtool -K eth0 rx on; ethtool -K eth1 rx on" to turn on rx > checksumming on both ports of the SysKonnect NIC, both tests pass > successfully. Commit 8b31cfbcd1b54362ef06c85beb40e65a349169a2 "sky2: > disable rx checksum on Yukon XL" disabled rx checksumming by default on > this NIC to get rid of some "hw csum failure" messages > (http://marc.info/?l=linux-netdev&m=119497815523843&w=4). However, this > seems to have exposed a different (and arguably worse) bug. > > I also tried booting with "maxcpus=1 pci=nomsi", but that didn't affect > the problem. > > As a temporary workaround, I will use ethtool to turn on rx checksumming > and live with the "hw csum failure" messages, since they are better than > network lockups. > > Let me know if I can be of any further assistance in tracking down this > problem. > > Tony Battersby > Cybernetics What bus and chipset is in use on the systems with sky2? I have seen problems when using PCI-X on AMD systems (documented in AMD errata) due to multiple outstanding transactions. -- Stephen Hemminger <[EMAIL PROTECTED]> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html