On Fri, 22 Sep 2006 13:24:43 +0200 Martin Lucina <[EMAIL PROTECTED]> wrote:
> Hello, > > I'm having problems with my sky2 NIC hanging under heavy load. This > appears to be an old problem since it happened for me with 2.6.17 as > well. Upgrading the affected systems to 2.6.18 has not solved the > problem. It's easily reproducible for me since I'm running some > application stress testing that easily saturates the link. > > I've had a look at the recent traffic on linux-kernel, netdev and the > relevant bugzilla (http://bugzilla.kernel.org/show_bug.cgi?id=6839) but > it's not clear to me which patch I should try against a stock 2.6.18 > kernel. If someone could confirm that the "TX pause fix" attached to > the bugzilla is sufficient, that would be great. You can turn off TX pause and get the same effect. > The card in question is a: > > Sep 22 12:17:27 dezo kernel: sky2 v1.5 addr 0xf3000000 irq 169 Yukon-XL > (0xb3) rev 1 > > it's a SysKonnect SK-9E21 PCI-E Server Adapter and the driver is using > PCI-MSI interrupts on my system. > > The chip on the card is a Marvell 88E8061. > > The actual errors leading up to the latest hang are: > > Sep 21 21:47:06 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out > Sep 21 21:47:06 dezo kernel: sky2 eth1: tx timeout > Sep 21 21:47:06 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 > done=220 > Sep 21 21:47:06 dezo kernel: sky2 hardware hung? flushing > Sep 21 21:59:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out > Sep 21 21:59:41 dezo kernel: sky2 eth1: tx timeout > Sep 21 21:59:41 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 > done=220 > Sep 21 21:59:41 dezo kernel: sky2 status report lost? > Sep 21 22:00:41 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out > Sep 21 22:00:41 dezo kernel: sky2 eth1: tx timeout > Sep 21 22:00:41 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 > done=220 > Sep 21 22:00:41 dezo kernel: sky2 hardware hung? flushing > Sep 21 22:13:10 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out > Sep 21 22:13:10 dezo kernel: sky2 eth1: tx timeout > Sep 21 22:13:10 dezo kernel: sky2 eth1: transmit ring 179 .. 138 report=220 > done=220 > Sep 21 22:13:10 dezo kernel: sky2 status report lost? > Sep 21 22:14:20 dezo kernel: NETDEV WATCHDOG: eth1: transmit timed out > Sep 21 22:14:20 dezo kernel: sky2 eth1: tx timeout > Sep 21 22:14:20 dezo kernel: sky2 eth1: transmit ring 220 .. 179 report=220 > done=220 > Sep 21 22:14:20 dezo kernel: sky2 hardware hung? flushing > Sep 21 22:15:09 dezo kernel: sky2 eth1: disabling interface > Sep 21 22:15:09 dezo kernel: sky2 eth1: enabling interface > Sep 21 22:15:12 dezo kernel: sky2 eth1: Link is up at 1000 Mbps, full duplex, > flow control > both > Sep 21 22:15:20 dezo kernel: eth1: no IPv6 routers present > > While the interface does appear to have been reset, it never actually > started working again and the system was hung until I rebooted it this > morning. > > I'm also seeing a lot of these under high load: > > Sep 21 21:34:24 dezo kernel: eth1: hw csum failure. > Sep 21 21:34:24 dezo kernel: > Sep 21 21:34:24 dezo kernel: Call Trace: > Sep 21 21:34:24 dezo kernel: [dump_stack+16/21] dump_stack+0x10/0x15 > Sep 21 21:34:24 dezo kernel: [__skb_checksum_complete+85/121] > __skb_checksum_complete+0x5 > 5/0x79 > Sep 21 21:34:24 dezo kernel: [tcp_v4_rcv+218/2405] tcp_v4_rcv+0xda/0x965 > Sep 21 21:34:24 dezo kernel: [ip_local_deliver+433/635] > ip_local_deliver+0x1b1/0x27b > Sep 21 21:34:24 dezo kernel: [ip_rcv+1234/1311] ip_rcv+0x4d2/0x51f > Sep 21 21:34:24 dezo kernel: [netif_receive_skb+589/621] > netif_receive_skb+0x24d/0x26d > Sep 21 21:34:24 dezo kernel: [__nosave_end+128712870/2129981440] > :sky2:sky2_status_intr+0 > x23b/0x404 > Sep 21 21:34:24 dezo kernel: [__nosave_end+128714646/2129981440] > :sky2:sky2_poll+0x100/0x > 1a1 > Sep 21 21:34:24 dezo kernel: [net_rx_action+132/268] net_rx_action+0x84/0x10c > Sep 21 21:34:24 dezo kernel: [__do_softirq+107/226] __do_softirq+0x6b/0xe2 > Sep 21 21:34:24 dezo kernel: [call_softirq+28/40] call_softirq+0x1c/0x28 > Sep 21 21:34:24 dezo kernel: [do_softirq+45/129] do_softirq+0x2d/0x81 > Sep 21 21:34:24 dezo kernel: [do_IRQ+112/132] do_IRQ+0x70/0x84 > Sep 21 21:34:24 dezo kernel: [ret_from_intr+0/11] ret_from_intr+0x0/0xb > Sep 21 21:34:24 dezo kernel: [mwait_idle+58/82] mwait_idle+0x3a/0x52 > Sep 21 21:34:24 dezo kernel: [cpu_idle+105/140] cpu_idle+0x69/0x8c > Sep 21 21:34:24 dezo kernel: [start_kernel+483/488] start_kernel+0x1e3/0x1e8 > Sep 21 21:34:24 dezo kernel: [x86_64_start_kernel+459/474] > x86_64_start_kernel+0x1cb/0x1d > > Am happy to help with tracking this down... > > Thanks, > > -mato Is this a dual port on single port card? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html