Carl-Daniel Hailfinger schrieb: > Hi, > > Carl-Daniel Hailfinger schrieb: > >>after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21 >>card (sky2 says it is a "Yukon-EC (0xb6) rev 1"), the card appears >>dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board. >> >>sky2 v0.11 addr 0xc9000000 irq 74 Yukon-EC (0xb6) rev 1 >>sky2 eth3: addr 00:00:5a:70:30:fb >>[...] >>sky2 eth3: enabling interface >>[...] >>sky2 eth3: phy interrupt status 0x1c40 0x7d0c >>sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both >>[...] >>NETDEV WATCHDOG: eth3: transmit timed out >>sky2 eth3: tx timeout >>NETDEV WATCHDOG: eth3: transmit timed out >>sky2 eth3: tx timeout >> >> >>switch:~ # ifconfig eth3 >>eth3 Link encap:Ethernet HWaddr 00:00:5A:70:30:FB >> inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:25980735946 (24777.1 Mb) TX bytes:259787058579 (247752.2 >> Mb) >> Interrupt:74 >> >>switch:~ # cat /proc/interrupts >> CPU0 >> 0: 11213627 IO-APIC-edge timer >> 1: 24783 IO-APIC-edge i8042 >> 8: 0 IO-APIC-edge rtc >> 9: 0 IO-APIC-level acpi >> 15: 401558 IO-APIC-edge ide1 >> 50: 249384881 IO-APIC-level eth0 >> 58: 179123938 IO-APIC-level sky2 >> 66: 3 IO-APIC-level sky2, ohci1394 >> 74: 98956955 IO-APIC-level sky2 >> 82: 19952 IO-APIC-level sky2 >>217: 1865 IO-APIC-level libata, NVidia CK804 >>225: 263052 IO-APIC-level libata, ehci_hcd:usb1 >>NMI: 11098 >>LOC: 11214113 >>ERR: 0 >>MIS: 0 >> >>Not only will the card not transmit anymore, it also doesn't >>receive any packet at all. "ethtool -r eth3" doesn't change >>anything, taking the interface down and up again also doesn't >>help. The interrupt count of interrupt 74 stays constant after >>failing. >> >>modprobe -r sky2; modprobe sky2 >>fixes the problem for me, so maybe resetting the card on TX >>timeouts will help. >> >>The same problem appeared much earlier for another card which >>shared interrupt 58 with an onboard card driven by skge. After >>disabling the skge driver and rebooting, that card has been >>stable so far. >> >>The card is connected to a 100 MBit switch. >> >>These problems didn't appear with sk98lin v8.14.3.3 (that >>driver did survive about 10 TB of traffic before I rebooted). >> >>Register dumps are available on request (too big for this >>list). >> >>I will now try sky2 0.13 and report back. > > > And it hit the other interface after 200 MB transferred... > NETDEV WATCHDOG: bridgeext0: transmit timed out > sky2 bridgeext0: tx timeout > NETDEV WATCHDOG: bridgeext0: transmit timed out > sky2 transmit interrupt missed? recovered > > Although the driver claims to recover, it doesn't recover at all. > What debug level would be advisable? It is now running with > "modprobe sky2 debug=2", but I can't see more than the messages > above. > > I have now added a hard reset routine to the tx timeout > path and hope it won't kill my machine.
Apologies for mangled whitespace, this is just a rough cut'n'paste. --- linux-2.6.15/drivers/net/sky2.c.orig 2006-01-21 16:00:15.000000000 +0100 +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.000000000 +0100 @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2 return 0; } +static int sky2_reset(struct sky2_hw *hw); /* * Interrupt from PHY are handled outside of interrupt context * because accessing phy registers requires spin wait which might @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d if (netif_msg_timer(sky2)) printk(KERN_ERR PFX "%s: tx timeout\n", dev->name); + if (0) { sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP); sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET); @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d sky2_qset(hw, txq); sky2_prefetch_init(hw, txq, sky2->tx_le_map, TX_RING_SIZE - 1); + } else { + printk(KERN_ERR PFX "%s: recovering the HARD way...\n", dev->name); + sky2_down(dev); + sky2_reset(hw); + sky2_up(dev); + } } And everytime the kernel throws this message, I run the following script: #!/bin/bash deadinterface=`dmesg|grep HARD|tail -1|sed "s/.*sky2 //;s/:.*//"` ip l s $deadinterface down ip l s $deadinterface up After that, everything continues to work until the next tx timeout happens, and then the script again saves the day. More results about the circumstances of this bug: It seems that it will only trigger under LOW load. As long as I keep the interface busy, it will have no problems at all. Regards, Carl-Daniel -- http://www.hailfinger.org/ - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html