21.09.2011 20:27, Alexey Loukianov wrote: > Would try to play a bit more with swconfig/brctl/vconfig as soon as I > would hit this bug again. Any suggestions about how to hunt this > nasty down? Anybody else having similar problems on DIR-600/601 or > DIR-615 E3/E4 boards?
Ok, here we go. Box had been functioning pretty well for about 23 hours with some modifications to the ag71xx sources (I had forced PHY for CPU port to use 100baseT instead of 1000baseT, instructed ag71xx instance which serves CPU port to do the same and disabled "feature" of ar7240 switch driver that brought CPU port link down in case there were no link detected on other switch ports) and then finally stalled into the flip-flopping state. Log messages and behavior was pretty the same, except for link up messages had been stating "100Mbps/Full duplex" instead of "1000Mbps/Full duplex" (as expected). Messing with the swconfig was pointless again, nothing helped to bring the interface back to the working state no matter I did with swconfig, including full switch reset, defining and deleting vlans, e.t.c. On the other hand it was sufficient to do simple "ifconfig eth0 down && ifconfig eth0 up" to recover the interface back into fully working state. This observation made me believe that the problem is in ag71xx ethernet interface driver and is not directly related to the switch hardware and/or driver. Digging a bit deeper into ag71xx sources I had located all the places where the logical link state changes. For the most part it might be only done by using ag->restart_work shared worker which it turn is only called on tx timeout or in case ar724x DMA had stucked for some reason. Quickly adding relevant printk() at places of interest I had come up with the following log: Jan 1 01:03:09 OpenWrt kern.info kernel: eth0: AR724x DMA seems to be stuck, reseting link Jan 1 01:03:09 OpenWrt kern.info kernel: eth0: link down Jan 1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state Jan 1 01:03:09 OpenWrt kern.info kernel: eth0: link up (1000Mbps/Full duplex) Jan 1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state Jan 1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state Jan 1 01:03:11 OpenWrt kern.info kernel: eth0: AR724x DMA seems to be stuck, reseting link Jan 1 01:03:11 OpenWrt kern.info kernel: eth0: link down Jan 1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state Jan 1 01:03:11 OpenWrt kern.info kernel: eth0: link up (1000Mbps/Full duplex) Jan 1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state Jan 1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering forwarding state This were taken not at the moment the link had fallen into flip-flopping state but I'm pretty sure that constant flip-flopping is just the result of permanent "DMA stuck" that can't be recovered by ag->restart_work shared worker on my AR7240 box. Thus I've got a question to the respected driver authors (Gabor Juhos and Imre Kaloz) about this DMA stucking: would you be so kind and shed some light on what are the conditions this "DMA stuck" is expected to happen and why is it only AR7240 specific? Is it a known hardware bug driver tries to recover from? Any hints about how to debug this issue further? P.S. CCing message to ag71xx driver authors. Sorry for possible message dupes. -- Best regards, Alexey Loukianov mailto:mooro...@mail.ru System Engineer, Mob.:+7(926)218-1320 *nix Specialist
signature.asc
Description: OpenPGP digital signature
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel