21.09.2011 20:27, Alexey Loukianov wrote:
> Would try to play a bit more with swconfig/brctl/vconfig as soon as I
> would hit this bug again. Any suggestions about how to hunt this 
> nasty down? Anybody else having similar problems on DIR-600/601 or 
> DIR-615 E3/E4 boards?

Ok, here we go. Box had been functioning pretty well for about 23 hours 
with some modifications to the ag71xx sources (I had forced PHY for CPU
port to use 100baseT instead of 1000baseT, instructed ag71xx instance 
which serves CPU port to do the same and disabled "feature" of ar7240 
switch driver that brought CPU port link down in case there were no link
detected on other switch ports) and then finally stalled into the 
flip-flopping state. Log messages and behavior was pretty the same, except
for link up messages had been stating "100Mbps/Full duplex" instead of 
"1000Mbps/Full duplex" (as expected).

Messing with the swconfig was pointless again, nothing helped to bring the 
interface back to the working state no matter I did with swconfig, including 
full switch reset, defining and deleting vlans, e.t.c. On the other hand 
it was sufficient to do simple "ifconfig eth0 down && ifconfig eth0 up" 
to recover the interface back into fully working state.

This observation made me believe that the problem is in ag71xx ethernet 
interface driver and is not directly related to the switch hardware and/or 
driver. Digging a bit deeper into ag71xx sources I had located all the 
places where the logical link state changes. For the most part it might be 
only done by using ag->restart_work shared worker which it turn is only 
called on tx timeout or in case ar724x DMA had stucked for some reason. 
Quickly adding relevant printk() at places of interest I had come up with 
the following log:

Jan  1 01:03:09 OpenWrt kern.info kernel: eth0: AR724x DMA seems to be stuck, 
reseting link
Jan  1 01:03:09 OpenWrt kern.info kernel: eth0: link down
Jan  1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state
Jan  1 01:03:09 OpenWrt kern.info kernel: eth0: link up (1000Mbps/Full duplex)
Jan  1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state
Jan  1 01:03:09 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state
Jan  1 01:03:11 OpenWrt kern.info kernel: eth0: AR724x DMA seems to be stuck, 
reseting link
Jan  1 01:03:11 OpenWrt kern.info kernel: eth0: link down
Jan  1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state
Jan  1 01:03:11 OpenWrt kern.info kernel: eth0: link up (1000Mbps/Full duplex)
Jan  1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state
Jan  1 01:03:11 OpenWrt kern.info kernel: br-lan: port 1(eth0) entering 
forwarding state

This were taken not at the moment the link had fallen into flip-flopping state
but I'm pretty sure that constant flip-flopping is just the result of
permanent "DMA stuck" that can't be recovered by ag->restart_work shared worker 
on my AR7240 box. Thus I've got a question to the respected driver authors 
(Gabor Juhos and Imre Kaloz) about this DMA stucking:
would you be so kind and shed some light on what are the conditions
this "DMA stuck" is expected to happen and why is it only AR7240 specific?
Is it a known hardware bug driver tries to recover from? Any hints about how 
to debug this issue further?

P.S. CCing message to ag71xx driver authors. Sorry for possible message dupes.

-- 
Best regards,
Alexey Loukianov                          mailto:mooro...@mail.ru
System Engineer,                            Mob.:+7(926)218-1320
*nix Specialist


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to