On 2020-09-21 18:26, Chuanhong Guo wrote: > Hi! > > On Mon, Sep 21, 2020 at 9:24 PM Felix Fietkau <n...@nbd.name> wrote: >> > + >> > + split_addr((u32) reg, &r1, &r2, &page); >> > + >> > ++ local_irq_save(flags); >> > + mutex_lock(&bus->mdio_lock); >> Taking a mutex in an irq-disabled section is basically asking for a >> deadlock. What router is this issue reproduced on? > > Any ar934x/qca953x router using built-in switch, based on > Flyspray bug reports. I can reproduce it on ar9341/tp-link-wr841n-v8 > As there's an ancient commit about the same issue I guess > it appears on ar724x/ar933x as well. > here's the commit message: > commit 5d77f370d695c9a70f25ffb8367db64efadaaedd > Author: Gabor Juhos <juh...@openwrt.org> > Date: Sun May 8 16:32:53 2011 +0000 > > ar71xx: ag71xx: make switch register access atomic > > Reading of the PHY registers occasionally returns with bogus values > under heavy load. This misleads the PHY driver and thus causes false > link/speed change notifications which leads to performance loss. > [and some dmesg after this] > >> What else is running at the time this happens? > > I'm using a minimal wr841n-v8 image. (make menuconfig, > select the device, save and build the firmware) > no extra background tasks are running. > swconfig led trigger is used so the kernel is constantly > polling port status from switch in the background. I have another idea what you could try. Many SoCs enable both MDIO busses, and maybe they have some shared resources/state in the hw.
You could try making a global spinlock in ag71xx_mdio.c and do spin_lock_bh(&mdio_lock) in ag71xx_mdio_mii_read and ag71xx_mdio_mii_write - Felix _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel