On 07-02-21, Jaap Buurman wrote: > Are we sure disabling TSO is the actual fix though? There are a few > reasons I am doubting that assessment: > > 1. Here is a user that is reporting he has always been running with > TSO disabled, yet he does experience the bug: > https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/389?u=mushoz > 2. TSO seems fine with the master branch according to user reports. > 3. The user "mrakotiq" suggested a patch to disable TSO in the bug > report you linked to, but this bug report also disables > NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX. The reason that was > given was that he was seeing packets getting tagged that shouldn't > have (at least that's what I am understanding from his post on the bug > report). So there's obviously also something wrong with this > functionality, and it might not surprise me if this change is the > thing that seems to fix this issue.
I took time to reproduce this using data from mrakotiq. I could reproduce the crash, it takes a few hours. Also, disabling NETIF_F_TSO and NETIF_F_TSO6 indeed makes the crash disappear, at least with this specific test. With the patch, performance is a bit lower when forwarding small TCP packets, but it's still fast enough to reach around 1 Gbit/s with full-size packets (80 kpps). Overall, I wouldn't be surprised if the issue is still lurking, but the patch does seem to improve stability. I'm running a few more tests and I will add it for the next 19.07 release. The master branch is fine, but it's using a different driver. It's an upstream driver, so it probably received more scrutiny than the one used in 19.07. It should be more reliable to run these devices with 21.02 when it's out. > Having said that, this bug is age-old and is affecting a lot of users, > me included. So I'd really like to get fixed. If there are no > regression with this approach, the best way forward might be to simply > adapt the patch he suggested as a workaround until we're on 21.xx with > the DSA driver. Especially since this user is reporting no more issues > with 75 (!) mt7621 routers in his production network, which is a > rather large sample size. Thoughts? > > Jaap
signature.asc
Description: PGP signature
_______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel