On 2017年05月05日 10:09, Stefan Agner wrote: > On 2017-05-04 19:03, Andy Duan wrote: >> On 2017年05月05日 05:36, Stefan Agner wrote: >>> On 2017-05-03 20:08, Andy Duan wrote: >>>> From: Stefan Agner <ste...@agner.ch> Sent: Thursday, May 04, 2017 9:22 AM >>>>> To: Andy Duan <fugang.d...@nxp.com> >>>>> Cc: fugang.d...@freescale.com; feste...@gmail.com; >>>>> netdev@vger.kernel.org; netdev-ow...@vger.kernel.org >>>>> Subject: Re: FEC on i.MX 7 transmit queue timeout >>>>> >>>>> Hi Andy, >>>>> >>>>> On 2017-04-20 19:48, Andy Duan wrote: >>>>>> On 2017年04月20日 07:15, Stefan Agner wrote: >>>>>>> I tested again with imx6sx-fec compatible string. I could reproduce >>>>>>> it on a Colibri with i.MX 7Dual. But not always: It really depends >>>>>>> whether queue 2 is counting up or not. Just after boot, I check >>>>>>> /proc/interrupts twice, if queue 2 is counting it will happen! >>>>>>> >>>>>>> But if only queue 0 is mostly in use, then it seems to work just fine. >>>>>> If your case is only running best effort like tcp/udp, you can re-set >>>>>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts file. >>>>>> Other two queues are for AVB audio/video queues, they have high >>>>>> priority than queue 0. If running iperf tcp test on the three queues, >>>>>> then the tcp segment may be out-of-order that cause net watchdog >>>>> timeout. >>>>>>> I also tried i.MX 7Dual SabreSD here, and the same thing. I had to >>>>>>> reboot 3 times, then queue 2 was counting: >>>>>>> 57: 8 GIC-0 150 Level 30be0000.ethernet >>>>>>> 58: 20137 GIC-0 151 Level 30be0000.ethernet >>>>>>> 59: 9269 GIC-0 152 Level 30be0000.ethernet >>>>>>> >>>>>>> It took me about 40 minutes on Sabre until it happened, and I had to >>>>>>> force it using iperf, but then I got the ring dumps: >>>>>> My board had ran more than 47 hours with nfs rootfs in 4.11.0-rc6, but >>>>>> not running iperf. >>>>>> I am testing with iperf. >>>>> Any update on this issue? >>>>> >>>>> When using iperf (server) on the board with Linux 4.11 the issue appears >>>>> within a few iperf iterations on a Sabre (TO 1.2, Board Rev C, if that >>>>> matters)... >>>>> >>>> I don’t know whether you received my last mail. (maybe failed due to I >>>> received some rejection mails) >>> I think I did not... The last email I received was Fri, 21 Apr 2017 >>> 02:48:23 UTC. >>> >>> >>>> If your case is only running best effort like tcp/udp, you can re-set >>>> the "fsl,num-tx-queues" and "fsl,num-rx-queues" to 1 in board dts >>>> file. >>> I did test that, and it seems to work fine with those properties set to >>> 1. >> So it can fix your problem after long time test? > Yes, seems to work fine after more than 2 hours. > >>>> Other two queues are for AVB audio/video queues, they have high >>>> priority than queue 0. If running iperf tcp test on the three queues, >>>> then the tcp segment may be out-of-order that cause net watchdog >>>> timeout. >>> Okay. A single event would be understandable, but it seems to enter some >>> kind of loop after that (continuously printing "fec 30be0000.ethernet >>> eth0: TX ring dump ..."). >>> >>> In a quick test I commented out the fec_dump call, with that it seems to >>> print only once and continues working afterwards (although, speed starts >>> to decrease, so something is not good at that point). >> The test base on above change ? One queue still bring watchdog timeout ? > No, sorry for the confusion: This was without the fix above. So use > multiple queues, and disable fec_dump... I was just wondering, because > disabling the multiple queues seems to me somewhat a workaround for > now... :-) > No, it is not workaround. As i said, quque1 and queue2 are for AVB paths have higher priority in transmition. It bring the trouble for your case. I will submit one patch to fix it that best effort go queue0, AVB streaming go quque1 and queue2.
> >>>> In fsl kernel tree, there have one patch that only select the queue0 >>>> for best effort like tcp/udp. Pls test again in your board, if no >>>> problem I will upstream the patch. >>> That sounds like a reasonable fix. >>> >>> IP, no matter whether TCP/UDP, is the most common use case, so IMHO this >>> should "just work" by default. >>> >>> -- >>> Stefan