Hi. i check the code again. found xmit_more can cause tx timeout. you can reference this. https://www.mail-archive.com/netdev@vger.kernel.org/msg123334.html so the patch should be like this. edit mtk_eth_soc.c
tx_num = fe_cal_txd_req(skb); if (unlikely(fe_empty_txd(ring) <= tx_num)) { + if (skb->xmit_more) + fe_reg_w32(ring->tx_next_idx, FE_REG_TX_CTX_IDX0); netif_stop_queue(dev); netif_err(priv, tx_queued, dev, "Tx Ring full when queue awake!\n"); but i am not sure. maybe the pause frame cause the problem. 2017-08-25 22:25 GMT+08:00 Kristian Evensen <kristian.even...@gmail.com>: > Hi all, > > On Sun, Aug 20, 2017 at 12:30 AM, John Crispin <j...@phrozen.org> wrote: >> correct, in my testing i have been ... with 200 parallel flows ... on >> MT7623, we'll have to find out what mt7621 can achieve ... this is all using >> hwnat ... >> 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS >> 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet loss >> .. at near wirespeed > > I have spent the last two days looking into this. My testing was based > on LEDE master as of yesterday morning and my initial test setup was > the following: > > Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 <-> Client > > The switch was tested and confirmed working at gigabit speeds. I used > iperf for my tests, with a payload of 100B and configured port > forwarding of UDP port 1203 from ZBT to client. I then ran the > following command on the NUC in a loop: > > iperf -u -c 10.1.2.63 -t 3600 -d -p 1203 -l 100B -b 1000M > > I left the test running over night (around 16 hours of pushing data), > but no error had been triggered as of this morning. Using bwm-ng, I > saw that the NUC was able to push around 40 Mbit/s, which, based on > earlier tests I have done where I have used the NUC as traffic > generator, seemed a bit low. I don't know if it is relevant, but when > capturing traffic (on both NUC and client) I saw pause packets quite > frequently. > > Since this tests did not yield any result, and throughput was low, I > looked at some of the setups where I have seen this error. In all > setups, there is always something placed in front of the 2926 (a > router, switch, ...). I therefore modified my test setup to be as > follows: > > Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 #1 <-> ZBT 2926 #2 <-> Client > > I forwarded port 1203 on the new ZBT router and repeated the > experiment. Using this setup, the NUC pushed about 260Mbit/s and I am > reliably able to trigger the error within ~1000 seconds. The error is > always seen on ZBT #1, and sometimes on ZBT #2. If I see the error on > #2 it is always at a later time than #1, so it seems that the two > routers somehow affect each other. When looking at the RX bandwidth on > the client (using bwm-ng), I see that it is very bursty. I receive > data at about 32Mbit/s, then no data for a while, then back to around > 32 Mbit/s, and so on, until the error is triggered and switch (TX) on > the router(s) die. Pause frames are also seen on both server and > client in this experiment. > > After having found a way to reliably trigger the issue, I tested the > patch provided by Mingyu. With this patch, the error is triggered much > faster, usually after around 300 seconds. > > Mingyu, do you have any other ideas on what could be wrong or how to fix this? > > John, would it be possible to get access to your staged commit, so > that I can repeat the test using your new code? > > Thanks for all the help, > Kristian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev