Hi all, On Sun, Aug 20, 2017 at 12:30 AM, John Crispin <j...@phrozen.org> wrote: > correct, in my testing i have been ... with 200 parallel flows ... on > MT7623, we'll have to find out what mt7621 can achieve ... this is all using > hwnat ... > 1) tcp - at 50 byte frames i am able to pass 720 MBit which is > 1M FPS > 2) udp - at 128 byte frames i am able to pass ~450k FPS at ~10% packet loss > .. at near wirespeed
I have spent the last two days looking into this. My testing was based on LEDE master as of yesterday morning and my initial test setup was the following: Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 <-> Client The switch was tested and confirmed working at gigabit speeds. I used iperf for my tests, with a payload of 100B and configured port forwarding of UDP port 1203 from ZBT to client. I then ran the following command on the NUC in a loop: iperf -u -c 10.1.2.63 -t 3600 -d -p 1203 -l 100B -b 1000M I left the test running over night (around 16 hours of pushing data), but no error had been triggered as of this morning. Using bwm-ng, I saw that the NUC was able to push around 40 Mbit/s, which, based on earlier tests I have done where I have used the NUC as traffic generator, seemed a bit low. I don't know if it is relevant, but when capturing traffic (on both NUC and client) I saw pause packets quite frequently. Since this tests did not yield any result, and throughput was low, I looked at some of the setups where I have seen this error. In all setups, there is always something placed in front of the 2926 (a router, switch, ...). I therefore modified my test setup to be as follows: Server (Intel NUC) <-> Gbit Switch <-> ZBT 2926 #1 <-> ZBT 2926 #2 <-> Client I forwarded port 1203 on the new ZBT router and repeated the experiment. Using this setup, the NUC pushed about 260Mbit/s and I am reliably able to trigger the error within ~1000 seconds. The error is always seen on ZBT #1, and sometimes on ZBT #2. If I see the error on #2 it is always at a later time than #1, so it seems that the two routers somehow affect each other. When looking at the RX bandwidth on the client (using bwm-ng), I see that it is very bursty. I receive data at about 32Mbit/s, then no data for a while, then back to around 32 Mbit/s, and so on, until the error is triggered and switch (TX) on the router(s) die. Pause frames are also seen on both server and client in this experiment. After having found a way to reliably trigger the issue, I tested the patch provided by Mingyu. With this patch, the error is triggered much faster, usually after around 300 seconds. Mingyu, do you have any other ideas on what could be wrong or how to fix this? John, would it be possible to get access to your staged commit, so that I can repeat the test using your new code? Thanks for all the help, Kristian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev