On Tue Jun 30, 2020 at 8:27 AM CEST, Andy Duan wrote: > From: Tobias Waldekranz <tob...@waldekranz.com> Sent: Tuesday, June 30, > 2020 12:29 AM > > On Sun Jun 28, 2020 at 8:23 AM CEST, Andy Duan wrote: > > > I never seem bandwidth test cause netdev watchdog trip. > > > Can you describe the reproduce steps on the commit, then we can > > > reproduce it on my local. Thanks. > > > > My setup uses a i.MX8M Nano EVK connected to an ethernet switch, but can > > get the same results with a direct connection to a PC. > > > > On the iMX, configure two VLANs on top of the FEC and enable IPv4 > > forwarding. > > > > On the PC, configure two VLANs and put them in different namespaces. From > > one namespace, use trafgen to generate a flow that the iMX will route from > > the first VLAN to the second and then back towards the second namespace on > > the PC. > > > > Something like: > > > > { > > eth(sa=PC_MAC, da=IMX_MAC), > > ipv4(saddr=10.0.2.2, daddr=10.0.3.2, ttl=2) > > udp(sp=1, dp=2), > > "Hello world" > > } > > > > Wait a couple of seconds and then you'll see the output from fec_dump. > > > > In the same setup I also see a weird issue when running a TCP flow using > > iperf3. Most of the time (~70%) when i start the iperf3 client I'll see > > ~450Mbps of throughput. In the other case (~30%) I'll see ~790Mbps. The > > system is "stably bi-modal", i.e. whichever rate is reached in the > > beginning is > > then sustained for as long as the session is kept alive. > > > > I've inserted some tracepoints in the driver to try to understand what's > > going > > on: > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsvgsha > > re.com%2Fi%2FMVp.svg&data=02%7C01%7Cfugang.duan%40nxp.com% > > 7C12854e21ea124b4cc2e008d81c59d618%7C686ea1d3bc2b4c6fa92cd99c5c > > 301635%7C0%7C0%7C637290519453656013&sdata=by4ShOkmTaRkFfE > > 0xJkrTptC%2B2egFf9iM4E5hx4jiSU%3D&reserved=0 > > > > What I can't figure out is why the Tx buffers seem to be collected at a much > > slower rate in the slow case (top in the picture). If we fall behind in one > > NAPI > > poll, we should catch up at the next call (which we can see in the fast > > case). > > But in the slow case we keep falling further and further behind until we > > freeze > > the queue. Is this something you've ever observed? Any ideas? > > Before, our cases don't reproduce the issue, cpu resource has better > bandwidth > than ethernet uDMA then there have chance to complete current NAPI. The > next, > work_tx get the update, never catch the issue.
It appears it has nothing to do with routing back out through the same interface. I get the same bi-modal behavior if just run the iperf3 server on the iMX and then have it be the transmitting part, i.e. on the PC I run: iperf3 -c $IMX_IP -R I would be very interesting to see what numbers you see in this scenario.