Hello Madalin, I've been experiencing some issues with the DPAA Ethernet driver, specifically related to frame transmission. Hopefully you can point me in the right direction.
TLDR: Attempting to transmit faster than a few frames per second causes the TX FQ CGR to enter into the congested state and remain there forever, even after transmission stops. The hardware is a T2080RDB, running from the tip of net-next, using the standard t2080rdb device tree and corenet64_smp_defconfig kernel config. No changes were made to any of the files. The issue occurs with 4.16.1 stable as well. In fact, the only time I've been able to achieve reliable frame transmission was with the SDK 4.1 kernel. For my tests, I'm running iperf3 both with and without the -R option (send/receive). When using a USB Ethernet adapter, there are no issues. The issue is that it seems like the TX frame queues are getting "stuck" when attempting to transmit at rates greater than a few frames per second. Ping works fine, but it seems like anything that could potentially cause multiple TX frames to be enqueued causes issues. If I run iperf3 in reverse mode (with the T2080RDB receiving), then I can achieve ~940 Mbps, but this is also somewhat unreliable. If I run it with the T2080RDB transmitting, the test will never complete. Sometimes it starts transmitting for a few seconds then stops, and other times it never even starts. This also seems to force the interface into a bad state. The ethtool stats show that the interface has entered congestion a few times, and that it's currently congested. The fact that it's currently congested even after stopping transmission indicates that the FQ somehow stopped being drained. I've also noticed that whenever this issue occurs, the TX confirmation counters are always less than the TX packet counters. When it gets into this state, I can see that the memory usage is climbing, up until about the point of where the CGR threshold is (about 100 MB). Any idea what could prevent the TX FQ from being drained? My first guess was flow control, but it's completely disabled. I tried messing with the egress congestion threshold, workqueue assignments, etc., but nothing seemed to have any effect. If you need any more information or want me to run any tests, please let me know. Thanks, -- Jacob S. Moroni m...@jakemoroni.com