IP (rtl8169) forwarding bug (performance)

Risto Pajula Sun, 09 Dec 2018 15:29:05 -0800


Hello.

Old subject: "Re: IP fragmentation performance and don't fragment bugwhen forwarding

I have now been tracing the kernel and finding the bug seems difficult.I think the bug is combination of several things, likely cause is thatit only occurs with rtl8169 and how it is using the netdevapi/DQL api.


From my investigations seems that following happens:

*Burst of frames is received from internet. (eth0)
*These are accepted and forwarded to the (eth1)

*DQL hits the limit, this causes the scheduling for the device to bestopped (__QUEUE_STATE_STACK_XOFF)*This and combination of some timing causes the performancedegradation, messages are only scheduled for transmission in soft Irqcontext.

I still do now know, why the DQL hits the limits.. I think it shouldnot, maybe something undesirable first happens with rtl8169.


I inserted following trace printk functions to the code..

rtl8169_start_xmit:

trace_printk("ORP33 %s rtl8169_start_xmit len: %d entry: %u cur_tx: %ufrags: %d dirty_tx: %u tx_left: %u\n" , tp->dev->name, skb->len, entry,tp->cur_tx, frags, tp->dirty_tx, (tp->cur_tx - tp->dirty_tx) );



rtl_tx:

trace_printk("ORP33 %s rtl_tx len: %d entry: %u cur_tx: %u frags: %ddirty_tx: %u tx_left: %u\n" , tp->dev->name, tx_skb->skb->len,tp->cur_tx % NUM_TX_DESC, tp->cur_tx, 0, dirty_tx, tx_left );

Here is some grepped output only filtering the ping packet in twodifferent situations:

//trace when downloading ubuntu (~20MB/s), only 3 secs fit to buffer.ping is all the time normal....

//orig tracing.out3754

117493: <idle>-0 [000] ..s. 5235.407116: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 57 cur_tx: 3747641 frags: 0 dirty_tx:3747640 tx_left: 1118704: <idle>-0 [000] ..s. 5235.416057: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 20 cur_tx: 3747668 frags: 0 dirty_tx:3747667 tx_left: 1346319: hping3-2871 [000] .... 5236.002456:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 8cur_tx: 3754312 frags: 0 dirty_tx: 3754312 tx_left: 0346337: hping3-2871 [000] ..s. 5236.002475: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 9 cur_tx: 3754313 frags: 0 dirty_tx:3754312 tx_left: 11277155: hping3-2871 [001] .... 5237.002627:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 38cur_tx: 3766502 frags: 0 dirty_tx: 3766502 tx_left: 01277173: <idle>-0 [000] ..s. 5237.002646: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 39 cur_tx: 3766503 frags: 0 dirty_tx:3766502 tx_left: 1

//imcp messages from route to intenal network when TCP stream is on.tracing.out3236/eth1_ring.txt is the original

//no idea why the first frame is tripled...

200792: hping3-2871 [001] .... 3239.593384:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 34cur_tx: 2132770 frags: 0 dirty_tx: 2132770 tx_left: 0201578: hping3-2871 [001] .... 3240.593625:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 40cur_tx: 2132776 frags: 0 dirty_tx: 2132776 tx_left: 0341061: <idle>-0 [000] ..s. 3241.738011:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 52cur_tx: 2133684 frags: 0 dirty_tx: 2133680 tx_left: 4342112: <idle>-0 [000] ..s. 3241.745297: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 59 cur_tx: 2133691 frags: 0 dirty_tx:2133684 tx_left: 7500126: <idle>-0 [000] ..s. 3242.739500:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 49cur_tx: 2134513 frags: 0 dirty_tx: 2134505 tx_left: 8501739: <idle>-0 [000] ..s. 3242.751654: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 58 cur_tx: 2134522 frags: 0 dirty_tx:2134513 tx_left: 9646220: <idle>-0 [000] ..s. 3243.745761:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 41cur_tx: 2135273 frags: 0 dirty_tx: 2135267 tx_left: 6647633: <idle>-0 [000] ..s. 3243.755486: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 49 cur_tx: 2135281 frags: 0 dirty_tx:2135273 tx_left: 8802878: <idle>-0 [000] ..s. 3244.739947:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23cur_tx: 2136087 frags: 0 dirty_tx: 2136081 tx_left: 6804298: <idle>-0 [000] ..s. 3244.749677: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 31 cur_tx: 2136095 frags: 0 dirty_tx:2136087 tx_left: 8961190: <idle>-0 [000] ..s. 3245.746217:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 16cur_tx: 2136912 frags: 0 dirty_tx: 2136906 tx_left: 6962610: <idle>-0 [000] ..s. 3245.755946: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2136920 frags: 0 dirty_tx:2136912 tx_left: 81118044: <idle>-0 [000] ..s. 3246.740336:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 62cur_tx: 2137726 frags: 0 dirty_tx: 2137720 tx_left: 61119443: <idle>-0 [000] ..s. 3246.750047: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 5 cur_tx: 2137733 frags: 0 dirty_tx:2137726 tx_left: 71264047: <idle>-0 [000] ..s. 3247.709202:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 29cur_tx: 2138525 frags: 0 dirty_tx: 2138518 tx_left: 71264740: <idle>-0 [000] ..s. 3247.718918: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 33 cur_tx: 2138529 frags: 0 dirty_tx:2138525 tx_left: 41419958: <idle>-0 [000] ..s. 3248.736436:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 48cur_tx: 2139312 frags: 0 dirty_tx: 2139305 tx_left: 71421551: <idle>-0 [000] ..s. 3248.746189: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 56 cur_tx: 2139320 frags: 0 dirty_tx:2139312 tx_left: 81578746: <idle>-0 [000] ..s. 3249.742702:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 46cur_tx: 2140142 frags: 0 dirty_tx: 2140135 tx_left: 71580194: <idle>-0 [000] ..s. 3249.752457: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 54 cur_tx: 2140150 frags: 0 dirty_tx:2140142 tx_left: 81729597: <idle>-0 [000] ..s. 3250.746839:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 36cur_tx: 2140964 frags: 0 dirty_tx: 2140957 tx_left: 71731015: <idle>-0 [000] ..s. 3250.756594: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 43 cur_tx: 2140971 frags: 0 dirty_tx:2140964 tx_left: 71881028: <idle>-0 [000] ..s. 3251.740991:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 19cur_tx: 2141779 frags: 0 dirty_tx: 2141772 tx_left: 71881381: <idle>-0 [000] ..s. 3251.750718: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 26 cur_tx: 2141786 frags: 0 dirty_tx:2141779 tx_left: 7//something happens, the output queue empties.. now it is fast, andrtl8169_start_xmit is not called from soft_irq1896178: hping3-2871 [001] .... 3252.595994:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 23cur_tx: 2141911 frags: 0 dirty_tx: 2141911 tx_left: 01896196: <idle>-0 [000] ..s. 3252.596055: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 24 cur_tx: 2141912 frags: 0 dirty_tx:2141911 tx_left: 11923136: hping3-2871 [001] .... 3253.596227:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 50cur_tx: 2142066 frags: 0 dirty_tx: 2142066 tx_left: 01923154: <idle>-0 [000] ..s. 3253.596289: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 51 cur_tx: 2142067 frags: 0 dirty_tx:2142066 tx_left: 11951633: hping3-2871 [001] .... 3254.596459:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 5cur_tx: 2142213 frags: 0 dirty_tx: 2142213 tx_left: 01951651: <idle>-0 [000] ..s. 3254.596521: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 6 cur_tx: 2142214 frags: 0 dirty_tx:2142213 tx_left: 11954472: hping3-2871 [001] .... 3255.596700:rtl8169_start_xmit: ORP33 eth1 rtl8169_start_xmit len: 54 entry: 15cur_tx: 2142223 frags: 0 dirty_tx: 2142223 tx_left: 01954490: <idle>-0 [000] ..s. 3255.596763: rtl8169_poll:ORP33 eth1 rtl_tx len: 54 entry: 16 cur_tx: 2142224 frags: 0 dirty_tx:2142223 tx_left: 1

//until it soon, again is slow..


BR.
Risto



On 7.12.2018 16:46, Risto Pajula wrote:

Hello.
I have been to track the poor forwarding latency to the TCP Window scaleoptions. The Netgem device uses rather large windows scale options(x256) and I have been able to reproduce the routers poor forwardinglatency also with linux box running in the internal network and changingthe net.ipv4.tcp_rmem to a large value and thus changing the TCP windowscaling options to larger ones. I still do not have clue why this causesthe forwarfing in the linux kernel to block? Maybe something in theconnection tracking....?
With the ICMP timestamp messages I have been able to also pinpoint thatthe latency is caused in the eth1 sending side (the following hping3example is run in the router toward the internal network...
xxx:/usr/src/linux-4.20-rc2 # hping3 192.168.0.112 --icmp --icmp-ts -V
using eth1, addr: 192.168.0.1, MTU: 1500
HPING 192.168.0.112 (eth1 192.168.0.112): icmp mode set, 28 headers + 0data bytes
len=46 ip=192.168.0.112 ttl=64 id=49464 tos=0 iplen=40
icmp_seq=0 rtt=7.9 ms
ICMP timestamp: Originate=52294891 Receive=52294895 Transmit=52294895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=49795 tos=0 iplen=40
icmp_seq=1 rtt=235.9 ms
ICMP timestamp: Originate=52295891 Receive=52296128 Transmit=52296128
ICMP timestamp RTT tsrtt=235

len=46 ip=192.168.0.112 ttl=64 id=49941 tos=0 iplen=40
icmp_seq=2 rtt=3.8 ms
ICMP timestamp: Originate=52296891 Receive=52296895 Transmit=52296895
ICMP timestamp RTT tsrtt=3

len=46 ip=192.168.0.112 ttl=64 id=50685 tos=0 iplen=40
icmp_seq=3 rtt=47.8 ms
ICMP timestamp: Originate=52297891 Receive=52297940 Transmit=52297940
ICMP timestamp RTT tsrtt=47

len=46 ip=192.168.0.112 ttl=64 id=51266 tos=0 iplen=40
icmp_seq=4 rtt=7.7 ms
ICMP timestamp: Originate=52298891 Receive=52298895 Transmit=52298895
ICMP timestamp RTT tsrtt=7

len=46 ip=192.168.0.112 ttl=64 id=52245 tos=0 iplen=40
icmp_seq=5 rtt=3.7 ms
ICMP timestamp: Originate=52299891 Receive=52299895 Transmit=52299895
ICMP timestamp RTT tsrtt=3

^C
--- 192.168.0.112 hping statistic ---
6 packets tramitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 3.7/51.1/235.9 ms



BR.
Risto

On 2.12.2018 23:32, Risto Pajula wrote:
Hello.
You can most likely ignore the "DF Bit, mtu bug when forwarding" case.There isn't actually big IP packets on the wire, instead there isburst of packets on the wire, which are combined by the GRO... Andthus dropping them should not happen. Sorry about the invalid bug report.
However the poor latency from intenal network to the internet stillremain, both GRO enabled and disabled. I will try to study further...
BR.
Risto


On 2.12.2018 14:01, Risto Pajula wrote:
Hello.
I have encountered a weird performance problem in Linux IPfragmentation when using video streaming services behind the NAT.Also I have studied a possible bug in the DF bit (don't fragment)handling when forwarding the IP packets.
First the system setup description:
[host1]-int lan-(eth1)[linux router](eth0)-extlan-[fibrerouter]-internet
where:
host1: is a Netgem N7800 "cable box" for online video streamingservices provided by local telco (Can access Netflix, HBO nordic,"live TV", etc.)linux router: Linux computer with Dualcore Intel Celeron G1840,running currently Linux kernel 4.20.0-rc2, and openSUSE Leap 15.0eth1: Linux Routers internal (NAT) interface, 192.168.0.1/24 network,mtu set to 1500, RTL8169sb/8110sbeth0: Linux Routers internet facing interface, public ip address, mtuset to 1500, RTL8168evl/8111evlfibre router: Alcatel Lucent fibre router (I-241G-Q), directlyconnected to the eth0 of the Linux router.
And now when using the Netgem N7800 with online video services(Netflix, HBO nordic, etc) the Linux router will receive very BIG IPpackets in the eth0 upto ~20kB, this seems to lead to the followingproblems in the Linux IP stack.
IP fragmentation performance:
When the Linux router receives these large IP packets in the eth0everything works, but it seems that them cause very large performancedegradation from internal network to the internet regarding thelatency when the IP fragmentation is performed. The ping latency frominternal network to the internel network increases from stable15ms-20ms up to 700-800ms AND also the ping from the internal networkto the linux router eth1 (192.168.0.). However up link worksperfectly, the ping is still stable when streaming the onlineservices (From linux router to the internet). It seems that the IPfragmentation is somehow blocking the eth1 reception or transmissionfor very long time (which it shouldn't). I'm able to test and debugthe issue further, but advice regarding where to look would beappreciated.
DF Bit, mtu bug when forwarding:
I have started to study the above mentioned problem and have found apossible bug in the DF bit and mtu handling in IP forwarding. The BIGpackets received from streaming services all have the "DF bit" setand the question is that should we be forwarding them at all as thatwould result them being fragmented? Apparently we currently are... Ihave traced this down to the ip_forward.c function ip_exceeds_mtu(),and the following patch seems to fix that.
--- net/ipv4/ip_forward.c.orig  2018-12-02 11:09:32.764320780 +0200
+++ net/ipv4/ip_forward.c       2018-12-02 12:53:25.031232347 +0200
@@ -49,7 +49,7 @@ static bool ip_exceeds_mtu(const struct
                return false;

        /* original fragment exceeds mtu and DF is set */
-       if (unlikely(IPCB(skb)->frag_max_size > mtu))
+        if (unlikely(skb->len > mtu))
                return true;

        if (skb->ignore_df)
This seems to work (in some ways) - after the change IP packets thatare too large to the internal network get dropped and we are sending"ICMP Destination unreachable, The datagram is too big" messages tothe originator (as we should?). However it seems that not allservices really like this... Netflix behaves as expected and ping isstable from internal network to the internet, but for example HBOnordic will not work anymore (too little buffering? Retransimissionsnot working?). So it seems the original issue should be also fixed(And the fragmention should be allowed?).
Any advice would be appreciated. Thanks!

PS. Watching TV was not this intensive 20 years ago :)

IP (rtl8169) forwarding bug (performance)

Reply via email to