Hi Martin/John, Can you give us an idea where to look. If I use an external modem in bridge mode and run PPPoE on the router then it works fine.
However with PPPoA, TCP and UDP misbehaves, UDP packets over 1300 seems to be dropped silently and the first tcp packet is dropped with no logs in IPTABLES. Leonid has done some hard work to debug this. Can you please help? Thanks Daniel -----Original Message----- From: openwrt-devel [mailto:openwrt-devel-boun...@lists.openwrt.org] On Behalf Of Leonid Evdokimov Sent: 07 August 2016 10:33 To: Martin Blumenstingl <martin.blumensti...@googlemail.com> Cc: ha...@hauke-m.de; lede-...@lists.infradead.org; openwrt-devel@lists.openwrt.org Subject: Re: [OpenWrt-Devel] latency on PPPoA ADSL Annex A on using Lantiq On Wed, Aug 3, 2016 at 10:02 AM, Martin Blumenstingl <martin.blumensti...@googlemail.com> wrote: > I did not have time to look into this yet, but maybe Hauke or John > have an idea what's going on. I have a testcase, maybe it'll be useful to debug the issue. brie.darkk.net.ru (139.162.172.21) is server hosted in London. 2.30.18.250 is ADSL router in UK. Non-empty TCP packets are delayed till TCP RTO timer fires starting with the third datagram. I get following RTT estimates using tcp-echo, different `--delay` between packets does not change the result: brie $ ./mturtt --host 2.30.18.250 --port 4242 --count 10 --tcp --size 1400 --delay 1 sz seq rtt there back 1400 1 0.036 0.006 0.030 1400 2 0.036 0.006 0.030 1400 3 0.261 0.006 0.255 # woops! plus TCP RTO 1400 4 0.263 0.007 0.256 1400 5 0.264 0.006 0.258 1400 6 0.265 0.007 0.258 1400 7 0.266 0.006 0.260 1400 8 0.267 0.006 0.260 1400 9 0.266 0.006 0.260 1400 10 0.265 0.006 0.259 Running busy-loop at the router with `ss -4t 'sport = 4242' -ieom` reports following timer inforation: ESTAB 0 0 2.30.18.250:4242 139.162.172.21:53328 ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 ESTAB 0 0 2.30.18.250:4242 139.162.172.21:53328 ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 ESTAB 0 0 2.30.18.250:4242 139.162.172.21:53328 ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0· ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,230ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,220ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,200ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,190ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,180ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,160ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,150ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,140ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,130ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,110ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,100ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,090ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,080ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,060ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,050ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,040ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,020ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(persist,010ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:16 cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(on,230ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(on,220ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 ESTAB 0 1400 2.30.18.250:4242 139.162.172.21:53328 timer:(on,210ms,0) ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 ESTAB 0 0 2.30.18.250:4242 139.162.172.21:53328 ino:42363 sk:80f19ae0 rto:0.23 ato:0.04 qack:15 bidir cwnd:10 reordering:0 Same behavior is observed with small payload (50 bytes). Looking at tcpdump[1] recorded at the router: 00:00:00.000000 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [S], seq 2422862909, win 29200, options [mss 1460,sackOK,TS val 1141259943 ecr 0,nop,wscale 7], length 0 00:00:00.000314 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [S.], seq 4132184401, ack 2422862910, win 28800, options [mss 1452,sackOK,TS val 3539818 ecr 1141259943,nop,wscale 3], length 0 00:00:00.022541 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [.], ack 1, win 229, options [nop,nop,TS val 1141259950 ecr 3539818], length 0 00:00:00.022999 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [P.], seq 1:51, ack 1, win 229, options [nop,nop,TS val 1141259950 ecr 3539818], length 50 00:00:00.023192 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [.], ack 51, win 3600, options [nop,nop,TS val 3539824 ecr 1141259950], length 0 00:00:00.029527 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [P.], seq 1:51, ack 51, win 3600, options [nop,nop,TS val 3539826 ecr 1141259950], length 50 #### ~6ms to echo 00:00:00.052116 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [.], ack 51, win 229, options [nop,nop,TS val 1141259959 ecr 3539826], length 0 00:00:00.553865 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [P.], seq 51:101, ack 51, win 229, options [nop,nop,TS val 1141260110 ecr 3539826], length 50 00:00:00.556725 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [P.], seq 51:101, ack 101, win 3600, options [nop,nop,TS val 3539957 ecr 1141260110], length 50 #### ~3ms 00:00:00.579130 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [.], ack 101, win 229, options [nop,nop,TS val 1141260117 ecr 3539957], length 0 00:00:01.080645 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [P.], seq 101:151, ack 101, win 229, options [nop,nop,TS val 1141260267 ecr 3539957], length 50 00:00:01.120859 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [.], ack 151, win 3600, options [nop,nop,TS val 3540099 ecr 1141260267], length 0 00:00:01.304887 IP 2.30.18.250.4242 > brie.darkk.net.ru.52566: Flags [P.], seq 101:151, ack 151, win 3600, options [nop,nop,TS val 3540145 ecr 1141260267], length 50 #### ~224ms! 00:00:01.327310 IP brie.darkk.net.ru.52566 > 2.30.18.250.4242: Flags [.], ack 151, win 229, options [nop,nop,TS val 1141260341 ecr 3540145], length 0 ... `strace` shows no evidence[2] that it's userspace delay, the echo server gets the data promptly and responds within couple of milliseconds. TCP_NODELAY is turned on, TCP_CORK guard is added to tcp server code to get clear timestamps in strace. [1] https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf#file-trace-2-pcap-txt [2] https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf#file-trace-2-strace tc qdisc is default one and does not report drops, moreover, changing qdisc to plain fifo does not help: qdisc fq_codel 0: dev pppoa-wan root refcnt 2 limit 1024p flows 1024 quantum 1496 target 5.0ms interval 100.0ms ecn Sent 10859227 bytes 18757 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0 new_flows_len 0 old_flows_len 0 I've also run testcase using the link with different RTT (thus different RTO), got same results — the third and following packets are delayed by the router for one RTO so it does not look like hardcoded Nagle's delay. It looks like TCP retransmission, but the first packet is never logged. It's not captured by `tcpdump` or corresponding `iptables -j LOG` rule. The kernel has no /proc/net/snmp, so I can't tell if the RetransSegs counter is bumped. You can get the tools I use to reproduce the issue at https://gist.github.com/darkk/6b7f3127c1c343a1ccb8e0b59a44f5bf UDP and ICMP are not affected. I've also seen some issues in IP fragmentation/reassembly of over-MTU UDP datagrams, but they were not reproducible with another ISP. Versions: MOTD: DESIGNATED DRIVER (Bleeding Edge, r49162) uname: Linux OpenWrt 4.4.6 #1 Fri May 13 12:56:05 UTC 2016 mips GNU/Linux -- WBRBW, Leonid Evdokimov xmpp:l...@darkk.net.ru && http://darkk.net.ru && tel:+79816800702 _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel