On 10/26/21 8:45 PM, Bob McMahon wrote: > This is linux. The code flow is burst writes until the burst size, take a > timestamp, call select(), take second timestamp and insert time delta into > histogram, await clock_nanosleep() to schedule the next burst. (actually, the > deltas, inserts into the histogram and user i/o are done in another thread, > i.e. iperf 2's reporter thread.) > > I still must be missing something. Does anything else need to be set to > reduce the skb size? Everything seems to be indicating 4K writes even when > gso_max_size is 2000 (I assume these are units of bytes?) There are ten > writes, ten reads and ten RTTs for the bursts. I don't see partial writes > at the app level. > > [root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000
You could check with tcpdump on eth1, that outgoing packets are no longer 'TSO/GSO', but single MSS ones. (Note: this device gso_max_size is only taken into account for flows established after the change) > > [root@localhost iperf2-code]# ip -d link sh dev eth1 > 9: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state > UNKNOWN mode DEFAULT group default qlen 1000 > link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 > minmtu 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 > gso_max_size 2000 gso_max_segs 65535 > [root@localhost iperf2-code]# uname -r > 5.0.9-301.fc30.x86_64 > > > It looks like RTT is being driven by WiFi TXOPs as doubling the write size > increases the aggregation by two but has no significant effect on the RTTs. > > 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2 > > > 8k writes: tot_mpdus 317 tot_ampdus 107 mpduperampdu 3 > > > [root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 > -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > WARN: option of --burst-size without --burst-period defaults --burst-period > to 1 second > ------------------------------------------------------------ > Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 > flows) > Write buffer size: 4096 Byte > Bursting: 40.0 KByte every 1.00 seconds > TCP window size: 85.0 KByte (default) > Event based writes (pending queue watermark at 4 bytes) > Enabled select histograms bin-width=0.100 ms, bins=10000 > ------------------------------------------------------------ > [ 1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 > (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 > 20:25:29 (PDT) > [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/10091 us 4 > [ 1] 0.00-1.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 > (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339) > [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4990 us 8 > [ 1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 > (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330) > [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4904 us 8 > [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 > (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353) > [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4964 us 8 > [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338) > [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4970 us 8 > [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335) > [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4986 us 8 > [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343) > [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5059 us 8 > [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 > (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343) > [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5407 us 8 > [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 > (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343) > [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5188 us 8 > [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284) > [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5306 us 8 > [ 1] 9.00-10.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 ms/1635305138.146316) > [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/5939 us 7 > [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1 > (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 > ms/1635305135.148343) > > [root@localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1 > ------------------------------------------------------------ > Server listening on TCP port 5001 with pid 6287 > Binding to local address 192.168.1.1 and iface eth1 > Read buffer size: 128 KByte (Dist bin width=16.0 KByte) > TCP window size: 128 KByte (default) > ------------------------------------------------------------ > [ 1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port 45680 > (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer 2.1.4-master) > on 2021-10-26 20:25:29 (PDT) > [ ID] Burst (start-end) Transfer Bandwidth XferTime (DC%) > Reads=Dist NetPwr > [ 1] 0.0001-0.0500 sec 40.1 KBytes 6.59 Mbits/sec 49.848 ms (5%) > 12=12:0:0:0:0:0:0:0 0 > [ 1] 1.0002-1.0461 sec 40.0 KBytes 7.14 Mbits/sec 45.913 ms (4.6%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 2.0002-2.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.876 ms (4.9%) > 11=11:0:0:0:0:0:0:0 0 > [ 1] 3.0002-3.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.886 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 4.0002-4.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.887 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 5.0002-5.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.881 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 6.0002-6.0511 sec 40.0 KBytes 6.44 Mbits/sec 50.895 ms (5.1%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 7.0002-7.0501 sec 40.0 KBytes 6.57 Mbits/sec 49.889 ms (5%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 8.0002-8.0481 sec 40.0 KBytes 6.84 Mbits/sec 47.901 ms (4.8%) > 11=11:0:0:0:0:0:0:0 0 > [ 1] 9.0002-9.0491 sec 40.0 KBytes 6.70 Mbits/sec 48.872 ms (4.9%) > 10=10:0:0:0:0:0:0:0 0 > [ 1] 0.0000-10.0031 sec 400 KBytes 328 Kbits/sec > 104=104:0:0:0:0:0:0:0 > > Bob > > On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet <eric.duma...@gmail.com > <mailto:eric.duma...@gmail.com>> wrote: > > > > On 10/26/21 4:38 PM, Christoph Paasch wrote: > > Hi Bob, > > > >> On Oct 26, 2021, at 4:23 PM, Bob McMahon <bob.mcma...@broadcom.com > <mailto:bob.mcma...@broadcom.com> <mailto:bob.mcma...@broadcom.com > <mailto:bob.mcma...@broadcom.com>>> wrote: > >> I'm confused. I don't see any blocking nor partial writes per the > write() at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is > 40K, the write size is 4K and the watermark is 4 bytes. There are ten writes > per burst. > > > > You are on Linux here, right? > > > > AFAICS, Linux will still accept whatever fits in an skb. And that is > likely more than 4K (with GSO on by default). > > This (max payload per skb) can be tuned at the driver level, at least for > experimental purposes or dedicated devices. > > ip link set dev eth0 gso_max_size 8000 > > To fetch current values : > > ip -d link sh dev eth0 > > > > > > However, do you go back to select() after each write() or do you loop > over the write() calls? > > > > > > Christoph > > > >> The S8 histograms are the times waiting on the select(). The first > value is the bin number (multiplied by 100usec bin width) and second the bin > count. The worst case time is at the end and is timestamped per unix epoch. > >> > >> The second run is over a controlled WiFi link where a 99.7% point of > 4-8ms for a WiFi TX op arbitration win is in the ballpark. The first is 1G > wired and is in the 600 usec range. (No media arbitration there.) > >> > >> [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i > 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port > 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 2021-10-26 > 16:07:33 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/5368 us 8 > >> [ 1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 > (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/569 us 72 > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 ms/1635289654.928088) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/312 us 131 > >> [ 1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 ms/1635289655.927776) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/302 us 136 > >> [ 1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 ms/1635289656.927814) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/316 us 130 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 ms/1635289657.927810) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/253 us 162 > >> [ 1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 > (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 ms/1635289658.927630) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/290 us 141 > >> [ 1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 ms/1635289659.927771) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/359 us 114 > >> [ 1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 > (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 ms/1635289660.927753) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/349 us 117 > >> [ 1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 > (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 ms/1635289661.927843) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/347 us 118 > >> [ 1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 > (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 ms/1635289662.927861) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/1519 us 27 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 > (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 ms/1635289653.928360) > >> > >> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i > 1 -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms > >> WARN: option of --burst-size without --burst-period defaults > --burst-period to 1 second > >> ------------------------------------------------------------ > >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows) > >> Write buffer size: 4096 Byte > >> Bursting: 40.0 KByte every 1.00 seconds > >> TCP window size: 85.0 KByte (default) > >> Event based writes (pending queue watermark at 4 bytes) > >> Enabled select histograms bin-width=0.100 ms, bins=10000 > >> ------------------------------------------------------------ > >> [ 1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 > port 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on > 2021-10-26 16:07:56 (PDT) > >> [ ID] Interval Transfer Bandwidth Write/Err Rtry > Cwnd/RTT NetPwr > >> [ 1] 0.00-1.00 sec 40.1 KBytes 329 Kbits/sec 11/0 0 > 14K/10339 us 4 > >> [ 1] 0.00-1.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 ms/1635289676.802143) > >> [ 1] 1.00-2.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4853 us 8 > >> [ 1] 1.00-2.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 > (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 ms/1635289677.802274) > >> [ 1] 2.00-3.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4991 us 8 > >> [ 1] 2.00-3.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 ms/1635289678.794326) > >> [ 1] 3.00-4.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/4610 us 9 > >> [ 1] 3.00-4.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 ms/1635289679.794335) > >> [ 1] 4.00-5.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5028 us 8 > >> [ 1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 ms/1635289680.794399) > >> [ 1] 5.00-6.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5113 us 8 > >> [ 1] 5.00-6.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 > (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 ms/1635289681.794392) > >> [ 1] 6.00-7.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5054 us 8 > >> [ 1] 6.00-7.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 ms/1635289682.794335) > >> [ 1] 7.00-8.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5138 us 8 > >> [ 1] 7.00-8.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 > (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 ms/1635289683.794338) > >> [ 1] 8.00-9.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > >> [ 1] 8.00-9.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 > (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 ms/1635289684.794262) > >> [ 1] 9.00-10.00 sec 40.0 KBytes 328 Kbits/sec 10/0 0 > 14K/5329 us 8 > >> [ 1] 9.00-10.00 sec S8-PDF: > bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 > (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 ms/1635289685.796315) > >> [ 1] 0.00-10.01 sec 400 KBytes 327 Kbits/sec 102/0 0 > 14K/6331 us 6 > >> [ 1] 0.00-10.01 sec S8(f)-PDF: > bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1 > (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 > ms/1635289685.796315) > >> > >> Bob > >> > >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch <cpaa...@apple.com > <mailto:cpaa...@apple.com> <mailto:cpaa...@apple.com > <mailto:cpaa...@apple.com>>> wrote: > >> > >> Hello, > >> > >> > On Oct 25, 2021, at 9:24 PM, Eric Dumazet > <eric.duma...@gmail.com <mailto:eric.duma...@gmail.com> > <mailto:eric.duma...@gmail.com <mailto:eric.duma...@gmail.com>>> wrote: > >> > > >> > > >> > > >> > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote: > >> >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast > <make-wifi-f...@lists.bufferbloat.net > <mailto:make-wifi-f...@lists.bufferbloat.net> > <mailto:make-wifi-f...@lists.bufferbloat.net > <mailto:make-wifi-f...@lists.bufferbloat.net>>> wrote: > >> >> > >> >>> Hi All, > >> >>> > >> >>> Sorry for the spam. I'm trying to support a meaningful TCP > message latency w/iperf 2 from the sender side w/o requiring e2e clock > synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to help > with this. It seems that this event goes off when the bytes are in flight vs > have reached the destination network stack. If that's the case, then iperf 2 > client (sender) may be able to produce the message latency by adding the > drain time (write start to TCP_NOTSENT_LOWAT) and the sampled RTT. > >> >>> > >> >>> Does this seem reasonable? > >> >> > >> >> I’m not 100% sure what you’re asking, but I will try to help. > >> >> > >> >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t > report your endpoint as writable (e.g., via kqueue or epoll) until less than > that threshold of data remains unsent. It won’t stop you writing more bytes > if you want to, up to the socket send buffer size, but it won’t *ask* you for > more data until the TCP_NOTSENT_LOWAT threshold is reached. > >> > > >> > > >> > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made > sure that sendmsg() would actually > >> > stop feeding more bytes in TCP transmit queue if the current > amount of unsent bytes > >> > was above the threshold. > >> > > >> > So it looks like Apple implementation is different, based on > your description ? > >> > >> Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/... > >> > >> An app can still fill the send-buffer if it does a sendmsg() with > a large buffer or does repeated calls to sendmsg(). > >> > >> Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to > quickly change the data it "scheduled" to send. And thus allow the app to > write the smallest "logical unit" it has. If that unit is 512KB large, the > app is allowed to send that. > >> For example, in case of video-streaming one may want to skip ahead > in the video. In that case the app still needs to transmit the remaining > parts of the previous frame anyways, before it can send the new video frame. > >> That's the reason why the Apple implementation allows one to write > more than just the lowat threshold. > >> > >> > >> That being said, I do think that Linux's way allows for an easier > API because the app does not need to be careful at how much data it sends > after an epoll/kqueue wakeup. So, the latency-benefits will be easier to get. > >> > >> > >> Christoph > >> > >> > >> > >> > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > <https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36> > > <https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36 > > <https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36>> > >> > > >> > netperf does not use epoll(), but rather a loop over sendmsg(). > >> > > >> > One of the point of TCP_NOTSENT_LOWAT for Google was to be able > to considerably increase > >> > max number of bytes in transmit queues (3rd column of > /proc/sys/net/ipv4/tcp_wmem) > >> > by 10x, allowing for autotune to increase BDP for big RTT flows, > this without > >> > increasing memory needs for flows with small RTT. > >> > > >> > In other words, the TCP implementation attempts to keep BDP > bytes in flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP > of bytes in flight is necessary to fill the network pipe and get good > throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is > provided to give the source software some advance notice that the TCP > implementation will soon be looking for more bytes to send, so that the > buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF > option conflates both “bytes in flight” and “bytes buffered and ready to go” > into the same number.) > >> >> > >> >> If you wait for the TCP_NOTSENT_LOWAT notification, write a > chunk of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT > notification, that will tell you roughly how long it took n bytes to depart > the machine. You won’t know why, though. The bytes could depart the machine > in response for acks indicating that the same number of bytes have been > accepted at the receiver. But the bytes can also depart the machine because > CWND is growing. Of course, both of those things are usually happening at the > same time. > >> >> > >> >> How to use TCP_NOTSENT_LOWAT is explained in this video: > >> >> > >> >> > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199 > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199> > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199 > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199>>> > >> >> > >> >> Later in the same video is a two-minute demo (time offset 42:00 > to time offset 44:00) showing a “before and after” demo illustrating the > dramatic difference this makes for screen sharing responsiveness. > >> >> > >> >> > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520 > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520> > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520 > <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520>>> > >> >> > >> >> Stuart Cheshire > >> >> _______________________________________________ > >> >> Bloat mailing list > >> >> bl...@lists.bufferbloat.net > <mailto:bl...@lists.bufferbloat.net> <mailto:bl...@lists.bufferbloat.net > <mailto:bl...@lists.bufferbloat.net>> > >> >> https://lists.bufferbloat.net/listinfo/bloat > <https://lists.bufferbloat.net/listinfo/bloat> > <https://lists.bufferbloat.net/listinfo/bloat > <https://lists.bufferbloat.net/listinfo/bloat>> > >> >> > >> > _______________________________________________ > >> > Bloat mailing list > >> > bl...@lists.bufferbloat.net <mailto:bl...@lists.bufferbloat.net> > <mailto:bl...@lists.bufferbloat.net <mailto:bl...@lists.bufferbloat.net>> > >> > https://lists.bufferbloat.net/listinfo/bloat > <https://lists.bufferbloat.net/listinfo/bloat> > <https://lists.bufferbloat.net/listinfo/bloat > <https://lists.bufferbloat.net/listinfo/bloat>> > >> > >> > >> This electronic communication and the information and any files > transmitted with it, or attached to it, are confidential and are intended > solely for the use of the individual or entity to whom it is addressed and > may contain information that is confidential, legally privileged, protected > by privacy laws, or otherwise restricted from disclosure to anyone else. If > you are not the intended recipient or the person responsible for delivering > the e-mail to the intended recipient, you are hereby notified that any use, > copying, distributing, dissemination, forwarding, printing, or copying of > this e-mail is strictly prohibited. If you received this e-mail in error, > please return the e-mail to the sender, delete it from your computer, and > destroy any printed copy of it. > > > This electronic communication and the information and any files transmitted > with it, or attached to it, are confidential and are intended solely for the > use of the individual or entity to whom it is addressed and may contain > information that is confidential, legally privileged, protected by privacy > laws, or otherwise restricted from disclosure to anyone else. If you are not > the intended recipient or the person responsible for delivering the e-mail to > the intended recipient, you are hereby notified that any use, copying, > distributing, dissemination, forwarding, printing, or copying of this e-mail > is strictly prohibited. If you received this e-mail in error, please return > the e-mail to the sender, delete it from your computer, and destroy any > printed copy of it. _______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel