--- Begin Message ---

> On Oct 26, 2021, at 8:45 PM, Bob McMahon <bob.mcma...@broadcom.com> wrote:
> 
> This is linux. The code flow is burst writes until the burst size, take a 
> timestamp, call select(), take second timestamp and insert time delta into 
> histogram, await clock_nanosleep() to schedule the next burst. (actually, the 
> deltas, inserts into the histogram and user i/o are done in another thread, 
> i.e. iperf 2's reporter thread.)
> I still must be missing something.  Does anything else need to be set to 
> reduce the skb size? Everything seems to be indicating 4K writes even when 
> gso_max_size is 2000 (I assume these are units of bytes?) There are ten 
> writes, ten reads and ten  RTTs for the bursts.  I don't see partial writes 
> at the app level. 

One thing to keep in mind is that once the congestion-window increased to > 
40KB (your burst-size), all of the writes will not be blocking at all. 
TCP_NOTSENT_LOWAT is really just about the "notsent" part. Once the 
congestion-window is big enough to send 40KB in a burst, it will just all be 
immediately sent out.

> [root@localhost iperf2-code]# ip link set dev eth1 gso_max_size 2000
> [root@localhost iperf2-code]# ip -d link sh dev eth1
> 9: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state 
> UNKNOWN mode DEFAULT group default qlen 1000
>     link/ether 00:90:4c:40:04:59 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 
> 68 maxmtu 1500 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 
> 2000 gso_max_segs 65535
> [root@localhost iperf2-code]# uname -r
> 5.0.9-301.fc30.x86_64
> 
> It looks like RTT is being driven by WiFi TXOPs as doubling the write size 
> increases the aggregation by two but has no significant effect on the RTTs.
> 
> 4K writes: tot_mpdus 328 tot_ampdus 209 mpduperampdu 2
> 
> 8k writes:  tot_mpdus 317 tot_ampdus 107 mpduperampdu 3
> 
> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1%eth1 --trip-times -i 1 
> -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> WARN: option of --burst-size without --burst-period defaults --burst-period 
> to 1 second
> ------------------------------------------------------------
> Client connecting to 192.168.1.1, TCP port 5001 with pid 5145 via eth1 (1 
> flows)
> Write buffer size: 4096 Byte
> Bursting: 40.0 KByte every 1.00 seconds
> TCP window size: 85.0 KByte (default)
> Event based writes (pending queue watermark at 4 bytes)
> Enabled select histograms bin-width=0.100 ms, bins=10000
> ------------------------------------------------------------
> [  1] local 192.168.1.4%eth1 port 45680 connected with 192.168.1.1 port 5001 
> (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.30 ms) on 2021-10-26 
> 20:25:29 (PDT)
> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
> Cwnd/RTT        NetPwr
> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       
> 14K/10091 us  4
> [  1] 0.00-1.00 sec S8-PDF: 
> bin(w=100us):cnt(10)=1:1,36:1,40:1,44:1,46:1,48:1,49:1,50:2,52:1 
> (5.00/95.00/99.7%=1/52/52,Outliers=0,obl/obu=0/0) (5.121 ms/1635305129.152339)

Am I reading this correctly, that your writes take worst-case 5 milli-seconds ?

This looks correct then, because you seem to have an RTT of around 5ms.


It's surprising though that your congestion-window is not increasing.


Christoph


> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4990 us  8
> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,45:1,49:5,50:1 
> (5.00/95.00/99.7%=1/50/50,Outliers=0,obl/obu=0/0) (4.991 ms/1635305130.153330)
> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4904 us  8
> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,29:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.455 ms/1635305131.147353)
> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4964 us  8
> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:4,50:2,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.460 ms/1635305132.146338)
> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4970 us  8
> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.404 ms/1635305133.146335)
> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/4986 us  8
> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,48:1,49:1,50:4,59:1,64:1 
> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.395 ms/1635305134.146343)
> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5059 us  8
> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:2,60:1,85:1 
> (5.00/95.00/99.7%=1/85/85,Outliers=0,obl/obu=0/0) (8.417 ms/1635305135.148343)
> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5407 us  8
> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:4,50:1,59:1,75:1 
> (5.00/95.00/99.7%=1/75/75,Outliers=0,obl/obu=0/0) (7.428 ms/1635305136.147343)
> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5188 us  8
> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,40:1,49:3,50:3,64:1 
> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.388 ms/1635305137.146284)
> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> 14K/5306 us  8
> [  1] 9.00-10.00 sec S8-PDF: 
> bin(w=100us):cnt(10)=1:2,39:1,49:2,50:2,51:1,60:1,65:1 
> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.422 ms/1635305138.146316)
> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       
> 14K/5939 us  7
> [  1] 0.00-10.01 sec S8(f)-PDF: 
> bin(w=100us):cnt(100)=1:19,29:1,36:1,39:3,40:3,44:1,45:1,46:1,48:2,49:33,50:18,51:1,52:1,59:5,60:2,64:2,65:3,75:2,85:1
>  (5.00/95.00/99.7%=1/65/85,Outliers=0,obl/obu=0/0) (8.417 
> ms/1635305135.148343)
> 
> [root@localhost iperf2-code]# src/iperf -s -i 1 -e -B 192.168.1.1%eth1
> ------------------------------------------------------------
> Server listening on TCP port 5001 with pid 6287
> Binding to local address 192.168.1.1 and iface eth1
> Read buffer size:  128 KByte (Dist bin width=16.0 KByte)
> TCP window size:  128 KByte (default)
> ------------------------------------------------------------
> [  1] local 192.168.1.1%eth1 port 5001 connected with 192.168.1.4 port 45680 
> (MSS=1448) (burst-period=1.0000s) (trip-times) (sock=4) (peer 2.1.4-master) 
> on 2021-10-26 20:25:29 (PDT)
> [ ID] Burst (start-end)  Transfer     Bandwidth       XferTime  (DC%)     
> Reads=Dist          NetPwr
> [  1] 0.0001-0.0500 sec  40.1 KBytes  6.59 Mbits/sec  49.848 ms (5%)    
> 12=12:0:0:0:0:0:0:0  0
> [  1] 1.0002-1.0461 sec  40.0 KBytes  7.14 Mbits/sec  45.913 ms (4.6%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 2.0002-2.0491 sec  40.0 KBytes  6.70 Mbits/sec  48.876 ms (4.9%)    
> 11=11:0:0:0:0:0:0:0  0
> [  1] 3.0002-3.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.886 ms (5%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 4.0002-4.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.887 ms (5%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 5.0002-5.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.881 ms (5%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 6.0002-6.0511 sec  40.0 KBytes  6.44 Mbits/sec  50.895 ms (5.1%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 7.0002-7.0501 sec  40.0 KBytes  6.57 Mbits/sec  49.889 ms (5%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 8.0002-8.0481 sec  40.0 KBytes  6.84 Mbits/sec  47.901 ms (4.8%)    
> 11=11:0:0:0:0:0:0:0  0
> [  1] 9.0002-9.0491 sec  40.0 KBytes  6.70 Mbits/sec  48.872 ms (4.9%)    
> 10=10:0:0:0:0:0:0:0  0
> [  1] 0.0000-10.0031 sec   400 KBytes   328 Kbits/sec               
> 104=104:0:0:0:0:0:0:0
> 
> Bob
> 
> On Tue, Oct 26, 2021 at 6:12 PM Eric Dumazet <eric.duma...@gmail.com> wrote:
> 
> 
> On 10/26/21 4:38 PM, Christoph Paasch wrote:
> > Hi Bob,
> > 
> >> On Oct 26, 2021, at 4:23 PM, Bob McMahon <bob.mcma...@broadcom.com 
> >> <mailto:bob.mcma...@broadcom.com>> wrote:
> >> I'm confused. I don't see any blocking nor partial writes per the write() 
> >> at the app level with TCP_NOTSENT_LOWAT set at 4 bytes. The burst is 40K, 
> >> the write size is 4K and the watermark is 4 bytes. There are ten writes 
> >> per burst.
> > 
> > You are on Linux here, right?
> > 
> > AFAICS, Linux will still accept whatever fits in an skb. And that is likely 
> > more than 4K (with GSO on by default).
> 
> This (max payload per skb) can be tuned at the driver level, at least for 
> experimental purposes or dedicated devices.
> 
> ip link set dev eth0 gso_max_size 8000
> 
> To fetch current values :
> 
> ip -d link sh dev eth0
> 
> 
> > 
> > However, do you go back to select() after each write() or do you loop over 
> > the write() calls?
> > 
> > 
> > Christoph
> > 
> >> The S8 histograms are the times waiting on the select().  The first value 
> >> is the bin number (multiplied by 100usec bin width) and second the bin 
> >> count. The worst case time is at the end and is timestamped per unix epoch.
> >>
> >> The second run is over a controlled WiFi link where a 99.7% point of 4-8ms 
> >> for a WiFi TX op arbitration win is in the ballpark. The first is 1G wired 
> >> and is in the 600 usec range. (No media arbitration there.)
> >>
> >>  [root@localhost iperf2-code]# src/iperf -c 10.19.87.9 --trip-times -i 1 
> >> -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> >> WARN: option of --burst-size without --burst-period defaults 
> >> --burst-period to 1 second
> >> ------------------------------------------------------------
> >> Client connecting to 10.19.87.9, TCP port 5001 with pid 2124 (1 flows)
> >> Write buffer size: 4096 Byte
> >> Bursting: 40.0 KByte every 1.00 seconds
> >> TCP window size: 85.0 KByte (default)
> >> Event based writes (pending queue watermark at 4 bytes)
> >> Enabled select histograms bin-width=0.100 ms, bins=10000
> >> ------------------------------------------------------------
> >> [  1] local 10.19.87.10%eth0 port 33166 connected with 10.19.87.9 port 
> >> 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=0.54 ms) on 
> >> 2021-10-26 16:07:33 (PDT)
> >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
> >> Cwnd/RTT        NetPwr
> >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       
> >> 14K/5368 us  8
> >> [  1] 0.00-1.00 sec S8-PDF: bin(w=100us):cnt(10)=1:1,2:5,3:2,4:1,11:1 
> >> (5.00/95.00/99.7%=1/11/11,Outliers=0,obl/obu=0/0) (1.089 
> >> ms/1635289653.928360)
> >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/569 us  72
> >> [  1] 1.00-2.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:1,3:4,4:1,7:1,8:1 
> >> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.736 
> >> ms/1635289654.928088)
> >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/312 us  131
> >> [  1] 2.00-3.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:2,5:2,6:1 
> >> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.548 
> >> ms/1635289655.927776)
> >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/302 us  136
> >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,2:2,3:5,6:1 
> >> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.584 
> >> ms/1635289656.927814)
> >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/316 us  130
> >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:2,4:2,5:2,6:1 
> >> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.572 
> >> ms/1635289657.927810)
> >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/253 us  162
> >> [  1] 5.00-6.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:2,3:4,5:1 
> >> (5.00/95.00/99.7%=1/5/5,Outliers=0,obl/obu=0/0) (0.417 
> >> ms/1635289658.927630)
> >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/290 us  141
> >> [  1] 6.00-7.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:3,4:3,6:1 
> >> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.573 
> >> ms/1635289659.927771)
> >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/359 us  114
> >> [  1] 7.00-8.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,3:4,4:3,6:1 
> >> (5.00/95.00/99.7%=1/6/6,Outliers=0,obl/obu=0/0) (0.570 
> >> ms/1635289660.927753)
> >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/349 us  117
> >> [  1] 8.00-9.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,3:5,4:1,7:1 
> >> (5.00/95.00/99.7%=1/7/7,Outliers=0,obl/obu=0/0) (0.608 
> >> ms/1635289661.927843)
> >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/347 us  118
> >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:3,2:1,3:5,8:1 
> >> (5.00/95.00/99.7%=1/8/8,Outliers=0,obl/obu=0/0) (0.725 
> >> ms/1635289662.927861)
> >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       
> >> 14K/1519 us  27
> >> [  1] 0.00-10.01 sec S8(f)-PDF: 
> >> bin(w=100us):cnt(100)=1:25,2:13,3:36,4:11,5:5,6:5,7:2,8:2,11:1 
> >> (5.00/95.00/99.7%=1/7/11,Outliers=0,obl/obu=0/0) (1.089 
> >> ms/1635289653.928360)
> >>
> >> [root@localhost iperf2-code]# src/iperf -c 192.168.1.1 --trip-times -i 1 
> >> -e --tcp-write-prefetch 4 -l 4K --burst-size=40K --histograms
> >> WARN: option of --burst-size without --burst-period defaults 
> >> --burst-period to 1 second
> >> ------------------------------------------------------------
> >> Client connecting to 192.168.1.1, TCP port 5001 with pid 2131 (1 flows)
> >> Write buffer size: 4096 Byte
> >> Bursting: 40.0 KByte every 1.00 seconds
> >> TCP window size: 85.0 KByte (default)
> >> Event based writes (pending queue watermark at 4 bytes)
> >> Enabled select histograms bin-width=0.100 ms, bins=10000
> >> ------------------------------------------------------------
> >> [  1] local 192.168.1.4%eth1 port 45518 connected with 192.168.1.1 port 
> >> 5001 (MSS=1448) (prefetch=4) (trip-times) (sock=3) (ct=5.48 ms) on 
> >> 2021-10-26 16:07:56 (PDT)
> >> [ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     
> >> Cwnd/RTT        NetPwr
> >> [  1] 0.00-1.00 sec  40.1 KBytes   329 Kbits/sec  11/0          0       
> >> 14K/10339 us  4
> >> [  1] 0.00-1.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:1,40:1,47:1,49:2,50:3,51:1,60:1 
> >> (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.990 
> >> ms/1635289676.802143)
> >> [  1] 1.00-2.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/4853 us  8
> >> [  1] 1.00-2.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,38:1,39:1,44:1,45:1,49:1,51:1,52:1,60:1 
> >> (5.00/95.00/99.7%=1/60/60,Outliers=0,obl/obu=0/0) (5.937 
> >> ms/1635289677.802274)
> >> [  1] 2.00-3.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/4991 us  8
> >> [  1] 2.00-3.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,48:1,49:2,50:2,51:1,60:1,64:1 
> >> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.307 
> >> ms/1635289678.794326)
> >> [  1] 3.00-4.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/4610 us  9
> >> [  1] 3.00-4.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:3,50:3,56:1,64:1 
> >> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.362 
> >> ms/1635289679.794335)
> >> [  1] 4.00-5.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5028 us  8
> >> [  1] 4.00-5.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,49:6,59:1,64:1 
> >> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.367 
> >> ms/1635289680.794399)
> >> [  1] 5.00-6.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5113 us  8
> >> [  1] 5.00-6.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,49:3,50:2,58:1,60:1,65:1 
> >> (5.00/95.00/99.7%=1/65/65,Outliers=0,obl/obu=0/0) (6.442 
> >> ms/1635289681.794392)
> >> [  1] 6.00-7.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5054 us  8
> >> [  1] 6.00-7.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,39:1,49:3,51:1,60:2,64:1 
> >> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.374 
> >> ms/1635289682.794335)
> >> [  1] 7.00-8.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5138 us  8
> >> [  1] 7.00-8.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,39:2,40:1,49:2,50:1,60:1,64:1 
> >> (5.00/95.00/99.7%=1/64/64,Outliers=0,obl/obu=0/0) (6.396 
> >> ms/1635289683.794338)
> >> [  1] 8.00-9.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5329 us  8
> >> [  1] 8.00-9.00 sec S8-PDF: 
> >> bin(w=100us):cnt(10)=1:2,38:1,45:2,49:1,50:3,63:1 
> >> (5.00/95.00/99.7%=1/63/63,Outliers=0,obl/obu=0/0) (6.292 
> >> ms/1635289684.794262)
> >> [  1] 9.00-10.00 sec  40.0 KBytes   328 Kbits/sec  10/0          0       
> >> 14K/5329 us  8
> >> [  1] 9.00-10.00 sec S8-PDF: bin(w=100us):cnt(10)=1:2,39:1,49:3,50:3,84:1 
> >> (5.00/95.00/99.7%=1/84/84,Outliers=0,obl/obu=0/0) (8.306 
> >> ms/1635289685.796315)
> >> [  1] 0.00-10.01 sec   400 KBytes   327 Kbits/sec  102/0          0       
> >> 14K/6331 us  6
> >> [  1] 0.00-10.01 sec S8(f)-PDF: 
> >> bin(w=100us):cnt(100)=1:19,38:2,39:5,40:2,44:1,45:3,47:1,48:1,49:26,50:17,51:4,52:1,56:1,58:1,59:1,60:7,63:1,64:5,65:1,84:1
> >>  (5.00/95.00/99.7%=1/64/84,Outliers=0,obl/obu=0/0) (8.306 
> >> ms/1635289685.796315)
> >>
> >> Bob
> >>
> >> On Tue, Oct 26, 2021 at 11:45 AM Christoph Paasch <cpaa...@apple.com 
> >> <mailto:cpaa...@apple.com>> wrote:
> >>
> >>     Hello,
> >>
> >>     > On Oct 25, 2021, at 9:24 PM, Eric Dumazet <eric.duma...@gmail.com 
> >> <mailto:eric.duma...@gmail.com>> wrote:
> >>     >
> >>     >
> >>     >
> >>     > On 10/25/21 8:11 PM, Stuart Cheshire via Bloat wrote:
> >>     >> On 21 Oct 2021, at 17:51, Bob McMahon via Make-wifi-fast 
> >> <make-wifi-f...@lists.bufferbloat.net 
> >> <mailto:make-wifi-f...@lists.bufferbloat.net>> wrote:
> >>     >>
> >>     >>> Hi All,
> >>     >>>
> >>     >>> Sorry for the spam. I'm trying to support a meaningful TCP message 
> >> latency w/iperf 2 from the sender side w/o requiring e2e clock 
> >> synchronization. I thought I'd try to use the TCP_NOTSENT_LOWAT event to 
> >> help with this. It seems that this event goes off when the bytes are in 
> >> flight vs have reached the destination network stack. If that's the case, 
> >> then iperf 2 client (sender) may be able to produce the message latency by 
> >> adding the drain time (write start to TCP_NOTSENT_LOWAT) and the sampled 
> >> RTT.
> >>     >>>
> >>     >>> Does this seem reasonable?
> >>     >>
> >>     >> I’m not 100% sure what you’re asking, but I will try to help.
> >>     >>
> >>     >> When you set TCP_NOTSENT_LOWAT, the TCP implementation won’t report 
> >> your endpoint as writable (e.g., via kqueue or epoll) until less than that 
> >> threshold of data remains unsent. It won’t stop you writing more bytes if 
> >> you want to, up to the socket send buffer size, but it won’t *ask* you for 
> >> more data until the TCP_NOTSENT_LOWAT threshold is reached.
> >>     >
> >>     >
> >>     > When I implemented TCP_NOTSENT_LOWAT back in 2013 [1], I made sure 
> >> that sendmsg() would actually
> >>     > stop feeding more bytes in TCP transmit queue if the current amount 
> >> of unsent bytes
> >>     > was above the threshold.
> >>     >
> >>     > So it looks like Apple implementation is different, based on your 
> >> description ?
> >>
> >>     Yes, TCP_NOTSENT_LOWAT only impacts the wakeup on iOS/macOS/...
> >>
> >>     An app can still fill the send-buffer if it does a sendmsg() with a 
> >> large buffer or does repeated calls to sendmsg().
> >>
> >>     Fur Apple, the goal of TCP_NOTSENT_LOWAT was to allow an app to 
> >> quickly change the data it "scheduled" to send. And thus allow the app to 
> >> write the smallest "logical unit" it has. If that unit is 512KB large, the 
> >> app is allowed to send that.
> >>     For example, in case of video-streaming one may want to skip ahead in 
> >> the video. In that case the app still needs to transmit the remaining 
> >> parts of the previous frame anyways, before it can send the new video 
> >> frame.
> >>     That's the reason why the Apple implementation allows one to write 
> >> more than just the lowat threshold.
> >>
> >>
> >>     That being said, I do think that Linux's way allows for an easier API 
> >> because the app does not need to be careful at how much data it sends 
> >> after an epoll/kqueue wakeup. So, the latency-benefits will be easier to 
> >> get.
> >>
> >>
> >>     Christoph
> >>
> >>
> >>
> >>     > [1] 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36
> >>  
> >> <https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c9bee3b7fdecb0c1d070c7b54113b3bdfb9a3d36>
> >>     >
> >>     > netperf does not use epoll(), but rather a loop over sendmsg().
> >>     >
> >>     > One of the point of TCP_NOTSENT_LOWAT for Google was to be able to 
> >> considerably increase
> >>     > max number of bytes in transmit queues (3rd column of 
> >> /proc/sys/net/ipv4/tcp_wmem)
> >>     > by 10x, allowing for autotune to increase BDP for big RTT flows, 
> >> this without
> >>     > increasing memory needs for flows with small RTT.
> >>     >
> >>     > In other words, the TCP implementation attempts to keep BDP bytes in 
> >> flight + TCP_NOTSENT_LOWAT bytes buffered and ready to go. The BDP of 
> >> bytes in flight is necessary to fill the network pipe and get good 
> >> throughput. The TCP_NOTSENT_LOWAT of bytes buffered and ready to go is 
> >> provided to give the source software some advance notice that the TCP 
> >> implementation will soon be looking for more bytes to send, so that the 
> >> buffer doesn’t run dry, thereby lowering throughput. (The old SO_SNDBUF 
> >> option conflates both “bytes in flight” and “bytes buffered and ready to 
> >> go” into the same number.)
> >>     >>
> >>     >> If you wait for the TCP_NOTSENT_LOWAT notification, write a chunk 
> >> of n bytes of data, and then wait for the next TCP_NOTSENT_LOWAT 
> >> notification, that will tell you roughly how long it took n bytes to 
> >> depart the machine. You won’t know why, though. The bytes could depart the 
> >> machine in response for acks indicating that the same number of bytes have 
> >> been accepted at the receiver. But the bytes can also depart the machine 
> >> because CWND is growing. Of course, both of those things are usually 
> >> happening at the same time.
> >>     >>
> >>     >> How to use TCP_NOTSENT_LOWAT is explained in this video:
> >>     >>
> >>     >> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199 
> >> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2199>>
> >>     >>
> >>     >> Later in the same video is a two-minute demo (time offset 42:00 to 
> >> time offset 44:00) showing a “before and after” demo illustrating the 
> >> dramatic difference this makes for screen sharing responsiveness.
> >>     >>
> >>     >> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520 
> >> <https://developer.apple.com/videos/play/wwdc2015/719/?time=2520>>
> >>     >>
> >>     >> Stuart Cheshire
> >>     >> _______________________________________________
> >>     >> Bloat mailing list
> >>     >> bl...@lists.bufferbloat.net <mailto:bl...@lists.bufferbloat.net>
> >>     >> https://lists.bufferbloat.net/listinfo/bloat 
> >> <https://lists.bufferbloat.net/listinfo/bloat>
> >>     >>
> >>     > _______________________________________________
> >>     > Bloat mailing list
> >>     > bl...@lists.bufferbloat.net <mailto:bl...@lists.bufferbloat.net>
> >>     > https://lists.bufferbloat.net/listinfo/bloat 
> >> <https://lists.bufferbloat.net/listinfo/bloat>
> >>
> >>
> >> This electronic communication and the information and any files 
> >> transmitted with it, or attached to it, are confidential and are intended 
> >> solely for the use of the individual or entity to whom it is addressed and 
> >> may contain information that is confidential, legally privileged, 
> >> protected by privacy laws, or otherwise restricted from disclosure to 
> >> anyone else. If you are not the intended recipient or the person 
> >> responsible for delivering the e-mail to the intended recipient, you are 
> >> hereby notified that any use, copying, distributing, dissemination, 
> >> forwarding, printing, or copying of this e-mail is strictly prohibited. If 
> >> you received this e-mail in error, please return the e-mail to the sender, 
> >> delete it from your computer, and destroy any printed copy of it.
> 
> This electronic communication and the information and any files transmitted 
> with it, or attached to it, are confidential and are intended solely for the 
> use of the individual or entity to whom it is addressed and may contain 
> information that is confidential, legally privileged, protected by privacy 
> laws, or otherwise restricted from disclosure to anyone else. If you are not 
> the intended recipient or the person responsible for delivering the e-mail to 
> the intended recipient, you are hereby notified that any use, copying, 
> distributing, dissemination, forwarding, printing, or copying of this e-mail 
> is strictly prohibited. If you received this e-mail in error, please return 
> the e-mail to the sender, delete it from your computer, and destroy any 
> printed copy of it.


--- End Message ---
_______________________________________________
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Reply via email to