> On Sep 12, 2019, at 4:57 PM, Bryan Call <bc...@apache.org> wrote:
>
> I would double check your buffer settings just in case:
> $ sysctl -a | grep tcp | grep mem
> net.ipv4.tcp_rmem = 4096 87380 6291456
> net.ipv4.tcp_wmem = 4096 16384 4194304
>
> $ traffic_ctl config match buffer
> proxy.config.net.sock_send_buffer_size_in: 2097152
> proxy.config.net.sock_recv_buffer_size_in: 0
> proxy.config.net.sock_send_buffer_size_out: 0
> proxy.config.net.sock_recv_buffer_size_out: 2097152
The other thing to watch out for is that if you increase ATS buffer like the
above, every connection will always use that much memory. We had a situation
where that caused things to consume all of the kernels allowed memory for
sockets, and then things got sour really quick (that max memory is a percentage
of all available memory, another sysctl). You really do not want that to happen
:).
We’ve since removed the ATS settings (setting them only to 0), allowing for the
two sysctl’s above to take effect, autotuning the buffer settings. However, we
increased those as well accordingly, I believe upstream has also change
defaults, but something like this might be acceptable:
32768 131072 8388608
And yes, as Bryan points out, for some use cases, that might make things
slightly slower (due to BDP), but it’s a tradeoff. We reclaimed significant
amounts of memory by letting the autotuning to try its best (with the modified
min / initial / max settings).
Cheers,
— leif
>
> You can take a look at the buffers and windows sizes in the kernel, but it is
> kinda hard to match that up to what ATS is doing. You might be able to take
> the ATS logs, if you have all the milestone information and correlate it to a
> connection or the strace logs.
> $ ss -tnei
>
> Another possibility is to have ATS get the tcpinfo information every-time it
> does a write to see if there has been able delay in the socket connection.
>
> -Bryan
>
>
>> On Sep 12, 2019, at 11:30 AM, Chou, Peter <pbc...@labs.att.com> wrote:
>>
>> Bryan,
>>
>> Thanks for the response. Good reminder about the tranmission buffer limiting
>> the TCP tranmission window
>> which needs to be sized for the bandwidth delay product. I am a L2/L3 guy so
>> not a TCP expert :-).
>> However, I don't know whether the default (I believe 1MB in our RHEL
>> release) causes any problems for this
>> with our situation.
>>
>> We were more focused on whether a smaller TCP transmission buffer size
>> required more
>> frequent servicing by ATS, and if ATS had problems keeping the buffer from
>> emptying while data still needed
>> to be sent. We did some straces of ATS behavior, and we found that sometimes
>> the delay between successive
>> writev() following a EAGAIN event was fairly long (long enough it could
>> jeopardize time constraints for delivery
>> of streaming data). In my development box, I saw the delay between EAGAIN
>> and retry vary between 8ms to 1300ms.
>> I believe Jeremy saw a situation in the lab where it was 3-seconds!
>>
>> My setup was just a single VM with ATS 7.1.4 with curl (rate limited option
>> set for 1MB) with a previously cached 10MB
>> data file. I took a look at the code, and it seems on Linux ATS should be
>> using the epoll_wait mechanism
>> (10 ms time-out), which is driven by a polling continuation. I did not see
>> anything
>> there that should cause delay in retries of 1+ seconds. Any thoughts?
>>
>> Thanks,
>> Peter
>>
>> -----Original Message-----
>> From: Bryan Call <bc...@apache.org>
>> Sent: Thursday, September 12, 2019 9:24 AM
>> To: dev <dev@trafficserver.apache.org>
>> Subject: Re: TCP socket buffer size.
>>
>> I have seen issues where you can’t reach the max throughput of the network
>> connection without increasing the TCP buffers, because it effects the max
>> TCP window size (bandwidth-delay product). Here is a calculator I have used
>> before to figure out what your buffer size should be:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.switch.ch_network_tools_tcp-5Fthroughput_&d=DwIFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=s_oHKVI9KE_kwjG4HQVPq3HQoi5fh_uBbjpB2xeOjYU&s=PTsYs3JTguDKsu9dHpHXoAIQpp7hyH0UXHky9R1rwww&e=
>>
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.switch.ch_network_tools_tcp-5Fthroughput_&d=DwIFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=8c5kS62dKm3-obVyLvkwkc-kTTgV1vAsbxSPwL-yi3o&m=s_oHKVI9KE_kwjG4HQVPq3HQoi5fh_uBbjpB2xeOjYU&s=PTsYs3JTguDKsu9dHpHXoAIQpp7hyH0UXHky9R1rwww&e=>
>>
>> Theoretically there should be some latency difference between having a small
>> buffer size vs a larger one (up to some limit), but my guess is it would be
>> hard to measure because it would be so small.
>>
>> -Bryan
>>
>>
>>> On Sep 11, 2019, at 11:50 AM, Chou, Peter <pbc...@labs.att.com> wrote:
>>>
>>> Hi all,
>>>
>>> Sometimes we see lots of EAGAIN result codes from ATS trying to write to
>>> the TCP socket file descriptor. I presume this is typically due to
>>> congestion or rate mis-match between client and ATS. Is there any benefit
>>> to increasing the TCP socket buffer size which would reduce the number of
>>> these write operations? Specifically, should we expect any kind of latency
>>> difference as there is some concern about how long it takes ATS to
>>> re-schedule that particular VC for another write attempt?
>>>
>>> Thanks,
>>> Peter
>>
>