RE: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

Prout, Andrew - LLSC - MITLL Wed, 10 Jul 2019 11:54:30 -0700

On 7/10/19 2:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On 7/10/19 8:23 PM, Prout, Andrew - LLSC - MITLL wrote:
>> On 6/17/19 8:19 PM, Christoph Paasch wrote:
>>>
>>> Yes, this does the trick for my packetdrill-test.
>>>
>>> I wonder, is there a way we could end up in a situation where we can't
>>> retransmit anymore?
>>> For example, sk_wmem_queued has grown so much that the new test fails.
>>> Then, if we legitimately need to fragment in __tcp_retransmit_skb() we
>>> won't be able to do so. So we will never retransmit. And if no ACK
>>> comes back in to make some room we are stuck, no?
>> 
>> We seem to be having exactly this problem. We’re running on the 4.14 branch. 
>> After recently updating our kernel, we’ve been having a problem with TCP 
>> connections stalling / dying off without disconnecting. They're stuck and 
>> never recover.
>> 
>> I bisected the problem to 4.14.127 commit 
>> 9daf226ff92679d09aeca1b5c1240e3607153336 (commit 
>> f070ef2ac66716357066b683fb0baf55f8191a2e upstream): tcp: tcp_fragment() 
>> should apply sane memory limits. That lead me to this thread.
>> 
>> Our environment is a supercomputing center: lots of servers interconnected 
>> with a non-blocking 10Gbit ethernet network. We’ve zeroed in on the problem 
>> in two situations: remote users on VPN accessing large files via samba and 
>> compute jobs using Intel MPI over TCP/IP/ethernet. It certainly affects 
>> other situations, many of our workloads have been unstable since this patch 
>> went into production, but those are the two we clearly identified as they 
>> fail reliably every time. We had to take the system down for unscheduled 
>> maintenance to roll back to an older kernel.
>> 
>> The TCPWqueueTooBig count is incrementing when the problem occurs.
>> 
>> Using ftrace/trace-cmd on an affected process, it appears the call stack is:
>> run_timer_softirq
>> expire_timers
>> call_timer_fn
>> tcp_write_timer
>> tcp_write_timer_handler
>> tcp_retransmit_timer
>> tcp_retransmit_skb
>> __tcp_retransmit_skb
>> tcp_fragment
>> 
>> Andrew Prout
>> MIT Lincoln Laboratory Supercomputing Center
>> 
>
> What was the kernel version you used exactly ?
>
> This problem is supposed to be fixed in v4.14.131


Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as well, 
reliably for the samba test and once (not reliably) with synthetic test I was 
trying. A patched v4.14.132 with this patch partially reverted (just the four 
lines from tcp_fragment deleted) passed the samba test.

The synthetic test was a pair of simple send/recv test programs under the 
following conditions:
-The send socket was non-blocking
-SO_SNDBUF set to 128KiB
-The receiver NIC was being flooded with traffic from multiple hosts (to induce 
packet loss/retransmits)
-Load was on both systems: a while(1) program spinning on each CPU core
-The receiver was on an older unaffected kernel

RE: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits

Reply via email to