On 7/10/19 2:29 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On 7/10/19 8:23 PM, Prout, Andrew - LLSC - MITLL wrote: >> On 6/17/19 8:19 PM, Christoph Paasch wrote: >>> >>> Yes, this does the trick for my packetdrill-test. >>> >>> I wonder, is there a way we could end up in a situation where we can't >>> retransmit anymore? >>> For example, sk_wmem_queued has grown so much that the new test fails. >>> Then, if we legitimately need to fragment in __tcp_retransmit_skb() we >>> won't be able to do so. So we will never retransmit. And if no ACK >>> comes back in to make some room we are stuck, no? >> >> We seem to be having exactly this problem. We’re running on the 4.14 branch. >> After recently updating our kernel, we’ve been having a problem with TCP >> connections stalling / dying off without disconnecting. They're stuck and >> never recover. >> >> I bisected the problem to 4.14.127 commit >> 9daf226ff92679d09aeca1b5c1240e3607153336 (commit >> f070ef2ac66716357066b683fb0baf55f8191a2e upstream): tcp: tcp_fragment() >> should apply sane memory limits. That lead me to this thread. >> >> Our environment is a supercomputing center: lots of servers interconnected >> with a non-blocking 10Gbit ethernet network. We’ve zeroed in on the problem >> in two situations: remote users on VPN accessing large files via samba and >> compute jobs using Intel MPI over TCP/IP/ethernet. It certainly affects >> other situations, many of our workloads have been unstable since this patch >> went into production, but those are the two we clearly identified as they >> fail reliably every time. We had to take the system down for unscheduled >> maintenance to roll back to an older kernel. >> >> The TCPWqueueTooBig count is incrementing when the problem occurs. >> >> Using ftrace/trace-cmd on an affected process, it appears the call stack is: >> run_timer_softirq >> expire_timers >> call_timer_fn >> tcp_write_timer >> tcp_write_timer_handler >> tcp_retransmit_timer >> tcp_retransmit_skb >> __tcp_retransmit_skb >> tcp_fragment >> >> Andrew Prout >> MIT Lincoln Laboratory Supercomputing Center >> > > What was the kernel version you used exactly ? > > This problem is supposed to be fixed in v4.14.131
Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as well, reliably for the samba test and once (not reliably) with synthetic test I was trying. A patched v4.14.132 with this patch partially reverted (just the four lines from tcp_fragment deleted) passed the samba test. The synthetic test was a pair of simple send/recv test programs under the following conditions: -The send socket was non-blocking -SO_SNDBUF set to 128KiB -The receiver NIC was being flooded with traffic from multiple hosts (to induce packet loss/retransmits) -Load was on both systems: a while(1) program spinning on each CPU core -The receiver was on an older unaffected kernel