On Wed, Jun 26, 2019 at 8:22 AM Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote: > > On Wed, Jun 26, 2019 at 06:20:17AM +0200, Eric Dumazet wrote: > > On Wed, Jun 26, 2019 at 5:43 AM Guenter Roeck <li...@roeck-us.net> wrote: > > > > > > On 6/25/19 7:29 PM, Greg Kroah-Hartman wrote: > > > > On Tue, Jun 25, 2019 at 07:02:20PM -0700, Guenter Roeck wrote: > > > >> Hi Greg, > > > >> > > > >> On Sat, Jun 22, 2019 at 09:37:53AM +0200, Greg Kroah-Hartman wrote: > > > >>> On Fri, Jun 21, 2019 at 10:28:21PM -0700, Linus Torvalds wrote: > > > >>>> On Fri, Jun 21, 2019 at 6:03 PM Pierre-Loup A. Griffais > > > >>>> <pgriff...@valvesoftware.com> wrote: > > > >>>>> > > > >>>>> I applied Eric's path to the tip of the branch and ran that kernel > > > >>>>> and > > > >>>>> the bug didn't occur through several logout / login cycles, so > > > >>>>> things > > > >>>>> look good at first glance. I'll keep running that kernel and report > > > >>>>> back > > > >>>>> if anything crops up in the future, but I believe we're good, beyond > > > >>>>> getting distros to ship this additional fix. > > > >>>> > > > >>>> Good. It's now in my tree, so we can get it quickly into stable and > > > >>>> then quickly to distributions. > > > >>>> > > > >>>> Greg, it's commit b6653b3629e5 ("tcp: refine memory limit test in > > > >>>> tcp_fragment()"), and I'm building it right now and I'll push it out > > > >>>> in a couple of minutes assuming nothing odd is going on. > > > >>> > > > >>> This looks good for 4.19 and 5.1, so I'll push out new stable kernels > > > >>> in > > > >>> a bit for them. > > > >>> > > > >>> But for 4.14 and older, we don't have the "hint" to know this is an > > > >>> outbound going packet and not to apply these checks at that point in > > > >>> time, so this patch doesn't work. > > > >>> > > > >>> I'll see if I can figure anything else later this afternoon for those > > > >>> kernels... > > > >>> > > > >> > > > >> I may have missed it, but I don't see a fix for the problem in > > > >> older stable branches. Any news ? > > > >> > > > >> One possibility might be be to apply the part of 75c119afe14f7 which > > > >> introduces TCP_FRAG_IN_WRITE_QUEUE and TCP_FRAG_IN_RTX_QUEUE, if that > > > >> is acceptable. > > > > > > > > That's what people have already discussed on the stable mailing list a > > > > few hours ago, hopefully a patch shows up soon as I'm traveling at the > > > > moment and can't do it myself... > > > > > > > > > > Sounds good. Let me know if nothing shows up; I'll be happy to do it > > > if needed. > > > > > > Without the rb-tree for rtx queues, old kernels are vulnerable to SACK > > attacks if sk_sndbuf is too big, > > so I would simply add a cushion in the test, instead of trying to > > backport an illusion of the rb-tree fixes. > > > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index > > a8772e11dc1cb42d4319b6fc072c625d284c7ad5..a554213afa4ac41120d781fe64b7cd18ff9b56e8 > > 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -1274,7 +1274,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff > > *skb, u32 len, > > if (nsize < 0) > > nsize = 0; > > > > - if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) { > > + if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf + 131072)) { > > NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); > > return -ENOMEM; > > } > > That's a funny magic number, can we document what it means?
This is because TCP can cook skb with about 64KB of payload in tcp_sendmsg() before checking if memory limits are exceeded. (This is mentioned in commit b6653b3629e5b88202be3c9abc44713973f5c4b4 " tcp: refine memory limit test in tcp_fragment()" changelog) Then, if this giant TSO skb needs to be split in ~45 smaller skbs of one segment each, the resulting truesize might be twice bigger. You could use 2 * 65536 if that looks better, and possibly a macro, but I feel that adding a macro for this one particular spot and stable kernels might be overkill ? > > And yes, it's a much simpler patch, I'd rather take this than the fake > backport.