On 6/21/19 3:38 PM, Eric Dumazet wrote:
Please look at my recent patch.
Sorry I am travelling....
On Fri, Jun 21, 2019, 6:19 PM Linus Torvalds
<torva...@linux-foundation.org <mailto:torva...@linux-foundation.org>>
wrote:
On Fri, Jun 21, 2019 at 2:41 PM Greg Kroah-Hartman
<gre...@linuxfoundation.org <mailto:gre...@linuxfoundation.org>> wrote:
>
> What specific commit caused the breakage?
Both on reddit and on github there seems to be confusion about whether
it's a problem or not. Some people have it working with the exact same
kernel that breaks for others.
And then some people seem to say it works intermittently for them,
which seems to indicate a timing issue.
Looking at the SACK patches (assuming it's one of them), I'd suspect
the "tcp: tcp_fragment() should apply sane memory limits".
Eric, that one does
if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
return -ENOMEM;
}
but I think it's *normal* for "sk_wmem_queued >> 1" to be around the
same size as sk_sndbuf. So if there is some fragmentation, and we add
more skb's to it, that would seem to trigger fairly easily.
Particularly since this is all in "truesize" units, which can be a lot
bigger than the packets themselves.
I don't know the code, so I may be out to lunch and barking up
completely the wrong tree, but that particular check does seem like it
might trigger much more easily than I think the code _intended_ it to
trigger?
Pierre-Loup - do you guys have a test-case inside of valve? Or is this
purely "we see some people with problems"?
Definitely the latter, although the volume of complaints clearly points
to a real problem from our experience. Reproducing locally, bisecting
and testing possible fixes is just now starting on our end.
I agree not all users seem affected; most affected people report success
by using -tcp to launch Steam, which makes it use direct TCP instead of
WebSockets, our current default connection method for Linux.
Thanks,
- Pierre-Loup
Linus