On Thu, 2017-02-02 at 10:56 -0500, Josef Bacik wrote: > The problem is we set skb->pfmemalloc a bunch of different places, such > as __skb_fill_page_desc, which appears to be used in both the RX and TX > path, so we can't just kill it there. Do we want to go through and > audit each one, provide a way for callers to indicate if we care about > pfmemalloc and solve this problem that way? I feel like that's more > likely to bite us in the ass down the line, and somebody who doesn't > know the context is going to come along and change it and regress us to > the current situation. The only place this is a problem is with > loopback, and my change is contained to this one weird case. Thanks,
I mentioned this in another mail : Same issue will happen with veth, or any kind of driver allowing skb being given back to the stack in RX. So your patch on loopback is not the definitive patch. We probably should clear pf->memalloc directly in TCP write function. Note that I clear it on the clone, not in original skb. (It might be very useful to keep skb->pfmemalloc on original skbs in write queue, at least for debugging purposes) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8ce50dc3ab8cac821b8a2c3e0d31f0aa42f5c9d5..010280f1592d3bd195315882c364bdbbd4a1c2ec 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -944,6 +944,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, skb = skb_clone(skb, gfp_mask); if (unlikely(!skb)) return -ENOBUFS; + skb->pfmemalloc = 0; } inet = inet_sk(sk);