On Thu, Nov 29, 2018 at 3:26 PM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > From: Willem de Bruijn <will...@google.com> > > With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info. > Release of its last reference triggers a completion notification. > > The TCP stack in tcp_sendmsg_locked holds an extra ref independent of > the skbs, because it can build, send and free skbs within its loop, > possibly reaching refcount zero and freeing the ubuf_info too soon. > > The UDP stack currently also takes this extra ref, but does not need > it as all skbs are sent after return from __ip(6)_append_data. > > Avoid the extra refcount_inc and refcount_dec_and_test, and generally > the sock_zerocopy_put in the common path, by passing the initial > reference to the first skb. > > This approach is taken instead of initializing the refcount to 0, as > that would generate error "refcount_t: increment on 0" on the > next skb_zcopy_set. > > Signed-off-by: Willem de Bruijn <will...@google.com> > ---
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 9602746d7175..08ff04d12642 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -881,8 +881,8 @@ static int __ip_append_data(struct sock *sk, > int csummode = CHECKSUM_NONE; > struct rtable *rt = (struct rtable *)cork->dst; > unsigned int wmem_alloc_delta = 0; > + bool paged, extra_uref; > u32 tskey = 0; > - bool paged; > > skb = skb_peek_tail(queue); > > @@ -921,12 +921,13 @@ static int __ip_append_data(struct sock *sk, > uarg = sock_zerocopy_realloc(sk, length, skb_zcopy(skb)); > if (!uarg) > return -ENOBUFS; > + extra_uref = true; > if (rt->dst.dev->features & NETIF_F_SG && > csummode == CHECKSUM_PARTIAL) { > paged = true; > } else { > uarg->zerocopy = 0; > - skb_zcopy_set(skb, uarg); > + skb_zcopy_set(skb, uarg, &extra_uref); > } > } > > @@ -1019,7 +1020,7 @@ static int __ip_append_data(struct sock *sk, > cork->tx_flags = 0; > skb_shinfo(skb)->tskey = tskey; > tskey = 0; > - skb_zcopy_set(skb, uarg); > + skb_zcopy_set(skb, uarg, &extra_uref); > > /* > * Find where to start putting bytes. > @@ -1123,13 +1124,12 @@ static int __ip_append_data(struct sock *sk, > > if (wmem_alloc_delta) > refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc); > - sock_zerocopy_put(uarg); > return 0; > > error_efault: > err = -EFAULT; > error: > - sock_zerocopy_put_abort(uarg); > + sock_zerocopy_put_abort(uarg, extra_uref); I'll need another revision. Sorry for the spam. In the draft patch I suggested that the skb_zcopy_set needs to be moved below getfrag, so that the uarg is not freed on the only error path that calls kfree_skb inside the main loop. This is still needed. Else sock_zerocopy_put_abort here is reached with a non-NULL, but already freed uarg. Will send a v4. Will let this sit for at least a day in case anyone else has comments.