On 08/03/2018 09:58 AM, Toshiaki Makita wrote: > This is the basic implementation of veth driver XDP. > > Incoming packets are sent from the peer veth device in the form of skb, > so this is generally doing the same thing as generic XDP. > > This itself is not so useful, but a starting point to implement other > useful veth XDP features like TX and REDIRECT. > > This introduces NAPI when XDP is enabled, because XDP is now heavily > relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function > enqueues packets to the ring and peer NAPI handler drains the ring. > > Currently only one ring is allocated for each veth device, so it does > not scale on multiqueue env. This can be resolved by allocating rings > on the per-queue basis later. > > Note that NAPI is not used but netif_rx is used when XDP is not loaded, > so this does not change the default behaviour. > > v6: > - Check skb->len only when allocation is needed. > - Add __GFP_NOWARN to alloc_page() as it can be triggered by external > events. > > v3: > - Fix race on closing the device. > - Add extack messages in ndo_bpf. > > v2: > - Squashed with the patch adding NAPI. > - Implement adjust_tail. > - Don't acquire consumer lock because it is guarded by NAPI. > - Make poll_controller noop since it is unnecessary. > - Register rxq_info on enabling XDP rather than on opening the device. > > Signed-off-by: Toshiaki Makita <makita.toshi...@lab.ntt.co.jp> [...] > + > +static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv, > + struct sk_buff *skb) > +{ > + u32 pktlen, headroom, act, metalen; > + void *orig_data, *orig_data_end; > + struct bpf_prog *xdp_prog; > + int mac_len, delta, off; > + struct xdp_buff xdp; > + > + rcu_read_lock(); > + xdp_prog = rcu_dereference(priv->xdp_prog); > + if (unlikely(!xdp_prog)) { > + rcu_read_unlock(); > + goto out; > + } > + > + mac_len = skb->data - skb_mac_header(skb); > + pktlen = skb->len + mac_len; > + headroom = skb_headroom(skb) - mac_len; > + > + if (skb_shared(skb) || skb_head_is_locked(skb) || > + skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) {
Hmm, I think this is not fully correct. What happens if you have cloned skbs as e.g. the case with TCP? This would also need a full expensive unclone to make the data private as expected by XDP (this is basically a similar issue in generic XDP). It may potentially be worth to also share the code here with generic XDP implementation given it's quite similar? > + struct sk_buff *nskb; > + int size, head_off; > + void *head, *start; > + struct page *page; > + > + size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) + > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > + if (size > PAGE_SIZE) > + goto drop; > + > + page = alloc_page(GFP_ATOMIC | __GFP_NOWARN); > + if (!page) > + goto drop; > + > + head = page_address(page); > + start = head + VETH_XDP_HEADROOM; > + if (skb_copy_bits(skb, -mac_len, start, pktlen)) { > + page_frag_free(head); > + goto drop; > + } > + > + nskb = veth_build_skb(head, > + VETH_XDP_HEADROOM + mac_len, skb->len, > + PAGE_SIZE); > + if (!nskb) { > + page_frag_free(head); > + goto drop; > + } > + > + skb_copy_header(nskb, skb); > + head_off = skb_headroom(nskb) - skb_headroom(skb); > + skb_headers_offset_update(nskb, head_off); > + if (skb->sk) > + skb_set_owner_w(nskb, skb->sk); > + consume_skb(skb); > + skb = nskb; > + } > + > + xdp.data_hard_start = skb->head; > + xdp.data = skb_mac_header(skb); > + xdp.data_end = xdp.data + pktlen; > + xdp.data_meta = xdp.data; > + xdp.rxq = &priv->xdp_rxq; > + orig_data = xdp.data; > + orig_data_end = xdp.data_end; > + > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + > + switch (act) { > + case XDP_PASS: > + break; > + default: > + bpf_warn_invalid_xdp_action(act); > + case XDP_ABORTED: > + trace_xdp_exception(priv->dev, xdp_prog, act); > + case XDP_DROP: > + goto drop; > + } > + rcu_read_unlock(); > + > + delta = orig_data - xdp.data; > + off = mac_len + delta; > + if (off > 0) > + __skb_push(skb, off); > + else if (off < 0) > + __skb_pull(skb, -off); > + skb->mac_header -= delta; > + off = xdp.data_end - orig_data_end; > + if (off != 0) > + __skb_put(skb, off); > + skb->protocol = eth_type_trans(skb, priv->dev); > + > + metalen = xdp.data - xdp.data_meta; > + if (metalen) > + skb_metadata_set(skb, metalen); > +out: > + return skb; > +drop: > + rcu_read_unlock(); > + kfree_skb(skb); > + return NULL; > +}