2017-11-14 0:50 GMT+01:00 Alexei Starovoitov <a...@fb.com>: > On 11/13/17 9:07 PM, Björn Töpel wrote: >> >> 2017-10-31 13:41 GMT+01:00 Björn Töpel <bjorn.to...@gmail.com>: >>> >>> From: Björn Töpel <bjorn.to...@intel.com> >>> >> [...] >>> >>> >>> We'll do a presentation on AF_PACKET V4 in NetDev 2.2 [1] Seoul, >>> Korea, and our paper with complete benchmarks will be released shortly >>> on the NetDev 2.2 site. >>> >> >> We're back in the saddle after an excellent netdevconf week. Kudos to >> the organizers; We had a blast! Thanks for all the constructive >> feedback. >> >> I'll summarize the major points, that we'll address in the next RFC >> below. >> >> * Instead of extending AF_PACKET with yet another version, introduce a >> new address/packet family. As for naming had some name suggestions: >> AF_CAPTURE, AF_CHANNEL, AF_XDP and AF_ZEROCOPY. We'll go for >> AF_ZEROCOPY, unless there're no strong opinions against it. >> >> * No explicit zerocopy enablement. Use the zeropcopy path if >> supported, if not -- fallback to the skb path, for netdevs that >> don't support the required ndos. Further, we'll have the zerocopy >> behavior for the skb path as well, meaning that an AF_ZEROCOPY >> socket will consume the skb and we'll honor skb->queue_mapping, >> meaning that we only consume the packets for the enabled queue. >> >> * Limit the scope of the first patchset to Rx only, and introduce Tx >> in a separate patchset. > > > all sounds good to me except above bit. > I don't remember people suggesting to split it this way. > What's the value of it without tx? >
We definitely need Tx for our use-cases! I'll rephrase, so the idea was making the initial patch set without Tx *driver* specific code, e.g. use ndo_xdp_xmit/flush at a later point. So AF_ZEROCOPY, the socket parts, would have Tx support. @John Did I recall that correctly? >> * Minimize the size of the i40e zerocopy patches, by moving the driver >> specific code to separate patches. >> >> * Do not introduce a new XDP action XDP_PASS_TO_KERNEL, instead use >> XDP redirect map call with ingress flag. >> >> * Extend the XDP redirect to support explicit allocator/destructor >> functions. Right now, XDP redirect assumes that the page allocator >> was used, and the XDP redirect cleanup path is decreasing the page >> count of the XDP buffer. This assumption breaks for the zerocopy >> case. >> >> >> Björn >> >> >>> We based this patch set on net-next commit e1ea2f9856b7 ("Merge >>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net"). >>> >>> Please focus your review on: >>> >>> * The V4 user space interface >>> * PACKET_ZEROCOPY and its semantics >>> * Packet array interface >>> * XDP semantics when excuting in zero-copy mode (user space passed >>> buffers) >>> * XDP_PASS_TO_KERNEL semantics >>> >>> To do: >>> >>> * Investigate the user-space ring structure’s performance problems >>> * Continue the XDP integration into packet arrays >>> * Optimize performance >>> * SKB <-> V4 conversions in tp4a_populate & tp4a_flush >>> * Packet buffer is unnecessarily pinned for virtual devices >>> * Support shared packet buffers >>> * Unify V4 and SKB receive path in I40E driver >>> * Support for packets spanning multiple frames >>> * Disassociate the packet array implementation from the V4 queue >>> structure >>> >>> We would really like to thank the reviewers of the limited >>> distribution RFC for all their comments that have helped improve the >>> interfaces and the code significantly: Alexei Starovoitov, Alexander >>> Duyck, Jesper Dangaard Brouer, and John Fastabend. The internal team >>> at Intel that has been helping out reviewing code, writing tests, and >>> sanity checking our ideas: Rami Rosen, Jeff Shaw, Ferruh Yigit, and Qi >>> Zhang, your participation has really helped. >>> >>> Thanks: Björn and Magnus >>> >>> [1] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.netdevconf.org_2.2_&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=qR6oNZj1CqLATni4ibTgAQ&m=lKyFxON3kKygiOgECLBfmqRwM7ZyXFSUvLED1vP-gos&s=44jzm1W8xkGyZSZVANRygzHz6y4XHbYrYBRM-K5RhTc&e= >>> >>> >>> Björn Töpel (7): >>> packet: introduce AF_PACKET V4 userspace API >>> packet: implement PACKET_MEMREG setsockopt >>> packet: enable AF_PACKET V4 rings >>> packet: wire up zerocopy for AF_PACKET V4 >>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Rx support >>> i40e: AF_PACKET V4 ndo_tp4_zerocopy Tx support >>> samples/tpacket4: added tpbench >>> >>> Magnus Karlsson (7): >>> packet: enable Rx for AF_PACKET V4 >>> packet: enable Tx support for AF_PACKET V4 >>> netdevice: add AF_PACKET V4 zerocopy ops >>> veth: added support for PACKET_ZEROCOPY >>> samples/tpacket4: added veth support >>> i40e: added XDP support for TP4 enabled queue pairs >>> xdp: introducing XDP_PASS_TO_KERNEL for PACKET_ZEROCOPY use >>> >>> drivers/net/ethernet/intel/i40e/i40e.h | 3 + >>> drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 9 + >>> drivers/net/ethernet/intel/i40e/i40e_main.c | 837 ++++++++++++- >>> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 582 ++++++++- >>> drivers/net/ethernet/intel/i40e/i40e_txrx.h | 38 + >>> drivers/net/veth.c | 174 +++ >>> include/linux/netdevice.h | 16 + >>> include/linux/tpacket4.h | 1502 >>> ++++++++++++++++++++++++ >>> include/uapi/linux/bpf.h | 1 + >>> include/uapi/linux/if_packet.h | 65 +- >>> net/packet/af_packet.c | 1252 >>> +++++++++++++++++--- >>> net/packet/internal.h | 9 + >>> samples/tpacket4/Makefile | 12 + >>> samples/tpacket4/bench_all.sh | 28 + >>> samples/tpacket4/tpbench.c | 1390 >>> ++++++++++++++++++++++ >>> 15 files changed, 5674 insertions(+), 244 deletions(-) >>> create mode 100644 include/linux/tpacket4.h >>> create mode 100644 samples/tpacket4/Makefile >>> create mode 100755 samples/tpacket4/bench_all.sh >>> create mode 100644 samples/tpacket4/tpbench.c >>> >>> -- >>> 2.11.0 >>> >