2018-05-07 15:09 GMT+02:00 Jesper Dangaard Brouer <bro...@redhat.com>: > On Mon, 7 May 2018 11:13:58 +0200 > Magnus Karlsson <magnus.karls...@gmail.com> wrote: > >> On Sat, May 5, 2018 at 2:34 AM, Alexei Starovoitov >> <alexei.starovoi...@gmail.com> wrote: >> > On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote: >> >> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov >> >> <alexei.starovoi...@gmail.com> wrote: >> >> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote: >> >> >> On 05/02/2018 01:01 PM, Björn Töpel wrote: >> >> >> > From: Björn Töpel <bjorn.to...@intel.com> >> >> >> > >> >> >> > This patch set introduces a new address family called AF_XDP that is >> >> >> > optimized for high performance packet processing and, in upcoming >> >> >> > patch sets, zero-copy semantics. In this patch set, we have removed >> >> >> > all zero-copy related code in order to make it smaller, simpler and >> >> >> > hopefully more review friendly. This patch set only supports >> >> >> > copy-mode >> >> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode >> >> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and >> >> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his >> >> >> > work has already been accepted. We will publish our zero-copy support >> >> >> > for RX and TX on top of his patch sets at a later point in time. >> >> >> >> >> >> +1, would be great to see it land this cycle. Saw few minor nits here >> >> >> and there but nothing to hold it up, for the series: >> >> >> >> >> >> Acked-by: Daniel Borkmann <dan...@iogearbox.net> >> >> >> >> >> >> Thanks everyone! >> >> > >> >> > Great stuff! >> >> > >> >> > Applied to bpf-next, with one condition. >> >> > Upcoming zero-copy patches for both RX and TX need to be posted >> >> > and reviewed within this release window. >> >> > If netdev community as a whole won't be able to agree on the zero-copy >> >> > bits we'd need to revert this feature before the next merge window. >> >> >> >> Thanks everyone for reviewing this. Highly appreciated. >> >> >> >> Just so we understand the purpose correctly: >> >> >> >> 1: Do you want to see the ZC patches in order to verify that the user >> >> space API holds? If so, we can produce an additional RFC patch set >> >> using a big chunk of code that we had in RFC V1. We are not proud of >> >> this code since it is clunky, but it hopefully proves the point with >> >> the uapi being the same. >> >> >> >> 2: And/Or are you worried about us all (the netdev community) not >> >> agreeing on a way to implement ZC internally in the drivers and the >> >> XDP infrastructure? This is not going to be possible to finish during >> >> this cycle since we do not like the implementation we had in RFC V1. >> >> Too intrusive and now we also have nicer abstractions from Jesper that >> >> we can use and extend to provide a (hopefully) much cleaner and less >> >> intrusive solution. >> > >> > short answer: both. >> > >> > Cleanliness and performance of the ZC code is not as important as >> > getting API right. The main concern that during ZC review process >> > we will find out that existing API has issues, so we have to >> > do this exercise before the merge window. >> > And RFC won't fly. Send the patches for real. They have to go >> > through the proper code review. The hackers of netdev community >> > can accept a partial, or a bit unclean, or slightly inefficient >> > implementation, since it can be and will be improved later, >> > but API we cannot change once it goes into official release. >> > >> > Here is the example of API concern: >> > this patch set added shared umem concept. It sounds good in theory, >> > but will it perform well with ZC ? Earlier RFCs didn't have that >> > feature. If it won't perform well than it shouldn't be in the tree. >> > The key reason to let AF_XDP into the tree is its performance promise. >> > If it doesn't perform we should rip it out and redesign. >> >> That is a fair point. We will try to produce patch sets for zero-copy >> RX and TX using the latest interfaces within this merge window. Just >> note that we will focus on this for the next week(s) instead of the >> review items that you and Daniel Borkmann submitted. If we get those >> patch sets out in time and we agree that they are a possible way >> forward, then we produce patches with your fixes. It was mainly small >> items, so should be quick. > > I would like to see that you create a new xdp_mem_type for this new > zero-copy type. This will allow other XDP redirect methods/types (e.g. > devmap and cpumap) to react appropriately when receiving a zero-copy > frame. >
Yes, that's the plan! > For devmap, I'm hoping we can allow/support using the ndo_xdp_xmit call > without (first) copying (into a newly allocated page). By arguing that > if an xsk-userspace app modify a frame it's not allowed to, then it is > simply a bug in the program. (Note, this would also allow using > ndo_xdp_xmit call for TX from xsk-userspace). > Makes sense. I think the ZC rational for Rx can indeed be extended for devmap redirects -- i.e. no frame cloning is required. > For cpumap, it is hard to avoid a copy, but I'm hoping we could delay > the copy (and alloc of mem dest area) until on the remote CPU. This is > already the principle of cpumap; of moving the allocation of the SKB to > the remote CPU. > I think for most AF_XDP applications that would like to pass frames to the kernel, the cpumap would be preferred instead of XDP_PASS (moving the stack execution to another off-AF_XDP-thread). > For ZC to interact with XDP redirect-core and return API, the zero-copy > memory type/allocator, need to provide an area for the xdp_frame data > to be stored in (as we cannot allow using top-of-frame like > non-zero-copy variants), and extend xdp_frame with an ZC umem-id. > I imagine we can avoid any dynamic allocations, as we upfront (at bind > and XDP_UMEM_REG time) know the number of frames. (e.g. pre-alloc in > xdp_umem_reg() call, and have xdp_umem_get_xdp_frame lookup func). > Yeah, we can allocate a kernel-side-only xdp_frame for each umem frame. > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer