2018-02-07 16:54 GMT+01:00 Willem de Bruijn <willemdebruijn.ker...@gmail.com>: >> We realized, a bit late maybe, that 24 patches is a bit mouthful, so >> let me try to make it more palatable. > > Overall, this approach looks great to me. >
Yay! :-) > The patch set incorporates all the feedback from AF_PACKET V4. > At this point I don't have additional high-level interface comments. > I have a thought on the socket API. Now, we're registering buffer memory *to* the kernel, but mmap:ing the Rx/Tx rings *from* the kernel. I'm leaning towards removing the mmap call, in favor of registering the rings to kernel analogous to the XDP_MEM_REG socket option. We wont guarantee physical contiguous memory for the rings, but I think we can live with that. Thoughts? > As you point out, 24 patches and nearly 6000 changed lines is > quite a bit to ingest. Splitting up in smaller patch sets will help > give more detailed implementation feedback. > > The frame pool and device driver changes are largely independent > from AF_XDP and probably should be resolved first (esp. the > observed regresssion even without AF_XDP). > Yeah, the regression is unacceptable. Another way is starting with the patches without zero-copy first (i.e. the copy path), and later add the driver modifications. That would be the first 7 patches. > As you suggest, it would be great if the need for a separate > xsk_packet_array data structure can be avoided. > Yes, we'll address that! > Since frames from the same frame pool can be forwarded between > multiple device ports and thus AF_XDP sockets, that should perhaps > be a separate object independent from the sockets. This comment > hints at the awkward situation if tied to a descriptor pair: > >> + /* Check if umem is from this socket, if so do not make >> + * circular references. >> + */ > > Since this is in principle just a large shared memory area, could > it reuse existing BPF map logic? > Hmm, care to elaborate on your thinking here? > More extreme, and perhaps unrealistic, is if the descriptor ring > could similarly be a BPF map and the Rx XDP program directly > writes the descriptor, instead of triggering xdp_do_xsk_redirect. > As we discussed before, this would avoid the need to specify a > descriptor format upfront. Having the XDP program writeback the descriptor to user space ring is really something that would be useful (writing a virtio-net descriptors...). I need to think a bit more about this. :-) Please share your ideas! Thanks for looking into the patches! Björn