2018-05-18 18:17 GMT+02:00 Daniel Borkmann <dan...@iogearbox.net>: > On 05/18/2018 05:18 PM, Björn Töpel wrote: >> 2018-05-18 15:43 GMT+02:00 Daniel Borkmann <dan...@iogearbox.net>: >>> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote: >>>> On 5/16/18 11:46 PM, Björn Töpel wrote: >>>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov >>>>> <alexei.starovoi...@gmail.com>: >>>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote: >>>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote: >>>>>>>> From: Björn Töpel <bjorn.to...@intel.com> >>>>>>>> >>>>>>>> This patch set introduces a new address family called AF_XDP that is >>>>>>>> optimized for high performance packet processing and, in upcoming >>>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed >>>>>>>> all zero-copy related code in order to make it smaller, simpler and >>>>>>>> hopefully more review friendly. This patch set only supports copy-mode >>>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode >>>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and >>>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his >>>>>>>> work has already been accepted. We will publish our zero-copy support >>>>>>>> for RX and TX on top of his patch sets at a later point in time. >>>>>>> >>>>>>> +1, would be great to see it land this cycle. Saw few minor nits here >>>>>>> and there but nothing to hold it up, for the series: >>>>>>> >>>>>>> Acked-by: Daniel Borkmann <dan...@iogearbox.net> >>>>>>> >>>>>>> Thanks everyone! >>>>>> >>>>>> Great stuff! >>>>>> >>>>>> Applied to bpf-next, with one condition. >>>>>> Upcoming zero-copy patches for both RX and TX need to be posted >>>>>> and reviewed within this release window. >>>>>> If netdev community as a whole won't be able to agree on the zero-copy >>>>>> bits we'd need to revert this feature before the next merge window. >>>>>> >>>>>> Few other minor nits: >>>>>> patch 3: >>>>>> +struct xdp_ring { >>>>>> + __u32 producer __attribute__((aligned(64))); >>>>>> + __u32 consumer __attribute__((aligned(64))); >>>>>> +}; >>>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi >>>>>> headers. >>>>> >>>>> Hmm, I need some guidance on what a sane uapi variant would be. We >>>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have >>>>> both 64B and 128B according to the specs. Contemporary IA processors >>>>> have 64B. >>>>> >>>>> The simplest, and maybe most future-proof, would be 128B aligned for >>>>> all. Another is having 128B for ARM and 64B for all IA. A third option >>>>> is having a hand-shaking API (I think virtio has that) for determine >>>>> the cache line size, but I'd rather not go down that route. >>>>> >>>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version >>>>> would look like? >>>> >>>> I suspect i40e+arm combination wasn't tested anyway. >>>> The api may have endianness issues too on something like sparc. >>>> I think the way to be backwards compatible in this area >>>> is to make the api usable on x86 only by adding >>>> to include/uapi/linux/if_xdp.h >>>> #if defined(__x86_64__) >>>> #define AF_XDP_CACHE_BYTES 64 >>>> #else >>>> #error "AF_XDP support is not yet available for this architecture" >>>> #endif >>>> and doing: >>>> __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES))); >>>> __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES))); >>>> >>>> And progressively add to this for arm64 and few other archs. >>>> Eventually removing #error and adding some generic define >>>> that's good enough for long tail of architectures that >>>> we really cannot test. >>> >>> Been looking into this yesterday as well a bit, and it's a bit of a mess >>> what >>> uapi headers do on this regard (though there are just a handful of such >>> headers). >>> Some of the kernel uapi headers hard-code generally 64 bytes regardless of >>> the >>> underlying arch. In general, the kernel does expose it to user space via >>> sysfs >>> (coherency_line_size). Here's what perf does to retrieve it: >>> >>> #ifdef _SC_LEVEL1_DCACHE_LINESIZE >>> #define cache_line_size(cacheline_sizep) *cacheline_sizep = >>> sysconf(_SC_LEVEL1_DCACHE_LINESIZE) >>> #else >>> static void cache_line_size(int *cacheline_sizep) >>> { >>> if >>> (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", >>> cacheline_sizep)) >>> pr_debug("cannot determine cache line size"); >>> } >>> #endif >>> >>> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only >>> available for x86, arm64, s390 and ppc on a cursory glance in the glibc >>> code. >>> In the x86 case it retrieves the info from cpuid insn. In order to >>> generically >>> use it in combination with the header you'd have some probe which would then >>> set this as a define before including the header. >> >> But as a uapi we cannot depend on the L1 cache line size for what's >> currently running on the system, right? So, either a "one cache line >> size for all flavors of an arch", e.g. for ARMv8 that would be 128, >> even though there can be 64 flavors out there. >> >> Another way would be to remove the ring structure completely, and >> leave that to the user-space application to figure out. So there's a >> runtime interface (getsockopt) to probe the offsets the head and tail >> pointer is after/before the mmap call, and only expose the descriptor >> format explicitly in if_xdp.h. Don't know if that is too unorthodox or >> not... > > Good point, I think that may not be too unreasonable thing to do, imho, > at least this doesn't get us into the issue discussed here in the first > place, and it would work for other archs more seamlessly rather than ugly > build error or single per arch 'catch-all' define. >
I'll try that route (runtime-based offset to prod/cons), and see where it ends up! Enjoy the weekend, Björn > Thanks, > Daniel