2017-11-03 3:29 GMT+01:00 Willem de Bruijn <willemdebruijn.ker...@gmail.com>: >>>> +/* >>>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMREG >>>> + * to register user memory which should be used to store the packet >>>> + * data. >>>> + * >>>> + * There are some constraints for the memory being registered: >>>> + * - The memory area has to be memory page size aligned. >>>> + * - The frame size has to be a power of 2. >>>> + * - The frame size cannot be smaller than 2048B. >>>> + * - The frame size cannot be larger than the memory page size. >>>> + * >>>> + * Corollary: The number of frames that can be stored is >>>> + * len / frame_size. >>>> + * >>>> + */ >>>> +struct tpacket_memreg_req { >>>> + unsigned long addr; /* Start of packet data area */ >>>> + unsigned long len; /* Length of packet data area */ >>>> + unsigned int frame_size; /* Frame size */ >>>> + unsigned int data_headroom; /* Frame head room */ >>>> +}; >>> >>> Existing packet sockets take a tpacket_req, allocate memory and let the >>> user process mmap this. I understand that TPACKET_V4 distinguishes >>> the descriptor from packet pools, but could both use the existing structs >>> and logic (packet_mmap)? That would avoid introducing a lot of new code >>> just for granting user pages to the kernel. >>> >> >> We could certainly pass the "tpacket_memreg_req" fields as part of >> descriptor ring setup ("tpacket_req4"), but we went with having the >> memory register as a new separate setsockopt. Having it separated, >> makes it easier to compare regions at the kernel side of things. "Is >> this the same umem as another one?" If we go the path of passing the >> range at descriptor ring setup, we need to handle all kind of >> overlapping ranges to determine when a copy is needed or not, in those >> cases where the packet buffer (i.e. umem) is shared between processes. > > That's not what I meant. Both descriptor rings and packet pools are > memory regions. Packet sockets already have logic to allocate regions > and make them available to userspace with mmap(). Packet v4 reuses > that logic for its descriptor rings. Can it use the same for its packet > pool? Why does the kernel map user memory, instead? That is a lot of > non-trivial new logic.
Ah, got it. So, why do we register packet pool memory, instead of allocating in the kernel and mapping *that* memory. Actually, we started out with that approach, where the packet_mmap call mapped Tx/Rx descriptor rings and the packet buffer region. We later moved to this (register umem) approach, because it's more flexible for user space, not having to use a AF_PACKET specific allocator (i.e. continue to use regular mallocs, huge pages and such). I agree that the memory register code is adding a lot of new logic, but I believe it's worth the flexibility for user space. I'm looking into if I can share the memory register logic from Infiniband/verbs subsystem (drivers/infiniband/core/umem.c). Björn