On Mon, Aug 06, 2018 at 03:58:30PM +0200, Mauricio Vasquez B wrote: > Bpf queue implements a LIFO/FIFO data containers for ebpf programs.
queue/stack datastructure would be a great addition. > It allows to push an element to the queue by using the update operation > and to pop an element from the queue by using the lookup operation. > > A use case for this is to keep track of a pool of elements, like > network ports in a SNAT. > > Signed-off-by: Mauricio Vasquez B <mauricio.vasq...@polito.it> > --- > include/linux/bpf_types.h | 1 > include/uapi/linux/bpf.h | 5 + > kernel/bpf/Makefile | 2 > kernel/bpf/queuemap.c | 287 > +++++++++++++++++++++++++++++++++++++++++++++ > kernel/bpf/syscall.c | 61 +++++++--- > kernel/bpf/verifier.c | 16 ++- > 6 files changed, 353 insertions(+), 19 deletions(-) > create mode 100644 kernel/bpf/queuemap.c > > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h > index c5700c2d5549..6c7a62f3fe43 100644 > --- a/include/linux/bpf_types.h > +++ b/include/linux/bpf_types.h > @@ -58,3 +58,4 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) > BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) > #endif > #endif > +BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 0ebaaf7f3568..2c171c40eb45 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -120,6 +120,7 @@ enum bpf_map_type { > BPF_MAP_TYPE_CPUMAP, > BPF_MAP_TYPE_XSKMAP, > BPF_MAP_TYPE_SOCKHASH, > + BPF_MAP_TYPE_QUEUE, > }; > > enum bpf_prog_type { > @@ -255,6 +256,10 @@ enum bpf_attach_type { > /* Flag for stack_map, store build_id+offset instead of pointer */ > #define BPF_F_STACK_BUILD_ID (1U << 5) > > +/* Flags for queue_map, type of queue */ > +#define BPF_F_QUEUE_FIFO (1U << 16) > +#define BPF_F_QUEUE_LIFO (2U << 16) the choice of flags looks odd. Why start at bit 16 and why waste two bits? It's either stack or queue. May be instead define two map_types: BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK that share common implementation? And how about adding three new helpers: push/pop/peek as well? Reusing lookup/update is neat, but does lookup == pop or does lookup == peek ? I suspect it will be confusing. Three new helpers cost nothing, but will make bpf progs easier to read. Could you also add a flag for replacement policy? In this implementation when max_entries limit is reached the map_update_elem (aka push) will fail with e2big. It would be useful to allow pushing and dropping elements from the other side then such queue/stack can be used to keep track of the last N events (the prog would unconditionally push and user space would swap the queue for new one via map-in-map and drain old queue). Speaking of map-in-map, please add a test to make sure queue/stack works with hash/array_of_maps. selftests in patch2 needs to have kernel side as well. it's not enough to test syscall access only.