RE: [RFC PATCH 00/14] Introducing AF_PACKET V4 support

Karlsson, Magnus Fri, 03 Nov 2017 03:14:52 -0700


> -----Original Message-----
> From: Willem de Bruijn [mailto:willemdebruijn.ker...@gmail.com]
> Sent: Friday, November 3, 2017 5:35 AM
> To: Björn Töpel <bjorn.to...@gmail.com>
> Cc: Karlsson, Magnus <magnus.karls...@intel.com>; Duyck, Alexander H
> <alexander.h.du...@intel.com>; Alexander Duyck
> <alexander.du...@gmail.com>; John Fastabend
> <john.fastab...@gmail.com>; Alexei Starovoitov <a...@fb.com>; Jesper
> Dangaard Brouer <bro...@redhat.com>; michael.lundkv...@ericsson.com;
> ravineet.si...@ericsson.com; Daniel Borkmann <dan...@iogearbox.net>;
> Network Development <netdev@vger.kernel.org>; Topel, Bjorn
> <bjorn.to...@intel.com>; Brandeburg, Jesse
> <jesse.brandeb...@intel.com>; Singhai, Anjali <anjali.sing...@intel.com>;
> Rosen, Rami <rami.ro...@intel.com>; Shaw, Jeffrey B
> <jeffrey.b.s...@intel.com>; Yigit, Ferruh <ferruh.yi...@intel.com>; Zhang,
> Qi Z <qi.z.zh...@intel.com>
> Subject: Re: [RFC PATCH 00/14] Introducing AF_PACKET V4 support
> 
> On Tue, Oct 31, 2017 at 9:41 PM, Björn Töpel <bjorn.to...@gmail.com>
> wrote:
> > From: Björn Töpel <bjorn.to...@intel.com>
> >
> > This RFC introduces AF_PACKET_V4 and PACKET_ZEROCOPY that are
> > optimized for high performance packet processing and zero-copy
> > semantics. Throughput improvements can be up to 40x compared to V2
> and
> > V3 for the micro benchmarks included. Would be great to get your
> > feedback on it.
> >
> > The main difference between V4 and V2/V3 is that TX and RX descriptors
> > are separated from packet buffers.
> 
> Cool feature. I'm looking forward to the netdev talk. Aside from the inline
> comments in the patches, a few architecture questions.


Glad to hear. Are you going to Netdev in Seoul? If so, let us hook up
and discuss your comments in further detail. Some initial thoughts
below.

> Is TX support needed? Existing PACKET_TX_RING already sends out packets
> without copying directly from the tx_ring. Indirection through a descriptor
> ring is not helpful on TX if all packets still have to come from a 
> pre-registered
> packet pool. The patch set adds a lot of tx-only code and is complex enough
> without it.

That is correct, but what if the packet you are going to transmit came
in from the receive path and is already in the packet buffer? This
might happen if the application is examining/sniffing packets then
sending them out, or doing some modification to them. In that case we
avoid a copy in V4 since the packet is already in the packet
buffer. With V2 and V3, a copy from the RX ring to the TX ring would
be needed. In the PACKET_ZEROCOPY case, avoiding this copy increases
performance quite a lot.

> Can you use the existing PACKET_V2 format for the packet pool? The
> v4 format is nearly the same as V2. Using the same version might avoid some
> code duplication and simplify upgrading existing legacy code.
> Instead of continuing to add new versions whose behavior is implicit,
> perhaps we can add explicit mode PACKET_INDIRECT to PACKET_V2.

Interesting idea that I think is worth thinking more about. One
problem though with the V2 ring format model, and the current V4
format too by the way, when applied to a user-space allocated memory
is that they are symmetric, i.e. that user space and kernel space have
to produce and consume the same amount of entries (within the length
of the descriptor area). User space sends down a buffer entry that the
kernel fills in for RX for example. Symmetric queues do not work when
you have a shared packet buffer between two processes. (This is not a
requirement, but someone might do a mmap with MAP_SHARED for the
packet buffer and then fork of a child that then inherits this packet
buffer.) One of the processes might just receive packets, while the
other one is transmitting. Or if you have a veth link pair between two
processes and they have been set up to share packet buffer area. With
a symmetric queue you have to copy even if they share the same packet
buffer, but with an asymmetric queue, you do not and the driver only
needs to copy the packet buffer id between the TX desc ring of the
sender to the RX desc ring of the receiver, not the data. I think this
gives an indication that we need a new structure. Anyway, I like your
idea and I think it is worth thinking more about it. Let us have a
discussion about this at Netdev, if you are there.

> Finally, is it necessary to define a new descriptor ring format? Same for the
> packet array and frame set. The kernel already has a few, such as virtio for
> the first, skb_array/ptr_ring, even linux list for the second. These 
> containers
> add a lot of new boilerplate code. If new formats are absolutely necessary, at
> least we should consider making them generic (like skb_array and ptr_ring).
> But I'd like to understand first why, e.g., virtio cannot be used.

Agree with you. Good if we can use something existing. The descriptor
format of V4 was based on one of the first Virtio 1.1 proposal by
Michael Tsirkin (tools/virtio/ringtest/ring.c). Then we have diverged
somewhat due to performance reasons and Virtio 1.1 has done the same
but in another direction. We should take a look at the latest Virtio
1.1 proposal again and see what it offers. The reason we did not go
with Virtio 0.9 was for performance. Too many indirections, something
that the people behind Virtio 1.1 had identified too. With ptr_ring,
how do we deal with the pointers in the structure as this now has to
go to user-space? In any way, we would like to have a ring structure
that is asymmetric for the reasons above. Other than that, we would
not mind using anything as long as it is fast. If it already exists,
perfect.

/Magnus

RE: [RFC PATCH 00/14] Introducing AF_PACKET V4 support

Reply via email to