> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Saturday, 12 April 2025 19.05
> 
> On Sat, 12 Apr 2025 13:06:55 +0200
> Morten Brørup <m...@smartsharesystems.com> wrote:
> 
> > > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > > Sent: Saturday, 12 April 2025 01.45
> > >
> > > This is a rework of how packet capture is done in DPDK.
> > > The existing mechanism using callbacks has a number of problems;
> > > the main one is that any packets sent/received in a secondary
> process
> > > are not visible. The root cause is that callbacks can not function
> > > across process boundaries, they are specific to the role.
> > >
> > > The new mechanism builds on the concept of port mirroring used
> > > in Linux/FreeBSD and also router vendors (SPAN ports). The
> > > infrastructure
> > > is built around a new ethdev call to mirror a port.
> > >
> > > The internals of dumpcap are redone but the program command
> > > syntax is unchanged (no doc change needed). Usingthe new mirror
> > > mechanism,
> > > the dumpcap program creates a port using ring PMD and
> > > then directs mirror to that. Then the packets are extracted from
> the
> > > ring.
> > > If capturing on multiple devices, they all get mirrored to
> > > the same ring PMD.
> >
> > Here's some general feedback...
> >
> > I very much like the concept of using a shared ring for carrying the
> mirrored packets.
> > It allows other types of future consumers to process the mirrored
> packets, e.g. encapsulating and forwarding them into an L2 or L3
> tunnel, or Wireshark remote capture.
> >
> > Using a Ring PMD instead of setting up a dedicated ring also has some
> advantages, such as the ability to set up multiple separate mirror
> target instances.
> > For performance reasons, we should ensure that a lightweight Ring PMD
> is available for mirroring, in case the Ring PMD is extended with new
> features affecting its performance.
> > Or maybe create a new type of mirror/capture virtual PMD. This would
> allow applications to enqueue packets from non-ethdev interfaces into
> it.
> 
> The new modifications are simple, and do not impact performance of ring
> PMD.
> We don't need another PMD for this.

Agree; so long as the Ring PMD remains lightweight.

The Ring PMD sets the mbuf "port" field on RX, thereby overwriting the port of 
the mirrored packet.
Instead of adding a new mbuf field for the mirrored packet to hold the original 
port, you could add a "transparent" configuration option to the Ring PMD, so it 
doesn’t touch the transported mbuf.
Furthermore, not setting the mbuf "port" field would significantly improve 
performance of the Ring PMD; this is the only mbuf field written by the Ring 
PMD, so a memory write operation per packet is eliminated.

Metadata added to the mirrored packet, e.g. the egress port (for packets 
mirrored at egress) and maybe the queue_id and a timestamp (of the time of 
mirroring), could be stored in a dedicated dynfield or in a prepended header 
(as you already mentioned).
Prepending a header also works if there are no more dynfields available. 
Especially if we add a lot of metadata. I think a prepended header is more 
future proof.

Hmm... if the "port" field is also stored in the metadata, my suggested Ring 
PMD "transparent" option becomes a performance optimization only.

> It did uncover some broken API's in ring PMD, but those can be marked
> deprecated
> and disappear in some future version.

+1

> 
> >
> > I'm not convinced you should undo the VLAN offloads when enqueueing a
> mirror packet... If you do, you should also undo QinQ offloads. Undoing
> offloads is never going to end.
> > And if you create a dedicated PMD type for carrying mirrored packets,
> you can ensure that the offload fields remain intact on dequeue.
> 
> VLAN needs to be unoffloaded before filtering and writing to file.
> Can move that logic to the other end of the ring (i.e in dumpcap).

If the application at the dequeueing end of the ring is not dumpcap, but some 
other application, this other application might prefer getting the mbufs as raw 
as possible.
I think moving the unoffloading logic to the dequeueing end of the ring would 
be better.

BTW, unoffloading should probably be a public DPDK library function, to avoid 
copy-pasting it from dumpcap to other applications.

> 
> The issue with filtering is that current DPDK BPF has no way to
> reference offload
> meta data. There is a way to do that with Linux kernel BPF (via
> negative offsets).
> Looked into doing this in the DPDK BPF, but there are several blockers:
> not all
> the offloads are the same, and more importantly the pcap library to
> build filters
> (pcap_compile) has some internal issues. The pcap library "knows" when
> it is
> build a direct Linux socket filter, versus just class BPF. For example,
> if you call pcap_compile() with "vlan 13", it will generate different
> code
> based on whether it is a Linux filter or not.
> 
> > You should consider sampling and VLAN filtering as typical mirroring
> features.
> > It would improve the performance if such filtering is done before
> copying the packets.
> 
> The problem is that filtering before going into the ring leads to
> creating BPF
> dependency inside ethdev which is a build nightmare. Tried it and was
> not successful.

Filtering before going into the ring is only a performance optimization.
Full BPF would be nice, but apparently not doable.
VLAN filtering (mirroring one interesting VLAN only) and sampling (to reduce 
the traffic volume) is common for port mirroring. Perhaps just offer this 
limited pre-filtering as a performance optimization. Full BPF remains in 
dumpcap.
Anyway, limited pre-filtering can be added later; just keep it in mind for API 
design purposes.

If VLAN pre-filtering is added to the enqueueing end of the ring, it obviously 
needs to parse the mbuf/packet to obtain the VLAN; but I still think it should 
not modify the packet, i.e. unoffloading the VLAN tag still belongs in the 
dequeueing end of the ring.

Another thought (more feature creep)...
It would be "nice to have" the ability to attach multiple mirrors to a port.
E.g. if an application uses the mirror feature for its own purposes, dumpcap 
can still attach its own mirror to ports already being mirrored by the 
application.

> 
> >
> > PS: I agree with your choice of copying (rather than cloning by
> refcount) when mirroring the packets.

Reply via email to