> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Saturday, 12 April 2025 19.05 > > On Sat, 12 Apr 2025 13:06:55 +0200 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > > Sent: Saturday, 12 April 2025 01.45 > > > > > > This is a rework of how packet capture is done in DPDK. > > > The existing mechanism using callbacks has a number of problems; > > > the main one is that any packets sent/received in a secondary > process > > > are not visible. The root cause is that callbacks can not function > > > across process boundaries, they are specific to the role. > > > > > > The new mechanism builds on the concept of port mirroring used > > > in Linux/FreeBSD and also router vendors (SPAN ports). The > > > infrastructure > > > is built around a new ethdev call to mirror a port. > > > > > > The internals of dumpcap are redone but the program command > > > syntax is unchanged (no doc change needed). Usingthe new mirror > > > mechanism, > > > the dumpcap program creates a port using ring PMD and > > > then directs mirror to that. Then the packets are extracted from > the > > > ring. > > > If capturing on multiple devices, they all get mirrored to > > > the same ring PMD. > > > > Here's some general feedback... > > > > I very much like the concept of using a shared ring for carrying the > mirrored packets. > > It allows other types of future consumers to process the mirrored > packets, e.g. encapsulating and forwarding them into an L2 or L3 > tunnel, or Wireshark remote capture. > > > > Using a Ring PMD instead of setting up a dedicated ring also has some > advantages, such as the ability to set up multiple separate mirror > target instances. > > For performance reasons, we should ensure that a lightweight Ring PMD > is available for mirroring, in case the Ring PMD is extended with new > features affecting its performance. > > Or maybe create a new type of mirror/capture virtual PMD. This would > allow applications to enqueue packets from non-ethdev interfaces into > it. > > The new modifications are simple, and do not impact performance of ring > PMD. > We don't need another PMD for this.
Agree; so long as the Ring PMD remains lightweight. The Ring PMD sets the mbuf "port" field on RX, thereby overwriting the port of the mirrored packet. Instead of adding a new mbuf field for the mirrored packet to hold the original port, you could add a "transparent" configuration option to the Ring PMD, so it doesn’t touch the transported mbuf. Furthermore, not setting the mbuf "port" field would significantly improve performance of the Ring PMD; this is the only mbuf field written by the Ring PMD, so a memory write operation per packet is eliminated. Metadata added to the mirrored packet, e.g. the egress port (for packets mirrored at egress) and maybe the queue_id and a timestamp (of the time of mirroring), could be stored in a dedicated dynfield or in a prepended header (as you already mentioned). Prepending a header also works if there are no more dynfields available. Especially if we add a lot of metadata. I think a prepended header is more future proof. Hmm... if the "port" field is also stored in the metadata, my suggested Ring PMD "transparent" option becomes a performance optimization only. > It did uncover some broken API's in ring PMD, but those can be marked > deprecated > and disappear in some future version. +1 > > > > > I'm not convinced you should undo the VLAN offloads when enqueueing a > mirror packet... If you do, you should also undo QinQ offloads. Undoing > offloads is never going to end. > > And if you create a dedicated PMD type for carrying mirrored packets, > you can ensure that the offload fields remain intact on dequeue. > > VLAN needs to be unoffloaded before filtering and writing to file. > Can move that logic to the other end of the ring (i.e in dumpcap). If the application at the dequeueing end of the ring is not dumpcap, but some other application, this other application might prefer getting the mbufs as raw as possible. I think moving the unoffloading logic to the dequeueing end of the ring would be better. BTW, unoffloading should probably be a public DPDK library function, to avoid copy-pasting it from dumpcap to other applications. > > The issue with filtering is that current DPDK BPF has no way to > reference offload > meta data. There is a way to do that with Linux kernel BPF (via > negative offsets). > Looked into doing this in the DPDK BPF, but there are several blockers: > not all > the offloads are the same, and more importantly the pcap library to > build filters > (pcap_compile) has some internal issues. The pcap library "knows" when > it is > build a direct Linux socket filter, versus just class BPF. For example, > if you call pcap_compile() with "vlan 13", it will generate different > code > based on whether it is a Linux filter or not. > > > You should consider sampling and VLAN filtering as typical mirroring > features. > > It would improve the performance if such filtering is done before > copying the packets. > > The problem is that filtering before going into the ring leads to > creating BPF > dependency inside ethdev which is a build nightmare. Tried it and was > not successful. Filtering before going into the ring is only a performance optimization. Full BPF would be nice, but apparently not doable. VLAN filtering (mirroring one interesting VLAN only) and sampling (to reduce the traffic volume) is common for port mirroring. Perhaps just offer this limited pre-filtering as a performance optimization. Full BPF remains in dumpcap. Anyway, limited pre-filtering can be added later; just keep it in mind for API design purposes. If VLAN pre-filtering is added to the enqueueing end of the ring, it obviously needs to parse the mbuf/packet to obtain the VLAN; but I still think it should not modify the packet, i.e. unoffloading the VLAN tag still belongs in the dequeueing end of the ring. Another thought (more feature creep)... It would be "nice to have" the ability to attach multiple mirrors to a port. E.g. if an application uses the mirror feature for its own purposes, dumpcap can still attach its own mirror to ports already being mirrored by the application. > > > > > PS: I agree with your choice of copying (rather than cloning by > refcount) when mirroring the packets.