On 1/8/2024 10:41 AM, Morten Brørup wrote: >> From: Stephen Hemminger [mailto:step...@networkplumber.org] >> Sent: Monday, 8 January 2024 02.59 >> >> I have been looking at a problem reported by Sandesh >> where packet capture does not work if rx/tx burst is done in secondary >> process. >> >> The root cause is that existing rx/tx callback model just doesn't work >> unless the process doing the rx/tx burst calls is the same one that >> registered the callbacks. > > So, callbacks don't work across processes, because code might differ across > processes. > > If process A is running, and RX'ing and TX'ing, and process B wants to > install its own callbacks (e.g. packet capture) on RX and RX, we basically > want process A to execute code residing in process B, which is impossible. >
Callbacks stored in "struct rte_eth_dev", so it is per process, which means primary and secondaries has their own copies of callbacks, as Konstantin explained. So, how pdump works :), it uses MP support and shared ring similar to you mentioned below. More detail: - Primary registers a MP handler - pdump secondary process sends a MP message with a ring and mempool in the message - When primary receives the MP message it registers its *own* callbacks that gets 'ring' as parameter - Callbacks clone packets to 'ring', that is how pdump secondary process access to the packets > An alternative could be to pass the packets through a ring in shared memory. > However, this method would add the ring processing latency of process B to > the RX/TX latency of process A. > > I think we can conclude that callbacks are one of the things that don't work > with secondary processes. > > With this decided, we can then consider how to best add packet capture. The > concept of passing "data" (instead of calling functions) across processes > obviously applies to this use case. > >> >> An example sequence would be: >> 1. dumpcap (or pdump) as secondary tells pdump in primary to >> register callback >> 2. secondary process calls rx_burst. >> 3. rx_burst sees the callback but it has pointer pdump_rx which >> is not necessarily >> at same location in primary and secondary process. >> 4. indirect function call in secondary to bad location likely >> causes crash. >> >> Some possible workarounds. >> 1. Keep callback list per-process: messy, but won't crash. >> Capture won't work >> without other changes. In this primary would register >> callback, but secondaries >> would not use them in rx/tx burst. >> >> 2. Replace use of rx/tx callback in pdump with change to >> rte_ethdev to have >> a capture flag. (i.e. don't use indirection). Likely ABI >> problems. >> Basically, ignore the rx/tx callback mechanism. This is my >> preferred >> solution. >> >> 3. Some fix up mechanism (in EAL mp support?) to have each >> process fixup >> its callback mechanism. >> >> 4. Do something in pdump_init to register the callback in same >> process context >> (probably need callbacks to be per-process). Would mean >> callback is always >> on independent of capture being enabled. >> >> 5. Get rid of indirect function call pointer, and replace it by >> index into >> a static table of callback functions. Every process would >> have same code >> (in this case pdump_rx) but at different address. Requires >> all callbacks >> to be statically defined at build time. >> >> The existing rx/tx callback is not safe id rx/tx burst is called from >> different process >> than where callback is registered. >> >