On 1/8/2024 10:41 AM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:step...@networkplumber.org]
>> Sent: Monday, 8 January 2024 02.59
>>
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary
>> process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
> 
> So, callbacks don't work across processes, because code might differ across 
> processes.
> 
> If process A is running, and RX'ing and TX'ing, and process B wants to 
> install its own callbacks (e.g. packet capture) on RX and RX, we basically 
> want process A to execute code residing in process B, which is impossible.
> 

Callbacks stored in "struct rte_eth_dev", so it is per process, which
means primary and secondaries has their own copies of callbacks, as
Konstantin explained.

So, how pdump works :), it uses MP support and shared ring similar to
you mentioned below. More detail:
- Primary registers a MP handler
- pdump secondary process sends a MP message with a ring and mempool in
the message
- When primary receives the MP message it registers its *own* callbacks
that gets 'ring' as parameter
- Callbacks clone packets to 'ring', that is how pdump secondary process
access to the packets

> An alternative could be to pass the packets through a ring in shared memory. 
> However, this method would add the ring processing latency of process B to 
> the RX/TX latency of process A.
> 
> I think we can conclude that callbacks are one of the things that don't work 
> with secondary processes.
> 
> With this decided, we can then consider how to best add packet capture. The 
> concept of passing "data" (instead of calling functions) across processes 
> obviously applies to this use case.
> 
>>
>> An example sequence would be:
>>      1. dumpcap (or pdump) as secondary tells pdump in primary to
>> register callback
>>      2. secondary process calls rx_burst.
>>      3. rx_burst sees the callback but it has pointer pdump_rx which
>> is not necessarily
>>         at same location in primary and secondary process.
>>      4. indirect function call in secondary to bad location likely
>> causes crash.
>>
>> Some possible workarounds.
>>      1. Keep callback list per-process: messy, but won't crash.
>> Capture won't work
>>            without other changes. In this primary would register
>> callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>>      2. Replace use of rx/tx callback in pdump with change to
>> rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI
>> problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my
>> preferred
>>         solution.
>>
>>      3. Some fix up mechanism (in EAL mp support?) to have each
>> process fixup
>>            its callback mechanism.
>>
>>      4. Do something in pdump_init to register the callback in same
>> process context
>>         (probably need callbacks to be per-process). Would mean
>> callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by
>> index into
>>            a static table of callback functions. Every process would
>> have same code
>>            (in this case pdump_rx) but at different address.  Requires
>> all callbacks
>>            to be statically defined at build time.
>>
>> The existing rx/tx callback is not safe id rx/tx burst is called from
>> different process
>> than where callback is registered.
>>
> 

Reply via email to