On Mon, 8 Jan 2024 15:13:25 +0000 Konstantin Ananyev <konstantin.anan...@huawei.com> wrote:
> > I have been looking at a problem reported by Sandesh > > where packet capture does not work if rx/tx burst is done in secondary > > process. > > > > The root cause is that existing rx/tx callback model just doesn't work > > unless the process doing the rx/tx burst calls is the same one that > > registered the callbacks. > > > > An example sequence would be: > > 1. dumpcap (or pdump) as secondary tells pdump in primary to register > > callback > > 2. secondary process calls rx_burst. > > 3. rx_burst sees the callback but it has pointer pdump_rx which is not > > necessarily > > at same location in primary and secondary process. > > 4. indirect function call in secondary to bad location likely causes > > crash. > > As I remember, RX/TX callbacks were never intended to work over multiple > processes. > Right now RX/TX callbacks are private for the process, different process > simply should not > see/execute them. > I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the > rte_eth_dev.data that is shared > between processes. > It should be normal, wehn for the same port/queue you will end-up with > different list of callbacks > for different processes. > So, unless I am missing something, I don't see how we can end-up with 3) and > 4) from above: > From my understanding secondary process will never see/call primary's > callbacks. > > About pdump itself, it was a while when I looked at it last time, but as I > remember to start it to work, > server process has to call rte_pdump_init() which in terns register PDUMP_MP > handler. > I suppose for the secondary process to act as a 'pdump server' it needs to > call rte_pdump_init() itself, > though I am not sure such option is supported right now. > > > > > Some possible workarounds. > > 1. Keep callback list per-process: messy, but won't crash. Capture > > won't work > > without other changes. In this primary would register callback, > > but secondaries > > would not use them in rx/tx burst. > > > > 2. Replace use of rx/tx callback in pdump with change to rte_ethdev to > > have > > a capture flag. (i.e. don't use indirection). Likely ABI > > problems. > > Basically, ignore the rx/tx callback mechanism. This is my > > preferred > > solution. > > It is not only the capture flag, it is also what to do with the captured > packets > (copy? If yes, then where to? examine? drop?, do something else?). > It is probably not the best choice to add all these things into ethdev API. > > > 3. Some fix up mechanism (in EAL mp support?) to have each process fixup > > its callback mechanism. > > Probably the easiest way to fix that - pass to rte_pdump_enable() extra > information > that would allow it to distinguish on what exact process (local, remote) > we want to enable pdump functionality. Then it could act accordingly. > > > > > 4. Do something in pdump_init to register the callback in same process > > context > > (probably need callbacks to be per-process). Would mean callback is > > always > > on independent of capture being enabled. > > > > 5. Get rid of indirect function call pointer, and replace it by > > index into > > a static table of callback functions. Every process would have > > same code > > (in this case pdump_rx) but at different address. Requires all > > callbacks > > to be statically defined at build time. > > Doesn't look like a good approach - it will break many things. > > > The existing rx/tx callback is not safe id rx/tx burst is called from > > different process > > than where callback is registered. > > Have been looking into best way to fix this, and the real answer is not to use callbacks but instead use a flag per-queue. The natural place to put these in rte_ethdev_driver. BUT this will mean an ABI breakage, so will have to wait for 24.11 release. Sometimes fixing a design flaw means an ABI change.