10/11/2021 15:37, Ananyev, Konstantin:
> 
> Hi Ferruh,
> 
> > >> 22/10/2021 23:14, Bing Zhao:
> > >>> In the function "eth_dev_fp_ops_reset", a structure assignment
> > >>> operation is used to reset one queue's callback functions, etc., but
> > >>> it is not thread safe.
> > >>>
> > >>> The structure assignment is not atomic, a lot of instructions will
> > >>> be generated. Right now, since not all the fields are needed, the
> > >>> fields in the "dummy_ops" which is not set explicitly will be 0s
> > >>> based on the specification and compiler behavior. In order to make
> > >>> "fpo" has the same content with "dummy_ops", some clearing to 0
> > >>> operation is needed.
> > >>>
> > >>> By checking the object instructions (e.g. with GCC 4.8.5)
> > >>>     0x0000000000a58317 <+35>:   mov    %rsi,%rdi
> > >>>     0x0000000000a5831a <+38>:   mov    %rdx,%rcx
> > >>> => 0x0000000000a5831d <+41>:    rep stos %rax,%es:(%rdi)
> > >>>     0x0000000000a58320 <+44>:   mov    -0x38(%rsp),%rax
> > >>>     0x0000000000a58325 <+49>:   lea    -0xe0(%rip),%rdx
> > >>>          // # 0xa5824c <dummy_eth_rx_burst>
> > >>>
> > >>> It shows that "rep stos" will clear the "fpo" structure before
> > >>> assigning new values.
> > >>>
> > >>> In the other thread, if some data path Tx / Rx functions are still
> > >>> running, there is a risk to get 0 instead of the correct dummy
> > >>> content.
> > >>>    1. qd = p->rxq.data[queue_id]
> > >>>    2. (void **)&p->rxq.clbk[queue_id]
> > >>> "data" and "clbk" may be observed with NULL (0) in other threads.
> > >>> Even it is temporary, the accessing to a NULL pointer will cause a
> > >>> crash. Using "memcpy" could get rid of this.
> > >>>
> > >>> Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate 
> > >>> structure")
> > >>> Cc: konstantin.anan...@intel.com
> > >>>
> > >>> Signed-off-by: Bing Zhao <bi...@nvidia.com>
> > >>> ---
> > >>> --- a/lib/ethdev/ethdev_private.c
> > >>> +++ b/lib/ethdev/ethdev_private.c
> > >>> @@ -206,7 +206,7 @@ eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
> > >>>                 .txq = {.data = dummy_data, .clbk = dummy_data,},
> > >>>         };
> > >>>
> > >>> -       *fpo = dummy_ops;
> > >>> +       rte_memcpy(fpo, &dummy_ops, sizeof(struct rte_eth_fp_ops));
> > >>
> > >> That's not trivial.
> > >> Please add a comment to briefly explain that memcpy avoids zeroing of a 
> > >> simple assignment.
> > >>
> > >
> > > I think that patch is based on two totally wrong assumptions:
> > > 1) ethdev data-path and control-path API is MT-safe.
> > >      With current design it is not.
> > >      When calling rx/tx_burst it is caller responsibility to make sure 
> > > that given port is
> > >      already properly configured and started. Also it is user 
> > > responsibility to guarantee
> > >      that none other thread doing dev_stop for the same port 
> > > simultaneously.
> > >      And visa-versa when calling dev_stop(), it is user responsibility to 
> > > ensure that
> > >      none other thread doing rx/tx_burst for given port simultaneously.
> > >      If your app doesn't follow these principles, then it is a bug that 
> > > needs to be fixed.
> > > 2) rte_memcpy() provides some sort of atomicity and it is safe to use it 
> > > on its own
> > >      in MT environment. That's totally wrong.
> > >      In both cases compiler has total freedom to perform copy in any 
> > > order it likes
> > >      (let say it can first read whole source data in some temporary 
> > > buffer (SIMD register),
> > >      and then right it in one go, or it can do the same trick with 'rep 
> > > stos' as above).
> > >      Moreover CPU itself can reorder instructions.
> > >      So if you need this copy to be atomic you need to use some sort of
> > >      sync primitives along with it (mutex, rwlock, rcu, etc.).
> > >      But as I said above right now ethdev API is not MT-safe, so it is 
> > > not required.
> > >
> > > To summarise - there is no point to mae these changes,
> > > and patch comment is wrong and misleading.
> > 
> > Can we mark this patch as rejected now?
> 
> I believe so.
> 
> > Patch seems trying to cover a wrong application usage, and it should
> > be addressed in the application level.

Yes


Reply via email to