> 22/10/2021 23:14, Bing Zhao:
> > In the function "eth_dev_fp_ops_reset", a structure assignment
> > operation is used to reset one queue's callback functions, etc., but
> > it is not thread safe.
> >
> > The structure assignment is not atomic, a lot of instructions will
> > be generated. Right now, since not all the fields are needed, the
> > fields in the "dummy_ops" which is not set explicitly will be 0s
> > based on the specification and compiler behavior. In order to make
> > "fpo" has the same content with "dummy_ops", some clearing to 0
> > operation is needed.
> >
> > By checking the object instructions (e.g. with GCC 4.8.5)
> > 0x0000000000a58317 <+35>: mov %rsi,%rdi
> > 0x0000000000a5831a <+38>: mov %rdx,%rcx
> > => 0x0000000000a5831d <+41>: rep stos %rax,%es:(%rdi)
> > 0x0000000000a58320 <+44>: mov -0x38(%rsp),%rax
> > 0x0000000000a58325 <+49>: lea -0xe0(%rip),%rdx
> > // # 0xa5824c <dummy_eth_rx_burst>
> >
> > It shows that "rep stos" will clear the "fpo" structure before
> > assigning new values.
> >
> > In the other thread, if some data path Tx / Rx functions are still
> > running, there is a risk to get 0 instead of the correct dummy
> > content.
> > 1. qd = p->rxq.data[queue_id]
> > 2. (void **)&p->rxq.clbk[queue_id]
> > "data" and "clbk" may be observed with NULL (0) in other threads.
> > Even it is temporary, the accessing to a NULL pointer will cause a
> > crash. Using "memcpy" could get rid of this.
> >
> > Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")
> > Cc: konstantin.anan...@intel.com
> >
> > Signed-off-by: Bing Zhao <bi...@nvidia.com>
> > ---
> > --- a/lib/ethdev/ethdev_private.c
> > +++ b/lib/ethdev/ethdev_private.c
> > @@ -206,7 +206,7 @@ eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
> > .txq = {.data = dummy_data, .clbk = dummy_data,},
> > };
> >
> > - *fpo = dummy_ops;
> > + rte_memcpy(fpo, &dummy_ops, sizeof(struct rte_eth_fp_ops));
>
> That's not trivial.
> Please add a comment to briefly explain that memcpy avoids zeroing of a
> simple assignment.
>
I think that patch is based on two totally wrong assumptions:
1) ethdev data-path and control-path API is MT-safe.
With current design it is not.
When calling rx/tx_burst it is caller responsibility to make sure that
given port is
already properly configured and started. Also it is user responsibility to
guarantee
that none other thread doing dev_stop for the same port simultaneously.
And visa-versa when calling dev_stop(), it is user responsibility to ensure
that
none other thread doing rx/tx_burst for given port simultaneously.
If your app doesn't follow these principles, then it is a bug that needs to
be fixed.
2) rte_memcpy() provides some sort of atomicity and it is safe to use it on its
own
in MT environment. That's totally wrong.
In both cases compiler has total freedom to perform copy in any order it
likes
(let say it can first read whole source data in some temporary buffer (SIMD
register),
and then right it in one go, or it can do the same trick with 'rep stos' as
above).
Moreover CPU itself can reorder instructions.
So if you need this copy to be atomic you need to use some sort of
sync primitives along with it (mutex, rwlock, rcu, etc.).
But as I said above right now ethdev API is not MT-safe, so it is not
required.
To summarise - there is no point to mae these changes,
and patch comment is wrong and misleading.
Konstantin