On 10/23/2021 12:39 PM, Ananyev, Konstantin wrote:
22/10/2021 23:14, Bing Zhao:
In the function "eth_dev_fp_ops_reset", a structure assignment
operation is used to reset one queue's callback functions, etc., but
it is not thread safe.
The structure assignment is not atomic, a lot of instructions will
be generated. Right now, since not all the fields are needed, the
fields in the "dummy_ops" which is not set explicitly will be 0s
based on the specification and compiler behavior. In order to make
"fpo" has the same content with "dummy_ops", some clearing to 0
operation is needed.
By checking the object instructions (e.g. with GCC 4.8.5)
0x0000000000a58317 <+35>: mov %rsi,%rdi
0x0000000000a5831a <+38>: mov %rdx,%rcx
=> 0x0000000000a5831d <+41>: rep stos %rax,%es:(%rdi)
0x0000000000a58320 <+44>: mov -0x38(%rsp),%rax
0x0000000000a58325 <+49>: lea -0xe0(%rip),%rdx
// # 0xa5824c <dummy_eth_rx_burst>
It shows that "rep stos" will clear the "fpo" structure before
assigning new values.
In the other thread, if some data path Tx / Rx functions are still
running, there is a risk to get 0 instead of the correct dummy
content.
1. qd = p->rxq.data[queue_id]
2. (void **)&p->rxq.clbk[queue_id]
"data" and "clbk" may be observed with NULL (0) in other threads.
Even it is temporary, the accessing to a NULL pointer will cause a
crash. Using "memcpy" could get rid of this.
Fixes: c87d435a4d79 ("ethdev: copy fast-path API into separate structure")
Cc: konstantin.anan...@intel.com
Signed-off-by: Bing Zhao <bi...@nvidia.com>
---
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -206,7 +206,7 @@ eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
.txq = {.data = dummy_data, .clbk = dummy_data,},
};
- *fpo = dummy_ops;
+ rte_memcpy(fpo, &dummy_ops, sizeof(struct rte_eth_fp_ops));
That's not trivial.
Please add a comment to briefly explain that memcpy avoids zeroing of a simple
assignment.
I think that patch is based on two totally wrong assumptions:
1) ethdev data-path and control-path API is MT-safe.
With current design it is not.
When calling rx/tx_burst it is caller responsibility to make sure that
given port is
already properly configured and started. Also it is user responsibility to
guarantee
that none other thread doing dev_stop for the same port simultaneously.
And visa-versa when calling dev_stop(), it is user responsibility to
ensure that
none other thread doing rx/tx_burst for given port simultaneously.
If your app doesn't follow these principles, then it is a bug that needs
to be fixed.
2) rte_memcpy() provides some sort of atomicity and it is safe to use it on its
own
in MT environment. That's totally wrong.
In both cases compiler has total freedom to perform copy in any order it
likes
(let say it can first read whole source data in some temporary buffer
(SIMD register),
and then right it in one go, or it can do the same trick with 'rep stos'
as above).
Moreover CPU itself can reorder instructions.
So if you need this copy to be atomic you need to use some sort of
sync primitives along with it (mutex, rwlock, rcu, etc.).
But as I said above right now ethdev API is not MT-safe, so it is not
required.
To summarise - there is no point to mae these changes,
and patch comment is wrong and misleading.
Can we mark this patch as rejected now?
Patch seems trying to cover a wrong application usage, and it should
be addressed in the application level.