> From: Phil Yang <phil.y...@arm.com> > Sent: Tuesday, March 17, 2020 1:18 AM > To: tho...@monjalon.net; Van Haaren, Harry <harry.van.haa...@intel.com>; > Ananyev, Konstantin <konstantin.anan...@intel.com>; > step...@networkplumber.org; maxime.coque...@redhat.com; dev@dpdk.org > Cc: david.march...@redhat.com; jer...@marvell.com; hemant.agra...@nxp.com; > honnappa.nagaraha...@arm.com; gavin...@arm.com; ruifeng.w...@arm.com; > joyce.k...@arm.com; n...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org > Subject: [PATCH v3 09/12] service: avoid race condition for MT unsafe service > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > There has possible that a MT unsafe service might get configured to > run on another core while the service is running currently. This > might result in the MT unsafe service running on multiple cores > simultaneously. Use 'execute_lock' always when the service is > MT unsafe. > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > Cc: sta...@dpdk.org > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Reviewed-by: Phil Yang <phil.y...@arm.com> > Reviewed-by: Gavin Hu <gavin...@arm.com>
We should put "fix" in the title, once converged on an implementation. Regarding Fixes and stable backport, we should consider if fixing this in stable with a performance degradation, fixing with more complex solution, or documenting a known issue a better solution. This fix (always taking the atomic lock) will have a negative performance impact on existing code using services. We should investigate a way to fix it without causing datapath performance degradation. I think there is a way to achieve this by moving more checks/time to the control path (lcore updating the map), and not forcing the datapath lcore to always take an atomic. In this particular case, we have a counter for number of iterations that a service has done. If this increments we know that the lcore running the service has re-entered the critical section, so would see an updated "needs atomic" flag. This approach may introduce a predictable branch on the datapath, however the cost of a predictable branch vs always taking an atomic is order(s?) of magnitude, so a branch is much preferred. It must be possible to avoid the datapath overhead using a scheme like this. It will likely be more complex than your proposed change below, however if it avoids datapath performance drops I feel that a more complex solution is worth investigating at least. A unit test is required to validate a fix like this - although perhaps found by inspection/review, a real-world test to validate would give confidence. Thoughts on such an approach? > --- > lib/librte_eal/common/rte_service.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_eal/common/rte_service.c > b/lib/librte_eal/common/rte_service.c > index 557b5a9..32a2f8a 100644 > --- a/lib/librte_eal/common/rte_service.c > +++ b/lib/librte_eal/common/rte_service.c > @@ -50,6 +50,10 @@ struct rte_service_spec_impl { > uint8_t internal_flags; > > /* per service statistics */ > + /* Indicates how many cores the service is mapped to run on. > + * It does not indicate the number of cores the service is running > + * on currently. > + */ > rte_atomic32_t num_mapped_cores; > uint64_t calls; > uint64_t cycles_spent; > @@ -370,12 +374,7 @@ service_run(uint32_t i, struct core_state *cs, uint64_t > service_mask, > > cs->service_active_on_lcore[i] = 1; > > - /* check do we need cmpset, if MT safe or <= 1 core > - * mapped, atomic ops are not required. > - */ > - const int use_atomics = (service_mt_safe(s) == 0) && > - (rte_atomic32_read(&s->num_mapped_cores) > 1); > - if (use_atomics) { > + if (service_mt_safe(s) == 0) { > if (!rte_atomic32_cmpset((uint32_t *)&s->execute_lock, 0, 1)) > return -EBUSY; > > -- > 2.7.4