W dniu 20.07.2020 o 14:09, Harry van Haaren pisze: > This commit improves the service_lcore_stop() implementation, > waiting for the service core in question to return. The service > thread itself now has a variable to indicate if its thread is > active. When zero the service thread has completed its service, > and has returned from the service_runner_func() function. > > This fixes a race condition observed in the DPDK CI, where the > statistics of the service were not consistent with the expectation > due to the service thread still running, and incrementing a stat > after stop was called. > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > --- > > This is one possible solution, that avoids a class of race-conditions > based on stop() api and following behaviours. Without a change in > implementation of the service core thread, we could not detect when > the thread was actually finished. This is now possible, and the stop > api makes use of it to wait for 1000x one millisecond, or log a warning > that a service core didn't return quickly. > > Thanks for the discussion/debug on list - I'm not sure how to add > reported-by/suggested-by etc tags: but I'll resend a V2 (or can add > on apply). > > --- > lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/lib/librte_eal/common/rte_service.c > b/lib/librte_eal/common/rte_service.c > index 6a0e0ff65..d2255587d 100644 > --- a/lib/librte_eal/common/rte_service.c > +++ b/lib/librte_eal/common/rte_service.c > @@ -65,6 +65,7 @@ struct core_state { > /* map of services IDs are run on this core */ > uint64_t service_mask; > uint8_t runstate; /* running or stopped */ > + uint8_t thread_active; /* indicates when the thread is in service_run() > */ > uint8_t is_service_core; /* set if core is currently a service core */ > uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX]; > uint64_t loops; > @@ -457,6 +458,8 @@ service_runner_func(void *arg) > const int lcore = rte_lcore_id(); > struct core_state *cs = &lcore_states[lcore]; > > + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED); > + > /* runstate act as the guard variable. Use load-acquire > * memory order here to synchronize with store-release > * in runstate update functions. > @@ -475,6 +478,7 @@ service_runner_func(void *arg) > cs->loops++; > } > > + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED); > return 0; > } > > @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore) > __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED, > __ATOMIC_RELEASE); > > + /* wait for service lcore to return */ > + i = 0; > + uint8_t active; > + uint64_t start = rte_rdtsc(); > + do { > + active = __atomic_load_n(&lcore_states[lcore].thread_active, > + __ATOMIC_RELAXED); > + if (active == 0) > + break; > + rte_delay_ms(1); > + i++; > + } while (i < 1000); > + > + if (active != 0) { > + uint64_t end = rte_rdtsc(); > + RTE_LOG(WARNING, EAL, > + "service lcore stop() failed, waited for %ld cycles\n", > + end - start); > + } > + > return 0; > } > I don't like the idea of inserting this polling loop inside API call. And I don't like setting up a 1000 iterations constraint. How about keeping the thread_active flag, but moving checking state of this flag to separate function. This way the user of the API would be able to write own loop. Maybe he/she would like a custom loop, because: * waiting for more cores * would like to wait longer * would like to check if service is finished less often...
-- Lukasz Wojciechowski Principal Software Engineer Samsung R&D Institute Poland Samsung Electronics Office +48 22 377 88 25 l.wojciec...@partner.samsung.com