> -----Original Message----- > From: Lukasz Wojciechowski <l.wojciec...@partner.samsung.com> > Sent: Monday, July 20, 2020 1:52 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; dev@dpdk.org > Cc: david.march...@redhat.com; igor.roma...@oktetlabs.ru; > honnappa.nagaraha...@arm.com; Yigit, Ferruh <ferruh.yi...@intel.com>; > n...@arm.com; acon...@redhat.com > Subject: Re: [PATCH] service: fix stop API to wait for service thread > > > W dniu 20.07.2020 o 14:09, Harry van Haaren pisze: > > This commit improves the service_lcore_stop() implementation, > > waiting for the service core in question to return. The service > > thread itself now has a variable to indicate if its thread is > > active. When zero the service thread has completed its service, > > and has returned from the service_runner_func() function. > > > > This fixes a race condition observed in the DPDK CI, where the > > statistics of the service were not consistent with the expectation > > due to the service thread still running, and incrementing a stat > > after stop was called. > > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > --- > > > > This is one possible solution, that avoids a class of race-conditions > > based on stop() api and following behaviours. Without a change in > > implementation of the service core thread, we could not detect when > > the thread was actually finished. This is now possible, and the stop > > api makes use of it to wait for 1000x one millisecond, or log a warning > > that a service core didn't return quickly. > > > > Thanks for the discussion/debug on list - I'm not sure how to add > > reported-by/suggested-by etc tags: but I'll resend a V2 (or can add > > on apply). > > > > --- > > lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/lib/librte_eal/common/rte_service.c > b/lib/librte_eal/common/rte_service.c > > index 6a0e0ff65..d2255587d 100644 > > --- a/lib/librte_eal/common/rte_service.c > > +++ b/lib/librte_eal/common/rte_service.c > > @@ -65,6 +65,7 @@ struct core_state { > > /* map of services IDs are run on this core */ > > uint64_t service_mask; > > uint8_t runstate; /* running or stopped */ > > + uint8_t thread_active; /* indicates when the thread is in service_run() > > */ > > uint8_t is_service_core; /* set if core is currently a service core */ > > uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX]; > > uint64_t loops; > > @@ -457,6 +458,8 @@ service_runner_func(void *arg) > > const int lcore = rte_lcore_id(); > > struct core_state *cs = &lcore_states[lcore]; > > > > + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED); > > + > > /* runstate act as the guard variable. Use load-acquire > > * memory order here to synchronize with store-release > > * in runstate update functions. > > @@ -475,6 +478,7 @@ service_runner_func(void *arg) > > cs->loops++; > > } > > > > + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED); > > return 0; > > } > > > > @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore) > > __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED, > > __ATOMIC_RELEASE); > > > > + /* wait for service lcore to return */ > > + i = 0; > > + uint8_t active; > > + uint64_t start = rte_rdtsc(); > > + do { > > + active = __atomic_load_n(&lcore_states[lcore].thread_active, > > + __ATOMIC_RELAXED); > > + if (active == 0) > > + break; > > + rte_delay_ms(1); > > + i++; > > + } while (i < 1000); > > + > > + if (active != 0) { > > + uint64_t end = rte_rdtsc(); > > + RTE_LOG(WARNING, EAL, > > + "service lcore stop() failed, waited for %ld cycles\n", > > + end - start); > > + } > > + > > return 0; > > } > > > I don't like the idea of inserting this polling loop inside API call. > And I don't like setting up a 1000 iterations constraint. > How about keeping the thread_active flag, but moving checking state of > this flag to separate function. This way the user of the API would be > able to write own loop. > Maybe he/she would like a custom loop, because: > * waiting for more cores > * would like to wait longer > * would like to check if service is finished less often...
Agree - good feedback, thanks. v2 on the way, with this approach.