Re: [dpdk-dev] [PATCH v3 09/12] service: avoid race condition for MT unsafe service

Van Haaren, Harry Tue, 21 Apr 2020 10:44:25 -0700

> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Sent: Saturday, April 18, 2020 7:22 AM
> To: Van Haaren, Harry <harry.van.haa...@intel.com>; Phil Yang
> <phil.y...@arm.com>; tho...@monjalon.net; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; step...@networkplumber.org;
> maxime.coque...@redhat.com; dev@dpdk.org
> Cc: david.march...@redhat.com; jer...@marvell.com;
> hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>; Joyce Kong <joyce.k...@arm.com>; nd
> <n...@arm.com>; sta...@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>
> Subject: RE: [PATCH v3 09/12] service: avoid race condition for MT unsafe 
> service
> 
> <snip>
<snip snip>
> > > > To achieve our desired goal;
> > > >  - control thread doing mapping performs the following operations
> > > >     -- write service->num_mapped_lcores++ (atomic not required, only
> > > > single- writer allowed by APIs)
> > > This has to be atomic because of rte_service_run_iter_on_app_lcore API.
> > > Performance should be fine as this API is not called frequently. But
> > > need to consider the implications of more than one thread updating
> > num_mapped_cores.
> > >
> > > >     -- MFENCE (full st-ld barrier) to flush write, and force later
> > > > loads to
> > > issue
> > > > after
> > > I am not exactly sure what MFENCE on x86 does. On Arm platforms, the
> > > full barrier (DMB ISH) just makes sure that memory operations are not
> > > re-ordered around it. It does not say anything about when that store
> > > is visible to other cores. It will be visible at some point in time to 
> > > cores.
> > > But, I do not think we need to be worried about flushing to memory.
> > >
> > > >     -- read the "calls_per_service" counter for each lcores, add them 
> > > > up.
> > > This can be trimmed down to the single core the service is mapped to
> > > currently, no need to add all the counters.
> >
> > Correct - however that requires figuring out which lcore is running the 
> > service.
> > Anyway, agree - it's an implementation detail as to exactly how we detect 
> > it.
> >
> > >
> > > >     ---- Wait :)
> > > >     -- re-read the "calls_per_service", and ensure the count has 
> > > > changed.
> Here, there is an assumption that the service core function is running on the
> service core. If the service core is not running, the code will be stuck in 
> this
> polling loop.


Right - we could add a timeout ehre, but that just moves the problem somewhere
else (the application) which now needs to handle error rets, and possibly 
retries.
It could be a possible solution.. I'm not in favour of it at the moment, but it
needs some more time to think.

> I could not come up with a good way to check if the service core is running.
> Checking the app_runstate and comp_runstate is not enough as they just
> indicate that the service is ready to run. Using the counter 
> 'calls_per_service'
> introduces race conditions.
> 
> Only way I can think of is asking the user to follow a specific sequence of 
> APIs to
> ensure the service core is running before calling rte_service_map_lcore_set.

Good point - I'm thinking about this - but haven't come to an obvious 
conclusion yet.
I'm considering other ways to detect the core is/isn't running, and also 
considering
just "high-jacking" the service function pointer temporarily with a CAS, which 
gives
some new options on avoiding threads entering the critical section.

As above, I don't have a good solution yet.

<snip irrelevant to above discussion stuff>

Re: [dpdk-dev] [PATCH v3 09/12] service: avoid race condition for MT unsafe service

Reply via email to