> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Saturday, April 18, 2020 7:22 AM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Phil Yang > <phil.y...@arm.com>; tho...@monjalon.net; Ananyev, Konstantin > <konstantin.anan...@intel.com>; step...@networkplumber.org; > maxime.coque...@redhat.com; dev@dpdk.org > Cc: david.march...@redhat.com; jer...@marvell.com; > hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Joyce Kong <joyce.k...@arm.com>; nd > <n...@arm.com>; sta...@dpdk.org; nd <n...@arm.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH v3 09/12] service: avoid race condition for MT unsafe > service > > <snip> <snip snip> > > > > To achieve our desired goal; > > > > - control thread doing mapping performs the following operations > > > > -- write service->num_mapped_lcores++ (atomic not required, only > > > > single- writer allowed by APIs) > > > This has to be atomic because of rte_service_run_iter_on_app_lcore API. > > > Performance should be fine as this API is not called frequently. But > > > need to consider the implications of more than one thread updating > > num_mapped_cores. > > > > > > > -- MFENCE (full st-ld barrier) to flush write, and force later > > > > loads to > > > issue > > > > after > > > I am not exactly sure what MFENCE on x86 does. On Arm platforms, the > > > full barrier (DMB ISH) just makes sure that memory operations are not > > > re-ordered around it. It does not say anything about when that store > > > is visible to other cores. It will be visible at some point in time to > > > cores. > > > But, I do not think we need to be worried about flushing to memory. > > > > > > > -- read the "calls_per_service" counter for each lcores, add them > > > > up. > > > This can be trimmed down to the single core the service is mapped to > > > currently, no need to add all the counters. > > > > Correct - however that requires figuring out which lcore is running the > > service. > > Anyway, agree - it's an implementation detail as to exactly how we detect > > it. > > > > > > > > > ---- Wait :) > > > > -- re-read the "calls_per_service", and ensure the count has > > > > changed. > Here, there is an assumption that the service core function is running on the > service core. If the service core is not running, the code will be stuck in > this > polling loop.
Right - we could add a timeout ehre, but that just moves the problem somewhere else (the application) which now needs to handle error rets, and possibly retries. It could be a possible solution.. I'm not in favour of it at the moment, but it needs some more time to think. > I could not come up with a good way to check if the service core is running. > Checking the app_runstate and comp_runstate is not enough as they just > indicate that the service is ready to run. Using the counter > 'calls_per_service' > introduces race conditions. > > Only way I can think of is asking the user to follow a specific sequence of > APIs to > ensure the service core is running before calling rte_service_map_lcore_set. Good point - I'm thinking about this - but haven't come to an obvious conclusion yet. I'm considering other ways to detect the core is/isn't running, and also considering just "high-jacking" the service function pointer temporarily with a CAS, which gives some new options on avoiding threads entering the critical section. As above, I don't have a good solution yet. <snip irrelevant to above discussion stuff>