> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Wednesday, April 29, 2020 11:49 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Phil Yang > <phil.y...@arm.com>; dev@dpdk.org > Cc: tho...@monjalon.net; david.march...@redhat.com; Ananyev, Konstantin > <konstantin.anan...@intel.com>; jer...@marvell.com; > hemant.agra...@nxp.com; Gavin Hu <gavin...@arm.com>; nd > <n...@arm.com>; sta...@dpdk.org; Eads, Gage <gage.e...@intel.com>; > Richardson, Bruce <bruce.richard...@intel.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH v2 1/6] service: fix race condition for MT unsafe service > > Hi Harry, > Thanks for getting back on this. > > <snip> > > > > Subject: [PATCH v2 1/6] service: fix race condition for MT unsafe > > > service > > > > > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > > > The MT unsafe service might get configured to run on another core > > > while the service is running currently. This might result in the MT > > > unsafe service running on multiple cores simultaneously. Use > > > 'execute_lock' always when the service is MT unsafe. > > > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > Cc: sta...@dpdk.org > > > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > Reviewed-by: Phil Yang <phil.y...@arm.com> > > > --- > > > > Thanks for spinning a new revision - based on ML discussion previously, it > > seems like the "use service-run-count" to avoid this race would be a complex > > solution. > > > > Suggesting the following; > > 1) Take the approach as per this patch, to always take the atomic, fixing > > the > > race condition. > Ok
I've micro-benchmarked this code change inside the service cores autotest, and it introduces around 35 cycles of overhead per service call. This is not ideal, but given it's a bugfix, and by far the simplest method to fix this race-condition. Having discussed and investigated multiple other solutions, I believe this is the right solution. Thanks Honnappa and Phil for identifying and driving a solution. I suggest to post the benchmarking unit-test addition patch, and integrate that *before* the series under review here gets merged? This makes benchmarking of the "before bugfix" performance in future easier should it be required. > > 2) Add an API to service-cores, which allows "committing" of mappings. > > Committing the mapping would imply that the mappings will not be changed > > in future. With runtime-remapping being removed from the equation, the > > existing branch-over-atomic optimization is valid again. > Ok. Just to make sure I understand this: > a) on the data plane, if commit API is called (probably a new state variable) > and > num_mapped_cores is set to 1, there is no need to take the lock. > b) possible implementation of the commit API would check if > num_mapped_cores for the service is set to 1 and set a variable to indicate > that > the lock is not required. > > What do you think about asking the application to set the service capability > to > MT_SAFE if it knows that the service will run on a single core? This would > require us to change the documentation and does not require additional code. That's a nice idea - I like that if applications want to micro-optimize around the atomic, that they have a workaround/solution to do so, particularly that it doesn't require code-changes and backporting. Will send review and send feedback on the patches themselves. Regards, -Harry > > So this would offer applications two situations > > A) No application change: possible performance regression due to atomic > > always taken. > > B) Call "commit" API, and regain the performance as per previous DPDK > > versions. > > > > Thoughts/opinions on the above? I've flagged the rest of the patchset for > > review ASAP. Regards, -Harry > > > > > lib/librte_eal/common/rte_service.c | 11 +++++------ > > > 1 file changed, 5 insertions(+), 6 deletions(-) <snip patch changes>