> -----Original Message----- > From: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru> > Sent: Saturday, July 24, 2021 3:52 AM > To: Joyce Kong <joyce.k...@arm.com>; tho...@monjalon.net; > david.march...@redhat.com; roret...@linux.microsoft.com; > step...@networkplumber.org; olivier.m...@6wind.com; > harry.van.haa...@intel.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com> > Cc: dev@dpdk.org; nd <n...@arm.com> > Subject: Re: [PATCH v3 8/8] test/rcu: use compiler atomics for data sync > > On 7/20/21 6:51 AM, Joyce Kong wrote: > > Covert rte_atomic usages to compiler atomic built-ins in rcu_perf > > testcases. > > > > Signed-off-by: Joyce Kong <joyce.k...@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > Acked-by: Stephen Hemminger <step...@networkplumber.org> > > --- > > app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------ > > 1 file changed, 49 insertions(+), 49 deletions(-) > > > > diff --git a/app/test/test_rcu_qsbr_perf.c > > b/app/test/test_rcu_qsbr_perf.c index 3017e71120..cf7b158d22 100644 > > --- a/app/test/test_rcu_qsbr_perf.c > > +++ b/app/test/test_rcu_qsbr_perf.c > > @@ -30,8 +30,8 @@ static volatile uint32_t thr_id; > > static struct rte_rcu_qsbr *t[RTE_MAX_LCORE]; > > static struct rte_hash *h; > > static char hash_name[8]; > > -static rte_atomic64_t updates, checks; -static rte_atomic64_t > > update_cycles, check_cycles; > > +static uint64_t updates, checks; > > +static uint64_t update_cycles, check_cycles; > > > > /* Scale down results to 1000 operations to support lower > > * granularity clocks. > > @@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg) > > } > > > > cycles = rte_rdtsc_precise() - begin; > > - rte_atomic64_add(&update_cycles, cycles); > > - rte_atomic64_add(&updates, loop_cnt); > > + __atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED); > > + __atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED); > > Shouldn't __atomic_add_fetch() be used instead since it pseudo-code is a bit > simpler. What is the best option if return value is not actually used?
If the return value is not used, like the situations here, the instructions for __atomic_fetch_add() and __atomic_add_fetch() would be the same on X86 and Arm for gcc and clang that I have tried. If the return value is used, __atomic_add_fetch() would do two more instructions('mov' 'add') than __atomic_fetch_add() to return the calculation result. Based on experiments here: https://godbolt.org/ .