> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > Sent: Friday, 1 July 2022 20.38 > > <snip> > > > > <big snip of previous discussions> > > > > > > At the time of the read operation (in the global counter > solution), > > > > there may well be cycles consumed or calls having been made, but > not > > > > yet posted. The window between call having been made, and global > > > > counter having been incremented (and thus made globally visible) > is small, > > but non-zero. > > > Agree. The read value is the atomic state of the system at a given > > > instance (when the read was executed), though that instance > happened few > > cycles back. > > > (Just to be clear, I am fine with per-core counters) > > > > Option 1: "Per core counters" > > > > > Agree we need atomic operations. I am not sure if > __atomic_fetch_add > > > or __atomic_store_n would have a large difference. > __atomic_fetch_add > > > would result in less number of instructions. I am fine with either. > > > > Option 2: "Use atomics for counter increments". > > > > > > >> I was fortunate to get some data from a real-world > application, > > > > >> and enabling service core stats resulted in a 7% degradation > of > > > > >> overall system capacity. I'm guessing atomic instructions > would > > > > >> not make things > > > > better. > > > > Agree, performance of atomics is likely to reduce performance.. but > correctness > > is worth more than performance. > > > > <snip> > > > > In my mind, any LTS/backports get the simplest/highest-confidence > bugfix: using > > atomics. > > The atomics are behind the "service stats" feature enable, so impact > is only > > when those are enabled. > > > > If there is still a performance hit, and there are *no* MT services > registered, we > > could check a static-global flag, and if there are no MT services use > the normal > > adds. Thoughts on such a solution to reduce atomic perf impact only > to apps > > with MT-services? > The choice is not between atomic stats and non-atomic stats. The stats > need to be incremented atomically in both cases(MT or no MT services) > as the reader could be another thread. We need to ensure that the > stores and loads are not split. Hence, we need atomic operations. We > cannot avoid any performance hit here. > > I think the choice is between per-core stats vs global stats. IMO, we > should go with per-core stats to be aligned with the norm. +1
And the per-core counters can be summed when read for statistics, so it still looks like one counter per service. They are only updated per-core to prevent cache trashing. > > > > > The code changes themselves are OK.. I can send a patch with fix if > there's > > agreement on the approach? > > > > > > diff --git a/lib/eal/common/rte_service.c > b/lib/eal/common/rte_service.c index > > ef31b1f63c..a07c8fc2d7 100644 > > --- a/lib/eal/common/rte_service.c > > +++ b/lib/eal/common/rte_service.c > > @@ -363,9 +363,9 @@ service_runner_do_callback(struct > > rte_service_spec_impl *s, > > uint64_t start = rte_rdtsc(); > > s->spec.callback(userdata); > > uint64_t end = rte_rdtsc(); > > - s->cycles_spent += end - start; > > + __atomic_fetch_add(&s->cycles_spent, (end-start), > > __ATOMIC_RELAXED); > > + __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED); > > cs->calls_per_service[service_idx]++; > > - s->calls++; > > } else > > s->spec.callback(userdata); }