<snip> > > <big snip of previous discussions> > > > > At the time of the read operation (in the global counter solution), > > > there may well be cycles consumed or calls having been made, but not > > > yet posted. The window between call having been made, and global > > > counter having been incremented (and thus made globally visible) is small, > but non-zero. > > Agree. The read value is the atomic state of the system at a given > > instance (when the read was executed), though that instance happened few > cycles back. > > (Just to be clear, I am fine with per-core counters) > > Option 1: "Per core counters" > > > Agree we need atomic operations. I am not sure if __atomic_fetch_add > > or __atomic_store_n would have a large difference. __atomic_fetch_add > > would result in less number of instructions. I am fine with either. > > Option 2: "Use atomics for counter increments". > > > > >> I was fortunate to get some data from a real-world application, > > > >> and enabling service core stats resulted in a 7% degradation of > > > >> overall system capacity. I'm guessing atomic instructions would > > > >> not make things > > > better. > > Agree, performance of atomics is likely to reduce performance.. but > correctness > is worth more than performance. > > <snip> > > In my mind, any LTS/backports get the simplest/highest-confidence bugfix: > using > atomics. > The atomics are behind the "service stats" feature enable, so impact is only > when those are enabled. > > If there is still a performance hit, and there are *no* MT services > registered, we > could check a static-global flag, and if there are no MT services use the > normal > adds. Thoughts on such a solution to reduce atomic perf impact only to apps > with MT-services? The choice is not between atomic stats and non-atomic stats. The stats need to be incremented atomically in both cases(MT or no MT services) as the reader could be another thread. We need to ensure that the stores and loads are not split. Hence, we need atomic operations. We cannot avoid any performance hit here.
I think the choice is between per-core stats vs global stats. IMO, we should go with per-core stats to be aligned with the norm. > > The code changes themselves are OK.. I can send a patch with fix if there's > agreement on the approach? > > > diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c index > ef31b1f63c..a07c8fc2d7 100644 > --- a/lib/eal/common/rte_service.c > +++ b/lib/eal/common/rte_service.c > @@ -363,9 +363,9 @@ service_runner_do_callback(struct > rte_service_spec_impl *s, > uint64_t start = rte_rdtsc(); > s->spec.callback(userdata); > uint64_t end = rte_rdtsc(); > - s->cycles_spent += end - start; > + __atomic_fetch_add(&s->cycles_spent, (end-start), > __ATOMIC_RELAXED); > + __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED); > cs->calls_per_service[service_idx]++; > - s->calls++; > } else > s->spec.callback(userdata); }