> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Friday, 1 July 2022 20.38
> 
> <snip>
> >
> > <big snip of previous discussions>
> >
> > > > At the time of the read operation (in the global counter
> solution),
> > > > there may well be cycles consumed or calls having been made, but
> not
> > > > yet posted. The window between call having been made, and global
> > > > counter having been incremented (and thus made globally visible)
> is small,
> > but non-zero.
> > > Agree. The read value is the atomic state of the system at a given
> > > instance (when the read was executed), though that instance
> happened few
> > cycles back.
> > > (Just to be clear, I am fine with per-core counters)
> >
> > Option 1: "Per core counters"
> >
> > > Agree we need atomic operations. I am not sure if
> __atomic_fetch_add
> > > or __atomic_store_n would have a large difference.
> __atomic_fetch_add
> > > would result in less number of instructions. I am fine with either.
> >
> > Option 2: "Use atomics for counter increments".
> >
> > > > >> I was fortunate to get some data from a real-world
> application,
> > > > >> and enabling service core stats resulted in a 7% degradation
> of
> > > > >> overall system capacity. I'm guessing atomic instructions
> would
> > > > >> not make things
> > > > better.
> >
> > Agree, performance of atomics is likely to reduce performance.. but
> correctness
> > is worth more than performance.
> >
> > <snip>
> >
> > In my mind, any LTS/backports get the simplest/highest-confidence
> bugfix: using
> > atomics.
> > The atomics are behind the "service stats" feature enable, so impact
> is only
> > when those are enabled.
> >
> > If there is still a performance hit, and there are *no* MT services
> registered, we
> > could check a static-global flag, and if there are no MT services use
> the normal
> > adds. Thoughts on such a solution to reduce atomic perf impact only
> to apps
> > with MT-services?
> The choice is not between atomic stats and non-atomic stats. The stats
> need to be incremented atomically in both cases(MT or no MT services)
> as the reader could be another thread. We need to ensure that the
> stores and loads are not split. Hence, we need atomic operations. We
> cannot avoid any performance hit here.
> 
> I think the choice is between per-core stats vs global stats. IMO, we
> should go with per-core stats to be aligned with the norm.
+1

And the per-core counters can be summed when read for statistics, so it still 
looks like one counter per service. They are only updated per-core to prevent 
cache trashing.

> 
> >
> > The code changes themselves are OK.. I can send a patch with fix if
> there's
> > agreement on the approach?
> >
> >
> > diff --git a/lib/eal/common/rte_service.c
> b/lib/eal/common/rte_service.c index
> > ef31b1f63c..a07c8fc2d7 100644
> > --- a/lib/eal/common/rte_service.c
> > +++ b/lib/eal/common/rte_service.c
> > @@ -363,9 +363,9 @@ service_runner_do_callback(struct
> > rte_service_spec_impl *s,
> >                 uint64_t start = rte_rdtsc();
> >                 s->spec.callback(userdata);
> >                 uint64_t end = rte_rdtsc();
> > -               s->cycles_spent += end - start;
> > +               __atomic_fetch_add(&s->cycles_spent, (end-start),
> > __ATOMIC_RELAXED);
> > +               __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED);
> >                 cs->calls_per_service[service_idx]++;
> > -               s->calls++;
> >         } else
> >                 s->spec.callback(userdata);  }

Reply via email to