> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com]
> Sent: Friday, 8 July 2022 18.46
> 
> <snip>
> > > >
> > > > This commit fixes a potential racey-add that could occur if
> multiple
> > > service-
> > > > lcores were executing the same MT-safe service at the same time,
> > > > with service statistics collection enabled.
> > > >
> > > > Because multiple threads can run and execute the service, the
> stats
> > > values
> > > > can have multiple writer threads, resulting in the requirement of
> > > using
> > > > atomic addition for correctness.
> > > >
> > > > Note that when a MT unsafe service is executed, a spinlock is
> held,
> > > so the
> > > > stats increments are protected. This fact is used to avoid
> executing
> > > atomic
> > > > add instructions when not required.
> > > >
> > > > This patch causes a 1.25x increase in cycle-cost for polling a MT
> > > safe service
> > > > when statistics are enabled. No change was seen for MT unsafe
> > > services, or
> > > > when statistics are disabled.
> > > >
> > > > Reported-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
> > > > Suggested-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> > > > Suggested-by: Morten Brørup <m...@smartsharesystems.com>
> > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com>
> > > >
> > > > ---
> > > > ---
> > > >  lib/eal/common/rte_service.c | 10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/lib/eal/common/rte_service.c
> > > b/lib/eal/common/rte_service.c
> > > > index ef31b1f63c..f045e74ef3 100644
> > > > --- a/lib/eal/common/rte_service.c
> > > > +++ b/lib/eal/common/rte_service.c
> > > > @@ -363,9 +363,15 @@ service_runner_do_callback(struct
> > > > rte_service_spec_impl *s,
> > > >                 uint64_t start = rte_rdtsc();
> > > >                 s->spec.callback(userdata);
> > > >                 uint64_t end = rte_rdtsc();
> > > > -               s->cycles_spent += end - start;
> > > > +               uint64_t cycles = end - start;
> > > >                 cs->calls_per_service[service_idx]++;
> > > > -               s->calls++;
> > > > +               if (service_mt_safe(s)) {
> > > > +                       __atomic_fetch_add(&s->cycles_spent, cycles,
> > > > __ATOMIC_RELAXED);
> > > > +                       __atomic_fetch_add(&s->calls, 1,
> > > > __ATOMIC_RELAXED);
> > > > +               } else {
> > > > +                       s->cycles_spent += cycles;
> > > > +                       s->calls++;
> > > This is still a problem from a reader perspective. It is possible
> that
> > > the writes could be split while a reader is reading the stats.
> These
> > > need to be atomic adds.
> >
> > I don't understand what you suggest can go wrong here, Honnappa. If
> you
> > talking about 64 bit counters on 32 bit architectures, then I
> understand the
> > problem (and have many years of direct experience with it myself).
> > Otherwise, I hope you can elaborate or direct me to educational
> material
> > about the issue, considering this a learning opportunity. :-)
> I am thinking of the case where the 64b write is split into two 32b (or
> more) write operations either by the compiler or the micro-
> architecture. If this were to happen, it causes race conditions with
> the reader.
> 
> As far as I understand, the compiler does not provide any guarantees on
> generating non-tearing stores unless an atomic builtin/function is
> used.

This seems like a generic problem for all 64b statistics counters in DPDK, and 
any other C code using 64 bit counters. Being a generic C problem, there is 
probably a generic solution to it.

Is any compiler going to do something that stupid (i.e. tearing a store into 
multiple write operations) to a simple 64b counter on any 64 bit architecture 
(assuming the counter is 64b aligned)? Otherwise, we might only need to take 
special precautions for 32 bit architectures.

> If we have to ensure the micro-architecture does not generate
> split writes, we need to be careful that future code additions do not
> change the alignment of the stats.

Unless the structure containing the stats counters is packed, the contained 64b 
counters will be 64b aligned (on 64 bit architecture). So we should not worry 
about alignment, except perhaps on 32 bit architectures.

Reply via email to