<snip>
> > > > >
> > > > > This commit fixes a potential racey-add that could occur if
> > multiple
> > > > service-
> > > > > lcores were executing the same MT-safe service at the same time,
> > > > > with service statistics collection enabled.
> > > > >
> > > > > Because multiple threads can run and execute the service, the
> > stats
> > > > values
> > > > > can have multiple writer threads, resulting in the requirement
> > > > > of
> > > > using
> > > > > atomic addition for correctness.
> > > > >
> > > > > Note that when a MT unsafe service is executed, a spinlock is
> > held,
> > > > so the
> > > > > stats increments are protected. This fact is used to avoid
> > executing
> > > > atomic
> > > > > add instructions when not required.
> > > > >
> > > > > This patch causes a 1.25x increase in cycle-cost for polling a
> > > > > MT
> > > > safe service
> > > > > when statistics are enabled. No change was seen for MT unsafe
> > > > services, or
> > > > > when statistics are disabled.
> > > > >
> > > > > Reported-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com>
> > > > > Suggested-by: Honnappa Nagarahalli
> > > > > <honnappa.nagaraha...@arm.com>
> > > > > Suggested-by: Morten Brørup <m...@smartsharesystems.com>
> > > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com>
> > > > >
> > > > > ---
> > > > > ---
> > > > >  lib/eal/common/rte_service.c | 10 ++++++++--
> > > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/lib/eal/common/rte_service.c
> > > > b/lib/eal/common/rte_service.c
> > > > > index ef31b1f63c..f045e74ef3 100644
> > > > > --- a/lib/eal/common/rte_service.c
> > > > > +++ b/lib/eal/common/rte_service.c
> > > > > @@ -363,9 +363,15 @@ service_runner_do_callback(struct
> > > > > rte_service_spec_impl *s,
> > > > >               uint64_t start = rte_rdtsc();
> > > > >               s->spec.callback(userdata);
> > > > >               uint64_t end = rte_rdtsc();
> > > > > -             s->cycles_spent += end - start;
> > > > > +             uint64_t cycles = end - start;
> > > > >               cs->calls_per_service[service_idx]++;
> > > > > -             s->calls++;
> > > > > +             if (service_mt_safe(s)) {
> > > > > +                     __atomic_fetch_add(&s->cycles_spent,
> cycles,
> > > > > __ATOMIC_RELAXED);
> > > > > +                     __atomic_fetch_add(&s->calls, 1,
> > > > > __ATOMIC_RELAXED);
> > > > > +             } else {
> > > > > +                     s->cycles_spent += cycles;
> > > > > +                     s->calls++;
> > > > This is still a problem from a reader perspective. It is possible
> > that
> > > > the writes could be split while a reader is reading the stats.
> > These
> > > > need to be atomic adds.
> > >
> > > I don't understand what you suggest can go wrong here, Honnappa. If
> > you
> > > talking about 64 bit counters on 32 bit architectures, then I
> > understand the
> > > problem (and have many years of direct experience with it myself).
> > > Otherwise, I hope you can elaborate or direct me to educational
> > material
> > > about the issue, considering this a learning opportunity. :-)
> > I am thinking of the case where the 64b write is split into two 32b
> > (or
> > more) write operations either by the compiler or the micro-
> > architecture. If this were to happen, it causes race conditions with
> > the reader.
> >
> > As far as I understand, the compiler does not provide any guarantees
> > on generating non-tearing stores unless an atomic builtin/function is
> > used.
> 
> This seems like a generic problem for all 64b statistics counters in DPDK, and
> any other C code using 64 bit counters. Being a generic C problem, there is
> probably a generic solution to it.
Browsing through the code, I see similar problems elsewhere.

> 
> Is any compiler going to do something that stupid (i.e. tearing a store into
> multiple write operations) to a simple 64b counter on any 64 bit architecture
> (assuming the counter is 64b aligned)? Otherwise, we might only need to
> take special precautions for 32 bit architectures.
It is always a debate on who is stupid, compiler or programmer 😊

Though not the same case, you can look at this discussion where compiler 
generated torn stores [1] when we all thought it has been generating a 64b 
store.

[1] http://inbox.dpdk.org/dev/d5d563ab-0411-3faf-39ec-4994f2bc9...@intel.com/

> 
> > If we have to ensure the micro-architecture does not generate split
> > writes, we need to be careful that future code additions do not change
> > the alignment of the stats.
> 
> Unless the structure containing the stats counters is packed, the contained
> 64b counters will be 64b aligned (on 64 bit architecture). So we should not
> worry about alignment, except perhaps on 32 bit architectures.
Agree, future code changes need to be aware of these issues and DPDK supports 
32b architectures.

Reply via email to