> From: Honnappa Nagarahalli [mailto:honnappa.nagaraha...@arm.com] > Sent: Friday, 8 July 2022 19.40 > > <snip> > > > > > > > > > > > > This commit fixes a potential racey-add that could occur if > > > multiple > > > > > service- > > > > > > lcores were executing the same MT-safe service at the same > time, > > > > > > with service statistics collection enabled. > > > > > > > > > > > > Because multiple threads can run and execute the service, the > > > stats > > > > > values > > > > > > can have multiple writer threads, resulting in the > requirement > > > > > > of > > > > > using > > > > > > atomic addition for correctness. > > > > > > > > > > > > Note that when a MT unsafe service is executed, a spinlock is > > > held, > > > > > so the > > > > > > stats increments are protected. This fact is used to avoid > > > executing > > > > > atomic > > > > > > add instructions when not required. > > > > > > > > > > > > This patch causes a 1.25x increase in cycle-cost for polling > a > > > > > > MT > > > > > safe service > > > > > > when statistics are enabled. No change was seen for MT unsafe > > > > > services, or > > > > > > when statistics are disabled. > > > > > > > > > > > > Reported-by: Mattias Rönnblom <mattias.ronnb...@ericsson.com> > > > > > > Suggested-by: Honnappa Nagarahalli > > > > > > <honnappa.nagaraha...@arm.com> > > > > > > Suggested-by: Morten Brørup <m...@smartsharesystems.com> > > > > > > Signed-off-by: Harry van Haaren <harry.van.haa...@intel.com> > > > > > > > > > > > > --- > > > > > > --- > > > > > > lib/eal/common/rte_service.c | 10 ++++++++-- > > > > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/lib/eal/common/rte_service.c > > > > > b/lib/eal/common/rte_service.c > > > > > > index ef31b1f63c..f045e74ef3 100644 > > > > > > --- a/lib/eal/common/rte_service.c > > > > > > +++ b/lib/eal/common/rte_service.c > > > > > > @@ -363,9 +363,15 @@ service_runner_do_callback(struct > > > > > > rte_service_spec_impl *s, > > > > > > uint64_t start = rte_rdtsc(); > > > > > > s->spec.callback(userdata); > > > > > > uint64_t end = rte_rdtsc(); > > > > > > - s->cycles_spent += end - start; > > > > > > + uint64_t cycles = end - start; > > > > > > cs->calls_per_service[service_idx]++; > > > > > > - s->calls++; > > > > > > + if (service_mt_safe(s)) { > > > > > > + __atomic_fetch_add(&s->cycles_spent, > > cycles, > > > > > > __ATOMIC_RELAXED); > > > > > > + __atomic_fetch_add(&s->calls, 1, > > > > > > __ATOMIC_RELAXED); > > > > > > + } else { > > > > > > + s->cycles_spent += cycles; > > > > > > + s->calls++; > > > > > This is still a problem from a reader perspective. It is > possible > > > that > > > > > the writes could be split while a reader is reading the stats. > > > These > > > > > need to be atomic adds. > > > > > > > > I don't understand what you suggest can go wrong here, Honnappa. > If > > > you > > > > talking about 64 bit counters on 32 bit architectures, then I > > > understand the > > > > problem (and have many years of direct experience with it > myself). > > > > Otherwise, I hope you can elaborate or direct me to educational > > > material > > > > about the issue, considering this a learning opportunity. :-) > > > I am thinking of the case where the 64b write is split into two 32b > > > (or > > > more) write operations either by the compiler or the micro- > > > architecture. If this were to happen, it causes race conditions > with > > > the reader. > > > > > > As far as I understand, the compiler does not provide any > guarantees > > > on generating non-tearing stores unless an atomic builtin/function > is > > > used. > > > > This seems like a generic problem for all 64b statistics counters in > DPDK, and > > any other C code using 64 bit counters. Being a generic C problem, > there is > > probably a generic solution to it. > Browsing through the code, I see similar problems elsewhere. > > > > > Is any compiler going to do something that stupid (i.e. tearing a > store into > > multiple write operations) to a simple 64b counter on any 64 bit > architecture > > (assuming the counter is 64b aligned)? Otherwise, we might only need > to > > take special precautions for 32 bit architectures. > It is always a debate on who is stupid, compiler or programmer 😊
Compilers will never stop surprising me. Thankfully, they are not so unreliable and full of bugs as they were 25 years ago. :-) > > Though not the same case, you can look at this discussion where > compiler generated torn stores [1] when we all thought it has been > generating a 64b store. > > [1] http://inbox.dpdk.org/dev/d5d563ab-0411-3faf-39ec- > 4994f2bc9...@intel.com/ Good reference. Technically, this sets a bunch of fields in the rte_lpm_tbl_entry structure (which happens to be 32b in total size), so it is not completely unreasonable for the compiler to store those fields individually. I wonder if using a union to cast the rte_lpm_tbl_entry struct to uint32_t (and ensuring 32b alignment) would have solved the problem, and the __atomic_store() could be avoided? > > > > > > If we have to ensure the micro-architecture does not generate split > > > writes, we need to be careful that future code additions do not > change > > > the alignment of the stats. > > > > Unless the structure containing the stats counters is packed, the > contained > > 64b counters will be 64b aligned (on 64 bit architecture). So we > should not > > worry about alignment, except perhaps on 32 bit architectures. > Agree, future code changes need to be aware of these issues and DPDK > supports 32b architectures.