> From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Wednesday, 15 May 2024 17.03 > > On Wed, 15 May 2024 11:30:45 +0200 > Morten Brørup <m...@smartsharesystems.com> wrote: > > > With a long term perspective, I consider this patch very useful. > > And its 32 bit implementation can be optimized for various > architectures/compilers later. > > > > > > In addition, it would be "nice to have" if reset() and fetch() could > be called from another thread than the thread adding to the counter. > > > > As previously discussed [1], I think it can be done without > significantly affecting fast path add() performance, by using an > "offset" with Release-Consume ordering. > > > > [1]: > https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F427@smarts > erver.smartshare.dk/ > > > > > Without a specific driver use case, not sure why this added complexity > is needed.
Our application reads the stats counters from another thread than the fast path threads. We don't pause the fast path forwarding loops to aggregate a bunch of counters. I would guess that many other application work that way too. Especially latency sensitive applications. > If there is a specific example, can add it later. Any atomic operation > ends up > impacting the speculative execution pipeline on modern CPU's. This > version > ends up being just a single add instruction on ARM and x86 64 bit. I agree that everything is mostly fine on 64 bit. I am trying to ensure that we future proof it for multi threaded applications and 32 bit architectures too.