> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Wednesday, 22 May 2024 17.33
> 
> On Wed, 22 May 2024 10:31:39 +0200
> Morten Brørup <m...@smartsharesystems.com> wrote:
> 
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * On native 64 bit platform, counter is implemented as basic
> > > + * 64 bit unsigned integer that only increases.
> > > + */
> > > +typedef struct {
> > > + uint64_t current;
> > > + uint64_t offset;
> > > +} rte_counter64_t;
> >
> > As discussed in the other thread [1], I strongly prefer having
> "current" and "offset" separate, for performance reasons.
> > Keeping each offset close together with its counter will require more
> cache lines than necessary, because the offsets take up space in the hot
> part of a fast path data structure. E.g. the size_bins[] counters could
> fit into one cache line instead of two.
> >
> > [1]:
> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F422@smarts
> erver.smartshare.dk/
> 
> There are no size_bins in the current version of the patch.
> And the number of counters in ethdev part are small so it is less of a
> concern.
> The code is easier to maintain if the counter object is self contained.

I agree that there are advantages to keeping the counter object self contained.

However, these counters are generic, so we cannot assume that there are only 
very few, based on how the current software device drivers use them.

Someone might want to add size_bins to the software device drivers.
And someone else might want to collect many counters in some application or 
library structure.

> 
> > > +
> > > +/* On 32 bit platform, need to use atomic to avoid load/store
> tearing */
> > > +typedef RTE_ATOMIC(uint64_t) rte_counter64_t;
> >
> > As shown by Godbolt experiments discussed in a previous thread [2],
> non-tearing 64 bit counters can be implemented without using atomic
> instructions on all 32 bit architectures supported by DPDK. So we should
> use the counter/offset design pattern for RTE_ARCH_32 too.
> >
> > [2]:
> https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F433@smarts
> erver.smartshare.dk/
> 
> Bruce found some 32 bit versions of x86 have the problem.
> Godbolt doesn't seem to list 32 bit x86 compiler for Gcc.
> If you try MSVC in 32 bit mode it will split the loads.
> I see no problem on ARM which is the only other 32 bit we care about.
> 

Yeah, there seems to be a lot of work testing compiler behavior here. Let's 
start with a generic 32 bit implementation based on atomics, which should work 
correctly on all architectures/compilers.
Optimized architecture- and compiler-optimized variants can be added by 
interested CPU vendors later.

Reply via email to