> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> Sent: Friday, 2 February 2024 11.19
> 
> On 2024-02-01 09:04, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
> >> Sent: Wednesday, 31 January 2024 19.46
> >>
> >> On 2024-01-31 17:06, Stephen Hemminger wrote:
> >>> On Wed, 31 Jan 2024 14:13:01 +0100
> >>> Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote:
> >
> > [...]
> >
> >>> FYI - the linux kernel has a similar but more complete set of
> >> operations.
> >>> It might be more efficient to use unsigned long rather than
> requiring
> >>> the elements to be uint64_t. Thinking of the few 32 bit platforms.
> >>>
> >>
> >> Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't
> have
> >> an
> >> equivalent to __builtin_popcountl().
> >>
> >> How much do we need to care about 32-bit ISA performance?
> >
> > At the 2023 DPDK Summit I talked to someone at a very well known
> network equipment vendor using 32 bit CPUs in some of their products;
> some sort of CPE, IIRC. 32 bit CPUs are still out there, and 32-bit CPU
> support has not been deprecated in DPDK.
> >
> > For the bitset parameter to functions, you could either use "unsigned
> long*" (as suggested by Stephen), or "void*" (followed by type casting
> inside the functions).
> >
> > If only using this library for the command line argument parser and
> similar, performance is irrelevant. If we foresee using it in the fast
> path, e.g. with the htimer library, we shouldn't tie its API tightly to
> 64 bit.
> >
> 
> I'm not even sure performance will be that much worse. Sure, two
> popcount instead of one. What is probably worse is older ISAs (32- or
> 64-bit, e.g. original x64_64) that lack machine instructions for
> counting set bits of *any* word size.

I'm sorry about being unclear. I didn't mean to suggest supporting *any* word 
size; I was thinking about one word size, either 32 or 64 bit, automatically 
selected at build time depending on CPU architecture.

> 
> That said, the only real concern I have about going "unsigned long" ->
> "uint64_t" is that I might feel I need to go fix <rte_bitops.h> first.

I see.
Otherwise you'll end up with a bunch of #if RTE_ARCH_32 rte_bit_<op>32() #else 
rte_bit_<op>64() #endif in the implementation.
Perhaps a string concatenation macro could replace that with something like 
rte_bit_<op>##RTE_ARCH_BITS(), or RTE_POSTFIX_ARCH_BITS(rte_bit_<op>, 
(params)). Just thinking out aloud.

> 
> >>
> >> I'll go through the below API and some other APIs to see if there's
> >> something obvious missing.
> >>
> >> When I originally wrote this code there were a few potential
> features
> >> where I wasn't sure to what extent they were useful. One example was
> >> the
> >> shift operation. Any input is appreciated.
> >
> > Start off with what you already have. If we need more operations,
> they can always be added later.
> >
> >>
> >>> Also, what if any thread safety guarantees? or atomic.
> >>>
> >>
> >> Currently, it's all MT unsafe.
> >>
> >> An atomic set and get/test would make sense, and maybe other
> operations
> >> would as well.
> >>
> >> Bringing in atomicity into the design makes it much less obvious:
> >>
> >> Would the atomic operations imply some memory ordering, or be
> >> "relaxed".
> >> I would lean toward relaxed, but then shouldn't bit-level atomics be
> >> consistent with the core DPDK atomics API? With that in mind, memory
> >> ordering should be user-configurable.
> >>
> >> If the code needs to be pure C11 atomics-wise, the words that makes
> up
> >> the bitset must be _Atomic uint64_t. Then you need to be careful or
> end
> >> up with "lock"-prefixed instructions if you manipulate the bitset
> >> words.
> >> Just a pure words[N] = 0; gives you a mov+mfence on x86, for
> example,
> >> plus all the fun memory_order_seq_cst in terms of preventing
> >> compiler-level optimizations. So you definitely can't have the
> bitset
> >> always using _Atomic uint64_t, since would risk non-shared use
> cases.
> >> You could have a variant I guess. Just duplicate the whole thing, or
> >> something with macros.
> >
> > It seems like MT unsafe suffices for the near term use cases.
> >
> > We can add an atomic variant of the library later, if the need should
> arise.
> >
> 
> Agreed. The only concern I have here is that you end up wanting to
> change the original design, to better be able to fit atomic bit
> operations.

In a perfect world, the design should have a roadmap leading towards atomic bit 
operations.
In a fast moving world, we could mark the lib experimental (and mean it!) - it 
is still an improvement over copy-pasting something similar all over the code.

If a potential roadmap towards atomic operations is not obvious after thinking 
a few moments about it, we have a clear conscience to simply deem the library 
unsafe for multithreading and proceed with it "as is".

Reply via email to