> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > Sent: Friday, 2 February 2024 11.19 > > On 2024-02-01 09:04, Morten Brørup wrote: > >> From: Mattias Rönnblom [mailto:hof...@lysator.liu.se] > >> Sent: Wednesday, 31 January 2024 19.46 > >> > >> On 2024-01-31 17:06, Stephen Hemminger wrote: > >>> On Wed, 31 Jan 2024 14:13:01 +0100 > >>> Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote: > > > > [...] > > > >>> FYI - the linux kernel has a similar but more complete set of > >> operations. > >>> It might be more efficient to use unsigned long rather than > requiring > >>> the elements to be uint64_t. Thinking of the few 32 bit platforms. > >>> > >> > >> Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't > have > >> an > >> equivalent to __builtin_popcountl(). > >> > >> How much do we need to care about 32-bit ISA performance? > > > > At the 2023 DPDK Summit I talked to someone at a very well known > network equipment vendor using 32 bit CPUs in some of their products; > some sort of CPE, IIRC. 32 bit CPUs are still out there, and 32-bit CPU > support has not been deprecated in DPDK. > > > > For the bitset parameter to functions, you could either use "unsigned > long*" (as suggested by Stephen), or "void*" (followed by type casting > inside the functions). > > > > If only using this library for the command line argument parser and > similar, performance is irrelevant. If we foresee using it in the fast > path, e.g. with the htimer library, we shouldn't tie its API tightly to > 64 bit. > > > > I'm not even sure performance will be that much worse. Sure, two > popcount instead of one. What is probably worse is older ISAs (32- or > 64-bit, e.g. original x64_64) that lack machine instructions for > counting set bits of *any* word size.
I'm sorry about being unclear. I didn't mean to suggest supporting *any* word size; I was thinking about one word size, either 32 or 64 bit, automatically selected at build time depending on CPU architecture. > > That said, the only real concern I have about going "unsigned long" -> > "uint64_t" is that I might feel I need to go fix <rte_bitops.h> first. I see. Otherwise you'll end up with a bunch of #if RTE_ARCH_32 rte_bit_<op>32() #else rte_bit_<op>64() #endif in the implementation. Perhaps a string concatenation macro could replace that with something like rte_bit_<op>##RTE_ARCH_BITS(), or RTE_POSTFIX_ARCH_BITS(rte_bit_<op>, (params)). Just thinking out aloud. > > >> > >> I'll go through the below API and some other APIs to see if there's > >> something obvious missing. > >> > >> When I originally wrote this code there were a few potential > features > >> where I wasn't sure to what extent they were useful. One example was > >> the > >> shift operation. Any input is appreciated. > > > > Start off with what you already have. If we need more operations, > they can always be added later. > > > >> > >>> Also, what if any thread safety guarantees? or atomic. > >>> > >> > >> Currently, it's all MT unsafe. > >> > >> An atomic set and get/test would make sense, and maybe other > operations > >> would as well. > >> > >> Bringing in atomicity into the design makes it much less obvious: > >> > >> Would the atomic operations imply some memory ordering, or be > >> "relaxed". > >> I would lean toward relaxed, but then shouldn't bit-level atomics be > >> consistent with the core DPDK atomics API? With that in mind, memory > >> ordering should be user-configurable. > >> > >> If the code needs to be pure C11 atomics-wise, the words that makes > up > >> the bitset must be _Atomic uint64_t. Then you need to be careful or > end > >> up with "lock"-prefixed instructions if you manipulate the bitset > >> words. > >> Just a pure words[N] = 0; gives you a mov+mfence on x86, for > example, > >> plus all the fun memory_order_seq_cst in terms of preventing > >> compiler-level optimizations. So you definitely can't have the > bitset > >> always using _Atomic uint64_t, since would risk non-shared use > cases. > >> You could have a variant I guess. Just duplicate the whole thing, or > >> something with macros. > > > > It seems like MT unsafe suffices for the near term use cases. > > > > We can add an atomic variant of the library later, if the need should > arise. > > > > Agreed. The only concern I have here is that you end up wanting to > change the original design, to better be able to fit atomic bit > operations. In a perfect world, the design should have a roadmap leading towards atomic bit operations. In a fast moving world, we could mark the lib experimental (and mean it!) - it is still an improvement over copy-pasting something similar all over the code. If a potential roadmap towards atomic operations is not obvious after thinking a few moments about it, we have a clear conscience to simply deem the library unsafe for multithreading and proceed with it "as is".