On 2024-02-01 09:04, Morten Brørup wrote:
From: Mattias Rönnblom [mailto:hof...@lysator.liu.se]
Sent: Wednesday, 31 January 2024 19.46

On 2024-01-31 17:06, Stephen Hemminger wrote:
On Wed, 31 Jan 2024 14:13:01 +0100
Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote:

[...]

FYI - the linux kernel has a similar but more complete set of
operations.
It might be more efficient to use unsigned long rather than requiring
the elements to be uint64_t. Thinking of the few 32 bit platforms.


Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't have
an
equivalent to __builtin_popcountl().

How much do we need to care about 32-bit ISA performance?

At the 2023 DPDK Summit I talked to someone at a very well known network 
equipment vendor using 32 bit CPUs in some of their products; some sort of CPE, 
IIRC. 32 bit CPUs are still out there, and 32-bit CPU support has not been 
deprecated in DPDK.

For the bitset parameter to functions, you could either use "unsigned long*" (as 
suggested by Stephen), or "void*" (followed by type casting inside the functions).

If only using this library for the command line argument parser and similar, 
performance is irrelevant. If we foresee using it in the fast path, e.g. with 
the htimer library, we shouldn't tie its API tightly to 64 bit.


I'm not even sure performance will be that much worse. Sure, two popcount instead of one. What is probably worse is older ISAs (32- or 64-bit, e.g. original x64_64) that lack machine instructions for counting set bits of *any* word size.

That said, the only real concern I have about going "unsigned long" -> "uint64_t" is that I might feel I need to go fix <rte_bitops.h> first.


I'll go through the below API and some other APIs to see if there's
something obvious missing.

When I originally wrote this code there were a few potential features
where I wasn't sure to what extent they were useful. One example was
the
shift operation. Any input is appreciated.

Start off with what you already have. If we need more operations, they can 
always be added later.


Also, what if any thread safety guarantees? or atomic.


Currently, it's all MT unsafe.

An atomic set and get/test would make sense, and maybe other operations
would as well.

Bringing in atomicity into the design makes it much less obvious:

Would the atomic operations imply some memory ordering, or be
"relaxed".
I would lean toward relaxed, but then shouldn't bit-level atomics be
consistent with the core DPDK atomics API? With that in mind, memory
ordering should be user-configurable.

If the code needs to be pure C11 atomics-wise, the words that makes up
the bitset must be _Atomic uint64_t. Then you need to be careful or end
up with "lock"-prefixed instructions if you manipulate the bitset
words.
Just a pure words[N] = 0; gives you a mov+mfence on x86, for example,
plus all the fun memory_order_seq_cst in terms of preventing
compiler-level optimizations. So you definitely can't have the bitset
always using _Atomic uint64_t, since would risk non-shared use cases.
You could have a variant I guess. Just duplicate the whole thing, or
something with macros.

It seems like MT unsafe suffices for the near term use cases.

We can add an atomic variant of the library later, if the need should arise.


Agreed. The only concern I have here is that you end up wanting to change the original design, to better be able to fit atomic bit operations.


With GCC C11 builtins, you can both have the atomic cake and eat it, in
that you both access the data non-atomically/normally, and in an atomic
manner.

Yep. And we care quite a lot about performance, so we are likely to keep using 
those until the compilers offer similar performance for C11 standard atomics.

Reply via email to