On 2024-01-31 17:06, Stephen Hemminger wrote:
On Wed, 31 Jan 2024 14:13:01 +0100
Mattias Rönnblom <mattias.ronnb...@ericsson.com> wrote:

+/**
+ * @file
+ * RTE Bitset
+ *
+ * This file provides functions and macros for querying and
+ * manipulating sets of bits kept in arrays of @c uint64_t-sized
+ * elements.
+ *
+ * The bits in a bitset are numbered from 0 to @c size - 1, with the
+ * lowest index being the least significant bit.
+ *
+ * The bitset array must be properly aligned.
+ *
+ * For optimal performance, the @c size parameter, required by
+ * many of the API's functions, should be a compile-time constant.
+ *
+ * For large bitsets, the rte_bitmap.h API may be more appropriate.
+ *
+ * @warning
+ * All functions modifying a bitset may overwrite any unused bits of
+ * the last word. Such unused bits are ignored by all functions reading
+ * bits.
+ *
+ */

FYI - the linux kernel has a similar but more complete set of operations.
It might be more efficient to use unsigned long rather than requiring
the elements to be uint64_t. Thinking of the few 32 bit platforms.


Keeping it 64-bit avoids a popcount-related #ifdef. DPDK doesn't have an equivalent to __builtin_popcountl().

How much do we need to care about 32-bit ISA performance?

I'll go through the below API and some other APIs to see if there's something obvious missing.

When I originally wrote this code there were a few potential features where I wasn't sure to what extent they were useful. One example was the shift operation. Any input is appreciated.

Also, what if any thread safety guarantees? or atomic.


Currently, it's all MT unsafe.

An atomic set and get/test would make sense, and maybe other operations would as well.

Bringing in atomicity into the design makes it much less obvious:

Would the atomic operations imply some memory ordering, or be "relaxed". I would lean toward relaxed, but then shouldn't bit-level atomics be consistent with the core DPDK atomics API? With that in mind, memory ordering should be user-configurable.

If the code needs to be pure C11 atomics-wise, the words that makes up the bitset must be _Atomic uint64_t. Then you need to be careful or end up with "lock"-prefixed instructions if you manipulate the bitset words. Just a pure words[N] = 0; gives you a mov+mfence on x86, for example, plus all the fun memory_order_seq_cst in terms of preventing compiler-level optimizations. So you definitely can't have the bitset always using _Atomic uint64_t, since would risk non-shared use cases. You could have a variant I guess. Just duplicate the whole thing, or something with macros.

With GCC C11 builtins, you can both have the atomic cake and eat it, in that you both access the data non-atomically/normally, and in an atomic manner.

 From kernel bitmap.h

/**
  * DOC: bitmap overview
  *
  * The available bitmap operations and their rough meaning in the
  * case that the bitmap is a single unsigned long are thus:
  *
  * The generated code is more efficient when nbits is known at
  * compile-time and at most BITS_PER_LONG.
  *
  * ::
  *
  *  bitmap_zero(dst, nbits)                     *dst = 0UL
  *  bitmap_fill(dst, nbits)                     *dst = ~0UL
  *  bitmap_copy(dst, src, nbits)                *dst = *src
  *  bitmap_and(dst, src1, src2, nbits)          *dst = *src1 & *src2
  *  bitmap_or(dst, src1, src2, nbits)           *dst = *src1 | *src2
  *  bitmap_xor(dst, src1, src2, nbits)          *dst = *src1 ^ *src2
  *  bitmap_andnot(dst, src1, src2, nbits)       *dst = *src1 & ~(*src2)
  *  bitmap_complement(dst, src, nbits)          *dst = ~(*src)
  *  bitmap_equal(src1, src2, nbits)             Are *src1 and *src2 equal?
  *  bitmap_intersects(src1, src2, nbits)        Do *src1 and *src2 overlap?
  *  bitmap_subset(src1, src2, nbits)            Is *src1 a subset of *src2?
  *  bitmap_empty(src, nbits)                    Are all bits zero in *src?
  *  bitmap_full(src, nbits)                     Are all bits set in *src?
  *  bitmap_weight(src, nbits)                   Hamming Weight: number set bits
  *  bitmap_weight_and(src1, src2, nbits)        Hamming Weight of and'ed bitmap
  *  bitmap_set(dst, pos, nbits)                 Set specified bit area
  *  bitmap_clear(dst, pos, nbits)               Clear specified bit area
  *  bitmap_find_next_zero_area(buf, len, pos, n, mask)  Find bit free area
  *  bitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off)  as above
  *  bitmap_shift_right(dst, src, n, nbits)      *dst = *src >> n
  *  bitmap_shift_left(dst, src, n, nbits)       *dst = *src << n
  *  bitmap_cut(dst, src, first, n, nbits)       Cut n bits from first, copy 
rest
  *  bitmap_replace(dst, old, new, mask, nbits)  *dst = (*old & ~(*mask)) | (*new 
& *mask)
  *  bitmap_remap(dst, src, old, new, nbits)     *dst = map(old, new)(src)
  *  bitmap_bitremap(oldbit, old, new, nbits)    newbit = map(old, new)(oldbit)
  *  bitmap_onto(dst, orig, relmap, nbits)       *dst = orig relative to relmap
  *  bitmap_fold(dst, orig, sz, nbits)           dst bits = orig bits mod sz
  *  bitmap_parse(buf, buflen, dst, nbits)       Parse bitmap dst from kernel 
buf
  *  bitmap_parse_user(ubuf, ulen, dst, nbits)   Parse bitmap dst from user buf
  *  bitmap_parselist(buf, dst, nbits)           Parse bitmap dst from kernel 
buf
  *  bitmap_parselist_user(buf, dst, nbits)      Parse bitmap dst from user buf
  *  bitmap_find_free_region(bitmap, bits, order)  Find and allocate bit region
  *  bitmap_release_region(bitmap, pos, order)   Free specified bit region
  *  bitmap_allocate_region(bitmap, pos, order)  Allocate specified bit region
  *  bitmap_from_arr32(dst, buf, nbits)          Copy nbits from u32[] buf to 
dst
  *  bitmap_from_arr64(dst, buf, nbits)          Copy nbits from u64[] buf to 
dst
  *  bitmap_to_arr32(buf, src, nbits)            Copy nbits from buf to u32[] 
dst
  *  bitmap_to_arr64(buf, src, nbits)            Copy nbits from buf to u64[] 
dst
  *  bitmap_get_value8(map, start)               Get 8bit value from map at 
start
  *  bitmap_set_value8(map, value, start)        Set 8bit value to map at start
  *
  * Note, bitmap_zero() and bitmap_fill() operate over the region of
  * unsigned longs, that is, bits behind bitmap till the unsigned long
  * boundary will be zeroed or filled as well. Consider to use
  * bitmap_clear() or bitmap_set() to make explicit zeroing or filling
  * respectively.
  */

Reply via email to