On Fri, May 03, 2024 at 05:56:24PM -0700, Stephen Hemminger wrote: > On Fri, 3 May 2024 10:46:05 +0100 > Daniel Gregory <daniel.greg...@bytedance.com> wrote: > > > On Thu, May 02, 2024 at 02:48:26PM -0700, Stephen Hemminger wrote: > > > There are already constant checks like this elsewhere in the file. > > > > Yes, but they're in macros, rather than inlined functions, so my > > understanding was that at compile time, macro expansion has put the > > memorder constant in the _Static_assert call as opposed to still being > > a function parameter in the inline definition. > > Gcc and clang are smart enough that it is possible to use the internal > __builtin_constant_p() in the function. Some examples in DPDK: > > static __rte_always_inline int > rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table, > unsigned int n, struct rte_mempool_cache *cache) > { > int ret; > unsigned int remaining; > uint32_t index, len; > void **cache_objs; > > /* No cache provided */ > if (unlikely(cache == NULL)) { > remaining = n; > goto driver_dequeue; > } > > /* The cache is a stack, so copy will be in reverse order. */ > cache_objs = &cache->objs[cache->len]; > > if (__extension__(__builtin_constant_p(n)) && n <= cache->len) { > > It should be possible to use RTE_BUILD_BUG_ON() or static_assert here.
Yes, it's possible to use RTE_BUILD_BUG_ON(!__builtin_constant_p(n)) on Clang, I am simply not seeing it succeed. In fact, the opposite check, that the memorder is not constant, builds just fine with Clang 16. diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h index 5cb8b59056..d0646320e6 100644 --- a/lib/eal/arm/include/rte_pause_64.h +++ b/lib/eal/arm/include/rte_pause_64.h @@ -153,8 +153,7 @@ rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected, { uint16_t value; - RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire && - memorder != rte_memory_order_relaxed); + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); __RTE_ARM_LOAD_EXC_16(addr, value, memorder) if (value != expected) { @@ -172,8 +171,7 @@ rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected, { uint32_t value; - RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire && - memorder != rte_memory_order_relaxed); + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); __RTE_ARM_LOAD_EXC_32(addr, value, memorder) if (value != expected) { @@ -191,8 +189,7 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected, { uint64_t value; - RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire && - memorder != rte_memory_order_relaxed); + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); __RTE_ARM_LOAD_EXC_64(addr, value, memorder) if (value != expected) { diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h index f2a1eadcbd..3973488865 100644 --- a/lib/eal/include/generic/rte_pause.h +++ b/lib/eal/include/generic/rte_pause.h @@ -80,6 +80,7 @@ static __rte_always_inline void rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected, rte_memory_order memorder) { + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); assert(memorder == rte_memory_order_acquire || memorder == rte_memory_order_relaxed); while (rte_atomic_load_explicit((volatile __rte_atomic uint16_t *)addr, memorder) @@ -91,6 +92,7 @@ static __rte_always_inline void rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected, rte_memory_order memorder) { + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); assert(memorder == rte_memory_order_acquire || memorder == rte_memory_order_relaxed); while (rte_atomic_load_explicit((volatile __rte_atomic uint32_t *)addr, memorder) @@ -102,6 +104,7 @@ static __rte_always_inline void rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected, rte_memory_order memorder) { + RTE_BUILD_BUG_ON(__builtin_constant_p(memorder)); assert(memorder == rte_memory_order_acquire || memorder == rte_memory_order_relaxed); while (rte_atomic_load_explicit((volatile __rte_atomic uint64_t *)addr, memorder) This seemed odd, and it doesn't line up with what the GCC documentation says about __builtin_constant_p: > [__builtin_constant_p] does not return 1 when you pass a constant > numeric value to the inline function unless you specify the -O option. https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html So I did some more looking and the behaviour I've seen is that both Clang and GCC don't allow for static checks on inline function parameters: static inline __attribute__((always_inline)) int fn(int val) { _Static_assert(val == 0, "val nonzero"); return 0; } int main(void) { return fn(0); } ( https://godbolt.org/z/TrfWqYoGo ) Both GCC and Clang's __builtin_constant_p return 1 at runtime when called on an inline function parameter static inline __attribute__((always_inline)) int fn(int val) { return __builtin_constant_p(val); } int main(void) { return fn(0); } ( https://godbolt.org/z/hTGqvj91K ) But GCC doesn't allow for static assertions on the return value of __builtin_constant_p, error-ing that the expression is not constant: static inline __attribute__((always_inline)) int fn(int val) { _Static_assert(__builtin_constant_p(val), "v isn't constant"); return 0; } int main(void) { return fn(0); } ( https://godbolt.org/z/4799nEK1T ) And on Clang these static assertions are allowed, but always fail ( https://godbolt.org/z/6j16c5MbT ). I don't really have the experience with Clang or GCC to assess whether this is intended behaviour or a bug (I've not seen evidence this behaviour is new or different from previous versions), it's just what I've observed. > Changing a compile time check into a runtime check means that buggy programs > blow up much later and in a confusing manner. And it impacts all code that > is doing a spin lock, certainly one of the hottest paths in DPDK. True, buggy programs will error later, but other than converting the rte_wait_until_equal_* functions to macros, I haven't seen an approach that will let us use a static assert on the passed in memory order value right now. However, new enough GCC and Clang versions will optimise out runtime checks that are guaranteed to pass, such as in this case when an inline functions is called with a constant value. To give a reduced example: #include <assert.h> enum E { X, Y, Z, }; static inline __attribute__((always_inline)) void add(int *dst, enum E e) { assert(e == X || e == Y); if (e == X) (*dst)++; else (*dst) += 4; } } int fn(int num) { add(&num, Y); return num; } (GCC https://godbolt.org/z/GKGEa7Wdd ) (Clang https://godbolt.org/z/1vYv8MYcs ) With optimisations they both produce a fn that doesn't perform any check on the value of e, and simply return the input+4. Therefore, this patch shouldn't have an impact on the performance of locks when built in release mode.