On Fri, May 03, 2024 at 05:56:24PM -0700, Stephen Hemminger wrote:
> On Fri, 3 May 2024 10:46:05 +0100
> Daniel Gregory <daniel.greg...@bytedance.com> wrote:
> 
> > On Thu, May 02, 2024 at 02:48:26PM -0700, Stephen Hemminger wrote:
> > > There are already constant checks like this elsewhere in the file.  
> > 
> > Yes, but they're in macros, rather than inlined functions, so my
> > understanding was that at compile time, macro expansion has put the
> > memorder constant in the _Static_assert call as opposed to still being
> > a function parameter in the inline definition.
> 
> Gcc and clang are smart enough that it is possible to use the internal
> __builtin_constant_p() in the function. Some examples in DPDK:
> 
> static __rte_always_inline int
> rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>                          unsigned int n, struct rte_mempool_cache *cache)
> {
>       int ret;
>       unsigned int remaining;
>       uint32_t index, len;
>       void **cache_objs;
> 
>       /* No cache provided */
>       if (unlikely(cache == NULL)) {
>               remaining = n;
>               goto driver_dequeue;
>       }
> 
>       /* The cache is a stack, so copy will be in reverse order. */
>       cache_objs = &cache->objs[cache->len];
> 
>       if (__extension__(__builtin_constant_p(n)) && n <= cache->len) {
> 
> It should be possible to use RTE_BUILD_BUG_ON() or static_assert here.

Yes, it's possible to use RTE_BUILD_BUG_ON(!__builtin_constant_p(n)) on
Clang, I am simply not seeing it succeed. In fact, the opposite check,
that the memorder is not constant, builds just fine with Clang 16.


diff --git a/lib/eal/arm/include/rte_pause_64.h 
b/lib/eal/arm/include/rte_pause_64.h
index 5cb8b59056..d0646320e6 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -153,8 +153,7 @@ rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t 
expected,
 {
        uint16_t value;
 
-       RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire &&
-               memorder != rte_memory_order_relaxed);
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
 
        __RTE_ARM_LOAD_EXC_16(addr, value, memorder)
        if (value != expected) {
@@ -172,8 +171,7 @@ rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t 
expected,
 {
        uint32_t value;
 
-       RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire &&
-               memorder != rte_memory_order_relaxed);
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
 
        __RTE_ARM_LOAD_EXC_32(addr, value, memorder)
        if (value != expected) {
@@ -191,8 +189,7 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t 
expected,
 {
        uint64_t value;
 
-       RTE_BUILD_BUG_ON(memorder != rte_memory_order_acquire &&
-               memorder != rte_memory_order_relaxed);
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
 
        __RTE_ARM_LOAD_EXC_64(addr, value, memorder)
        if (value != expected) {
diff --git a/lib/eal/include/generic/rte_pause.h 
b/lib/eal/include/generic/rte_pause.h
index f2a1eadcbd..3973488865 100644
--- a/lib/eal/include/generic/rte_pause.h
+++ b/lib/eal/include/generic/rte_pause.h
@@ -80,6 +80,7 @@ static __rte_always_inline void
 rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
                rte_memory_order memorder)
 {
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
        assert(memorder == rte_memory_order_acquire || memorder == 
rte_memory_order_relaxed);
 
        while (rte_atomic_load_explicit((volatile __rte_atomic uint16_t *)addr, 
memorder)
@@ -91,6 +92,7 @@ static __rte_always_inline void
 rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
                rte_memory_order memorder)
 {
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
        assert(memorder == rte_memory_order_acquire || memorder == 
rte_memory_order_relaxed);
 
        while (rte_atomic_load_explicit((volatile __rte_atomic uint32_t *)addr, 
memorder)
@@ -102,6 +104,7 @@ static __rte_always_inline void
 rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
                rte_memory_order memorder)
 {
+       RTE_BUILD_BUG_ON(__builtin_constant_p(memorder));
        assert(memorder == rte_memory_order_acquire || memorder == 
rte_memory_order_relaxed);
 
        while (rte_atomic_load_explicit((volatile __rte_atomic uint64_t *)addr, 
memorder)


This seemed odd, and it doesn't line up with what the GCC documentation
says about __builtin_constant_p:

> [__builtin_constant_p] does not return 1 when you pass a constant
> numeric value to the inline function unless you specify the -O option.
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

So I did some more looking and the behaviour I've seen is that both
Clang and GCC don't allow for static checks on inline function
parameters:

static inline __attribute__((always_inline)) int
fn(int val)
{
        _Static_assert(val == 0, "val nonzero");
        return 0;
}

int main(void) {
        return fn(0);
}

( https://godbolt.org/z/TrfWqYoGo )

Both GCC and Clang's __builtin_constant_p return 1 at runtime when
called on an inline function parameter

static inline __attribute__((always_inline)) int
fn(int val)
{
        return __builtin_constant_p(val);
}

int main(void) {
        return fn(0);
}

( https://godbolt.org/z/hTGqvj91K )

But GCC doesn't allow for static assertions on the return value of
__builtin_constant_p, error-ing that the expression is not constant:

static inline __attribute__((always_inline)) int
fn(int val)
{
        _Static_assert(__builtin_constant_p(val), "v isn't constant");
        return 0;
}

int main(void) {
        return fn(0);
}

( https://godbolt.org/z/4799nEK1T )

And on Clang these static assertions are allowed, but always fail
( https://godbolt.org/z/6j16c5MbT ).

I don't really have the experience with Clang or GCC to assess whether
this is intended behaviour or a bug (I've not seen evidence this
behaviour is new or different from previous versions), it's just what
I've observed.
 
> Changing a compile time check into a runtime check means that buggy programs
> blow up much later and in a confusing manner. And it impacts all code that
> is doing a spin lock, certainly one of the hottest paths in DPDK.

True, buggy programs will error later, but other than converting the
rte_wait_until_equal_* functions to macros, I haven't seen an approach
that will let us use a static assert on the passed in memory order
value right now.

However, new enough GCC and Clang versions will optimise out runtime
checks that are guaranteed to pass, such as in this case when an inline
functions is called with a constant value. To give a reduced example:

#include <assert.h>

enum E {
        X,
        Y,
        Z,
};

static inline __attribute__((always_inline)) void
add(int *dst, enum E e)
{
        assert(e == X || e == Y);
        if (e == X)
                (*dst)++;
        else
                (*dst) += 4;
}
}

int
fn(int num)
{
        add(&num, Y);
        return num;
}

(GCC https://godbolt.org/z/GKGEa7Wdd )
(Clang https://godbolt.org/z/1vYv8MYcs )

With optimisations they both produce a fn that doesn't perform any check
on the value of e, and simply return the input+4.

Therefore, this patch shouldn't have an impact on the performance of
locks when built in release mode.

Reply via email to