On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw <richard.earns...@foss.arm.com> wrote: > > > > On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote: > > On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw > > <richard.earns...@foss.arm.com> wrote: > >> > >> > >> > >> On 22/07/2021 12:32, Prathamesh Kulkarni wrote: > >>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw > >>> <richard.earns...@foss.arm.com> wrote: > >>>> > >>>> > >>>> > >>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote: > >>>>> Hi, > >>>>> The attached patch removes calls to builtins from vshl_n intrinsics, > >>>>> and replacing them > >>>>> with left shift operator. The patch passes bootstrap+test on > >>>>> arm-linux-gnueabihf. > >>>>> > >>>>> Altho, I noticed, that the patch causes 3 extra registers to spill > >>>>> using << instead > >>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of > >>>>> intrinsics ? > >>>>> Before patch, the shift operation was performed by call to > >>>>> __builtin_neon_vshl<type> (__a, __b) > >>>>> and now it's inlined to __a << __b, which might result in increased > >>>>> register pressure ? > >>>>> > >>>>> Thanks, > >>>>> Prathamesh > >>>>> > >>>> > >>>> > >>>> You're missing a ChangeLog for the patch. > >>> Sorry, updated in this patch. > >>>> > >>>> However, I'm not sure about this. The register shift form of VSHL > >>>> performs a right shift if the value is negative, which is UB if you > >>>> write `<<` instead. > >>>> > >>>> Have I missed something here? > >>> Hi Richard, > >>> According to this article: > >>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N > >>> For vshl_n, the shift amount is always in the non-negative range for all > >>> types. > >>> > >>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following > >>> diagnostic: > >>> foo.c: In function ‘main’: > >>> foo.c:17:1: error: constant -1 out of range 0 - 31 > >>> 17 | } > >>> | ^ > >>> > >> > >> It does do that now, but that's because the intrinsic expansion does > >> some bounds checking; when you remove the call into the back-end > >> intrinsic that will no-longer happen. > >> > >> I think with this change various things are likely: > >> > >> - We'll no-longer reject non-immediate values, so users will be able to > >> write > >> > >> int b = 5; > >> vshl_n_s32 (a, b); > >> > >> which will expand to a vdup followed by the register form. > >> > >> - we'll rely on the front-end diagnosing out-of range shifts > >> > >> - code of the form > >> > >> int b = -1; > >> vshl_n_s32 (a, b); > >> > >> will probably now go through without any errors, especially at low > >> optimization levels. It may end up doing what the user wanted, but it's > >> definitely a change in behaviour - and perhaps worse, the compiler might > >> diagnose the above as UB and silently throw some stuff away. > >> > >> It might be that we need to insert some form of static assertion that > >> the second argument is a __builtin_constant_p(). > > Ah right, thanks for the suggestions! > > I tried the above example: > > int b = -1; > > vshl_n_s32 (a, b); > > and it compiled without any errors with -O0 after patch. > > > > Would it be OK to use _Static_assert (__builtin_constant_p (b)) to > > guard against non-immediate values ? > > > > With the following change: > > __extension__ extern __inline int32x2_t > > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > > vshl_n_s32 (int32x2_t __a, const int __b) > > { > > _Static_assert (__builtin_constant_p (__b)); > > return __a << __b; > > } > > > > the above example fails at -O0: > > ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’: > > ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed > > 4904 | _Static_assert (__builtin_constant_p (__b)); > > | ^~~~~~~~~~~~~~ > > I've been playing with that but unfortunately it doesn't seem to work in > the way we want it to. For a complete test: > > > > typedef __simd64_int32_t int32x2_t; > > __extension__ extern __inline int32x2_t > __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) > vshl_n_s32 (int32x2_t __a, const int __b) > { > _Static_assert (__builtin_constant_p (__b), "Second argument must be > a litteral constant"); > return __a << __b; > } > > int32x2_t f (int32x2_t x, const int b) > { > return vshl_n_s32 (x, 1); > } > > At -O0 I get: > > test.c: In function ‘vshl_n_s32’: > test.c:7:3: error: static assertion failed: "Second argument must be a > litteral constant" > 7 | _Static_assert (__builtin_constant_p (__b), "Second argument > must be a litteral constant"); > | ^~~~~~~~~~~~~~ > > While at -O1 and above I get: > > > test.c: In function ‘vshl_n_s32’: > test.c:7:19: error: expression in static assertion is not constant > 7 | _Static_assert (__builtin_constant_p (__b), "Second argument > must be a litteral constant"); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~ > > Which indicates that it doesn't consider __builtin_constant_p() to be a > constant expression :( > > So either I'm writing the static assertion incorrectly, or something > weird is going on. The most likely issue is that the static assertion > is being processed too early, before the function is inlined. Ah indeed. I wonder if we should add an attribute to parameter that it should be constant, and emit an error if the caller passes non-constant value ? sth like: void foo(int x __attribute__((runtime_constant))); and the front-end can then diagnose if the argument is __builtin_constant_p while type-checking call to foo.
Thanks, Prathamesh > > R. > > > > > Thanks, > > Prathamesh > >> > >> R. > >> > >>> So, is the attached patch OK ? > >> > >>> > >>> Thanks, > >>> Prathamesh > >>>> > >>>> R.