On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw <richard.earns...@foss.arm.com> wrote: > > > > On 22/07/2021 12:32, Prathamesh Kulkarni wrote: > > On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw > > <richard.earns...@foss.arm.com> wrote: > >> > >> > >> > >> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote: > >>> Hi, > >>> The attached patch removes calls to builtins from vshl_n intrinsics, > >>> and replacing them > >>> with left shift operator. The patch passes bootstrap+test on > >>> arm-linux-gnueabihf. > >>> > >>> Altho, I noticed, that the patch causes 3 extra registers to spill > >>> using << instead > >>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of > >>> intrinsics ? > >>> Before patch, the shift operation was performed by call to > >>> __builtin_neon_vshl<type> (__a, __b) > >>> and now it's inlined to __a << __b, which might result in increased > >>> register pressure ? > >>> > >>> Thanks, > >>> Prathamesh > >>> > >> > >> > >> You're missing a ChangeLog for the patch. > > Sorry, updated in this patch. > >> > >> However, I'm not sure about this. The register shift form of VSHL > >> performs a right shift if the value is negative, which is UB if you > >> write `<<` instead. > >> > >> Have I missed something here? > > Hi Richard, > > According to this article: > > https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N > > For vshl_n, the shift amount is always in the non-negative range for all > > types. > > > > I tried using vshl_n_s32 (a, -1), and the compiler emitted following > > diagnostic: > > foo.c: In function ‘main’: > > foo.c:17:1: error: constant -1 out of range 0 - 31 > > 17 | } > > | ^ > > > > It does do that now, but that's because the intrinsic expansion does > some bounds checking; when you remove the call into the back-end > intrinsic that will no-longer happen. > > I think with this change various things are likely: > > - We'll no-longer reject non-immediate values, so users will be able to > write > > int b = 5; > vshl_n_s32 (a, b); > > which will expand to a vdup followed by the register form. > > - we'll rely on the front-end diagnosing out-of range shifts > > - code of the form > > int b = -1; > vshl_n_s32 (a, b); > > will probably now go through without any errors, especially at low > optimization levels. It may end up doing what the user wanted, but it's > definitely a change in behaviour - and perhaps worse, the compiler might > diagnose the above as UB and silently throw some stuff away. > > It might be that we need to insert some form of static assertion that > the second argument is a __builtin_constant_p(). Ah right, thanks for the suggestions! I tried the above example: int b = -1; vshl_n_s32 (a, b); and it compiled without any errors with -O0 after patch.
Would it be OK to use _Static_assert (__builtin_constant_p (b)) to guard against non-immediate values ? With the following change: __extension__ extern __inline int32x2_t __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) vshl_n_s32 (int32x2_t __a, const int __b) { _Static_assert (__builtin_constant_p (__b)); return __a << __b; } the above example fails at -O0: ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’: ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed 4904 | _Static_assert (__builtin_constant_p (__b)); | ^~~~~~~~~~~~~~ Thanks, Prathamesh > > R. > > > So, is the attached patch OK ? > > > > > Thanks, > > Prathamesh > >> > >> R.