On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
<richard.earns...@foss.arm.com> wrote:
>
>
>
> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> > On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> > <richard.earns...@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>> Hi,
> >>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>> and replacing them
> >>> with left shift operator. The patch passes bootstrap+test on
> >>> arm-linux-gnueabihf.
> >>>
> >>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>> using << instead
> >>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>> intrinsics ?
> >>> Before patch, the shift operation was performed by call to
> >>> __builtin_neon_vshl<type> (__a, __b)
> >>> and now it's inlined to __a << __b, which might result in increased
> >>> register pressure ?
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>
> >>
> >>
> >> You're missing a ChangeLog for the patch.
> > Sorry, updated in this patch.
> >>
> >> However, I'm not sure about this.  The register shift form of VSHL
> >> performs a right shift if the value is negative, which is UB if you
> >> write `<<` instead.
> >>
> >> Have I missed something here?
> > Hi Richard,
> > According to this article:
> > https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> > For vshl_n, the shift amount is always in the non-negative range for all 
> > types.
> >
> > I tried using vshl_n_s32 (a, -1), and the compiler emitted following 
> > diagnostic:
> > foo.c: In function ‘main’:
> > foo.c:17:1: error: constant -1 out of range 0 - 31
> >     17 | }
> >        | ^
> >
>
> It does do that now, but that's because the intrinsic expansion does
> some bounds checking; when you remove the call into the back-end
> intrinsic that will no-longer happen.
>
> I think with this change various things are likely:
>
> - We'll no-longer reject non-immediate values, so users will be able to
> write
>
>          int b = 5;
>         vshl_n_s32 (a, b);
>
>    which will expand to a vdup followed by the register form.
>
> - we'll rely on the front-end diagnosing out-of range shifts
>
> - code of the form
>
>         int b = -1;
>         vshl_n_s32 (a, b);
>
>    will probably now go through without any errors, especially at low
> optimization levels.  It may end up doing what the user wanted, but it's
> definitely a change in behaviour - and perhaps worse, the compiler might
> diagnose the above as UB and silently throw some stuff away.
>
> It might be that we need to insert some form of static assertion that
> the second argument is a __builtin_constant_p().
Ah right, thanks for the suggestions!
I tried the above example:
int b = -1;
vshl_n_s32 (a, b);
and it compiled without any errors with -O0 after patch.

Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
guard against non-immediate values ?

With the following change:
__extension__ extern __inline int32x2_t
__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
vshl_n_s32 (int32x2_t __a, const int __b)
{
  _Static_assert (__builtin_constant_p (__b));
  return __a << __b;
}

the above example fails at -O0:
../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
 4904 |   _Static_assert (__builtin_constant_p (__b));
      |   ^~~~~~~~~~~~~~

Thanks,
Prathamesh
>
> R.
>
> > So, is the attached patch OK ?
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> R.

Reply via email to