On Thu, 22 Jul 2021 at 20:29, Richard Earnshaw
<richard.earns...@foss.arm.com> wrote:
>
>
>
> On 22/07/2021 14:47, Prathamesh Kulkarni via Gcc-patches wrote:
> > On Thu, 22 Jul 2021 at 17:28, Richard Earnshaw
> > <richard.earns...@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 22/07/2021 12:32, Prathamesh Kulkarni wrote:
> >>> On Thu, 22 Jul 2021 at 16:03, Richard Earnshaw
> >>> <richard.earns...@foss.arm.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 22/07/2021 08:45, Prathamesh Kulkarni via Gcc-patches wrote:
> >>>>> Hi,
> >>>>> The attached patch removes calls to builtins from vshl_n intrinsics,
> >>>>> and replacing them
> >>>>> with left shift operator. The patch passes bootstrap+test on
> >>>>> arm-linux-gnueabihf.
> >>>>>
> >>>>> Altho, I noticed, that the patch causes 3 extra registers to spill
> >>>>> using << instead
> >>>>> of the builtin for vshl_n.c. Could that be perhaps due to inlining of
> >>>>> intrinsics ?
> >>>>> Before patch, the shift operation was performed by call to
> >>>>> __builtin_neon_vshl<type> (__a, __b)
> >>>>> and now it's inlined to __a << __b, which might result in increased
> >>>>> register pressure ?
> >>>>>
> >>>>> Thanks,
> >>>>> Prathamesh
> >>>>>
> >>>>
> >>>>
> >>>> You're missing a ChangeLog for the patch.
> >>> Sorry, updated in this patch.
> >>>>
> >>>> However, I'm not sure about this.  The register shift form of VSHL
> >>>> performs a right shift if the value is negative, which is UB if you
> >>>> write `<<` instead.
> >>>>
> >>>> Have I missed something here?
> >>> Hi Richard,
> >>> According to this article:
> >>> https://developer.arm.com/documentation/den0018/a/NEON-Intrinsics-Reference/Shift/VSHL-N
> >>> For vshl_n, the shift amount is always in the non-negative range for all 
> >>> types.
> >>>
> >>> I tried using vshl_n_s32 (a, -1), and the compiler emitted following 
> >>> diagnostic:
> >>> foo.c: In function ‘main’:
> >>> foo.c:17:1: error: constant -1 out of range 0 - 31
> >>>      17 | }
> >>>         | ^
> >>>
> >>
> >> It does do that now, but that's because the intrinsic expansion does
> >> some bounds checking; when you remove the call into the back-end
> >> intrinsic that will no-longer happen.
> >>
> >> I think with this change various things are likely:
> >>
> >> - We'll no-longer reject non-immediate values, so users will be able to
> >> write
> >>
> >>           int b = 5;
> >>          vshl_n_s32 (a, b);
> >>
> >>     which will expand to a vdup followed by the register form.
> >>
> >> - we'll rely on the front-end diagnosing out-of range shifts
> >>
> >> - code of the form
> >>
> >>          int b = -1;
> >>          vshl_n_s32 (a, b);
> >>
> >>     will probably now go through without any errors, especially at low
> >> optimization levels.  It may end up doing what the user wanted, but it's
> >> definitely a change in behaviour - and perhaps worse, the compiler might
> >> diagnose the above as UB and silently throw some stuff away.
> >>
> >> It might be that we need to insert some form of static assertion that
> >> the second argument is a __builtin_constant_p().
> > Ah right, thanks for the suggestions!
> > I tried the above example:
> > int b = -1;
> > vshl_n_s32 (a, b);
> > and it compiled without any errors with -O0 after patch.
> >
> > Would it be OK to use _Static_assert (__builtin_constant_p (b)) to
> > guard against non-immediate values ?
> >
> > With the following change:
> > __extension__ extern __inline int32x2_t
> > __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> > vshl_n_s32 (int32x2_t __a, const int __b)
> > {
> >    _Static_assert (__builtin_constant_p (__b));
> >    return __a << __b;
> > }
> >
> > the above example fails at -O0:
> > ../armhf-build/gcc/include/arm_neon.h: In function ‘vshl_n_s32’:
> > ../armhf-build/gcc/include/arm_neon.h:4904:3: error: static assertion failed
> >   4904 |   _Static_assert (__builtin_constant_p (__b));
> >        |   ^~~~~~~~~~~~~~
>
> I've been playing with that but unfortunately it doesn't seem to work in
> the way we want it to.  For a complete test:
>
>
>
> typedef __simd64_int32_t int32x2_t;
>
> __extension__ extern __inline int32x2_t
> __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
> vshl_n_s32 (int32x2_t __a, const int __b)
> {
>    _Static_assert (__builtin_constant_p (__b), "Second argument must be
> a litteral constant");
>    return __a << __b;
> }
>
> int32x2_t f (int32x2_t x, const int b)
> {
>    return vshl_n_s32 (x, 1);
> }
>
> At -O0 I get:
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:3: error: static assertion failed: "Second argument must be a
> litteral constant"
>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>        |   ^~~~~~~~~~~~~~
>
> While at -O1 and above I get:
>
>
> test.c: In function ‘vshl_n_s32’:
> test.c:7:19: error: expression in static assertion is not constant
>      7 |   _Static_assert (__builtin_constant_p (__b), "Second argument
> must be a litteral constant");
>        |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Which indicates that it doesn't consider __builtin_constant_p() to be a
> constant expression :(
>
> So either I'm writing the static assertion incorrectly, or something
> weird is going on.  The most likely issue is that the static assertion
> is being processed too early, before the function is inlined.
Ah indeed. I wonder if we should add an attribute to parameter that it
should be constant,
and emit an error if the caller passes non-constant value ?
sth like:
void foo(int x __attribute__((runtime_constant)));
and the front-end can then diagnose if the argument is
__builtin_constant_p while type-checking call to foo.

Thanks,
Prathamesh
>
> R.
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> R.
> >>
> >>> So, is the attached patch OK ?
> >>
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>>
> >>>> R.

Reply via email to