Tamar Christina <tamar.christ...@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandif...@arm.com>
>> Sent: Monday, October 14, 2024 7:34 PM
>> To: Tamar Christina <tamar.christ...@arm.com>
>> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de
>> Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using
>> VEC_PERM_EXPR
>> 
>> Tamar Christina <tamar.christ...@arm.com> writes:
>> > Hi All,
>> >
>> > This patch series adds support for a target to do a direct convertion for 
>> > zero
>> > extends using permutes.
>> >
>> > To do this it uses a target hook use_permute_for_promotio which must be
>> > implemented by targets.  This hook is used to indicate:
>> >
>> >  1. can a target do this for the given modes.
>> >  2. is it profitable for the target to do it.
>> >  3. can the target convert between various vector modes with a 
>> > VIEW_CONVERT.
>> >
>> > Using permutations have a big benefit of multi-step zero extensions because
>> they
>> > both reduce the number of needed instructions, but also increase 
>> > throughput as
>> > the dependency chain is removed.
>> >
>> > Concretely on AArch64 this changes:
>> >
>> > void test4(unsigned char *x, long long *y, int n) {
>> >     for(int i = 0; i < n; i++) {
>> >         y[i] = x[i];
>> >     }
>> > }
>> >
>> > from generating:
>> >
>> > .L4:
>> >         ldr     q30, [x4], 16
>> >         add     x3, x3, 128
>> >         zip1    v1.16b, v30.16b, v31.16b
>> >         zip2    v30.16b, v30.16b, v31.16b
>> >         zip1    v2.8h, v1.8h, v31.8h
>> >         zip1    v0.8h, v30.8h, v31.8h
>> >         zip2    v1.8h, v1.8h, v31.8h
>> >         zip2    v30.8h, v30.8h, v31.8h
>> >         zip1    v26.4s, v2.4s, v31.4s
>> >         zip1    v29.4s, v0.4s, v31.4s
>> >         zip1    v28.4s, v1.4s, v31.4s
>> >         zip1    v27.4s, v30.4s, v31.4s
>> >         zip2    v2.4s, v2.4s, v31.4s
>> >         zip2    v0.4s, v0.4s, v31.4s
>> >         zip2    v1.4s, v1.4s, v31.4s
>> >         zip2    v30.4s, v30.4s, v31.4s
>> >         stp     q26, q2, [x3, -128]
>> >         stp     q28, q1, [x3, -96]
>> >         stp     q29, q0, [x3, -64]
>> >         stp     q27, q30, [x3, -32]
>> >         cmp     x4, x5
>> >         bne     .L4
>> >
>> > and instead we get:
>> >
>> > .L4:
>> >         add     x3, x3, 128
>> >         ldr     q23, [x4], 16
>> >         tbl     v5.16b, {v23.16b}, v31.16b
>> >         tbl     v4.16b, {v23.16b}, v30.16b
>> >         tbl     v3.16b, {v23.16b}, v29.16b
>> >         tbl     v2.16b, {v23.16b}, v28.16b
>> >         tbl     v1.16b, {v23.16b}, v27.16b
>> >         tbl     v0.16b, {v23.16b}, v26.16b
>> >         tbl     v22.16b, {v23.16b}, v25.16b
>> >         tbl     v23.16b, {v23.16b}, v24.16b
>> >         stp     q5, q4, [x3, -128]
>> >         stp     q3, q2, [x3, -96]
>> >         stp     q1, q0, [x3, -64]
>> >         stp     q22, q23, [x3, -32]
>> >         cmp     x4, x5
>> >         bne     .L4
>> >
>> > Tests are added in the AArch64 patch introducing the hook.  The testsuite 
>> > also
>> > already had about 800 runtime tests that get affected by this.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
>> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >    * target.def (use_permute_for_promotion): New.
>> >    * doc/tm.texi.in: Document it.
>> >    * doc/tm.texi: Regenerate.
>> >    * targhooks.cc (default_use_permute_for_promotion): New.
>> >    * targhooks.h (default_use_permute_for_promotion): New.
>> >    (vectorizable_conversion): Support direct convertion with permute.
>> >    * tree-vect-stmts.cc (vect_create_vectorized_promotion_stmts): Likewise.
>> >    (supportable_widening_operation): Likewise.
>> >    (vect_gen_perm_mask_any): Allow vector permutes where input registers
>> >    are half the width of the result per the GCC 14 relaxation of
>> >    VEC_PERM_EXPR.
>> >
>> > ---
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index
>> 4deb3d2c283a2964972b94f434370a6f57ea816a..e8192590ac14005bf7cb5f73
>> 1c16ee7eacb78143 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -6480,6 +6480,15 @@ type @code{internal_fn}) should be considered
>> expensive when the mask is
>> >  all zeros.  GCC can then try to branch around the instruction instead.
>> >  @end deftypefn
>> >
>> > +@deftypefn {Target Hook} bool
>> TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION (const_tree
>> @var{in_type}, const_tree @var{out_type})
>> > +This hook returns true if the operation promoting @var{in_type} to
>> > +@var{out_type} should be done as a vector permute.  If @var{out_type} is
>> > +a signed type the operation will be done as the related unsigned type and
>> > +converted to @var{out_type}.  If the target supports the needed permute,
>> > +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is
>> > +beneficial to the hook should return true, else false should be returned.
>> > +@end deftypefn
>> 
>> Just a review of the documentation, but: is a two-step process really
>> necessary for signed out_types?  I thought it could be done directly,
>> since it's in_type rather than out_type that determines the type of
>> extension.
>
> Thanks!,
>
> I think this is an indication the text is ambiguous.  The intention was to say
> that if out_type is signed, we still keep the type as signed, but insert an
> intermediate cast to (unsigned type(out_type)).

Yeah, the documentation explained that well.

I was simply confused, sorry.  I was still thinking in terms of the
type requirements for conversions (where going directly from unsigned
to signed would be ok).  But of course, that isn't true for VEC_PERM_EXPR.

So ignore my earlier comment.

Thanks,
Richard

Reply via email to