Tamar Christina <tamar.christ...@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandif...@arm.com> >> Sent: Monday, October 14, 2024 7:34 PM >> To: Tamar Christina <tamar.christ...@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; rguent...@suse.de >> Subject: Re: [PATCH 1/4]middle-end: support multi-step zero-extends using >> VEC_PERM_EXPR >> >> Tamar Christina <tamar.christ...@arm.com> writes: >> > Hi All, >> > >> > This patch series adds support for a target to do a direct convertion for >> > zero >> > extends using permutes. >> > >> > To do this it uses a target hook use_permute_for_promotio which must be >> > implemented by targets. This hook is used to indicate: >> > >> > 1. can a target do this for the given modes. >> > 2. is it profitable for the target to do it. >> > 3. can the target convert between various vector modes with a >> > VIEW_CONVERT. >> > >> > Using permutations have a big benefit of multi-step zero extensions because >> they >> > both reduce the number of needed instructions, but also increase >> > throughput as >> > the dependency chain is removed. >> > >> > Concretely on AArch64 this changes: >> > >> > void test4(unsigned char *x, long long *y, int n) { >> > for(int i = 0; i < n; i++) { >> > y[i] = x[i]; >> > } >> > } >> > >> > from generating: >> > >> > .L4: >> > ldr q30, [x4], 16 >> > add x3, x3, 128 >> > zip1 v1.16b, v30.16b, v31.16b >> > zip2 v30.16b, v30.16b, v31.16b >> > zip1 v2.8h, v1.8h, v31.8h >> > zip1 v0.8h, v30.8h, v31.8h >> > zip2 v1.8h, v1.8h, v31.8h >> > zip2 v30.8h, v30.8h, v31.8h >> > zip1 v26.4s, v2.4s, v31.4s >> > zip1 v29.4s, v0.4s, v31.4s >> > zip1 v28.4s, v1.4s, v31.4s >> > zip1 v27.4s, v30.4s, v31.4s >> > zip2 v2.4s, v2.4s, v31.4s >> > zip2 v0.4s, v0.4s, v31.4s >> > zip2 v1.4s, v1.4s, v31.4s >> > zip2 v30.4s, v30.4s, v31.4s >> > stp q26, q2, [x3, -128] >> > stp q28, q1, [x3, -96] >> > stp q29, q0, [x3, -64] >> > stp q27, q30, [x3, -32] >> > cmp x4, x5 >> > bne .L4 >> > >> > and instead we get: >> > >> > .L4: >> > add x3, x3, 128 >> > ldr q23, [x4], 16 >> > tbl v5.16b, {v23.16b}, v31.16b >> > tbl v4.16b, {v23.16b}, v30.16b >> > tbl v3.16b, {v23.16b}, v29.16b >> > tbl v2.16b, {v23.16b}, v28.16b >> > tbl v1.16b, {v23.16b}, v27.16b >> > tbl v0.16b, {v23.16b}, v26.16b >> > tbl v22.16b, {v23.16b}, v25.16b >> > tbl v23.16b, {v23.16b}, v24.16b >> > stp q5, q4, [x3, -128] >> > stp q3, q2, [x3, -96] >> > stp q1, q0, [x3, -64] >> > stp q22, q23, [x3, -32] >> > cmp x4, x5 >> > bne .L4 >> > >> > Tests are added in the AArch64 patch introducing the hook. The testsuite >> > also >> > already had about 800 runtime tests that get affected by this. >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, >> > x86_64-pc-linux-gnu -m32, -m64 and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> > * target.def (use_permute_for_promotion): New. >> > * doc/tm.texi.in: Document it. >> > * doc/tm.texi: Regenerate. >> > * targhooks.cc (default_use_permute_for_promotion): New. >> > * targhooks.h (default_use_permute_for_promotion): New. >> > (vectorizable_conversion): Support direct convertion with permute. >> > * tree-vect-stmts.cc (vect_create_vectorized_promotion_stmts): Likewise. >> > (supportable_widening_operation): Likewise. >> > (vect_gen_perm_mask_any): Allow vector permutes where input registers >> > are half the width of the result per the GCC 14 relaxation of >> > VEC_PERM_EXPR. >> > >> > --- >> > >> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> > index >> 4deb3d2c283a2964972b94f434370a6f57ea816a..e8192590ac14005bf7cb5f73 >> 1c16ee7eacb78143 100644 >> > --- a/gcc/doc/tm.texi >> > +++ b/gcc/doc/tm.texi >> > @@ -6480,6 +6480,15 @@ type @code{internal_fn}) should be considered >> expensive when the mask is >> > all zeros. GCC can then try to branch around the instruction instead. >> > @end deftypefn >> > >> > +@deftypefn {Target Hook} bool >> TARGET_VECTORIZE_USE_PERMUTE_FOR_PROMOTION (const_tree >> @var{in_type}, const_tree @var{out_type}) >> > +This hook returns true if the operation promoting @var{in_type} to >> > +@var{out_type} should be done as a vector permute. If @var{out_type} is >> > +a signed type the operation will be done as the related unsigned type and >> > +converted to @var{out_type}. If the target supports the needed permute, >> > +is able to convert unsigned(@var{out_type}) to @var{out_type} and it is >> > +beneficial to the hook should return true, else false should be returned. >> > +@end deftypefn >> >> Just a review of the documentation, but: is a two-step process really >> necessary for signed out_types? I thought it could be done directly, >> since it's in_type rather than out_type that determines the type of >> extension. > > Thanks!, > > I think this is an indication the text is ambiguous. The intention was to say > that if out_type is signed, we still keep the type as signed, but insert an > intermediate cast to (unsigned type(out_type)).
Yeah, the documentation explained that well. I was simply confused, sorry. I was still thinking in terms of the type requirements for conversions (where going directly from unsigned to signed would be ok). But of course, that isn't true for VEC_PERM_EXPR. So ignore my earlier comment. Thanks, Richard