On Wed, Nov 16, 2022 at 4:25 AM Richard Biener via Gcc-patches <[email protected]> wrote: > > On Tue, 15 Nov 2022, Richard Sandiford wrote: > > > "Andre Vieira (lists)" <[email protected]> writes: > > > On 07/11/2022 11:05, Richard Biener wrote: > > >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > > >> > > >>> Sorry for the delay, just been reminded I still had this patch > > >>> outstanding > > >>> from last stage 1. Hopefully since it has been mostly reviewed it could > > >>> go in > > >>> for this stage 1? > > >>> > > >>> I addressed the comments and gave the slp-part of vectorizable_call > > >>> some TLC > > >>> to make it work. > > >>> > > >>> I also changed vect_get_slp_defs as I noticed that the call from > > >>> vectorizable_call was creating an auto_vec with 'nargs' that might be > > >>> less > > >>> than the number of children in the slp_node > > >> how so? Please fix that in the caller. It looks like it probably > > >> shoud use vect_nargs instead? > > > Well that was my first intuition, but when I looked at it further the > > > variant it's calling: > > > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec<vec<tree> > > > > *vec_oprnds, unsigned n) > > > > > > Is actually creating a vector of vectors of slp defs. So for each child > > > of slp_node it calls: > > > void vect_get_slp_defs (slp_tree slp_node, vec<tree> *vec_defs) > > > > > > Which returns a vector of vectorized defs. So vect_nargs would be the > > > right size for the inner vec<tree> of vec_defs, but the outer should > > > have the same number of elements as the original slp_node has children. > > > > > > However, at the call site (vectorizable_call), the operand we pass to > > > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > > > specialized for slp_node. I'll go see if I can change the call site to > > > not have to do that, given the continue at the end of the if (slp_node) > > > BB I don't think it needs to use vec_defs after it, but it may require > > > some massaging to be able to define it separately for each code-path. > > > > > >> > > >>> , so that quick_push might not be > > >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I > > >>> didn't > > >>> actually come across any failure because of it though. Happy to split > > >>> this > > >>> into a separate patch if needed. > > >>> > > >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and > > >>> x86_64-pc-linux-gnu. > > >>> > > >>> OK for trunk? > > >> I'll leave final approval to Richard but > > >> > > >> - This only needs 1 bit, but occupies the full 16 to ensure a nice > > >> + This only needs 1 bit, but occupies the full 15 to ensure a nice > > >> layout. */ > > >> unsigned int vectorizable : 16; > > >> > > >> you don't actually change the width of the bitfield. I would find > > >> it more natural to have > > >> > > >> signed int type0 : 7; > > >> signed int type0_vtrans : 1; > > >> signed int type1 : 7; > > >> signed int type1_vtrans : 1; > > >> > > >> with typeN_vtrans specifying how the types transform when vectorized. > > >> I would imagine another variant we could need is narrow/widen > > >> according to either result or other argument type? That said, > > >> just your flag would then be > > >> > > >> signed int type0 : 7; > > >> signed int pad : 1; > > >> signed int type1 : 7; > > >> signed int type1_vect_as_scalar : 1; > > >> > > >> ? > > > That's a cool idea! I'll leave it as a single bit for now like that, if > > > we want to re-use it for multiple transformations we will obviously need > > > to rename & give it more bits. > > > > I think we should steal bits from vectorizable rather than shrink > > type0 and type1 though. Then add a 14-bit padding field to show > > how many bits are left. > > > > > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo, > > > rhs_type = unsigned_type_node; > > > } > > > > > > + /* The argument that is not of the same type as the others. */ > > > int mask_opno = -1; > > > + int scalar_opno = -1; > > > if (internal_fn_p (cfn)) > > > - mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > > + { > > > + internal_fn ifn = as_internal_fn (cfn); > > > + if (direct_internal_fn_p (ifn) > > > + && direct_internal_fn (ifn).type1_is_scalar_p) > > > + scalar_opno = direct_internal_fn (ifn).type1; > > > + else > > > + /* For masked operations this represents the argument that carries the > > > + mask. */ > > > + mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > > > This doesn't seem logically like an else. We should do both. > > > > LGTM otherwise for the bits outside match.pd. If Richard's happy with > > the match.pd bits then I think the patch is OK with those changes and > > without the vect_get_slp_defs thing (as you mentioned downthread). > > Yes, the match.pd part looked OK.
I was in the process of cleaning up patchworks for aarch64 patches and came across this one which looks like it was approved but never went in. I doubt it applies now. And we are in stage 3 already. Maybe for stage 1 this patch can be revived/revisited. I have not checked to see if the testcases now emit the expected code. Thanks, Andrew > > > Thanks, > > Richard > > > > > > >> > > >>> gcc/ChangeLog: > > >>> > > >>> * config/aarch64/aarch64.md (ftrunc<mode><frintnz_mode>2): New > > >>> pattern. > > >>> * config/aarch64/iterators.md (FRINTNZ): New iterator. > > >>> (frintnz_mode): New int attribute. > > >>> (VSFDF): Make iterator conditional. > > >>> * internal-fn.def (FTRUNC_INT): New IFN. > > >>> * internal-fn.cc (ftrunc_int_direct): New define. > > >>> (expand_ftrunc_int_optab_fn): New custom expander. > > >>> (direct_ftrunc_int_optab_supported_p): New supported_p. > > >>> * internal-fn.h (direct_internal_fn_info): Add new member > > >>> type1_is_scalar_p. > > >>> * match.pd: Add to the existing TRUNC pattern match. > > >>> * optabs.def (ftrunc_int): New entry. > > >>> * stor-layout.h (element_precision): Moved from here... > > >>> * tree.h (element_precision): ... to here. > > >>> (element_type): New declaration. > > >>> * tree.cc (element_type): New function. > > >>> (element_precision): Changed to use element_type. > > >>> * tree-vect-stmts.cc (vectorizable_internal_function): Add > > >>> support for > > >>> IFNs with different input types. > > >>> (vect_get_scalar_oprnds): New function. > > >>> (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > > >>> * tree-vect-slp.cc (check_scalar_arg_ok): New function. > > >>> (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. > > >>> (vect_get_slp_defs): Ensure vec_oprnds has enough slots to > > >>> push. > > >>> * doc/md.texi: New entry for ftrunc pattern name. > > >>> * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. > > >>> > > >>> gcc/testsuite/ChangeLog: > > >>> > > >>> * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz > > >>> instructions available. > > >>> * lib/target-supports.exp: Added aarch64_frintnzx_ok target and > > >>> aarch64_frintz options. > > >>> * gcc.target/aarch64/frintnz.c: New test. > > >>> * gcc.target/aarch64/frintnz_vec.c: New test. > > >>> * gcc.target/aarch64/frintnz_slp.c: New test. > > >>> > > > > -- > Richard Biener <[email protected]> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; > HRB 36809 (AG Nuernberg)
