Hi Richard,
On 7 January 2018 at 18:08, James Greenhalgh <james.greenha...@arm.com> wrote: > On Mon, Dec 18, 2017 at 07:40:00PM +0000, Jeff Law wrote: >> On 11/17/2017 07:56 AM, Richard Sandiford wrote: >> > This patch adds support for using a single fully-predicated loop instead >> > of a vector loop and a scalar tail. An SVE WHILELO instruction generates >> > the predicate for each iteration of the loop, given the current scalar >> > iv value and the loop bound. This operation is wrapped up in a new >> > internal >> > function called WHILE_ULT. E.g.: >> > >> > WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 } >> > WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 } >> > >> > The third WHILE_ULT argument is needed to make the operation >> > unambiguous: without it, WHILE_ULT (0, 3) for one vector type would >> > seem equivalent to WHILE_ULT (0, 3) for another, even if the types have >> > different numbers of elements. >> > >> > Note that the patch uses "mask" and "fully-masked" instead of >> > "predicate" and "fully-predicated", to follow existing GCC terminology. >> > >> > This patch just handles the simple cases, punting for things like >> > reductions and live-out values. Later patches remove most of these >> > restrictions. >> > >> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >> > and powerpc64le-linux-gnu. OK to install? >> > >> > Richard >> > >> > >> > 2017-11-17 Richard Sandiford <richard.sandif...@linaro.org> >> > Alan Hayward <alan.hayw...@arm.com> >> > David Sherwood <david.sherw...@arm.com> >> > >> > gcc/ >> > * optabs.def (while_ult_optab): New optab. >> > * doc/md.texi (while_ult@var{m}@var{n}): Document. >> > * internal-fn.def (WHILE_ULT): New internal function. >> > * internal-fn.h (direct_internal_fn_supported_p): New override >> > that takes two types as argument. >> > * internal-fn.c (while_direct): New macro. >> > (expand_while_optab_fn): New function. >> > (convert_optab_supported_p): Likewise. >> > (direct_while_optab_supported_p): New macro. >> > * wide-int.h (wi::udiv_ceil): New function. >> > * tree-vectorizer.h (rgroup_masks): New structure. >> > (vec_loop_masks): New typedef. >> > (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p >> > and fully_masked_p. >> > (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P) >> > (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros. >> > (vect_max_vf): New function. >> > (slpeel_make_loop_iterate_ntimes): Delete. >> > (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while) >> > (vect_halve_mask_nunits, vect_double_mask_nunits): Declare. >> > )vect_record_loop_mask, vect_get_loop_mask): Likewise. >> > * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h, >> > internal-fn.h, stor-layout.h and optabs-query.h. >> > (vect_set_loop_mask): New function. >> > (add_preheader_seq): Likewise. >> > (add_header_seq): Likewise. >> > (vect_maybe_permute_loop_masks): Likewise. >> > (vect_set_loop_masks_directly): Likewise. >> > (vect_set_loop_condition_masked): Likewise. >> > (vect_set_loop_condition_unmasked): New function, split out from >> > slpeel_make_loop_iterate_ntimes. >> > (slpeel_make_loop_iterate_ntimes): Rename to.. >> > (vect_set_loop_condition): ...this. Use vect_set_loop_condition_masked >> > for fully-masked loops and vect_set_loop_condition_unmasked otherwise. >> > (vect_do_peeling): Update call accordingly. >> > (vect_gen_vector_loop_niters): Use VF as the step for fully-masked >> > loops. >> > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize >> > mask_compare_type, can_fully_mask_p and fully_masked_p. >> > (release_vec_loop_masks): New function. >> > (_loop_vec_info): Use it to free the loop masks. >> > (can_produce_all_loop_masks_p): New function. >> > (vect_get_max_nscalars_per_iter): Likewise. >> > (vect_verify_full_masking): Likewise. >> > (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around >> > retries, and free the mask rgroups before retrying. Check loop-wide >> > reasons for disallowing fully-masked loops. Make the final decision >> > about whether use a fully-masked loop or not. >> > (vect_estimate_min_profitable_iters): Do not assume that peeling >> > for the number of iterations will be needed for fully-masked loops. >> > (vectorizable_reduction): Disable fully-masked loops. >> > (vectorizable_live_operation): Likewise. >> > (vect_halve_mask_nunits): New function. >> > (vect_double_mask_nunits): Likewise. >> > (vect_record_loop_mask): Likewise. >> > (vect_get_loop_mask): Likewise. >> > (vect_transform_loop): Handle the case in which the final loop >> > iteration might handle a partial vector. Call vect_set_loop_condition >> > instead of slpeel_make_loop_iterate_ntimes. >> > * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h. >> > (check_load_store_masking): New function. >> > (prepare_load_store_mask): Likewise. >> > (vectorizable_store): Handle fully-masked loops. >> > (vectorizable_load): Likewise. >> > (supportable_widening_operation): Use vect_halve_mask_nunits for >> > booleans. >> > (supportable_narrowing_operation): Likewise vect_double_mask_nunits. >> > (vect_gen_while): New function. >> > * config/aarch64/aarch64.md (umax<mode>3): New expander. >> > (aarch64_uqdec<mode>): New insn. >> > * config/aarch64/aarch64-sve.md (<perm_optab>_<mode>) >> > (*aarch64_sve_<perm_insn><perm_hilo><mode>): New predicate patterns. >> > >> > gcc/testsuite/ >> > * gcc.dg/tree-ssa/cunroll-10.c: Disable vectorization. >> > * gcc.dg/tree-ssa/peel1.c: Likewise. >> > * gcc.dg/vect/vect-load-lanes-peeling-1.c: Remove XFAIL for >> > variable-length vectors. >> > * gcc.target/aarch64/sve_vcond_6.c: XFAIL test for AND. >> > * gcc.target/aarch64/sve_vec_bool_cmp_1.c: Expect BIC instead of NOT. >> > * gcc.target/aarch64/sve_slp_1.c: Check for a fully-masked loop. >> > * gcc.target/aarch64/sve_slp_2.c: Likewise. >> > * gcc.target/aarch64/sve_slp_3.c: Likewise. >> > * gcc.target/aarch64/sve_slp_4.c: Likewise. >> > * gcc.target/aarch64/sve_slp_6.c: Likewise. >> > * gcc.target/aarch64/sve_slp_8.c: New test. >> > * gcc.target/aarch64/sve_slp_8_run.c: Likewise. >> > * gcc.target/aarch64/sve_slp_9.c: Likewise. >> > * gcc.target/aarch64/sve_slp_9_run.c: Likewise. >> > * gcc.target/aarch64/sve_slp_10.c: Likewise. >> > * gcc.target/aarch64/sve_slp_10_run.c: Likewise. >> > * gcc.target/aarch64/sve_slp_11.c: Likewise. >> > * gcc.target/aarch64/sve_slp_11_run.c: Likewise. >> > * gcc.target/aarch64/sve_slp_12.c: Likewise. >> > * gcc.target/aarch64/sve_slp_12_run.c: Likewise. >> > * gcc.target/aarch64/sve_ld1r_2.c: Likewise. >> > * gcc.target/aarch64/sve_ld1r_2_run.c: Likewise. >> > * gcc.target/aarch64/sve_while_1.c: Likewise. >> > * gcc.target/aarch64/sve_while_2.c: Likewise. >> > * gcc.target/aarch64/sve_while_3.c: Likewise. >> > * gcc.target/aarch64/sve_while_4.c: Likewise. >> Like other SVE related patches, I haven't looked at the aarch64 specific >> bits, just the generic bits. >> >> Sadly, I'm totally lost on this one.... I understand at a 30000ft >> level what you're trying to do and many of the low level primitives made >> sense. But I wasn't able to go from those primitives to the higher >> level implementation details, even though the higher level >> implementation details didn't seem all that large. >> >> I trust your judgment on this stuff. >> >> OK for the trunk. > > The AArch64 bits are OK. > As I reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83846, I've noticed that one of the new tests (aarch64/sve/while_4.c) fails when using -mabi=ilp32 Christophe > Thanks, > James >