On Fri, Jul 27, 2012 at 5:24 PM, Ulrich Weigand <uweig...@de.ibm.com> wrote: > Richard Guenther wrote: >> On Mon, Jun 11, 2012 at 5:25 PM, Richard Earnshaw <rearn...@arm.com> wrote: >> > On 11/06/12 15:53, Richard Guenther wrote: >> >> The type argument or the size argument looks redundant. >> > >> > Technically, yes, we could get rid of "tree_low_cst (TYPE_SIZE (type)" >> > and calculate it inside the alignment function if it was needed. >> > However, it seemed likely that most targets would need that number one >> > way or another, such that passing it would be helpful. >> >> Well, you don't need it in stor-layout and targets might think the value >> may be completely unrelated to the type ... >> >> >> Note that we still can have such vector "properly" aligned, thus the >> >> vectorizer would need to use build_aligned_type also if it knows the >> >> type is aligned, not only when thinks it is misaligned. You basically >> >> change the alignment of the "default" vector type. >> > >> > I'm not sure I follow... >> >> I say that a large vector may be still aligned, so the vectorizer when >> creating vector memory references has to use a non-default aligned vector >> type when the vector is aligned. It won't do that at the moment. > > Richard (Earnshaw) has asked me to take over working on this patch now. > > I've now made the change requested above and removed the size argument. > The target is now simply asked to return the required alignment for the > given vector type. I've also added a check for the case where the > target provides both an alignment and a mode for a vector type, but > the mode actually requires bigger alignment than the type. This is > simply rejected (the target can fix this by reporting a different > type alignment or changing the mode alignment). > > I've not made any attempts to have the vectorizer register larger > alignments than the one returned by the target hook. It's not > clear to me when this would be useful (at least on ARM) ... > > I've also run the testsuite, and this actually uncovered to bugs in > the vectorizer where it made an implicit assumption that vector types > must always be naturally aligned: > > - In vect_update_misalignment_for_peel, the code used the vector size > instead of the required alignment in order to bound misalignment > values -- leading to a misalignment value bigger than the underlying > alignment requirement of the vector type, causing an ICE later on > > - In vect_do_peeling_for_loop_bound, the code divided the vector type > alignment by the number of elements in order to arrive at the element > size ... this returns a wrong value if the alignment is less than the > vector size, causing incorrect code to be generated > > (This routine also had some confusion between size and alignment in > comments and variable names, which I've fixed as well.) > > Finally, two test cases still failed spuriously: > > - gcc.dg/align-2.c actually checked that vector types are naturally > aligned > > - gcc.dg/vect/slp-25.c checked that we needed to perform peeling for > alignment, which we actually don't need any more if vector types > have a lesser alignment requirement in the first place > > I've added a new effective target flag to check whether the target > requires natural alignment for vector types, and disabled those two > tests if it doesn't. > > With those changes, I've completed testing with no regressions on > arm-linux-gnueabi. > > OK for mainline?
Ok. Please add to the documentation that the default vector alignment has to be a power-of-two multiple of the default vector element alignment. You probably want to double-check vector_alignment_reachable_p as well which checks whether vector alignment can be reached by peeling off scalar iterations. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > * target.def (vector_alignment): New target hook. > * doc/tm.texi.in (TARGET_VECTOR_ALIGNMENT): Document new hook. > * doc/tm.texi: Regenerate. > * targhooks.c (default_vector_alignment): New function. > * targhooks.h (default_vector_alignment): Add prototype. > * stor-layout.c (layout_type): Use targetm.vector_alignment. > * config/arm/arm.c (arm_vector_alignment): New function. > (TARGET_VECTOR_ALIGNMENT): Define. > > * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use > vector type alignment instead of size. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Use > element type size directly instead of computing it from alignment. > Fix variable naming and comment. > > testsuite/ChangeLog: > > * lib/target-supports.exp > (check_effective_target_vect_natural_alignment): New function. > * gcc.dg/align-2.c: Only run on targets with natural alignment > of vector types. > * gcc.dg/vect/slp-25.c: Adjust tests for targets without natural > alignment of vector types. > > > Index: gcc/target.def > =================================================================== > *** gcc/target.def (revision 189809) > --- gcc/target.def (working copy) > *************** DEFHOOK > *** 1659,1664 **** > --- 1659,1672 ---- > bool, (enum machine_mode mode), > hook_bool_mode_false) > > + DEFHOOK > + (vector_alignment, > + "This hook can be used to define the alignment for a vector of type\n\ > + @var{type}, in order to comply with a platform ABI. The default is to\n\ > + require natural alignment for vector types.", > + HOST_WIDE_INT, (const_tree type), > + default_vector_alignment) > + > /* True if we should try to use a scalar mode to represent an array, > overriding the usual MAX_FIXED_MODE limit. */ > DEFHOOK > Index: gcc/doc/tm.texi > =================================================================== > *** gcc/doc/tm.texi (revision 189809) > --- gcc/doc/tm.texi (working copy) > *************** make it all fit in fewer cache lines. > *** 1107,1112 **** > --- 1107,1118 ---- > If the value of this macro has a type, it should be an unsigned type. > @end defmac > > + @deftypefn {Target Hook} HOST_WIDE_INT TARGET_VECTOR_ALIGNMENT (const_tree > @var{type}) > + This hook can be used to define the alignment for a vector of type > + @var{type}, in order to comply with a platform ABI. The default is to > + require natural alignment for vector types. > + @end deftypefn > + > @defmac STACK_SLOT_ALIGNMENT (@var{type}, @var{mode}, @var{basic-align}) > If defined, a C expression to compute the alignment for stack slot. > @var{type} is the data type, @var{mode} is the widest mode available, > Index: gcc/doc/tm.texi.in > =================================================================== > *** gcc/doc/tm.texi.in (revision 189809) > --- gcc/doc/tm.texi.in (working copy) > *************** make it all fit in fewer cache lines. > *** 1091,1096 **** > --- 1091,1098 ---- > If the value of this macro has a type, it should be an unsigned type. > @end defmac > > + @hook TARGET_VECTOR_ALIGNMENT > + > @defmac STACK_SLOT_ALIGNMENT (@var{type}, @var{mode}, @var{basic-align}) > If defined, a C expression to compute the alignment for stack slot. > @var{type} is the data type, @var{mode} is the widest mode available, > Index: gcc/targhooks.c > =================================================================== > *** gcc/targhooks.c (revision 189809) > --- gcc/targhooks.c (working copy) > *************** tree default_mangle_decl_assembler_name > *** 945,950 **** > --- 945,957 ---- > return id; > } > > + /* Default to natural alignment for vector types. */ > + HOST_WIDE_INT > + default_vector_alignment (const_tree type) > + { > + return tree_low_cst (TYPE_SIZE (type), 0); > + } > + > bool > default_builtin_vector_alignment_reachable (const_tree type, bool is_packed) > { > Index: gcc/targhooks.h > =================================================================== > *** gcc/targhooks.h (revision 189809) > --- gcc/targhooks.h (working copy) > *************** extern int default_builtin_vectorization > *** 83,88 **** > --- 83,90 ---- > > extern tree default_builtin_reciprocal (unsigned int, bool, bool); > > + extern HOST_WIDE_INT default_vector_alignment (const_tree); > + > extern bool default_builtin_vector_alignment_reachable (const_tree, bool); > extern bool > default_builtin_support_vector_misalignment (enum machine_mode mode, > Index: gcc/stor-layout.c > =================================================================== > *** gcc/stor-layout.c (revision 189809) > --- gcc/stor-layout.c (working copy) > *************** layout_type (tree type) > *** 2131,2139 **** > TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype), > bitsize_int (nunits)); > > ! /* Always naturally align vectors. This prevents ABI changes > ! depending on whether or not native vector modes are supported. */ > ! TYPE_ALIGN (type) = tree_low_cst (TYPE_SIZE (type), 0); > break; > } > > --- 2131,2147 ---- > TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype), > bitsize_int (nunits)); > > ! /* For vector types, we do not default to the mode's alignment. > ! Instead, query a target hook, defaulting to natural alignment. > ! This prevents ABI changes depending on whether or not native > ! vector modes are supported. */ > ! TYPE_ALIGN (type) = targetm.vector_alignment (type); > ! > ! /* However, if the underlying mode requires a bigger alignment than > ! what the target hook provides, we cannot use the mode. For now, > ! simply reject that case. */ > ! gcc_assert (TYPE_ALIGN (type) > ! >= GET_MODE_ALIGNMENT (TYPE_MODE (type))); > break; > } > > Index: gcc/config/arm/arm.c > =================================================================== > *** gcc/config/arm/arm.c (revision 189809) > --- gcc/config/arm/arm.c (working copy) > *************** static bool arm_array_mode_supported_p ( > *** 254,259 **** > --- 254,260 ---- > unsigned HOST_WIDE_INT); > static enum machine_mode arm_preferred_simd_mode (enum machine_mode); > static bool arm_class_likely_spilled_p (reg_class_t); > + static HOST_WIDE_INT arm_vector_alignment (const_tree type); > static bool arm_vector_alignment_reachable (const_tree type, bool > is_packed); > static bool arm_builtin_support_vector_misalignment (enum machine_mode mode, > const_tree type, > *************** static const struct attribute_spec arm_a > *** 602,607 **** > --- 603,611 ---- > #undef TARGET_CLASS_LIKELY_SPILLED_P > #define TARGET_CLASS_LIKELY_SPILLED_P arm_class_likely_spilled_p > > + #undef TARGET_VECTOR_ALIGNMENT > + #define TARGET_VECTOR_ALIGNMENT arm_vector_alignment > + > #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE > #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \ > arm_vector_alignment_reachable > *************** arm_have_conditional_execution (void) > *** 25051,25056 **** > --- 25055,25072 ---- > return !TARGET_THUMB1; > } > > + /* The AAPCS sets the maximum alignment of a vector to 64 bits. */ > + static HOST_WIDE_INT > + arm_vector_alignment (const_tree type) > + { > + HOST_WIDE_INT align = tree_low_cst (TYPE_SIZE (type), 0); > + > + if (TARGET_AAPCS_BASED) > + align = MIN (align, 64); > + > + return align; > + } > + > static unsigned int > arm_autovectorize_vector_sizes (void) > { > Index: gcc/tree-vect-data-refs.c > =================================================================== > *** gcc/tree-vect-data-refs.c (revision 189809) > --- gcc/tree-vect-data-refs.c (working copy) > *************** vect_update_misalignment_for_peel (struc > *** 1059,1065 **** > int misal = DR_MISALIGNMENT (dr); > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > misal += negative ? -npeel * dr_size : npeel * dr_size; > ! misal &= GET_MODE_SIZE (TYPE_MODE (vectype)) - 1; > SET_DR_MISALIGNMENT (dr, misal); > return; > } > --- 1059,1065 ---- > int misal = DR_MISALIGNMENT (dr); > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > misal += negative ? -npeel * dr_size : npeel * dr_size; > ! misal &= (TYPE_ALIGN (vectype) / BITS_PER_UNIT) - 1; > SET_DR_MISALIGNMENT (dr, misal); > return; > } > Index: gcc/tree-vect-loop-manip.c > =================================================================== > *** gcc/tree-vect-loop-manip.c (revision 189809) > --- gcc/tree-vect-loop-manip.c (working copy) > *************** vect_do_peeling_for_loop_bound (loop_vec > *** 1937,1943 **** > If the misalignment of DR is known at compile time: > addr_mis = int mis = DR_MISALIGNMENT (dr); > Else, compute address misalignment in bytes: > ! addr_mis = addr & (vectype_size - 1) > > prolog_niters = min (LOOP_NITERS, ((VF - > addr_mis/elem_size)&(VF-1))/step) > > --- 1937,1943 ---- > If the misalignment of DR is known at compile time: > addr_mis = int mis = DR_MISALIGNMENT (dr); > Else, compute address misalignment in bytes: > ! addr_mis = addr & (vectype_align - 1) > > prolog_niters = min (LOOP_NITERS, ((VF - > addr_mis/elem_size)&(VF-1))/step) > > *************** vect_gen_niters_for_prolog_loop (loop_ve > *** 1991,1999 **** > tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt, > &new_stmts, offset, loop); > tree type = unsigned_type_for (TREE_TYPE (start_addr)); > ! tree vectype_size_minus_1 = build_int_cst (type, vectype_align - 1); > ! tree elem_size_log = > ! build_int_cst (type, exact_log2 (vectype_align/nelements)); > tree nelements_minus_1 = build_int_cst (type, nelements - 1); > tree nelements_tree = build_int_cst (type, nelements); > tree byte_misalign; > --- 1991,2000 ---- > tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt, > &new_stmts, offset, loop); > tree type = unsigned_type_for (TREE_TYPE (start_addr)); > ! tree vectype_align_minus_1 = build_int_cst (type, vectype_align - 1); > ! HOST_WIDE_INT elem_size = > ! int_cst_value (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); > ! tree elem_size_log = build_int_cst (type, exact_log2 (elem_size)); > tree nelements_minus_1 = build_int_cst (type, nelements - 1); > tree nelements_tree = build_int_cst (type, nelements); > tree byte_misalign; > *************** vect_gen_niters_for_prolog_loop (loop_ve > *** 2002,2011 **** > new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmts); > gcc_assert (!new_bb); > > ! /* Create: byte_misalign = addr & (vectype_size - 1) */ > byte_misalign = > fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), > ! vectype_size_minus_1); > > /* Create: elem_misalign = byte_misalign / element_size */ > elem_misalign = > --- 2003,2012 ---- > new_bb = gsi_insert_seq_on_edge_immediate (pe, new_stmts); > gcc_assert (!new_bb); > > ! /* Create: byte_misalign = addr & (vectype_align - 1) */ > byte_misalign = > fold_build2 (BIT_AND_EXPR, type, fold_convert (type, start_addr), > ! vectype_align_minus_1); > > /* Create: elem_misalign = byte_misalign / element_size */ > elem_misalign = > Index: gcc/testsuite/lib/target-supports.exp > =================================================================== > *** gcc/testsuite/lib/target-supports.exp (revision 189809) > --- gcc/testsuite/lib/target-supports.exp (working copy) > *************** proc check_effective_target_natural_alig > *** 3379,3384 **** > --- 3379,3404 ---- > return $et_natural_alignment_64_saved > } > > + # Return 1 if all vector types are naturally aligned (aligned to their > + # type-size), 0 otherwise. > + # > + # This won't change for different subtargets so cache the result. > + > + proc check_effective_target_vect_natural_alignment { } { > + global et_vect_natural_alignment > + > + if [info exists et_vect_natural_alignment_saved] { > + verbose "check_effective_target_vect_natural_alignment: using > cached result" 2 > + } else { > + set et_vect_natural_alignment_saved 1 > + if { [check_effective_target_arm_eabi] } { > + set et_vect_natural_alignment_saved 0 > + } > + } > + verbose "check_effective_target_vect_natural_alignment: returning > $et_vect_natural_alignment_saved" 2 > + return $et_vect_natural_alignment_saved > + } > + > # Return 1 if vector alignment (for types of size 32 bit or less) is > reachable, 0 otherwise. > # > # This won't change for different subtargets so cache the result. > Index: gcc/testsuite/gcc.dg/align-2.c > =================================================================== > *** gcc/testsuite/gcc.dg/align-2.c (revision 189809) > --- gcc/testsuite/gcc.dg/align-2.c (working copy) > *************** > *** 1,5 **** > /* PR 17962 */ > ! /* { dg-do compile } */ > /* { dg-options "" } */ > > typedef float v4 __attribute__((vector_size(sizeof(float)*4))); > --- 1,5 ---- > /* PR 17962 */ > ! /* { dg-do compile { target vect_natural_alignment } } */ > /* { dg-options "" } */ > > typedef float v4 __attribute__((vector_size(sizeof(float)*4))); > Index: gcc/testsuite/gcc.dg/vect/slp-25.c > =================================================================== > *** gcc/testsuite/gcc.dg/vect/slp-25.c (revision 189809) > --- gcc/testsuite/gcc.dg/vect/slp-25.c (working copy) > *************** int main (void) > *** 56,60 **** > > /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ > /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 > "vect" } } */ > ! /* { dg-final { scan-tree-dump-times "Alignment of access forced using > peeling" 2 "vect" { xfail { vect_no_align } } } } */ > /* { dg-final { cleanup-tree-dump "vect" } } */ > --- 56,60 ---- > > /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */ > /* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 0 > "vect" } } */ > ! /* { dg-final { scan-tree-dump-times "Alignment of access forced using > peeling" 2 "vect" { xfail { vect_no_align || { ! vect_natural_alignment } } } > } } */ > /* { dg-final { cleanup-tree-dump "vect" } } */ > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > ulrich.weig...@de.ibm.com >