On Thu, Mar 31, 2011 at 3:32 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > This patch adds an array_mode_supported_p hook, which says whether > MAX_FIXED_MODE_SIZE should be ignored for a given type of array. > It follows on from the discussion here: > > http://gcc.gnu.org/ml/gcc/2011-03/msg00342.html > > The intended use of the hook is to allow small arrays of vectors > to have a non-BLK mode, and hence to be stored in rtl registers. > These arrays are used both in the ARM arm_neon.h API and in the > optabs proposed in: > > http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html > > The tail end of the thread was about the definition of TYPE_MODE: > > #define TYPE_MODE(NODE) \ > (TREE_CODE (TYPE_CHECK (NODE)) == VECTOR_TYPE \ > ? vector_type_mode (NODE) : (NODE)->type.mode) > > with this outcome: > > http://gcc.gnu.org/ml/gcc/2011-03/msg00470.html > > To summarise my take on it: > > - The current definition of TYPE_MODE isn't sufficient even for vector > modes and vector_mode_supported_p, because non-vector types can have > vector modes. > > - We should no longer treat types as having one mode everywhere. > We should instead replace TYPE_MODE with a function that takes > a context. Tests of things like vector_mode_supported_p would > move from layout_type to this new function. > > I think this patch fits within that scheme. array_mode_supported_p > would be treated in the same way as vector_mode_supported_p. > > I realise the ideal would be to get rid of TYPE_MODE first. > But that's going to be a longer-term thing. Now that there's > at least a plan, I'd like to press ahead with the array stuff > on the basis that > > (a) although the new hook won't work with the "target" attribute, > our current mode handling doesn't work in just the same way. > > (b) the new hook doesn't interfere with the plan. > > (c) getting good code from the intrinsics (and support for these > instructions in the vectoriser) is going to be much more important > to most ARM users than the ability to turn Neon on and off for > individual functions in a TU. > > To give an example of the difference, the Neon code posted here: > > http://hilbert-space.de/?p=22 > > produces this inner loop before the patch (but with > http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01996.html applied): > > .L3: > vld3.8 {d16-d18}, [r1]! > vstmia ip, {d16-d18} > fldd d19, [sp, #24] > adr r5, .L6 > ldmia r5, {r4-r5} > fldd d16, [sp, #32] > vmov d18, r4, r5 @ v8qi > vmull.u8 q9, d19, d18 > adr r5, .L6+8 > ldmia r5, {r4-r5} > vmov d17, r4, r5 @ v8qi > vstmia sp, {d18-d19} > vmlal.u8 q9, d16, d17 > fldd d16, [sp, #40] > adr r5, .L6+16 > ldmia r5, {r4-r5} > vmov d17, r4, r5 @ v8qi > vmlal.u8 q9, d16, d17 > add r3, r3, #1 > vshrn.i16 d16, q9, #8 > cmp r3, r2 > vst1.8 {d16}, [r0]! > bne .L3 > > With both patches applied, the inner loop is: > > .L3: > vld3.8 {d18-d20}, [r1]! > vmull.u8 q8, d18, d21 > vmlal.u8 q8, d19, d22 > vmlal.u8 q8, d20, d23 > add r3, r3, #1 > vshrn.i16 d16, q8, #8 > cmp r3, r2 > vst1.8 {d16}, [r0]! > bne .L3 > > Tested on arm-linux-gnueabi. OK to install?
It looks reasonable given the past discussion, but - can you move forward with the Neon stuff a bit to see if it really fits? Or is this all that is needed for the load/store lane support as well (apart from vectorizer changes of course). Can you check the code generated by for example float foo(char *p) { float a[2]; int i; ((char *)a)[0] = p[0]; ((char *)a)[1] = p[1]; ((char *)a)[2] = p[2]; ((char *)a)[3] = p[3]; ((char *)a)[4] = p[4]; ((char *)a)[5] = p[5]; ((char *)a)[6] = p[6]; ((char *)a)[7] = p[7]; return a[0] + a[1]; } for an array a that would get such a larger mode? Thus, check what happens with partial defs of different types (just to avoid ICEs like the ones Jakub was fixing yesterday). Thanks, Richard. > Richard > > > gcc/ > * hooks.h (hook_bool_mode_uhwi_false): Declare. > * hooks.c (hook_bool_mode_uhwi_false): New function. > * target.def (array_mode_supported_p): New hook. > * doc/tm.texi.in (TARGET_ARRAY_MODE_SUPPORTED_P): Add @hook. > * doc/tm.texi: Regenerate. > * stor-layout.c (mode_for_array): New function. > (layout_type): Use it. > * config/arm/arm.c (arm_array_mode_supported_p): New function. > (TARGET_ARRAY_MODE_SUPPORTED_P): Define. > > Index: gcc/hooks.h > =================================================================== > --- gcc/hooks.h 2011-03-31 10:57:26.000000000 +0100 > +++ gcc/hooks.h 2011-03-31 14:18:21.000000000 +0100 > @@ -34,6 +34,8 @@ extern bool hook_bool_mode_false (enum m > extern bool hook_bool_mode_true (enum machine_mode); > extern bool hook_bool_mode_const_rtx_false (enum machine_mode, const_rtx); > extern bool hook_bool_mode_const_rtx_true (enum machine_mode, const_rtx); > +extern bool hook_bool_mode_uhwi_false (enum machine_mode, > + unsigned HOST_WIDE_INT); > extern bool hook_bool_tree_false (tree); > extern bool hook_bool_const_tree_false (const_tree); > extern bool hook_bool_tree_true (tree); > Index: gcc/hooks.c > =================================================================== > --- gcc/hooks.c 2011-03-31 10:57:26.000000000 +0100 > +++ gcc/hooks.c 2011-03-31 14:18:21.000000000 +0100 > @@ -101,6 +101,15 @@ hook_bool_mode_const_rtx_true (enum mach > return true; > } > > +/* Generic hook that takes (enum machine_mode, unsigned HOST_WIDE_INT) > + and returns false. */ > +bool > +hook_bool_mode_uhwi_false (enum machine_mode mode ATTRIBUTE_UNUSED, > + unsigned HOST_WIDE_INT value ATTRIBUTE_UNUSED) > +{ > + return false; > +} > + > /* Generic hook that takes (FILE *, const char *) and does nothing. */ > void > hook_void_FILEptr_constcharptr (FILE *a ATTRIBUTE_UNUSED, const char *b > ATTRIBUTE_UNUSED) > Index: gcc/target.def > =================================================================== > --- gcc/target.def 2011-03-31 10:57:26.000000000 +0100 > +++ gcc/target.def 2011-03-31 14:18:41.000000000 +0100 > @@ -1611,6 +1611,38 @@ DEFHOOK > bool, (enum machine_mode mode), > hook_bool_mode_false) > > +/* True if we should try to use a scalar mode to represent an array, > + overriding the usual MAX_FIXED_MODE limit. */ > +DEFHOOK > +(array_mode_supported_p, > + "Return true if GCC should try to use a scalar mode to store an array\n\ > +of @var{nelems} elements, given that each element has mode @var{mode}.\n\ > +Returning true here overrides the usual @code{MAX_FIXED_MODE} limit\n\ > +and allows GCC to use any defined integer mode.\n\ > +\n\ > +One use of this hook is to support vector load and store operations\n\ > +that operate on several homogeneous vectors. For example, ARM Neon\n\ > +has operations like:\n\ > +\n\ > +@smallexample\n\ > +int8x8x3_t vld3_s8 (const int8_t *)\n\ > +@end smallexample\n\ > +\n\ > +where the return type is defined as:\n\ > +\n\ > +@smallexample\n\ > +typedef struct int8x8x3_t\n\ > +@{\n\ > + int8x8_t val[3];\n\ > +@} int8x8x3_t;\n\ > +@end smallexample\n\ > +\n\ > +If this hook allows @code{val} to have a scalar mode, then\n\ > +@code{int8x8x3_t} can have the same mode. GCC can then store\n\ > +@code{int8x8x3_t}s in registers rather than forcing them onto the stack.", > + bool, (enum machine_mode mode, unsigned HOST_WIDE_INT nelems), > + hook_bool_mode_uhwi_false) > + > /* Compute cost of moving data from a register of class FROM to one of > TO, using MODE. */ > DEFHOOK > Index: gcc/doc/tm.texi.in > =================================================================== > --- gcc/doc/tm.texi.in 2011-03-29 10:32:08.000000000 +0100 > +++ gcc/doc/tm.texi.in 2011-03-31 14:27:42.000000000 +0100 > @@ -4271,6 +4271,8 @@ insns involving vector mode @var{mode}. > must have move patterns for this mode. > @end deftypefn > > +@hook TARGET_ARRAY_MODE_SUPPORTED_P > + > @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P > Define this to return nonzero for machine modes for which the port has > small register classes. If this target hook returns nonzero for a given > Index: gcc/stor-layout.c > =================================================================== > --- gcc/stor-layout.c 2011-03-31 10:57:26.000000000 +0100 > +++ gcc/stor-layout.c 2011-03-31 14:22:23.000000000 +0100 > @@ -546,6 +546,34 @@ get_mode_alignment (enum machine_mode mo > return MIN (BIGGEST_ALIGNMENT, MAX (1, > mode_base_align[mode]*BITS_PER_UNIT)); > } > > +/* Return the natural mode of an array, given that it is SIZE bytes in > + total and has elements of type ELEM_TYPE. */ > + > +static enum machine_mode > +mode_for_array (tree elem_type, tree size) > +{ > + tree elem_size; > + unsigned HOST_WIDE_INT int_size, int_elem_size; > + bool limit_p; > + > + /* One-element arrays get the component type's mode. */ > + elem_size = TYPE_SIZE (elem_type); > + if (simple_cst_equal (size, elem_size)) > + return TYPE_MODE (elem_type); > + > + limit_p = true; > + if (host_integerp (size, 1) && host_integerp (elem_size, 1)) > + { > + int_size = tree_low_cst (size, 1); > + int_elem_size = tree_low_cst (elem_size, 1); > + if (int_elem_size > 0 > + && int_size % int_elem_size == 0 > + && targetm.array_mode_supported_p (TYPE_MODE (elem_type), > + int_size / int_elem_size)) > + limit_p = false; > + } > + return mode_for_size_tree (size, MODE_INT, limit_p); > +} > > /* Subroutine of layout_decl: Force alignment required for the data type. > But if the decl itself wants greater alignment, don't override that. */ > @@ -2039,14 +2067,8 @@ layout_type (tree type) > && (TYPE_MODE (TREE_TYPE (type)) != BLKmode > || TYPE_NO_FORCE_BLK (TREE_TYPE (type)))) > { > - /* One-element arrays get the component type's mode. */ > - if (simple_cst_equal (TYPE_SIZE (type), > - TYPE_SIZE (TREE_TYPE (type)))) > - SET_TYPE_MODE (type, TYPE_MODE (TREE_TYPE (type))); > - else > - SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type), > - MODE_INT, 1)); > - > + SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type), > + TYPE_SIZE (type))); > if (TYPE_MODE (type) != BLKmode > && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT > && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type))) > Index: gcc/config/arm/arm.c > =================================================================== > --- gcc/config/arm/arm.c 2011-03-31 14:10:12.000000000 +0100 > +++ gcc/config/arm/arm.c 2011-03-31 14:18:21.000000000 +0100 > @@ -243,6 +243,8 @@ static rtx arm_pic_static_addr (rtx orig > static bool cortex_a9_sched_adjust_cost (rtx, rtx, rtx, int *); > static bool xscale_sched_adjust_cost (rtx, rtx, rtx, int *); > static bool fa726te_sched_adjust_cost (rtx, rtx, rtx, int *); > +static bool arm_array_mode_supported_p (enum machine_mode, > + unsigned HOST_WIDE_INT); > static enum machine_mode arm_preferred_simd_mode (enum machine_mode); > static bool arm_class_likely_spilled_p (reg_class_t); > static bool arm_vector_alignment_reachable (const_tree type, bool is_packed); > @@ -403,6 +405,8 @@ #define TARGET_ADDRESS_COST arm_address_ > #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask > #undef TARGET_VECTOR_MODE_SUPPORTED_P > #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p > +#undef TARGET_ARRAY_MODE_SUPPORTED_P > +#define TARGET_ARRAY_MODE_SUPPORTED_P arm_array_mode_supported_p > #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE > #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arm_preferred_simd_mode > #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES > @@ -22377,6 +22381,20 @@ arm_vector_mode_supported_p (enum machin > return false; > } > > +/* Implements target hook array_mode_supported_p. */ > + > +static bool > +arm_array_mode_supported_p (enum machine_mode mode, > + unsigned HOST_WIDE_INT nelems) > +{ > + if (TARGET_NEON > + && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)) > + && (nelems >= 2 && nelems <= 4)) > + return true; > + > + return false; > +} > + > /* Use the option -mvectorize-with-neon-quad to override the use of > doubleword > registers when autovectorizing for Neon, at least until multiple vector > widths are supported properly by the middle-end. */ >