On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> I think the limit should be on the number of generated copies and not >>> the overall size of the structure... If the struct were composed of >>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores... >>> >>> I wonder how rep; movb; interacts with store to load forwarding? Is >>> that maybe optimized well on some archs? movb should always >>> forward and wasn't the setup cost for small N reasonable on modern >>> CPUs? >> >> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage >> 24-128 on zen. This is w/o store/load forwarding, but I doubt those provide >> a cheap way around. >> >>> >>> It probably depends on the width of the entries in the store buffer, >>> if they appear in-order and the alignment of the stores (if they are larger >>> than >>> 8 bytes they are surely aligned). IIRC CPUs had smaller store buffer >>> entries than cache line size. >>> >>> Given that load bandwith is usually higher than store bandwith it >>> might make sense to do the store combining in our copying sequence, >>> like for the 8 byte entry case use sth like >>> >>> movq 0(%eax), %xmm0 >>> movhps 8(%eax), %xmm0 // or vpinsert >>> mov[au]ps %xmm0, 0%(ebx) >>> ... >>> >>> thus do two loads per store and perform the stores in wider >>> mode? >> >> This may be somewhat faster indeed. I am not sure if store to load >> forwarding will work for the later half when read again by halves. >> It would not happen on older CPUs :) > > Yes, forwarding larger stores to smaller loads generally works fine > since forever with the usual restrictions of alignment/size being > power of two "halves". > > The question is of course what to do for 4 byte or smaller elements or > mixed size elements. We can do zero-extending loads > (do we have them for QI, HI mode loads as well?) and > do shift and or's. I'm quite sure the CPUs wouldn't like to > see vpinsert's of different vector mode destinations. So it > would be 8 byte stores from GPRs and values built up via > shift & or.
Like we generate foo: .LFB0: .cfi_startproc movl 4(%rdi), %eax movzwl 2(%rdi), %edx salq $16, %rax orq %rdx, %rax movzbl 1(%rdi), %edx salq $8, %rax orq %rdx, %rax movzbl (%rdi), %edx salq $8, %rax orq %rdx, %rax movq %rax, (%rsi) ret for struct x { char e; char f; short c; int i; } a; void foo (struct x *p, long *q) { *q = (((((((unsigned long)(unsigned int)p->i) << 16) | (((unsigned long)(unsigned short)p->c))) << 8) | (((unsigned long)(unsigned char)p->f))) << 8) | ((unsigned long)(unsigned char)p->e); } if you disable the bswap pass. Doing 4 byte stores in this case would save some prefixes at least. I expected the ORs and shifts to have smaller encodings... With 4 byte stores we end up with the same size as with individual loads & stores. > As said, the important part is that IIRC CPUs can usually > have more loads in flight than stores. Esp. Bulldozer > with the split core was store buffer size limited (but it > could do merging of store buffer entries IIRC). Also if we do the stores in smaller chunks we are more likely hitting the same store-to-load-forwarding issue elsewhere. Like in case the destination is memcpy'ed away. So the proposed change isn't necessarily a win without a possible similar regression that it tries to fix. Whole-program analysis of accesses might allow marking affected objects. Richard. > Richard. > >> Honza >>> >>> As said a general concern was you not copying padding. If you >>> put this into an even more common place you surely will break >>> stuff, no? >>> >>> Richard. >>> >>> > >>> > Martin >>> > >>> > >>> >> >>> >> Richard. >>> >> >>> >> > Martin >>> >> > >>> >> > >>> >> > 2017-10-12 Martin Jambor <mjam...@suse.cz> >>> >> > >>> >> > PR target/80689 >>> >> > * tree-sra.h: New file. >>> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to >>> >> > tree-sra.h. >>> >> > * expr.c: Include params.h and tree-sra.h. >>> >> > (emit_move_elementwise): New function. >>> >> > (store_expr_with_bounds): Optionally use it. >>> >> > * ipa-cp.c: Include tree-sra.h. >>> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New. >>> >> > * config/i386/i386.c (ix86_option_override_internal): Set >>> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35. >>> >> > * tree-sra.c: Include tree-sra.h. >>> >> > (scalarizable_type_p): Renamed to >>> >> > simple_mix_of_records_and_arrays_p, made public, renamed the >>> >> > second parameter to allow_char_arrays. >>> >> > (extract_min_max_idx_from_array): New function. >>> >> > (completely_scalarize): Moved bits of the function to >>> >> > extract_min_max_idx_from_array. >>> >> > >>> >> > testsuite/ >>> >> > * gcc.target/i386/pr80689-1.c: New test. >>> >> > --- >>> >> > gcc/config/i386/i386.c | 4 ++ >>> >> > gcc/expr.c | 103 >>> >> > ++++++++++++++++++++++++++++-- >>> >> > gcc/ipa-cp.c | 1 + >>> >> > gcc/ipa-prop.h | 4 -- >>> >> > gcc/params.def | 6 ++ >>> >> > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++++ >>> >> > gcc/tree-sra.c | 86 >>> >> > +++++++++++++++---------- >>> >> > gcc/tree-sra.h | 33 ++++++++++ >>> >> > 8 files changed, 233 insertions(+), 42 deletions(-) >>> >> > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c >>> >> > create mode 100644 gcc/tree-sra.h >>> >> > >>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> >> > index 1ee8351c21f..87f602e7ead 100644 >>> >> > --- a/gcc/config/i386/i386.c >>> >> > +++ b/gcc/config/i386/i386.c >>> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p, >>> >> > ix86_tune_cost->l2_cache_size, >>> >> > opts->x_param_values, >>> >> > opts_set->x_param_values); >>> >> > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, >>> >> > + 35, >>> >> > + opts->x_param_values, >>> >> > + opts_set->x_param_values); >>> >> > >>> >> > /* Enable sw prefetching at -O3 for CPUS that prefetching is >>> >> > helpful. */ >>> >> > if (opts->x_flag_prefetch_loop_arrays < 0 >>> >> > diff --git a/gcc/expr.c b/gcc/expr.c >>> >> > index 134ee731c29..dff24e7f166 100644 >>> >> > --- a/gcc/expr.c >>> >> > +++ b/gcc/expr.c >>> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see >>> >> > #include "tree-chkp.h" >>> >> > #include "rtl-chkp.h" >>> >> > #include "ccmp.h" >>> >> > - >>> >> > +#include "params.h" >>> >> > +#include "tree-sra.h" >>> >> > >>> >> > /* If this is nonzero, we do not bother generating VOLATILE >>> >> > around volatile memory references, and we are willing to >>> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from) >>> >> > return maybe_expand_insn (code, 2, ops); >>> >> > } >>> >> > >>> >> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET >>> >> > to TARGET >>> >> > + plus OFFSET, but do so element-wise and/or field-wise for each >>> >> > record and >>> >> > + array within TYPE. TYPE must either be a register type or an >>> >> > aggregate >>> >> > + complying with scalarizable_type_p. >>> >> > + >>> >> > + If CALL_PARAM_P is nonzero, this is a store into a call param on >>> >> > the >>> >> > + stack, and block moves may need to be treated specially. */ >>> >> > + >>> >> > +static void >>> >> > +emit_move_elementwise (tree type, rtx target, rtx source, >>> >> > HOST_WIDE_INT offset, >>> >> > + int call_param_p) >>> >> > +{ >>> >> > + switch (TREE_CODE (type)) >>> >> > + { >>> >> > + case RECORD_TYPE: >>> >> > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) >>> >> > + if (TREE_CODE (fld) == FIELD_DECL) >>> >> > + { >>> >> > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld); >>> >> > + tree ft = TREE_TYPE (fld); >>> >> > + emit_move_elementwise (ft, target, source, fld_offset, >>> >> > + call_param_p); >>> >> > + } >>> >> > + break; >>> >> > + >>> >> > + case ARRAY_TYPE: >>> >> > + { >>> >> > + tree elem_type = TREE_TYPE (type); >>> >> > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type)); >>> >> > + gcc_assert (el_size > 0); >>> >> > + >>> >> > + offset_int idx, max; >>> >> > + /* Skip (some) zero-length arrays; others have MAXIDX == >>> >> > MINIDX - 1. */ >>> >> > + if (extract_min_max_idx_from_array (type, &idx, &max)) >>> >> > + { >>> >> > + HOST_WIDE_INT el_offset = offset; >>> >> > + for (; idx <= max; ++idx) >>> >> > + { >>> >> > + emit_move_elementwise (elem_type, target, source, >>> >> > el_offset, >>> >> > + call_param_p); >>> >> > + el_offset += el_size; >>> >> > + } >>> >> > + } >>> >> > + } >>> >> > + break; >>> >> > + default: >>> >> > + machine_mode mode = TYPE_MODE (type); >>> >> > + >>> >> > + rtx ntgt = adjust_address (target, mode, offset / >>> >> > BITS_PER_UNIT); >>> >> > + rtx nsrc = adjust_address (source, mode, offset / >>> >> > BITS_PER_UNIT); >>> >> > + >>> >> > + /* TODO: Figure out whether the following is actually >>> >> > necessary. */ >>> >> > + if (target == ntgt) >>> >> > + ntgt = copy_rtx (target); >>> >> > + if (source == nsrc) >>> >> > + nsrc = copy_rtx (source); >>> >> > + >>> >> > + gcc_assert (mode != VOIDmode); >>> >> > + if (mode != BLKmode) >>> >> > + emit_move_insn (ntgt, nsrc); >>> >> > + else >>> >> > + { >>> >> > + /* For example vector gimple registers can end up here. */ >>> >> > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX, >>> >> > + TYPE_MODE (sizetype), EXPAND_NORMAL); >>> >> > + emit_block_move (ntgt, nsrc, size, >>> >> > + (call_param_p >>> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); >>> >> > + } >>> >> > + break; >>> >> > + } >>> >> > + return; >>> >> > +} >>> >> > + >>> >> > /* Generate code for computing expression EXP, >>> >> > and storing the value into TARGET. >>> >> > >>> >> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, >>> >> > int call_param_p, >>> >> > emit_group_store (target, temp, TREE_TYPE (exp), >>> >> > int_size_in_bytes (TREE_TYPE (exp))); >>> >> > else if (GET_MODE (temp) == BLKmode) >>> >> > - emit_block_move (target, temp, expr_size (exp), >>> >> > - (call_param_p >>> >> > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); >>> >> > + { >>> >> > + /* Copying smallish BLKmode structures with emit_block_move >>> >> > and thus >>> >> > + by-pieces can result in store-to-load stalls. So copy >>> >> > some simple >>> >> > + small aggregates element or field-wise. */ >>> >> > + if (GET_MODE (target) == BLKmode >>> >> > + && AGGREGATE_TYPE_P (TREE_TYPE (exp)) >>> >> > + && !TREE_ADDRESSABLE (TREE_TYPE (exp)) >>> >> > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp))) >>> >> > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp))) >>> >> > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY) >>> >> > + * BITS_PER_UNIT)) >>> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), >>> >> > false)) >>> >> > + { >>> >> > + /* FIXME: Can this happen? What would it mean? */ >>> >> > + gcc_assert (!reverse); >>> >> > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0, >>> >> > + call_param_p); >>> >> > + } >>> >> > + else >>> >> > + emit_block_move (target, temp, expr_size (exp), >>> >> > + (call_param_p >>> >> > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); >>> >> > + } >>> >> > /* If we emit a nontemporal store, there is nothing else to do. >>> >> > */ >>> >> > else if (nontemporal && emit_storent_insn (target, temp)) >>> >> > ; >>> >> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c >>> >> > index 6b3d8d7364c..7d6019bbd30 100644 >>> >> > --- a/gcc/ipa-cp.c >>> >> > +++ b/gcc/ipa-cp.c >>> >> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see >>> >> > #include "tree-ssa-ccp.h" >>> >> > #include "stringpool.h" >>> >> > #include "attribs.h" >>> >> > +#include "tree-sra.h" >>> >> > >>> >> > template <typename valtype> class ipcp_value; >>> >> > >>> >> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h >>> >> > index fa5bed49ee0..2313cc884ed 100644 >>> >> > --- a/gcc/ipa-prop.h >>> >> > +++ b/gcc/ipa-prop.h >>> >> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate >>> >> > (tree **, bool *, >>> >> > void ipa_release_body_info (struct ipa_func_body_info *); >>> >> > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); >>> >> > >>> >> > -/* From tree-sra.c: */ >>> >> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, >>> >> > tree, >>> >> > - gimple_stmt_iterator *, bool); >>> >> > - >>> >> > /* In ipa-cp.c */ >>> >> > void ipa_cp_c_finalize (void); >>> >> > >>> >> > diff --git a/gcc/params.def b/gcc/params.def >>> >> > index e55afc28053..5e19f1414a0 100644 >>> >> > --- a/gcc/params.def >>> >> > +++ b/gcc/params.def >>> >> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK, >>> >> > "Enable loop epilogue vectorization using smaller vector >>> >> > size.", >>> >> > 0, 0, 1) >>> >> > >>> >> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, >>> >> > + "max-size-for-elementwise-copy", >>> >> > + "Maximum size in bytes of a structure or array to by >>> >> > considered for " >>> >> > + "copying by its individual fields or elements", >>> >> > + 0, 0, 512) >>> >> > + >>> >> > /* >>> >> > >>> >> > Local variables: >>> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c >>> >> > b/gcc/testsuite/gcc.target/i386/pr80689-1.c >>> >> > new file mode 100644 >>> >> > index 00000000000..4156d4fba45 >>> >> > --- /dev/null >>> >> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c >>> >> > @@ -0,0 +1,38 @@ >>> >> > +/* { dg-do compile } */ >>> >> > +/* { dg-options "-O2" } */ >>> >> > + >>> >> > +typedef struct st1 >>> >> > +{ >>> >> > + long unsigned int a,b; >>> >> > + long int c,d; >>> >> > +}R; >>> >> > + >>> >> > +typedef struct st2 >>> >> > +{ >>> >> > + int t; >>> >> > + R reg; >>> >> > +}N; >>> >> > + >>> >> > +void Set (const R *region, N *n_info ); >>> >> > + >>> >> > +void test(N *n_obj ,const long unsigned int a, const long unsigned >>> >> > int b, const long int c,const long int d) >>> >> > +{ >>> >> > + R reg; >>> >> > + >>> >> > + reg.a=a; >>> >> > + reg.b=b; >>> >> > + reg.c=c; >>> >> > + reg.d=d; >>> >> > + Set (®, n_obj); >>> >> > + >>> >> > +} >>> >> > + >>> >> > +void Set (const R *reg, N *n_obj ) >>> >> > +{ >>> >> > + n_obj->reg=(*reg); >>> >> > +} >>> >> > + >>> >> > + >>> >> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */ >>> >> > +/* { dg-final { scan-assembler-not "movdqu" } } */ >>> >> > +/* { dg-final { scan-assembler-not "movups" } } */ >>> >> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >>> >> > index bac593951e7..ade97964205 100644 >>> >> > --- a/gcc/tree-sra.c >>> >> > +++ b/gcc/tree-sra.c >>> >> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see >>> >> > #include "ipa-fnsummary.h" >>> >> > #include "ipa-utils.h" >>> >> > #include "builtins.h" >>> >> > +#include "tree-sra.h" >>> >> > >>> >> > /* Enumeration of all aggregate reductions we can do. */ >>> >> > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ >>> >> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool >>> >> > write) >>> >> > } >>> >> > >>> >> > >>> >> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or >>> >> > fixed-length >>> >> > - ARRAY_TYPE with fields that are either of gimple register types >>> >> > (excluding >>> >> > - bit-fields) or (recursively) scalarizable types. CONST_DECL must >>> >> > be true if >>> >> > - we are considering a decl from constant pool. If it is false, >>> >> > char arrays >>> >> > - will be refused. */ >>> >> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length >>> >> > ARRAY_TYPE with >>> >> > + fields/elements that are not bit-fields and are either register >>> >> > types or >>> >> > + recursively comply with simple_mix_of_records_and_arrays_p. >>> >> > Furthermore, if >>> >> > + ALLOW_CHAR_ARRAYS is false, the function will return false also if >>> >> > TYPE >>> >> > + contains an array of elements that only have one byte. */ >>> >> > >>> >> > -static bool >>> >> > -scalarizable_type_p (tree type, bool const_decl) >>> >> > +bool >>> >> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays) >>> >> > { >>> >> > gcc_assert (!is_gimple_reg_type (type)); >>> >> > if (type_contains_placeholder_p (type)) >>> >> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl) >>> >> > return false; >>> >> > >>> >> > if (!is_gimple_reg_type (ft) >>> >> > - && !scalarizable_type_p (ft, const_decl)) >>> >> > + && !simple_mix_of_records_and_arrays_p (ft, >>> >> > allow_char_arrays)) >>> >> > return false; >>> >> > } >>> >> > >>> >> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl) >>> >> > case ARRAY_TYPE: >>> >> > { >>> >> > HOST_WIDE_INT min_elem_size; >>> >> > - if (const_decl) >>> >> > + if (allow_char_arrays) >>> >> > min_elem_size = 0; >>> >> > else >>> >> > min_elem_size = BITS_PER_UNIT; >>> >> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl) >>> >> > >>> >> > tree elem = TREE_TYPE (type); >>> >> > if (!is_gimple_reg_type (elem) >>> >> > - && !scalarizable_type_p (elem, const_decl)) >>> >> > + && !simple_mix_of_records_and_arrays_p (elem, >>> >> > allow_char_arrays)) >>> >> > return false; >>> >> > return true; >>> >> > } >>> >> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool >>> >> > const_decl) >>> >> > } >>> >> > } >>> >> > >>> >> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, >>> >> > tree, tree); >>> >> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, >>> >> > tree, >>> >> > + tree); >>> >> > + >>> >> > +/* For a given array TYPE, return false if its domain does not have >>> >> > any maximum >>> >> > + value. Otherwise calculate MIN and MAX indices of the first and >>> >> > the last >>> >> > + element. */ >>> >> > + >>> >> > +bool >>> >> > +extract_min_max_idx_from_array (tree type, offset_int *min, >>> >> > offset_int *max) >>> >> > +{ >>> >> > + tree domain = TYPE_DOMAIN (type); >>> >> > + tree minidx = TYPE_MIN_VALUE (domain); >>> >> > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); >>> >> > + tree maxidx = TYPE_MAX_VALUE (domain); >>> >> > + if (!maxidx) >>> >> > + return false; >>> >> > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); >>> >> > + >>> >> > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in >>> >> > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ >>> >> > + *min = wi::to_offset (minidx); >>> >> > + *max = wi::to_offset (maxidx); >>> >> > + if (!TYPE_UNSIGNED (domain)) >>> >> > + { >>> >> > + *min = wi::sext (*min, TYPE_PRECISION (domain)); >>> >> > + *max = wi::sext (*max, TYPE_PRECISION (domain)); >>> >> > + } >>> >> > + return true; >>> >> > +} >>> >> > >>> >> > /* Create total_scalarization accesses for all scalar fields of a >>> >> > member >>> >> > - of type DECL_TYPE conforming to scalarizable_type_p. BASE >>> >> > + of type DECL_TYPE conforming to >>> >> > simple_mix_of_records_and_arrays_p. BASE >>> >> > must be the top-most VAR_DECL representing the variable; within >>> >> > that, >>> >> > OFFSET locates the member and REF must be the memory reference >>> >> > expression for >>> >> > the member. */ >>> >> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree >>> >> > decl_type, HOST_WIDE_INT offset, tree ref) >>> >> > { >>> >> > tree elemtype = TREE_TYPE (decl_type); >>> >> > tree elem_size = TYPE_SIZE (elemtype); >>> >> > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); >>> >> > HOST_WIDE_INT el_size = tree_to_shwi (elem_size); >>> >> > gcc_assert (el_size > 0); >>> >> > >>> >> > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); >>> >> > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); >>> >> > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); >>> >> > + offset_int idx, max; >>> >> > /* Skip (some) zero-length arrays; others have MAXIDX == >>> >> > MINIDX - 1. */ >>> >> > - if (maxidx) >>> >> > + if (extract_min_max_idx_from_array (decl_type, &idx, &max)) >>> >> > { >>> >> > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); >>> >> > tree domain = TYPE_DOMAIN (decl_type); >>> >> > - /* MINIDX and MAXIDX are inclusive, and must be >>> >> > interpreted in >>> >> > - DOMAIN (e.g. signed int, whereas min/max may be >>> >> > size_int). */ >>> >> > - offset_int idx = wi::to_offset (minidx); >>> >> > - offset_int max = wi::to_offset (maxidx); >>> >> > - if (!TYPE_UNSIGNED (domain)) >>> >> > - { >>> >> > - idx = wi::sext (idx, TYPE_PRECISION (domain)); >>> >> > - max = wi::sext (max, TYPE_PRECISION (domain)); >>> >> > - } >>> >> > for (int el_off = offset; idx <= max; ++idx) >>> >> > { >>> >> > tree nref = build4 (ARRAY_REF, elemtype, >>> >> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree >>> >> > decl_type, HOST_WIDE_INT offset, tree ref) >>> >> > } >>> >> > >>> >> > /* Create total_scalarization accesses for a member of type TYPE, >>> >> > which must >>> >> > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE >>> >> > must be the >>> >> > - top-most VAR_DECL representing the variable; within that, POS and >>> >> > SIZE locate >>> >> > - the member, REVERSE gives its torage order. and REF must be the >>> >> > reference >>> >> > - expression for it. */ >>> >> > + satisfy either is_gimple_reg_type or >>> >> > simple_mix_of_records_and_arrays_p. >>> >> > + BASE must be the top-most VAR_DECL representing the variable; >>> >> > within that, >>> >> > + POS and SIZE locate the member, REVERSE gives its torage order. >>> >> > and REF must >>> >> > + be the reference expression for it. */ >>> >> > >>> >> > static void >>> >> > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, >>> >> > bool reverse, >>> >> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, >>> >> > HOST_WIDE_INT size, bool reverse, >>> >> > } >>> >> > >>> >> > /* Create a total_scalarization access for VAR as a whole. VAR must >>> >> > be of a >>> >> > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ >>> >> > + RECORD_TYPE or ARRAY_TYPE conforming to >>> >> > + simple_mix_of_records_and_arrays_p. */ >>> >> > >>> >> > static void >>> >> > create_total_scalarization_access (tree var) >>> >> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void) >>> >> > { >>> >> > tree var = candidate (i); >>> >> > >>> >> > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), >>> >> > - constant_decl_p (var))) >>> >> > + if (VAR_P (var) >>> >> > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var), >>> >> > + constant_decl_p >>> >> > (var))) >>> >> > { >>> >> > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) >>> >> > <= max_scalarization_size) >>> >> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h >>> >> > new file mode 100644 >>> >> > index 00000000000..dc901385994 >>> >> > --- /dev/null >>> >> > +++ b/gcc/tree-sra.h >>> >> > @@ -0,0 +1,33 @@ >>> >> > +/* tree-sra.h - Run-time parameters. >>> >> > + Copyright (C) 2017 Free Software Foundation, Inc. >>> >> > + >>> >> > +This file is part of GCC. >>> >> > + >>> >> > +GCC is free software; you can redistribute it and/or modify it under >>> >> > +the terms of the GNU General Public License as published by the Free >>> >> > +Software Foundation; either version 3, or (at your option) any later >>> >> > +version. >>> >> > + >>> >> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY >>> >> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or >>> >> > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License >>> >> > +for more details. >>> >> > + >>> >> > +You should have received a copy of the GNU General Public License >>> >> > +along with GCC; see the file COPYING3. If not see >>> >> > +<http://www.gnu.org/licenses/>. */ >>> >> > + >>> >> > +#ifndef TREE_SRA_H >>> >> > +#define TREE_SRA_H >>> >> > + >>> >> > + >>> >> > +bool simple_mix_of_records_and_arrays_p (tree type, bool >>> >> > allow_char_arrays); >>> >> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx, >>> >> > + offset_int *max); >>> >> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT >>> >> > offset, >>> >> > + bool reverse, tree exp_type, >>> >> > + gimple_stmt_iterator *gsi, bool >>> >> > insert_after); >>> >> > + >>> >> > + >>> >> > + >>> >> > +#endif /* TREE_SRA_H */ >>> >> > -- >>> >> > 2.14.1 >>> >> >