Hi, On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote: > On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor <mjam...@suse.cz> wrote: > > Hi, > > > > I'd like to request comments to the patch below which aims to fix PR > > 80689, which is an instance of a store-to-load forwarding stall on > > x86_64 CPUs in the Image Magick benchmark, which is responsible for a > > slow down of up to 9% compared to gcc 6, depending on options and HW > > used. (Actually, I have just seen 24% in one specific combination but > > for various reasons can no longer verify it today.) > > > > The revision causing the regression is 237074, which increased the > > size of the mode for copying aggregates "by pieces" to 128 bits, > > incurring big stalls when the values being copied are also still being > > stored in a smaller data type or if the copied values are loaded in a > > smaller types shortly afterwards. Such situations happen in Image > > Magick even across calls, which means that any non-IPA flow-sensitive > > approach would not detect them. Therefore, the patch simply changes > > the way we copy small BLKmode data that are simple combinations of > > records and arrays (meaning no unions, bit-fields but also character > > arrays are disallowed) and simply copies them one field and/or element > > at a time. > > > > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs > > (the structure in the benchmark has 32 bytes) but is subject to change > > after more benchmarking and is actually zero - meaning element copying > > never happens - on other architectures. I believe that any > > architecture with a store buffer can benefit but it's probably better > > to leave it to their maintainers to find a different default value. I > > am not sure this is how such HW-dependant decisions should be done and > > is the primary reason, why I am sending this RFC first. > > > > I have decided to implement this change at the expansion level because > > at that point the type information is still readily available and at > > the same time we can also handle various implicit copies, for example > > those passing a parameter. I found I could re-use some bits and > > pieces of tree-SRA and so I did, creating tree-sra.h header file in > > the process. > > > > I am fully aware that in the final patch the new parameter, or indeed > > any new parameters, need to be documented. I have skipped that > > intentionally now and will write the documentation if feedback here is > > generally good. > > > > I have bootstrapped and tested this patch on x86_64-linux, with > > different values of the parameter and only found problems with > > unreasonably high values leading to OOM. I have done the same with a > > previous version of the patch which was equivalent to the limit being > > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran > > into failures of tests which assumed that structure padding was copied > > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also > > for example gcc.dg/vmx/varargs-4.c). > > > > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and > > 8% at -O2 and -Ofast compilation levels respectively on one particular > > new AMD CPU and by 6% and 3% on one particular old Intel machine. > > > > Thanks in advance for any comments, > > I wonder if you can at the place you choose to hook this in elide any > copying of padding between fields. > > I'd rather have hooked such "high level" optimization in > expand_assignment where you can be reasonably sure you're seeing an > actual source-level construct.
I have discussed this with Honza and we eventually decided to make the elememnt-wise copy an alternative to emit_block_move (which uses the larger mode for moving since GCC 7) exactly so that we handle not only source-level assignments but also passing parameters by value and other situations. > > 35 bytes seems to be much - what is the code-size impact? I will find out and report on that. I need at least 32 bytes (four long ints) to fix imagemagick, where the problematic structure is: typedef struct _RectangleInfo { size_t width, height; ssize_t x, y; } RectangleInfo; ...so 8 longs, no padding. Since any aggregate having between 33-35 bytes needs to consist of smaller fields/elements, it seemed reasonable to also copy them element-wise. Nevertheless, I still intend to experiment with the limit, I sent out this RFC exactly so that I don't spend a lot of time benchmarking something that is eventually not deemed acceptable on principle. > > IIRC the reason this may be slow isn't loading in smaller types than stored > before by the copy - the store buffers can handle this reasonably well. It's > solely when previous smaller stores are > > a1) not mergeabe in the store buffer > a2) not merged because earlier stores are already committed > > and > > b) loaded afterwards as a type that would access multiple store buffers > > a) would be sure to happen in case b) involves accessing padding. Is the > Image Magick case one that involves padding in the structure? As I said above, there is no padding. Basically, what happens is that in a number of places, there is a variable region of the aforementioned type and it is initialized and passed to function SetPixelCacheNexusPixels in the following manner: ... region.width=cache_info->columns; region.height=1; region.x=0; region.y=y; pixels=SetPixelCacheNexusPixels(cache_info,ReadMode,®ion,MagickTrue, cache_nexus[id],exception); ... and the first four statements in SetPixelCacheNexusPixels are: assert(cache_info != (const CacheInfo *) NULL); assert(cache_info->signature == MagickSignature); if (cache_info->type == UndefinedCache) return((PixelPacket *) NULL); nexus_info->region=(*region); with the last one generating the stalls, on both Zen-based machines and also on 2-3 years old Intel CPUs. I have had a look at what Agner Fog's micro-architecture document says about store forwarding stalls and: - on Broadwells and Haswells, any "write of any size is followed by a read of a larger size" ioncurs a stall, which fits our example, - on Skylakes: "A read that is bigger than the write, or a read that covers both written and unwritten bytes, takes approximately 11 clock cycles extra" seems to apply - on Intel silvermont, there will also be a stall because "A memory write can be forwarded to a subsequent read of the same size or a smaller size..." - on Zens, Agner Fog says they work perfectly except when crossing a page or when "A read that has a partial overlap with a preceding write has a penalty of 6-7 clock cycles," which must be why I see stalls. So I guess the pending stores are not really merged even without padding, Martin > > Richard. > > > Martin > > > > > > 2017-10-12 Martin Jambor <mjam...@suse.cz> > > > > PR target/80689 > > * tree-sra.h: New file. > > * ipa-prop.h: Moved declaration of build_ref_for_offset to > > tree-sra.h. > > * expr.c: Include params.h and tree-sra.h. > > (emit_move_elementwise): New function. > > (store_expr_with_bounds): Optionally use it. > > * ipa-cp.c: Include tree-sra.h. > > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New. > > * config/i386/i386.c (ix86_option_override_internal): Set > > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35. > > * tree-sra.c: Include tree-sra.h. > > (scalarizable_type_p): Renamed to > > simple_mix_of_records_and_arrays_p, made public, renamed the > > second parameter to allow_char_arrays. > > (extract_min_max_idx_from_array): New function. > > (completely_scalarize): Moved bits of the function to > > extract_min_max_idx_from_array. > > > > testsuite/ > > * gcc.target/i386/pr80689-1.c: New test. > > --- > > gcc/config/i386/i386.c | 4 ++ > > gcc/expr.c | 103 > > ++++++++++++++++++++++++++++-- > > gcc/ipa-cp.c | 1 + > > gcc/ipa-prop.h | 4 -- > > gcc/params.def | 6 ++ > > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++++ > > gcc/tree-sra.c | 86 +++++++++++++++---------- > > gcc/tree-sra.h | 33 ++++++++++ > > 8 files changed, 233 insertions(+), 42 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c > > create mode 100644 gcc/tree-sra.h > > > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > > index 1ee8351c21f..87f602e7ead 100644 > > --- a/gcc/config/i386/i386.c > > +++ b/gcc/config/i386/i386.c > > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p, > > ix86_tune_cost->l2_cache_size, > > opts->x_param_values, > > opts_set->x_param_values); > > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > > + 35, > > + opts->x_param_values, > > + opts_set->x_param_values); > > > > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ > > if (opts->x_flag_prefetch_loop_arrays < 0 > > diff --git a/gcc/expr.c b/gcc/expr.c > > index 134ee731c29..dff24e7f166 100644 > > --- a/gcc/expr.c > > +++ b/gcc/expr.c > > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see > > #include "tree-chkp.h" > > #include "rtl-chkp.h" > > #include "ccmp.h" > > - > > +#include "params.h" > > +#include "tree-sra.h" > > > > /* If this is nonzero, we do not bother generating VOLATILE > > around volatile memory references, and we are willing to > > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from) > > return maybe_expand_insn (code, 2, ops); > > } > > > > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to > > TARGET > > + plus OFFSET, but do so element-wise and/or field-wise for each record > > and > > + array within TYPE. TYPE must either be a register type or an aggregate > > + complying with scalarizable_type_p. > > + > > + If CALL_PARAM_P is nonzero, this is a store into a call param on the > > + stack, and block moves may need to be treated specially. */ > > + > > +static void > > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT > > offset, > > + int call_param_p) > > +{ > > + switch (TREE_CODE (type)) > > + { > > + case RECORD_TYPE: > > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > > + if (TREE_CODE (fld) == FIELD_DECL) > > + { > > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld); > > + tree ft = TREE_TYPE (fld); > > + emit_move_elementwise (ft, target, source, fld_offset, > > + call_param_p); > > + } > > + break; > > + > > + case ARRAY_TYPE: > > + { > > + tree elem_type = TREE_TYPE (type); > > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type)); > > + gcc_assert (el_size > 0); > > + > > + offset_int idx, max; > > + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - > > 1. */ > > + if (extract_min_max_idx_from_array (type, &idx, &max)) > > + { > > + HOST_WIDE_INT el_offset = offset; > > + for (; idx <= max; ++idx) > > + { > > + emit_move_elementwise (elem_type, target, source, el_offset, > > + call_param_p); > > + el_offset += el_size; > > + } > > + } > > + } > > + break; > > + default: > > + machine_mode mode = TYPE_MODE (type); > > + > > + rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT); > > + rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT); > > + > > + /* TODO: Figure out whether the following is actually necessary. */ > > + if (target == ntgt) > > + ntgt = copy_rtx (target); > > + if (source == nsrc) > > + nsrc = copy_rtx (source); > > + > > + gcc_assert (mode != VOIDmode); > > + if (mode != BLKmode) > > + emit_move_insn (ntgt, nsrc); > > + else > > + { > > + /* For example vector gimple registers can end up here. */ > > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX, > > + TYPE_MODE (sizetype), EXPAND_NORMAL); > > + emit_block_move (ntgt, nsrc, size, > > + (call_param_p > > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > > + } > > + break; > > + } > > + return; > > +} > > + > > /* Generate code for computing expression EXP, > > and storing the value into TARGET. > > > > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, int > > call_param_p, > > emit_group_store (target, temp, TREE_TYPE (exp), > > int_size_in_bytes (TREE_TYPE (exp))); > > else if (GET_MODE (temp) == BLKmode) > > - emit_block_move (target, temp, expr_size (exp), > > - (call_param_p > > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > > + { > > + /* Copying smallish BLKmode structures with emit_block_move and > > thus > > + by-pieces can result in store-to-load stalls. So copy some > > simple > > + small aggregates element or field-wise. */ > > + if (GET_MODE (target) == BLKmode > > + && AGGREGATE_TYPE_P (TREE_TYPE (exp)) > > + && !TREE_ADDRESSABLE (TREE_TYPE (exp)) > > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp))) > > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp))) > > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY) > > + * BITS_PER_UNIT)) > > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), > > false)) > > + { > > + /* FIXME: Can this happen? What would it mean? */ > > + gcc_assert (!reverse); > > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0, > > + call_param_p); > > + } > > + else > > + emit_block_move (target, temp, expr_size (exp), > > + (call_param_p > > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > > + } > > /* If we emit a nontemporal store, there is nothing else to do. */ > > else if (nontemporal && emit_storent_insn (target, temp)) > > ; > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > > index 6b3d8d7364c..7d6019bbd30 100644 > > --- a/gcc/ipa-cp.c > > +++ b/gcc/ipa-cp.c > > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > > #include "tree-ssa-ccp.h" > > #include "stringpool.h" > > #include "attribs.h" > > +#include "tree-sra.h" > > > > template <typename valtype> class ipcp_value; > > > > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h > > index fa5bed49ee0..2313cc884ed 100644 > > --- a/gcc/ipa-prop.h > > +++ b/gcc/ipa-prop.h > > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate > > (tree **, bool *, > > void ipa_release_body_info (struct ipa_func_body_info *); > > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); > > > > -/* From tree-sra.c: */ > > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, > > - gimple_stmt_iterator *, bool); > > - > > /* In ipa-cp.c */ > > void ipa_cp_c_finalize (void); > > > > diff --git a/gcc/params.def b/gcc/params.def > > index e55afc28053..5e19f1414a0 100644 > > --- a/gcc/params.def > > +++ b/gcc/params.def > > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK, > > "Enable loop epilogue vectorization using smaller vector size.", > > 0, 0, 1) > > > > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > > + "max-size-for-elementwise-copy", > > + "Maximum size in bytes of a structure or array to by considered > > for " > > + "copying by its individual fields or elements", > > + 0, 0, 512) > > + > > /* > > > > Local variables: > > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c > > b/gcc/testsuite/gcc.target/i386/pr80689-1.c > > new file mode 100644 > > index 00000000000..4156d4fba45 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +typedef struct st1 > > +{ > > + long unsigned int a,b; > > + long int c,d; > > +}R; > > + > > +typedef struct st2 > > +{ > > + int t; > > + R reg; > > +}N; > > + > > +void Set (const R *region, N *n_info ); > > + > > +void test(N *n_obj ,const long unsigned int a, const long unsigned int b, > > const long int c,const long int d) > > +{ > > + R reg; > > + > > + reg.a=a; > > + reg.b=b; > > + reg.c=c; > > + reg.d=d; > > + Set (®, n_obj); > > + > > +} > > + > > +void Set (const R *reg, N *n_obj ) > > +{ > > + n_obj->reg=(*reg); > > +} > > + > > + > > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */ > > +/* { dg-final { scan-assembler-not "movdqu" } } */ > > +/* { dg-final { scan-assembler-not "movups" } } */ > > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > > index bac593951e7..ade97964205 100644 > > --- a/gcc/tree-sra.c > > +++ b/gcc/tree-sra.c > > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see > > #include "ipa-fnsummary.h" > > #include "ipa-utils.h" > > #include "builtins.h" > > +#include "tree-sra.h" > > > > /* Enumeration of all aggregate reductions we can do. */ > > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ > > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool write) > > } > > > > > > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or > > fixed-length > > - ARRAY_TYPE with fields that are either of gimple register types > > (excluding > > - bit-fields) or (recursively) scalarizable types. CONST_DECL must be > > true if > > - we are considering a decl from constant pool. If it is false, char > > arrays > > - will be refused. */ > > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE > > with > > + fields/elements that are not bit-fields and are either register types or > > + recursively comply with simple_mix_of_records_and_arrays_p. > > Furthermore, if > > + ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE > > + contains an array of elements that only have one byte. */ > > > > -static bool > > -scalarizable_type_p (tree type, bool const_decl) > > +bool > > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays) > > { > > gcc_assert (!is_gimple_reg_type (type)); > > if (type_contains_placeholder_p (type)) > > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl) > > return false; > > > > if (!is_gimple_reg_type (ft) > > - && !scalarizable_type_p (ft, const_decl)) > > + && !simple_mix_of_records_and_arrays_p (ft, > > allow_char_arrays)) > > return false; > > } > > > > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl) > > case ARRAY_TYPE: > > { > > HOST_WIDE_INT min_elem_size; > > - if (const_decl) > > + if (allow_char_arrays) > > min_elem_size = 0; > > else > > min_elem_size = BITS_PER_UNIT; > > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl) > > > > tree elem = TREE_TYPE (type); > > if (!is_gimple_reg_type (elem) > > - && !scalarizable_type_p (elem, const_decl)) > > + && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays)) > > return false; > > return true; > > } > > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool const_decl) > > } > > } > > > > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, > > tree, tree); > > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, > > + tree); > > + > > +/* For a given array TYPE, return false if its domain does not have any > > maximum > > + value. Otherwise calculate MIN and MAX indices of the first and the > > last > > + element. */ > > + > > +bool > > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int > > *max) > > +{ > > + tree domain = TYPE_DOMAIN (type); > > + tree minidx = TYPE_MIN_VALUE (domain); > > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > > + tree maxidx = TYPE_MAX_VALUE (domain); > > + if (!maxidx) > > + return false; > > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > > + > > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in > > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ > > + *min = wi::to_offset (minidx); > > + *max = wi::to_offset (maxidx); > > + if (!TYPE_UNSIGNED (domain)) > > + { > > + *min = wi::sext (*min, TYPE_PRECISION (domain)); > > + *max = wi::sext (*max, TYPE_PRECISION (domain)); > > + } > > + return true; > > +} > > > > /* Create total_scalarization accesses for all scalar fields of a member > > - of type DECL_TYPE conforming to scalarizable_type_p. BASE > > + of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p. > > BASE > > must be the top-most VAR_DECL representing the variable; within that, > > OFFSET locates the member and REF must be the memory reference > > expression for > > the member. */ > > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree decl_type, > > HOST_WIDE_INT offset, tree ref) > > { > > tree elemtype = TREE_TYPE (decl_type); > > tree elem_size = TYPE_SIZE (elemtype); > > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); > > HOST_WIDE_INT el_size = tree_to_shwi (elem_size); > > gcc_assert (el_size > 0); > > > > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); > > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); > > + offset_int idx, max; > > /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - > > 1. */ > > - if (maxidx) > > + if (extract_min_max_idx_from_array (decl_type, &idx, &max)) > > { > > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > > tree domain = TYPE_DOMAIN (decl_type); > > - /* MINIDX and MAXIDX are inclusive, and must be interpreted in > > - DOMAIN (e.g. signed int, whereas min/max may be size_int). > > */ > > - offset_int idx = wi::to_offset (minidx); > > - offset_int max = wi::to_offset (maxidx); > > - if (!TYPE_UNSIGNED (domain)) > > - { > > - idx = wi::sext (idx, TYPE_PRECISION (domain)); > > - max = wi::sext (max, TYPE_PRECISION (domain)); > > - } > > for (int el_off = offset; idx <= max; ++idx) > > { > > tree nref = build4 (ARRAY_REF, elemtype, > > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree decl_type, > > HOST_WIDE_INT offset, tree ref) > > } > > > > /* Create total_scalarization accesses for a member of type TYPE, which > > must > > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be > > the > > - top-most VAR_DECL representing the variable; within that, POS and SIZE > > locate > > - the member, REVERSE gives its torage order. and REF must be the > > reference > > - expression for it. */ > > + satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p. > > + BASE must be the top-most VAR_DECL representing the variable; within > > that, > > + POS and SIZE locate the member, REVERSE gives its torage order. and REF > > must > > + be the reference expression for it. */ > > > > static void > > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool > > reverse, > > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, > > HOST_WIDE_INT size, bool reverse, > > } > > > > /* Create a total_scalarization access for VAR as a whole. VAR must be of > > a > > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ > > + RECORD_TYPE or ARRAY_TYPE conforming to > > + simple_mix_of_records_and_arrays_p. */ > > > > static void > > create_total_scalarization_access (tree var) > > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void) > > { > > tree var = candidate (i); > > > > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > > - constant_decl_p (var))) > > + if (VAR_P (var) > > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var), > > + constant_decl_p (var))) > > { > > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > > <= max_scalarization_size) > > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h > > new file mode 100644 > > index 00000000000..dc901385994 > > --- /dev/null > > +++ b/gcc/tree-sra.h > > @@ -0,0 +1,33 @@ > > +/* tree-sra.h - Run-time parameters. > > + Copyright (C) 2017 Free Software Foundation, Inc. > > + > > +This file is part of GCC. > > + > > +GCC is free software; you can redistribute it and/or modify it under > > +the terms of the GNU General Public License as published by the Free > > +Software Foundation; either version 3, or (at your option) any later > > +version. > > + > > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > > +for more details. > > + > > +You should have received a copy of the GNU General Public License > > +along with GCC; see the file COPYING3. If not see > > +<http://www.gnu.org/licenses/>. */ > > + > > +#ifndef TREE_SRA_H > > +#define TREE_SRA_H > > + > > + > > +bool simple_mix_of_records_and_arrays_p (tree type, bool > > allow_char_arrays); > > +bool extract_min_max_idx_from_array (tree type, offset_int *idx, > > + offset_int *max); > > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, > > + bool reverse, tree exp_type, > > + gimple_stmt_iterator *gsi, bool insert_after); > > + > > + > > + > > +#endif /* TREE_SRA_H */ > > -- > > 2.14.1 > >