Re: [RFC, PR 80689] Copy small aggregates element-wise

Martin Jambor Thu, 26 Oct 2017 05:20:23 -0700

Hi,

On Tue, Oct 17, 2017 at 01:34:54PM +0200, Richard Biener wrote:
> On Fri, Oct 13, 2017 at 6:13 PM, Martin Jambor <mjam...@suse.cz> wrote:
> > Hi,
> >
> > I'd like to request comments to the patch below which aims to fix PR
> > 80689, which is an instance of a store-to-load forwarding stall on
> > x86_64 CPUs in the Image Magick benchmark, which is responsible for a
> > slow down of up to 9% compared to gcc 6, depending on options and HW
> > used.  (Actually, I have just seen 24% in one specific combination but
> > for various reasons can no longer verify it today.)
> >
> > The revision causing the regression is 237074, which increased the
> > size of the mode for copying aggregates "by pieces" to 128 bits,
> > incurring big stalls when the values being copied are also still being
> > stored in a smaller data type or if the copied values are loaded in a
> > smaller types shortly afterwards.  Such situations happen in Image
> > Magick even across calls, which means that any non-IPA flow-sensitive
> > approach would not detect them.  Therefore, the patch simply changes
> > the way we copy small BLKmode data that are simple combinations of
> > records and arrays (meaning no unions, bit-fields but also character
> > arrays are disallowed) and simply copies them one field and/or element
> > at a time.
> >
> > "Small" in this RFC patch means up to 35 bytes on x86_64 and i386 CPUs
> > (the structure in the benchmark has 32 bytes) but is subject to change
> > after more benchmarking and is actually zero - meaning element copying
> > never happens - on other architectures.  I believe that any
> > architecture with a store buffer can benefit but it's probably better
> > to leave it to their maintainers to find a different default value.  I
> > am not sure this is how such HW-dependant decisions should be done and
> > is the primary reason, why I am sending this RFC first.
> >
> > I have decided to implement this change at the expansion level because
> > at that point the type information is still readily available and at
> > the same time we can also handle various implicit copies, for example
> > those passing a parameter.  I found I could re-use some bits and
> > pieces of tree-SRA and so I did, creating tree-sra.h header file in
> > the process.
> >
> > I am fully aware that in the final patch the new parameter, or indeed
> > any new parameters, need to be documented.  I have skipped that
> > intentionally now and will write the documentation if feedback here is
> > generally good.
> >
> > I have bootstrapped and tested this patch on x86_64-linux, with
> > different values of the parameter and only found problems with
> > unreasonably high values leading to OOM.  I have done the same with a
> > previous version of the patch which was equivalent to the limit being
> > 64 bytes on aarch64-linux, ppc64le-linux and ia64-linux and only ran
> > into failures of tests which assumed that structure padding was copied
> > in aggregate copies (mostly gcc.target/aarch64/aapcs64/ stuff but also
> > for example gcc.dg/vmx/varargs-4.c).
> >
> > The patch decreases the SPEC 2017 "rate" run-time of imagick by 9% and
> > 8% at -O2 and -Ofast compilation levels respectively on one particular
> > new AMD CPU and by 6% and 3% on one particular old Intel machine.
> >
> > Thanks in advance for any comments,
> 
> I wonder if you can at the place you choose to hook this in elide any
> copying of padding between fields.
> 
> I'd rather have hooked such "high level" optimization in
> expand_assignment where you can be reasonably sure you're seeing an
> actual source-level construct.


I have discussed this with Honza and we eventually decided to make the
elememnt-wise copy an alternative to emit_block_move (which uses the
larger mode for moving since GCC 7) exactly so that we handle not only
source-level assignments but also passing parameters by value and
other situations.

> 
> 35 bytes seems to be much - what is the code-size impact?

I will find out and report on that.  I need at least 32 bytes (four
long ints) to fix imagemagick, where the problematic structure is:

  typedef struct _RectangleInfo
  {
    size_t
      width,
      height;
  
    ssize_t
      x,
      y;
  } RectangleInfo;

...so 8 longs, no padding.  Since any aggregate having between 33-35
bytes needs to consist of smaller fields/elements, it seemed
reasonable to also copy them element-wise.

Nevertheless, I still intend to experiment with the limit, I sent out
this RFC exactly so that I don't spend a lot of time benchmarking
something that is eventually not deemed acceptable on principle.

> 
> IIRC the reason this may be slow isn't loading in smaller types than stored
> before by the copy - the store buffers can handle this reasonably well.  It's
> solely when previous smaller stores are
> 
>   a1) not mergeabe in the store buffer
>   a2) not merged because earlier stores are already committed
> 
> and
> 
>   b) loaded afterwards as a type that would access multiple store buffers
> 
> a) would be sure to happen in case b) involves accessing padding.  Is the
> Image Magick case one that involves padding in the structure?

As I said above, there is no padding.

Basically, what happens is that in a number of places, there is a
variable region of the aforementioned type and it is initialized and
passed to function SetPixelCacheNexusPixels in the following manner:

    ...
    region.width=cache_info->columns;
    region.height=1;
    region.x=0;
    region.y=y;
    pixels=SetPixelCacheNexusPixels(cache_info,ReadMode,&region,MagickTrue,
      cache_nexus[id],exception);
    ...

and the first four statements in SetPixelCacheNexusPixels are:

  assert(cache_info != (const CacheInfo *) NULL);
  assert(cache_info->signature == MagickSignature);
  if (cache_info->type == UndefinedCache)
    return((PixelPacket *) NULL);
  nexus_info->region=(*region);

with the last one generating the stalls, on both Zen-based machines
and also on 2-3 years old Intel CPUs.

I have had a look at what Agner Fog's micro-architecture document says
about store forwarding stalls and:

  - on Broadwells and Haswells, any "write of any size is followed by
    a read of a larger size" ioncurs a stall, which fits our example,
  - on Skylakes: "A read that is bigger than the write, or a read that
    covers both written and unwritten bytes, takes approximately 11
    clock cycles extra" seems to apply
  - on Intel silvermont, there will also be a stall because "A memory
    write can be forwarded to a subsequent read of the same size or a
    smaller size..."
  - on Zens, Agner Fog says they work perfectly except when crossing a
    page or when "A read that has a partial overlap with a preceding
    write has a penalty of 6-7 clock cycles," which must be why I see
    stalls.

So I guess the pending stores are not really merged even without
padding,

Martin


> 
> Richard.
> 
> > Martin
> >
> >
> > 2017-10-12  Martin Jambor  <mjam...@suse.cz>
> >
> >         PR target/80689
> >         * tree-sra.h: New file.
> >         * ipa-prop.h: Moved declaration of build_ref_for_offset to
> >         tree-sra.h.
> >         * expr.c: Include params.h and tree-sra.h.
> >         (emit_move_elementwise): New function.
> >         (store_expr_with_bounds): Optionally use it.
> >         * ipa-cp.c: Include tree-sra.h.
> >         * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> >         * config/i386/i386.c (ix86_option_override_internal): Set
> >         PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> >         * tree-sra.c: Include tree-sra.h.
> >         (scalarizable_type_p): Renamed to
> >         simple_mix_of_records_and_arrays_p, made public, renamed the
> >         second parameter to allow_char_arrays.
> >         (extract_min_max_idx_from_array): New function.
> >         (completely_scalarize): Moved bits of the function to
> >         extract_min_max_idx_from_array.
> >
> >         testsuite/
> >         * gcc.target/i386/pr80689-1.c: New test.
> > ---
> >  gcc/config/i386/i386.c                    |   4 ++
> >  gcc/expr.c                                | 103 
> > ++++++++++++++++++++++++++++--
> >  gcc/ipa-cp.c                              |   1 +
> >  gcc/ipa-prop.h                            |   4 --
> >  gcc/params.def                            |   6 ++
> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++++++++++
> >  gcc/tree-sra.c                            |  86 +++++++++++++++----------
> >  gcc/tree-sra.h                            |  33 ++++++++++
> >  8 files changed, 233 insertions(+), 42 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
> >  create mode 100644 gcc/tree-sra.h
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index 1ee8351c21f..87f602e7ead 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
> >                          ix86_tune_cost->l2_cache_size,
> >                          opts->x_param_values,
> >                          opts_set->x_param_values);
> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> > +                        35,
> > +                        opts->x_param_values,
> > +                        opts_set->x_param_values);
> >
> >    /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful.  */
> >    if (opts->x_flag_prefetch_loop_arrays < 0
> > diff --git a/gcc/expr.c b/gcc/expr.c
> > index 134ee731c29..dff24e7f166 100644
> > --- a/gcc/expr.c
> > +++ b/gcc/expr.c
> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-chkp.h"
> >  #include "rtl-chkp.h"
> >  #include "ccmp.h"
> > -
> > +#include "params.h"
> > +#include "tree-sra.h"
> >
> >  /* If this is nonzero, we do not bother generating VOLATILE
> >     around volatile memory references, and we are willing to
> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
> >    return maybe_expand_insn (code, 2, ops);
> >  }
> >
> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to 
> > TARGET
> > +   plus OFFSET, but do so element-wise and/or field-wise for each record 
> > and
> > +   array within TYPE.  TYPE must either be a register type or an aggregate
> > +   complying with scalarizable_type_p.
> > +
> > +   If CALL_PARAM_P is nonzero, this is a store into a call param on the
> > +   stack, and block moves may need to be treated specially.  */
> > +
> > +static void
> > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT 
> > offset,
> > +                      int call_param_p)
> > +{
> > +  switch (TREE_CODE (type))
> > +    {
> > +    case RECORD_TYPE:
> > +      for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
> > +       if (TREE_CODE (fld) == FIELD_DECL)
> > +         {
> > +           HOST_WIDE_INT fld_offset = offset + int_bit_position (fld);
> > +           tree ft = TREE_TYPE (fld);
> > +           emit_move_elementwise (ft, target, source, fld_offset,
> > +                                  call_param_p);
> > +         }
> > +      break;
> > +
> > +    case ARRAY_TYPE:
> > +      {
> > +       tree elem_type = TREE_TYPE (type);
> > +       HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type));
> > +       gcc_assert (el_size > 0);
> > +
> > +       offset_int idx, max;
> > +       /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 
> > 1.  */
> > +       if (extract_min_max_idx_from_array (type, &idx, &max))
> > +         {
> > +           HOST_WIDE_INT el_offset = offset;
> > +           for (; idx <= max; ++idx)
> > +             {
> > +               emit_move_elementwise (elem_type, target, source, el_offset,
> > +                                      call_param_p);
> > +               el_offset += el_size;
> > +             }
> > +         }
> > +      }
> > +      break;
> > +    default:
> > +      machine_mode mode = TYPE_MODE (type);
> > +
> > +      rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT);
> > +      rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT);
> > +
> > +      /* TODO: Figure out whether the following is actually necessary.  */
> > +      if (target == ntgt)
> > +       ntgt = copy_rtx (target);
> > +      if (source == nsrc)
> > +       nsrc = copy_rtx (source);
> > +
> > +      gcc_assert (mode != VOIDmode);
> > +      if (mode != BLKmode)
> > +       emit_move_insn (ntgt, nsrc);
> > +      else
> > +       {
> > +         /* For example vector gimple registers can end up here.  */
> > +         rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX,
> > +                                 TYPE_MODE (sizetype), EXPAND_NORMAL);
> > +         emit_block_move (ntgt, nsrc, size,
> > +                          (call_param_p
> > +                           ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > +       }
> > +      break;
> > +    }
> > +  return;
> > +}
> > +
> >  /* Generate code for computing expression EXP,
> >     and storing the value into TARGET.
> >
> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, int 
> > call_param_p,
> >         emit_group_store (target, temp, TREE_TYPE (exp),
> >                           int_size_in_bytes (TREE_TYPE (exp)));
> >        else if (GET_MODE (temp) == BLKmode)
> > -       emit_block_move (target, temp, expr_size (exp),
> > -                        (call_param_p
> > -                         ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > +       {
> > +         /* Copying smallish BLKmode structures with emit_block_move and 
> > thus
> > +            by-pieces can result in store-to-load stalls.  So copy some 
> > simple
> > +            small aggregates element or field-wise.  */
> > +         if (GET_MODE (target) == BLKmode
> > +             && AGGREGATE_TYPE_P (TREE_TYPE (exp))
> > +             && !TREE_ADDRESSABLE (TREE_TYPE (exp))
> > +             && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp)))
> > +             && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
> > +                 <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY)
> > +                     * BITS_PER_UNIT))
> > +             && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), 
> > false))
> > +           {
> > +             /* FIXME:  Can this happen?  What would it mean?  */
> > +             gcc_assert (!reverse);
> > +             emit_move_elementwise (TREE_TYPE (exp), target, temp, 0,
> > +                                    call_param_p);
> > +           }
> > +         else
> > +           emit_block_move (target, temp, expr_size (exp),
> > +                            (call_param_p
> > +                             ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> > +       }
> >        /* If we emit a nontemporal store, there is nothing else to do.  */
> >        else if (nontemporal && emit_storent_insn (target, temp))
> >         ;
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index 6b3d8d7364c..7d6019bbd30 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-ssa-ccp.h"
> >  #include "stringpool.h"
> >  #include "attribs.h"
> > +#include "tree-sra.h"
> >
> >  template <typename valtype> class ipcp_value;
> >
> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> > index fa5bed49ee0..2313cc884ed 100644
> > --- a/gcc/ipa-prop.h
> > +++ b/gcc/ipa-prop.h
> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate 
> > (tree **, bool *,
> >  void ipa_release_body_info (struct ipa_func_body_info *);
> >  tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
> >
> > -/* From tree-sra.c:  */
> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
> > -                          gimple_stmt_iterator *, bool);
> > -
> >  /* In ipa-cp.c  */
> >  void ipa_cp_c_finalize (void);
> >
> > diff --git a/gcc/params.def b/gcc/params.def
> > index e55afc28053..5e19f1414a0 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
> >           "Enable loop epilogue vectorization using smaller vector size.",
> >           0, 0, 1)
> >
> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
> > +         "max-size-for-elementwise-copy",
> > +         "Maximum size in bytes of a structure or array to by considered 
> > for "
> > +         "copying by its individual fields or elements",
> > +         0, 0, 512)
> > +
> >  /*
> >
> >  Local variables:
> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c 
> > b/gcc/testsuite/gcc.target/i386/pr80689-1.c
> > new file mode 100644
> > index 00000000000..4156d4fba45
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +typedef struct st1
> > +{
> > +        long unsigned int a,b;
> > +        long int c,d;
> > +}R;
> > +
> > +typedef struct st2
> > +{
> > +        int  t;
> > +        R  reg;
> > +}N;
> > +
> > +void Set (const R *region,  N *n_info );
> > +
> > +void test(N  *n_obj ,const long unsigned int a, const long unsigned int b, 
> >  const long int c,const long int d)
> > +{
> > +        R reg;
> > +
> > +        reg.a=a;
> > +        reg.b=b;
> > +        reg.c=c;
> > +        reg.d=d;
> > +        Set (&reg, n_obj);
> > +
> > +}
> > +
> > +void Set (const R *reg,  N *n_obj )
> > +{
> > +        n_obj->reg=(*reg);
> > +}
> > +
> > +
> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */
> > +/* { dg-final { scan-assembler-not "movdqu" } } */
> > +/* { dg-final { scan-assembler-not "movups" } } */
> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> > index bac593951e7..ade97964205 100644
> > --- a/gcc/tree-sra.c
> > +++ b/gcc/tree-sra.c
> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "ipa-fnsummary.h"
> >  #include "ipa-utils.h"
> >  #include "builtins.h"
> > +#include "tree-sra.h"
> >
> >  /* Enumeration of all aggregate reductions we can do.  */
> >  enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool write)
> >  }
> >
> >
> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or 
> > fixed-length
> > -   ARRAY_TYPE with fields that are either of gimple register types 
> > (excluding
> > -   bit-fields) or (recursively) scalarizable types.  CONST_DECL must be 
> > true if
> > -   we are considering a decl from constant pool.  If it is false, char 
> > arrays
> > -   will be refused.  */
> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE 
> > with
> > +   fields/elements that are not bit-fields and are either register types or
> > +   recursively comply with simple_mix_of_records_and_arrays_p.  
> > Furthermore, if
> > +   ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE
> > +   contains an array of elements that only have one byte.  */
> >
> > -static bool
> > -scalarizable_type_p (tree type, bool const_decl)
> > +bool
> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays)
> >  {
> >    gcc_assert (!is_gimple_reg_type (type));
> >    if (type_contains_placeholder_p (type))
> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl)
> >             return false;
> >
> >           if (!is_gimple_reg_type (ft)
> > -             && !scalarizable_type_p (ft, const_decl))
> > +             && !simple_mix_of_records_and_arrays_p (ft, 
> > allow_char_arrays))
> >             return false;
> >         }
> >
> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl)
> >    case ARRAY_TYPE:
> >      {
> >        HOST_WIDE_INT min_elem_size;
> > -      if (const_decl)
> > +      if (allow_char_arrays)
> >         min_elem_size = 0;
> >        else
> >         min_elem_size = BITS_PER_UNIT;
> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl)
> >
> >        tree elem = TREE_TYPE (type);
> >        if (!is_gimple_reg_type (elem)
> > -         && !scalarizable_type_p (elem, const_decl))
> > +         && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays))
> >         return false;
> >        return true;
> >      }
> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool const_decl)
> >    }
> >  }
> >
> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, 
> > tree, tree);
> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree,
> > +                           tree);
> > +
> > +/* For a given array TYPE, return false if its domain does not have any 
> > maximum
> > +   value.  Otherwise calculate MIN and MAX indices of the first and the 
> > last
> > +   element.  */
> > +
> > +bool
> > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int 
> > *max)
> > +{
> > +  tree domain = TYPE_DOMAIN (type);
> > +  tree minidx = TYPE_MIN_VALUE (domain);
> > +  gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> > +  tree maxidx = TYPE_MAX_VALUE (domain);
> > +  if (!maxidx)
> > +    return false;
> > +  gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> > +
> > +  /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> > +     DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
> > +  *min = wi::to_offset (minidx);
> > +  *max = wi::to_offset (maxidx);
> > +  if (!TYPE_UNSIGNED (domain))
> > +    {
> > +      *min = wi::sext (*min, TYPE_PRECISION (domain));
> > +      *max = wi::sext (*max, TYPE_PRECISION (domain));
> > +    }
> > +  return true;
> > +}
> >
> >  /* Create total_scalarization accesses for all scalar fields of a member
> > -   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
> > +   of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p.  
> > BASE
> >     must be the top-most VAR_DECL representing the variable; within that,
> >     OFFSET locates the member and REF must be the memory reference 
> > expression for
> >     the member.  */
> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree decl_type, 
> > HOST_WIDE_INT offset, tree ref)
> >        {
> >         tree elemtype = TREE_TYPE (decl_type);
> >         tree elem_size = TYPE_SIZE (elemtype);
> > -       gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
> >         HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
> >         gcc_assert (el_size > 0);
> >
> > -       tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
> > -       gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> > -       tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
> > +       offset_int idx, max;
> >         /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 
> > 1.  */
> > -       if (maxidx)
> > +       if (extract_min_max_idx_from_array (decl_type, &idx, &max))
> >           {
> > -           gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> >             tree domain = TYPE_DOMAIN (decl_type);
> > -           /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> > -              DOMAIN (e.g. signed int, whereas min/max may be size_int).  
> > */
> > -           offset_int idx = wi::to_offset (minidx);
> > -           offset_int max = wi::to_offset (maxidx);
> > -           if (!TYPE_UNSIGNED (domain))
> > -             {
> > -               idx = wi::sext (idx, TYPE_PRECISION (domain));
> > -               max = wi::sext (max, TYPE_PRECISION (domain));
> > -             }
> >             for (int el_off = offset; idx <= max; ++idx)
> >               {
> >                 tree nref = build4 (ARRAY_REF, elemtype,
> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree decl_type, 
> > HOST_WIDE_INT offset, tree ref)
> >  }
> >
> >  /* Create total_scalarization accesses for a member of type TYPE, which 
> > must
> > -   satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be 
> > the
> > -   top-most VAR_DECL representing the variable; within that, POS and SIZE 
> > locate
> > -   the member, REVERSE gives its torage order. and REF must be the 
> > reference
> > -   expression for it.  */
> > +   satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p.
> > +   BASE must be the top-most VAR_DECL representing the variable; within 
> > that,
> > +   POS and SIZE locate the member, REVERSE gives its torage order. and REF 
> > must
> > +   be the reference expression for it.  */
> >
> >  static void
> >  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool 
> > reverse,
> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, 
> > HOST_WIDE_INT size, bool reverse,
> >  }
> >
> >  /* Create a total_scalarization access for VAR as a whole.  VAR must be of 
> > a
> > -   RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p.  */
> > +   RECORD_TYPE or ARRAY_TYPE conforming to
> > +   simple_mix_of_records_and_arrays_p.  */
> >
> >  static void
> >  create_total_scalarization_access (tree var)
> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void)
> >        {
> >         tree var = candidate (i);
> >
> > -       if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> > -                                               constant_decl_p (var)))
> > +       if (VAR_P (var)
> > +           && simple_mix_of_records_and_arrays_p (TREE_TYPE (var),
> > +                                                  constant_decl_p (var)))
> >           {
> >             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> >                 <= max_scalarization_size)
> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> > new file mode 100644
> > index 00000000000..dc901385994
> > --- /dev/null
> > +++ b/gcc/tree-sra.h
> > @@ -0,0 +1,33 @@
> > +/* tree-sra.h - Run-time parameters.
> > +   Copyright (C) 2017 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it under
> > +the terms of the GNU General Public License as published by the Free
> > +Software Foundation; either version 3, or (at your option) any later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#ifndef TREE_SRA_H
> > +#define TREE_SRA_H
> > +
> > +
> > +bool simple_mix_of_records_and_arrays_p (tree type, bool 
> > allow_char_arrays);
> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx,
> > +                                    offset_int *max);
> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
> > +                          bool reverse, tree exp_type,
> > +                          gimple_stmt_iterator *gsi, bool insert_after);
> > +
> > +
> > +
> > +#endif /* TREE_SRA_H */
> > --
> > 2.14.1
> >

Re: [RFC, PR 80689] Copy small aggregates element-wise

Reply via email to