On Thu, Oct 26, 2017 at 4:38 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka <hubi...@ucw.cz> wrote:
>>> I think the limit should be on the number of generated copies and not
>>> the overall size of the structure...  If the struct were composed of
>>> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
>>>
>>> I wonder how rep; movb; interacts with store to load forwarding?  Is
>>> that maybe optimized well on some archs?  movb should always
>>> forward and wasn't the setup cost for small N reasonable on modern
>>> CPUs?
>>
>> rep mov is win over loop for blocks over 128bytes on core, for blocks in rage
>> 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
>> a cheap way around.
>>
>>>
>>> It probably depends on the width of the entries in the store buffer,
>>> if they appear in-order and the alignment of the stores (if they are larger 
>>> than
>>> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
>>> entries than cache line size.
>>>
>>> Given that load bandwith is usually higher than store bandwith it
>>> might make sense to do the store combining in our copying sequence,
>>> like for the 8 byte entry case use sth like
>>>
>>>   movq 0(%eax), %xmm0
>>>   movhps 8(%eax), %xmm0 // or vpinsert
>>>   mov[au]ps %xmm0, 0%(ebx)
>>> ...
>>>
>>> thus do two loads per store and perform the stores in wider
>>> mode?
>>
>> This may be somewhat faster indeed.  I am not sure if store to load
>> forwarding will work for the later half when read again by halves.
>> It would not happen on older CPUs :)
>
> Yes, forwarding larger stores to smaller loads generally works fine
> since forever with the usual restrictions of alignment/size being
> power of two "halves".
>
> The question is of course what to do for 4 byte or smaller elements or
> mixed size elements.  We can do zero-extending loads
> (do we have them for QI, HI mode loads as well?) and
> do shift and or's.  I'm quite sure the CPUs wouldn't like to
> see vpinsert's of different vector mode destinations.  So it
> would be 8 byte stores from GPRs and values built up via
> shift & or.

Like we generate

foo:
.LFB0:
        .cfi_startproc
        movl    4(%rdi), %eax
        movzwl  2(%rdi), %edx
        salq    $16, %rax
        orq     %rdx, %rax
        movzbl  1(%rdi), %edx
        salq    $8, %rax
        orq     %rdx, %rax
        movzbl  (%rdi), %edx
        salq    $8, %rax
        orq     %rdx, %rax
        movq    %rax, (%rsi)
        ret

for

struct x { char e; char f; short c; int i; } a;

void foo (struct x *p, long *q)
{
 *q = (((((((unsigned long)(unsigned int)p->i) << 16)
   | (((unsigned long)(unsigned short)p->c))) << 8)
   | (((unsigned long)(unsigned char)p->f))) << 8)
   | ((unsigned long)(unsigned char)p->e);
}

if you disable the bswap pass.  Doing 4 byte stores in this
case would save some prefixes at least.  I expected the
ORs and shifts to have smaller encodings...

With 4 byte stores we end up with the same size as with
individual loads & stores.

> As said, the important part is that IIRC CPUs can usually
> have more loads in flight than stores.  Esp. Bulldozer
> with the split core was store buffer size limited (but it
> could do merging of store buffer entries IIRC).

Also if we do the stores in smaller chunks we are more
likely hitting the same store-to-load-forwarding issue
elsewhere.  Like in case the destination is memcpy'ed
away.

So the proposed change isn't necessarily a win without
a possible similar regression that it tries to fix.

Whole-program analysis of accesses might allow
marking affected objects.

Richard.

> Richard.
>
>> Honza
>>>
>>> As said a general concern was you not copying padding.  If you
>>> put this into an even more common place you surely will break
>>> stuff, no?
>>>
>>> Richard.
>>>
>>> >
>>> > Martin
>>> >
>>> >
>>> >>
>>> >> Richard.
>>> >>
>>> >> > Martin
>>> >> >
>>> >> >
>>> >> > 2017-10-12  Martin Jambor  <mjam...@suse.cz>
>>> >> >
>>> >> >         PR target/80689
>>> >> >         * tree-sra.h: New file.
>>> >> >         * ipa-prop.h: Moved declaration of build_ref_for_offset to
>>> >> >         tree-sra.h.
>>> >> >         * expr.c: Include params.h and tree-sra.h.
>>> >> >         (emit_move_elementwise): New function.
>>> >> >         (store_expr_with_bounds): Optionally use it.
>>> >> >         * ipa-cp.c: Include tree-sra.h.
>>> >> >         * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
>>> >> >         * config/i386/i386.c (ix86_option_override_internal): Set
>>> >> >         PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
>>> >> >         * tree-sra.c: Include tree-sra.h.
>>> >> >         (scalarizable_type_p): Renamed to
>>> >> >         simple_mix_of_records_and_arrays_p, made public, renamed the
>>> >> >         second parameter to allow_char_arrays.
>>> >> >         (extract_min_max_idx_from_array): New function.
>>> >> >         (completely_scalarize): Moved bits of the function to
>>> >> >         extract_min_max_idx_from_array.
>>> >> >
>>> >> >         testsuite/
>>> >> >         * gcc.target/i386/pr80689-1.c: New test.
>>> >> > ---
>>> >> >  gcc/config/i386/i386.c                    |   4 ++
>>> >> >  gcc/expr.c                                | 103 
>>> >> > ++++++++++++++++++++++++++++--
>>> >> >  gcc/ipa-cp.c                              |   1 +
>>> >> >  gcc/ipa-prop.h                            |   4 --
>>> >> >  gcc/params.def                            |   6 ++
>>> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++++++++++
>>> >> >  gcc/tree-sra.c                            |  86 
>>> >> > +++++++++++++++----------
>>> >> >  gcc/tree-sra.h                            |  33 ++++++++++
>>> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
>>> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> >  create mode 100644 gcc/tree-sra.h
>>> >> >
>>> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> >> > index 1ee8351c21f..87f602e7ead 100644
>>> >> > --- a/gcc/config/i386/i386.c
>>> >> > +++ b/gcc/config/i386/i386.c
>>> >> > @@ -6511,6 +6511,10 @@ ix86_option_override_internal (bool main_args_p,
>>> >> >                          ix86_tune_cost->l2_cache_size,
>>> >> >                          opts->x_param_values,
>>> >> >                          opts_set->x_param_values);
>>> >> > +  maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>>> >> > +                        35,
>>> >> > +                        opts->x_param_values,
>>> >> > +                        opts_set->x_param_values);
>>> >> >
>>> >> >    /* Enable sw prefetching at -O3 for CPUS that prefetching is 
>>> >> > helpful.  */
>>> >> >    if (opts->x_flag_prefetch_loop_arrays < 0
>>> >> > diff --git a/gcc/expr.c b/gcc/expr.c
>>> >> > index 134ee731c29..dff24e7f166 100644
>>> >> > --- a/gcc/expr.c
>>> >> > +++ b/gcc/expr.c
>>> >> > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3.  If not see
>>> >> >  #include "tree-chkp.h"
>>> >> >  #include "rtl-chkp.h"
>>> >> >  #include "ccmp.h"
>>> >> > -
>>> >> > +#include "params.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> >  /* If this is nonzero, we do not bother generating VOLATILE
>>> >> >     around volatile memory references, and we are willing to
>>> >> > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from)
>>> >> >    return maybe_expand_insn (code, 2, ops);
>>> >> >  }
>>> >> >
>>> >> > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET 
>>> >> > to TARGET
>>> >> > +   plus OFFSET, but do so element-wise and/or field-wise for each 
>>> >> > record and
>>> >> > +   array within TYPE.  TYPE must either be a register type or an 
>>> >> > aggregate
>>> >> > +   complying with scalarizable_type_p.
>>> >> > +
>>> >> > +   If CALL_PARAM_P is nonzero, this is a store into a call param on 
>>> >> > the
>>> >> > +   stack, and block moves may need to be treated specially.  */
>>> >> > +
>>> >> > +static void
>>> >> > +emit_move_elementwise (tree type, rtx target, rtx source, 
>>> >> > HOST_WIDE_INT offset,
>>> >> > +                      int call_param_p)
>>> >> > +{
>>> >> > +  switch (TREE_CODE (type))
>>> >> > +    {
>>> >> > +    case RECORD_TYPE:
>>> >> > +      for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
>>> >> > +       if (TREE_CODE (fld) == FIELD_DECL)
>>> >> > +         {
>>> >> > +           HOST_WIDE_INT fld_offset = offset + int_bit_position (fld);
>>> >> > +           tree ft = TREE_TYPE (fld);
>>> >> > +           emit_move_elementwise (ft, target, source, fld_offset,
>>> >> > +                                  call_param_p);
>>> >> > +         }
>>> >> > +      break;
>>> >> > +
>>> >> > +    case ARRAY_TYPE:
>>> >> > +      {
>>> >> > +       tree elem_type = TREE_TYPE (type);
>>> >> > +       HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type));
>>> >> > +       gcc_assert (el_size > 0);
>>> >> > +
>>> >> > +       offset_int idx, max;
>>> >> > +       /* Skip (some) zero-length arrays; others have MAXIDX == 
>>> >> > MINIDX - 1.  */
>>> >> > +       if (extract_min_max_idx_from_array (type, &idx, &max))
>>> >> > +         {
>>> >> > +           HOST_WIDE_INT el_offset = offset;
>>> >> > +           for (; idx <= max; ++idx)
>>> >> > +             {
>>> >> > +               emit_move_elementwise (elem_type, target, source, 
>>> >> > el_offset,
>>> >> > +                                      call_param_p);
>>> >> > +               el_offset += el_size;
>>> >> > +             }
>>> >> > +         }
>>> >> > +      }
>>> >> > +      break;
>>> >> > +    default:
>>> >> > +      machine_mode mode = TYPE_MODE (type);
>>> >> > +
>>> >> > +      rtx ntgt = adjust_address (target, mode, offset / 
>>> >> > BITS_PER_UNIT);
>>> >> > +      rtx nsrc = adjust_address (source, mode, offset / 
>>> >> > BITS_PER_UNIT);
>>> >> > +
>>> >> > +      /* TODO: Figure out whether the following is actually 
>>> >> > necessary.  */
>>> >> > +      if (target == ntgt)
>>> >> > +       ntgt = copy_rtx (target);
>>> >> > +      if (source == nsrc)
>>> >> > +       nsrc = copy_rtx (source);
>>> >> > +
>>> >> > +      gcc_assert (mode != VOIDmode);
>>> >> > +      if (mode != BLKmode)
>>> >> > +       emit_move_insn (ntgt, nsrc);
>>> >> > +      else
>>> >> > +       {
>>> >> > +         /* For example vector gimple registers can end up here.  */
>>> >> > +         rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX,
>>> >> > +                                 TYPE_MODE (sizetype), EXPAND_NORMAL);
>>> >> > +         emit_block_move (ntgt, nsrc, size,
>>> >> > +                          (call_param_p
>>> >> > +                           ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > +       }
>>> >> > +      break;
>>> >> > +    }
>>> >> > +  return;
>>> >> > +}
>>> >> > +
>>> >> >  /* Generate code for computing expression EXP,
>>> >> >     and storing the value into TARGET.
>>> >> >
>>> >> > @@ -5713,9 +5788,29 @@ store_expr_with_bounds (tree exp, rtx target, 
>>> >> > int call_param_p,
>>> >> >         emit_group_store (target, temp, TREE_TYPE (exp),
>>> >> >                           int_size_in_bytes (TREE_TYPE (exp)));
>>> >> >        else if (GET_MODE (temp) == BLKmode)
>>> >> > -       emit_block_move (target, temp, expr_size (exp),
>>> >> > -                        (call_param_p
>>> >> > -                         ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > +       {
>>> >> > +         /* Copying smallish BLKmode structures with emit_block_move 
>>> >> > and thus
>>> >> > +            by-pieces can result in store-to-load stalls.  So copy 
>>> >> > some simple
>>> >> > +            small aggregates element or field-wise.  */
>>> >> > +         if (GET_MODE (target) == BLKmode
>>> >> > +             && AGGREGATE_TYPE_P (TREE_TYPE (exp))
>>> >> > +             && !TREE_ADDRESSABLE (TREE_TYPE (exp))
>>> >> > +             && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp)))
>>> >> > +             && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp)))
>>> >> > +                 <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY)
>>> >> > +                     * BITS_PER_UNIT))
>>> >> > +             && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), 
>>> >> > false))
>>> >> > +           {
>>> >> > +             /* FIXME:  Can this happen?  What would it mean?  */
>>> >> > +             gcc_assert (!reverse);
>>> >> > +             emit_move_elementwise (TREE_TYPE (exp), target, temp, 0,
>>> >> > +                                    call_param_p);
>>> >> > +           }
>>> >> > +         else
>>> >> > +           emit_block_move (target, temp, expr_size (exp),
>>> >> > +                            (call_param_p
>>> >> > +                             ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>>> >> > +       }
>>> >> >        /* If we emit a nontemporal store, there is nothing else to do. 
>>> >> >  */
>>> >> >        else if (nontemporal && emit_storent_insn (target, temp))
>>> >> >         ;
>>> >> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>>> >> > index 6b3d8d7364c..7d6019bbd30 100644
>>> >> > --- a/gcc/ipa-cp.c
>>> >> > +++ b/gcc/ipa-cp.c
>>> >> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
>>> >> >  #include "tree-ssa-ccp.h"
>>> >> >  #include "stringpool.h"
>>> >> >  #include "attribs.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> >  template <typename valtype> class ipcp_value;
>>> >> >
>>> >> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>>> >> > index fa5bed49ee0..2313cc884ed 100644
>>> >> > --- a/gcc/ipa-prop.h
>>> >> > +++ b/gcc/ipa-prop.h
>>> >> > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate 
>>> >> > (tree **, bool *,
>>> >> >  void ipa_release_body_info (struct ipa_func_body_info *);
>>> >> >  tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
>>> >> >
>>> >> > -/* From tree-sra.c:  */
>>> >> > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, 
>>> >> > tree,
>>> >> > -                          gimple_stmt_iterator *, bool);
>>> >> > -
>>> >> >  /* In ipa-cp.c  */
>>> >> >  void ipa_cp_c_finalize (void);
>>> >> >
>>> >> > diff --git a/gcc/params.def b/gcc/params.def
>>> >> > index e55afc28053..5e19f1414a0 100644
>>> >> > --- a/gcc/params.def
>>> >> > +++ b/gcc/params.def
>>> >> > @@ -1294,6 +1294,12 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK,
>>> >> >           "Enable loop epilogue vectorization using smaller vector 
>>> >> > size.",
>>> >> >           0, 0, 1)
>>> >> >
>>> >> > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY,
>>> >> > +         "max-size-for-elementwise-copy",
>>> >> > +         "Maximum size in bytes of a structure or array to by 
>>> >> > considered for "
>>> >> > +         "copying by its individual fields or elements",
>>> >> > +         0, 0, 512)
>>> >> > +
>>> >> >  /*
>>> >> >
>>> >> >  Local variables:
>>> >> > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c 
>>> >> > b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > new file mode 100644
>>> >> > index 00000000000..4156d4fba45
>>> >> > --- /dev/null
>>> >> > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c
>>> >> > @@ -0,0 +1,38 @@
>>> >> > +/* { dg-do compile } */
>>> >> > +/* { dg-options "-O2" } */
>>> >> > +
>>> >> > +typedef struct st1
>>> >> > +{
>>> >> > +        long unsigned int a,b;
>>> >> > +        long int c,d;
>>> >> > +}R;
>>> >> > +
>>> >> > +typedef struct st2
>>> >> > +{
>>> >> > +        int  t;
>>> >> > +        R  reg;
>>> >> > +}N;
>>> >> > +
>>> >> > +void Set (const R *region,  N *n_info );
>>> >> > +
>>> >> > +void test(N  *n_obj ,const long unsigned int a, const long unsigned 
>>> >> > int b,  const long int c,const long int d)
>>> >> > +{
>>> >> > +        R reg;
>>> >> > +
>>> >> > +        reg.a=a;
>>> >> > +        reg.b=b;
>>> >> > +        reg.c=c;
>>> >> > +        reg.d=d;
>>> >> > +        Set (&reg, n_obj);
>>> >> > +
>>> >> > +}
>>> >> > +
>>> >> > +void Set (const R *reg,  N *n_obj )
>>> >> > +{
>>> >> > +        n_obj->reg=(*reg);
>>> >> > +}
>>> >> > +
>>> >> > +
>>> >> > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */
>>> >> > +/* { dg-final { scan-assembler-not "movdqu" } } */
>>> >> > +/* { dg-final { scan-assembler-not "movups" } } */
>>> >> > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>>> >> > index bac593951e7..ade97964205 100644
>>> >> > --- a/gcc/tree-sra.c
>>> >> > +++ b/gcc/tree-sra.c
>>> >> > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
>>> >> >  #include "ipa-fnsummary.h"
>>> >> >  #include "ipa-utils.h"
>>> >> >  #include "builtins.h"
>>> >> > +#include "tree-sra.h"
>>> >> >
>>> >> >  /* Enumeration of all aggregate reductions we can do.  */
>>> >> >  enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
>>> >> > @@ -952,14 +953,14 @@ create_access (tree expr, gimple *stmt, bool 
>>> >> > write)
>>> >> >  }
>>> >> >
>>> >> >
>>> >> > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or 
>>> >> > fixed-length
>>> >> > -   ARRAY_TYPE with fields that are either of gimple register types 
>>> >> > (excluding
>>> >> > -   bit-fields) or (recursively) scalarizable types.  CONST_DECL must 
>>> >> > be true if
>>> >> > -   we are considering a decl from constant pool.  If it is false, 
>>> >> > char arrays
>>> >> > -   will be refused.  */
>>> >> > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length 
>>> >> > ARRAY_TYPE with
>>> >> > +   fields/elements that are not bit-fields and are either register 
>>> >> > types or
>>> >> > +   recursively comply with simple_mix_of_records_and_arrays_p.  
>>> >> > Furthermore, if
>>> >> > +   ALLOW_CHAR_ARRAYS is false, the function will return false also if 
>>> >> > TYPE
>>> >> > +   contains an array of elements that only have one byte.  */
>>> >> >
>>> >> > -static bool
>>> >> > -scalarizable_type_p (tree type, bool const_decl)
>>> >> > +bool
>>> >> > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays)
>>> >> >  {
>>> >> >    gcc_assert (!is_gimple_reg_type (type));
>>> >> >    if (type_contains_placeholder_p (type))
>>> >> > @@ -977,7 +978,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> >             return false;
>>> >> >
>>> >> >           if (!is_gimple_reg_type (ft)
>>> >> > -             && !scalarizable_type_p (ft, const_decl))
>>> >> > +             && !simple_mix_of_records_and_arrays_p (ft, 
>>> >> > allow_char_arrays))
>>> >> >             return false;
>>> >> >         }
>>> >> >
>>> >> > @@ -986,7 +987,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> >    case ARRAY_TYPE:
>>> >> >      {
>>> >> >        HOST_WIDE_INT min_elem_size;
>>> >> > -      if (const_decl)
>>> >> > +      if (allow_char_arrays)
>>> >> >         min_elem_size = 0;
>>> >> >        else
>>> >> >         min_elem_size = BITS_PER_UNIT;
>>> >> > @@ -1008,7 +1009,7 @@ scalarizable_type_p (tree type, bool const_decl)
>>> >> >
>>> >> >        tree elem = TREE_TYPE (type);
>>> >> >        if (!is_gimple_reg_type (elem)
>>> >> > -         && !scalarizable_type_p (elem, const_decl))
>>> >> > +         && !simple_mix_of_records_and_arrays_p (elem, 
>>> >> > allow_char_arrays))
>>> >> >         return false;
>>> >> >        return true;
>>> >> >      }
>>> >> > @@ -1017,10 +1018,38 @@ scalarizable_type_p (tree type, bool 
>>> >> > const_decl)
>>> >> >    }
>>> >> >  }
>>> >> >
>>> >> > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, 
>>> >> > tree, tree);
>>> >> > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, 
>>> >> > tree,
>>> >> > +                           tree);
>>> >> > +
>>> >> > +/* For a given array TYPE, return false if its domain does not have 
>>> >> > any maximum
>>> >> > +   value.  Otherwise calculate MIN and MAX indices of the first and 
>>> >> > the last
>>> >> > +   element.  */
>>> >> > +
>>> >> > +bool
>>> >> > +extract_min_max_idx_from_array (tree type, offset_int *min, 
>>> >> > offset_int *max)
>>> >> > +{
>>> >> > +  tree domain = TYPE_DOMAIN (type);
>>> >> > +  tree minidx = TYPE_MIN_VALUE (domain);
>>> >> > +  gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>>> >> > +  tree maxidx = TYPE_MAX_VALUE (domain);
>>> >> > +  if (!maxidx)
>>> >> > +    return false;
>>> >> > +  gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>>> >> > +
>>> >> > +  /* MINIDX and MAXIDX are inclusive, and must be interpreted in
>>> >> > +     DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
>>> >> > +  *min = wi::to_offset (minidx);
>>> >> > +  *max = wi::to_offset (maxidx);
>>> >> > +  if (!TYPE_UNSIGNED (domain))
>>> >> > +    {
>>> >> > +      *min = wi::sext (*min, TYPE_PRECISION (domain));
>>> >> > +      *max = wi::sext (*max, TYPE_PRECISION (domain));
>>> >> > +    }
>>> >> > +  return true;
>>> >> > +}
>>> >> >
>>> >> >  /* Create total_scalarization accesses for all scalar fields of a 
>>> >> > member
>>> >> > -   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
>>> >> > +   of type DECL_TYPE conforming to 
>>> >> > simple_mix_of_records_and_arrays_p.  BASE
>>> >> >     must be the top-most VAR_DECL representing the variable; within 
>>> >> > that,
>>> >> >     OFFSET locates the member and REF must be the memory reference 
>>> >> > expression for
>>> >> >     the member.  */
>>> >> > @@ -1047,27 +1076,14 @@ completely_scalarize (tree base, tree 
>>> >> > decl_type, HOST_WIDE_INT offset, tree ref)
>>> >> >        {
>>> >> >         tree elemtype = TREE_TYPE (decl_type);
>>> >> >         tree elem_size = TYPE_SIZE (elemtype);
>>> >> > -       gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
>>> >> >         HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
>>> >> >         gcc_assert (el_size > 0);
>>> >> >
>>> >> > -       tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
>>> >> > -       gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
>>> >> > -       tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
>>> >> > +       offset_int idx, max;
>>> >> >         /* Skip (some) zero-length arrays; others have MAXIDX == 
>>> >> > MINIDX - 1.  */
>>> >> > -       if (maxidx)
>>> >> > +       if (extract_min_max_idx_from_array (decl_type, &idx, &max))
>>> >> >           {
>>> >> > -           gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
>>> >> >             tree domain = TYPE_DOMAIN (decl_type);
>>> >> > -           /* MINIDX and MAXIDX are inclusive, and must be 
>>> >> > interpreted in
>>> >> > -              DOMAIN (e.g. signed int, whereas min/max may be 
>>> >> > size_int).  */
>>> >> > -           offset_int idx = wi::to_offset (minidx);
>>> >> > -           offset_int max = wi::to_offset (maxidx);
>>> >> > -           if (!TYPE_UNSIGNED (domain))
>>> >> > -             {
>>> >> > -               idx = wi::sext (idx, TYPE_PRECISION (domain));
>>> >> > -               max = wi::sext (max, TYPE_PRECISION (domain));
>>> >> > -             }
>>> >> >             for (int el_off = offset; idx <= max; ++idx)
>>> >> >               {
>>> >> >                 tree nref = build4 (ARRAY_REF, elemtype,
>>> >> > @@ -1088,10 +1104,10 @@ completely_scalarize (tree base, tree 
>>> >> > decl_type, HOST_WIDE_INT offset, tree ref)
>>> >> >  }
>>> >> >
>>> >> >  /* Create total_scalarization accesses for a member of type TYPE, 
>>> >> > which must
>>> >> > -   satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE 
>>> >> > must be the
>>> >> > -   top-most VAR_DECL representing the variable; within that, POS and 
>>> >> > SIZE locate
>>> >> > -   the member, REVERSE gives its torage order. and REF must be the 
>>> >> > reference
>>> >> > -   expression for it.  */
>>> >> > +   satisfy either is_gimple_reg_type or 
>>> >> > simple_mix_of_records_and_arrays_p.
>>> >> > +   BASE must be the top-most VAR_DECL representing the variable; 
>>> >> > within that,
>>> >> > +   POS and SIZE locate the member, REVERSE gives its torage order. 
>>> >> > and REF must
>>> >> > +   be the reference expression for it.  */
>>> >> >
>>> >> >  static void
>>> >> >  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, 
>>> >> > bool reverse,
>>> >> > @@ -1111,7 +1127,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, 
>>> >> > HOST_WIDE_INT size, bool reverse,
>>> >> >  }
>>> >> >
>>> >> >  /* Create a total_scalarization access for VAR as a whole.  VAR must 
>>> >> > be of a
>>> >> > -   RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p.  */
>>> >> > +   RECORD_TYPE or ARRAY_TYPE conforming to
>>> >> > +   simple_mix_of_records_and_arrays_p.  */
>>> >> >
>>> >> >  static void
>>> >> >  create_total_scalarization_access (tree var)
>>> >> > @@ -2803,8 +2820,9 @@ analyze_all_variable_accesses (void)
>>> >> >        {
>>> >> >         tree var = candidate (i);
>>> >> >
>>> >> > -       if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
>>> >> > -                                               constant_decl_p (var)))
>>> >> > +       if (VAR_P (var)
>>> >> > +           && simple_mix_of_records_and_arrays_p (TREE_TYPE (var),
>>> >> > +                                                  constant_decl_p 
>>> >> > (var)))
>>> >> >           {
>>> >> >             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>>> >> >                 <= max_scalarization_size)
>>> >> > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
>>> >> > new file mode 100644
>>> >> > index 00000000000..dc901385994
>>> >> > --- /dev/null
>>> >> > +++ b/gcc/tree-sra.h
>>> >> > @@ -0,0 +1,33 @@
>>> >> > +/* tree-sra.h - Run-time parameters.
>>> >> > +   Copyright (C) 2017 Free Software Foundation, Inc.
>>> >> > +
>>> >> > +This file is part of GCC.
>>> >> > +
>>> >> > +GCC is free software; you can redistribute it and/or modify it under
>>> >> > +the terms of the GNU General Public License as published by the Free
>>> >> > +Software Foundation; either version 3, or (at your option) any later
>>> >> > +version.
>>> >> > +
>>> >> > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> >> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> >> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> >> > +for more details.
>>> >> > +
>>> >> > +You should have received a copy of the GNU General Public License
>>> >> > +along with GCC; see the file COPYING3.  If not see
>>> >> > +<http://www.gnu.org/licenses/>.  */
>>> >> > +
>>> >> > +#ifndef TREE_SRA_H
>>> >> > +#define TREE_SRA_H
>>> >> > +
>>> >> > +
>>> >> > +bool simple_mix_of_records_and_arrays_p (tree type, bool 
>>> >> > allow_char_arrays);
>>> >> > +bool extract_min_max_idx_from_array (tree type, offset_int *idx,
>>> >> > +                                    offset_int *max);
>>> >> > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT 
>>> >> > offset,
>>> >> > +                          bool reverse, tree exp_type,
>>> >> > +                          gimple_stmt_iterator *gsi, bool 
>>> >> > insert_after);
>>> >> > +
>>> >> > +
>>> >> > +
>>> >> > +#endif /* TREE_SRA_H */
>>> >> > --
>>> >> > 2.14.1
>>> >> >

Reply via email to