On Fri, Aug 16, 2024 at 6:05 AM Alexandre Oliva <ol...@adacore.com> wrote:
>
> On Aug 15, 2024, Alexandre Oliva <ol...@adacore.com> wrote:
>
> > I can't quite envision what to check for in a target-independent test.
>
> Got it.  Also dropped some occurrences of CONST_CAST_TREE that I added,
> then changed function signatures but failed to remove them.
>
> Retested on x86_64-linux-gnu.  Ok to install?
>
>
> When small objects containing padding bits (or bytes) are fully
> initialized, we will often store them in registers, and setting
> bitfields and other small fields will attempt to preserve the
> uninitialized padding bits, which tends to be expensive.
> Zero-initializing registers, OTOH, tends to be cheap.
>
> So, if we're optimizing, zero-initialize such small padded objects
> even if that's not needed for correctness.  We can't zero-initialize
> all such padding objects, though: if there's no padding whatsoever,
> and all fields are initialized with nonzero, the zero initialization
> would be flagged as dead.  That's why we introduce machinery to detect
> whether objects have padding bits.  I considered distinguishing
> between bitfields, units and larger padding elements, but I didn't
> pursue that distinction.
>
> Since the object's zero-initialization subsumes fields'
> zero-initialization, the empty string test in builtin-snprintf-6.c's
> test_assign_aggregate would regress without the addition of
> native_encode_constructor.

I hope Jakub can look at the has_padding part, I note that we
have native_encode_initializer which handles CONSTRUCTORs.
For GIMPLE you'd only ever see empty ones, so I think handling
those in native_encode_expr is fine - a comment that one shouldn't
try to improve handling there would be nice.

Otherwise looks OK to me, one question and two minor adjustments
below.

Richard.

>
> for  gcc/ChangeLog
>
>         * expr.cc (categorize_ctor_elements_1): Change p_complete to
>         int, to distinguish complete initialization in presence or
>         absence of uninitialized padding bits.
>         (categorize_ctor_elements): Likewise.  Adjust all callers...
>         * expr.h (categorize_ctor_elements): ... and declaration.
>         (type_has_padding_at_level_p): New.
>         * gimple-fold.cc (type_has_padding_at_level_p): New.
>         * fold-const.cc (native_encode_constructor): New.
>         (native_encode_expr): Call it.
>         * gimplify.cc (gimplify_init_constructor): Clear small
>         non-addressable non-volatile objects with padding or
>         other uninitialized fields as an optimization.
>
> for  gcc/testsuite/ChangeLog
>
>         * gcc.dg/init-pad-1.c: New.
> ---
>  gcc/expr.cc                       |   20 ++++++++++-----
>  gcc/expr.h                        |    3 +-
>  gcc/fold-const.cc                 |   33 ++++++++++++++++++++++++
>  gcc/gimple-fold.cc                |   50 
> +++++++++++++++++++++++++++++++++++++
>  gcc/gimplify.cc                   |   14 ++++++++++
>  gcc/testsuite/gcc.dg/init-pad-1.c |   18 +++++++++++++
>  6 files changed, 129 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/init-pad-1.c
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 2089c2b86a98a..a701c67b3485d 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -7096,7 +7096,7 @@ count_type_elements (const_tree type, bool for_ctor_p)
>  static bool
>  categorize_ctor_elements_1 (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                             HOST_WIDE_INT *p_unique_nz_elts,
> -                           HOST_WIDE_INT *p_init_elts, bool *p_complete)
> +                           HOST_WIDE_INT *p_init_elts, int *p_complete)
>  {
>    unsigned HOST_WIDE_INT idx;
>    HOST_WIDE_INT nz_elts, unique_nz_elts, init_elts, num_fields;
> @@ -7218,7 +7218,10 @@ categorize_ctor_elements_1 (const_tree ctor, 
> HOST_WIDE_INT *p_nz_elts,
>
>    if (*p_complete && !complete_ctor_at_level_p (TREE_TYPE (ctor),
>                                                 num_fields, elt_type))
> -    *p_complete = false;
> +    *p_complete = 0;
> +  else if (*p_complete > 0
> +          && type_has_padding_at_level_p (TREE_TYPE (ctor)))
> +    *p_complete = -1;
>
>    *p_nz_elts += nz_elts;
>    *p_unique_nz_elts += unique_nz_elts;
> @@ -7239,7 +7242,10 @@ categorize_ctor_elements_1 (const_tree ctor, 
> HOST_WIDE_INT *p_nz_elts,
>       and place it in *P_ELT_COUNT.
>     * whether the constructor is complete -- in the sense that every
>       meaningful byte is explicitly given a value --
> -     and place it in *P_COMPLETE.
> +     and place it in *P_COMPLETE:
> +     -  0 if any field is missing
> +     -  1 if all fields are initialized, and there's no padding
> +     - -1 if all fields are initialized, but there's padding
>
>     Return whether or not CTOR is a valid static constant initializer, the 
> same
>     as "initializer_constant_valid_p (CTOR, TREE_TYPE (CTOR)) != 0".  */
> @@ -7247,12 +7253,12 @@ categorize_ctor_elements_1 (const_tree ctor, 
> HOST_WIDE_INT *p_nz_elts,
>  bool
>  categorize_ctor_elements (const_tree ctor, HOST_WIDE_INT *p_nz_elts,
>                           HOST_WIDE_INT *p_unique_nz_elts,
> -                         HOST_WIDE_INT *p_init_elts, bool *p_complete)
> +                         HOST_WIDE_INT *p_init_elts, int *p_complete)
>  {
>    *p_nz_elts = 0;
>    *p_unique_nz_elts = 0;
>    *p_init_elts = 0;
> -  *p_complete = true;
> +  *p_complete = 1;
>
>    return categorize_ctor_elements_1 (ctor, p_nz_elts, p_unique_nz_elts,
>                                      p_init_elts, p_complete);
> @@ -7313,7 +7319,7 @@ mostly_zeros_p (const_tree exp)
>    if (TREE_CODE (exp) == CONSTRUCTOR)
>      {
>        HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> -      bool complete_p;
> +      int complete_p;
>
>        categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
>                                 &complete_p);
> @@ -7331,7 +7337,7 @@ all_zeros_p (const_tree exp)
>    if (TREE_CODE (exp) == CONSTRUCTOR)
>      {
>        HOST_WIDE_INT nz_elts, unz_elts, init_elts;
> -      bool complete_p;
> +      int complete_p;
>
>        categorize_ctor_elements (exp, &nz_elts, &unz_elts, &init_elts,
>                                 &complete_p);
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 533ae0af3871a..04782b15f192c 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -361,7 +361,8 @@ extern unsigned HOST_WIDE_INT highest_pow2_factor 
> (const_tree);
>
>  extern bool categorize_ctor_elements (const_tree, HOST_WIDE_INT *,
>                                       HOST_WIDE_INT *, HOST_WIDE_INT *,
> -                                     bool *);
> +                                     int *);
> +extern bool type_has_padding_at_level_p (tree);
>  extern bool immediate_const_ctor_p (const_tree, unsigned int words = 1);
>  extern void store_constructor (tree, rtx, int, poly_int64, bool);
>  extern HOST_WIDE_INT int_expr_size (const_tree exp);
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 8908e7381e72c..ff2c8f35303b8 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8193,6 +8193,36 @@ native_encode_string (const_tree expr, unsigned char 
> *ptr, int len, int off)
>    return len;
>  }
>
> +/* Subroutine of native_encode_expr.  Encode the CONSTRUCTOR
> +   specified by EXPR into the buffer PTR of length LEN bytes.
> +   Return the number of bytes placed in the buffer, or zero
> +   upon failure.  */
> +
> +static int
> +native_encode_constructor (const_tree expr, unsigned char *ptr, int len, int 
> off)
> +{
> +  /* We are only concerned with zero-initialization constructors here.  */
> +  if (CONSTRUCTOR_NELTS (expr))
> +    return 0;
> +
> +  /* Wide-char strings are encoded in target byte-order so native
> +     encoding them is trivial.  */
> +  if (BITS_PER_UNIT != CHAR_BIT
> +      || !tree_fits_shwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))))
> +    return 0;
> +
> +  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE 
> (expr)));
> +  if ((off == -1 && total_bytes > len) || off >= total_bytes)
> +    return 0;
> +  if (off == -1)
> +    off = 0;
> +  len = MIN (total_bytes - off, len);
> +  if (ptr == NULL)
> +    /* Dry run.  */;
> +  else
> +    memset (ptr, 0, len);
> +  return len;
> +}
>
>  /* Subroutine of fold_view_convert_expr.  Encode the INTEGER_CST, REAL_CST,
>     FIXED_CST, COMPLEX_CST, STRING_CST, or VECTOR_CST specified by EXPR into
> @@ -8229,6 +8259,9 @@ native_encode_expr (const_tree expr, unsigned char 
> *ptr, int len, int off)
>      case STRING_CST:
>        return native_encode_string (expr, ptr, len, off);
>
> +    case CONSTRUCTOR:
> +      return native_encode_constructor (expr, ptr, len, off);
> +
>      default:
>        return 0;
>      }
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 18d7a6b176db7..f955a006ebe4e 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -4661,6 +4661,56 @@ clear_padding_type_may_have_padding_p (tree type)
>      }
>  }
>
> +/* Return true if TYPE has padding bits aside from those in fields,
> +   elements, etc.  */
> +
> +bool
> +type_has_padding_at_level_p (tree type)
> +{
> +  switch (TREE_CODE (type))
> +    {
> +    case RECORD_TYPE:
> +      {
> +       tree bitpos = size_zero_node;
> +       /* Expect fields to be sorted by bit position.  */
> +       for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> +         if (TREE_CODE (f) == FIELD_DECL)
> +           {
> +             if (DECL_PADDING_P (f))
> +               return true;
> +             tree pos = bit_position (f);
> +             if (simple_cst_equal (bitpos, pos) != 1)
> +               return true;
> +             if (!DECL_SIZE (f))
> +               return true;
> +             bitpos = int_const_binop (PLUS_EXPR, pos, DECL_SIZE (f));
> +           }
> +       if (simple_cst_equal (bitpos, TYPE_SIZE (type)) != 1)
> +         return true;
> +       return false;
> +      }
> +    case UNION_TYPE:
> +      /* If any of the fields is smaller than the whole, there is padding.  
> */
> +      for (tree f = TYPE_FIELDS (type); f; f = DECL_CHAIN (f))
> +       if (TREE_CODE (f) == FIELD_DECL)
> +         if (simple_cst_equal (TYPE_SIZE (TREE_TYPE (f)),
> +                               TREE_TYPE (type)) != 1)
> +           return true;
> +      return false;
> +    case ARRAY_TYPE:
> +    case COMPLEX_TYPE:
> +    case VECTOR_TYPE:
> +      /* No recursing here, no padding at this level.  */
> +      return false;
> +    case REAL_TYPE:
> +      return clear_padding_real_needs_padding_p (type);
> +    case BITINT_TYPE:
> +      return clear_padding_bitint_needs_padding_p (type);
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Emit a runtime loop:
>     for (; buf.base != end; buf.base += sz)
>       __builtin_clear_padding (buf.base);  */
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 26a216e151d6f..354a6f53a7216 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -5564,7 +5564,8 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>         struct gimplify_init_ctor_preeval_data preeval_data;
>         HOST_WIDE_INT num_ctor_elements, num_nonzero_elements;
>         HOST_WIDE_INT num_unique_nonzero_elements;
> -       bool complete_p, valid_const_initializer;
> +       int complete_p;
> +       bool valid_const_initializer;
>
>         /* Aggregate types must lower constructors to initialization of
>            individual elements.  The exception is that a CONSTRUCTOR node
> @@ -5668,6 +5669,17 @@ gimplify_init_constructor (tree *expr_p, gimple_seq 
> *pre_p, gimple_seq *post_p,
>           /* If a single access to the target must be ensured and all elements
>              are zero, then it's optimal to clear whatever their number.  */
>           cleared = true;
> +       /* If the object is small enough to go in registers, and it's
> +          not required to be constructed in memory, clear it first.
> +          That will avoid wasting cycles preserving any padding bits
> +          that might be there, and if there aren't any, the compiler
> +          is smart enough to optimize the clearing out.  */
> +       else if (complete_p <= 0

I wonder if for complete_p == 1 zeroing first also makes a difference?
 I also wonder
whether the extra zeroing confuses SRA - try say struct s { short a;
int b; } and see
if SRA is still happily considering it.

> +                && !TREE_ADDRESSABLE (ctor) && !TREE_THIS_VOLATILE (object)

2nd && to the next line

> +                && (TYPE_MODE (type) != BLKmode || TYPE_NO_FORCE_BLK (type))
> +                && (opt_for_fn (cfun->decl, optimize)
> +                    || opt_for_fn (cfun->decl, optimize_size)))

that's simplified as && optimize

> +         cleared = true;
>         else
>           cleared = false;
>
> diff --git a/gcc/testsuite/gcc.dg/init-pad-1.c 
> b/gcc/testsuite/gcc.dg/init-pad-1.c
> new file mode 100644
> index 0000000000000..801f93813e3ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/init-pad-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Og -fdump-tree-gimple" } */
> +
> +struct s {
> +  short a : 3;
> +  short b : 3;
> +  char  c;
> +};
> +
> +extern void g(struct s *);
> +
> +void f() {
> +  struct s x = { 0, 0, 1 };
> +  g (&x);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "= {};" 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-not "= 0;" "gimple" } } */
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Reply via email to