Hi,

PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
assignment coming from a C struct assignment and one a representing a
folded memcpy, can kill the latter and keep in place only the former,
which does not copy padding - at least when SRA decides to totally
scalarize a least one of the aggregates (when not doing total
scalarization, SRA cares about padding)

SRA would not totally scalarize an aggregate if it saw that it takes
part in a gimple assignment which is a folded memcpy (see how
type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
because of the DSE decisions.

I was asked to modify SRA to take padding into account - and to copy
it around - when totally scalarizing, which is what the patch below
does.  I am not very happy about this, I am afraid it will lead to
performance regressions, but this has all been discussed (see
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html).

I tried to alleviate the problem by not only inserting accesses for
padding but also by enlarging existing accesses whenever possible to
extend over padding - the extended access would get copied when in the
original IL an aggregate copy is replaced with SRA copies and a
BIT_FIELD_REF would be generated to replace a scalar access to a part
of the aggregate in the original IL.  I have made it work in the sense
that the patch passed bootstrap and testing (see the git branch
refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127
if you are interested), but this approach meant that each such
extended replacement which was written to (so all of them) could
potentially become only partially assigned to and so had to be marked
as addressable and could not become gimple register, meaning that
total scalarizatio would be creating addressable variables.  Detecting
such cases is not easy, it would mean introducing yet another type of
write flag (written to exactly this access) and propagate that flag
across assignment sub-accesses.

So I decided that was not the way to go and instead only extended
integer accesses, and that is what the atcg below does.  Like in the
previous attempt, whatever padding could not be covered by extending
an access would be covered by extra artificial accesses.  As you can
see, it adds a little complexity to various places of the pass which
are already not trivial, but hopefully it is manageable.

Bootstrapped and tested on x86_64-linux, I'll curious about the
feedback.

Thanks,

Martin


2020-01-27  Martin Jambor  <mjam...@suse.cz>

        PR tree-optimization/92486
        * tree-sra.c: Include langhooks.h
        (struct access): New fields reg_size and reg_acc_type.
        (dump_access): Print new fields.
        (acc_size): New function.
        (find_access_in_subtree): Use it, new parameter reg.
        (get_var_base_offset_size_access): Pass true to
        find_access_in_subtree.
        (create_access_1): Initialize reg_size.
        (create_artificial_child_access): Likewise.
        (create_total_scalarization_access): Likewise.
        (build_ref_for_model): Do not use model expr if reg_acc_size is
        non-NULL.
        (get_reg_access_replacement): New function.
        (verify_sra_access_forest): Adjust verification for presence of
        extended accesses covering padding.
        (analyze_access_subtree): Undo extension over padding if total
        scalarization failed, set grp_partial_lhs if we are going to introduce
        a partial store to the new replacement, do not ignore holes when
        totally scalarizing.
        (sra_type_for_size): New function.
        (total_scalarization_fill_padding): Likewise.
        (total_should_skip_creating_access): Use it.
        (totally_scalarize_subtree): Likewise.
        (sra_modify_expr): Use get_reg_access_replacement instead of
        get_access_replacement, adjust for reg_acc_type.
        (sra_modify_assign): Likewise.
        (load_assign_lhs_subreplacements): Pass false to
        find_access_in_subtree.

        testsuite/
        * gcc.dg/tree-ssa/pr92486.c: New test.
---
 gcc/ChangeLog                           |  32 +++
 gcc/testsuite/ChangeLog                 |   5 +
 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c |  38 +++
 gcc/tree-sra.c                          | 368 +++++++++++++++++++++---
 4 files changed, 396 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3541c6638f9..34c60e6f2a3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,35 @@
+2020-01-26  Martin Jambor  <mjam...@suse.cz>
+
+       PR tree-optimization/92486
+       * tree-sra.c: Include langhooks.h
+       (struct access): New fields reg_size and reg_acc_type.
+       (dump_access): Print new fields.
+       (acc_size): New function.
+       (find_access_in_subtree): Use it, new parameter reg.
+       (get_var_base_offset_size_access): Pass true to
+       find_access_in_subtree.
+       (create_access_1): Initialize reg_size.
+       (create_artificial_child_access): Likewise.
+       (create_total_scalarization_access): Likewise.
+       (build_ref_for_model): Do not use model expr if reg_acc_size is
+       non-NULL.
+       (get_reg_access_replacement): New function.
+       (verify_sra_access_forest): Adjust verification for presence of
+       extended accesses covering padding.
+       (analyze_access_subtree): Undo extension over padding if total
+       scalarization failed, set grp_partial_lhs if we are going to introduce
+       a partial store to the new replacement, do not ignore holes when
+       totally scalarizing.
+       (sra_type_for_size): New function.
+       (total_scalarization_fill_padding): Likewise.
+       (total_should_skip_creating_access): Use it.
+       (totally_scalarize_subtree): Likewise.
+       (sra_modify_expr): Use get_reg_access_replacement instead of
+       get_access_replacement, adjust for reg_acc_type.
+       (sra_modify_assign): Likewise.
+       (load_assign_lhs_subreplacements): Pass false to
+       find_access_in_subtree.
+
 2020-01-27  Martin Liska  <mli...@suse.cz>
 
        PR driver/91220
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 22a37dd1ab2..7ccb5098224 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-01-24  Martin Jambor  <mjam...@suse.cz>
+
+       PR tree-optimization/92486
+       * gcc.dg/tree-ssa/pr92486.c: New test.
+
 2020-01-27  Martin Liska  <mli...@suse.cz>
 
        PR target/93274
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
new file mode 100644
index 00000000000..77e84241eff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+struct s {
+    char c;
+    int i;
+};
+
+__attribute__((noipa))
+void f(struct s *p, struct s *q)
+{
+    struct s w;
+
+    __builtin_memset(&w, 0, sizeof(struct s));
+    w = *q;
+
+    __builtin_memset(p, 0, sizeof(struct s));
+    *p = w;
+}
+
+int main()
+{
+    struct s x;
+    __builtin_memset(&x, 1, sizeof(struct s));
+
+    struct s y;
+    __builtin_memset(&y, 2, sizeof(struct s));
+
+    f(&y, &x);
+
+    for (unsigned char *p = (unsigned char *)&y;
+        p < (unsigned char *)&y + sizeof(struct s);
+        p++)
+      if (*p != 1)
+       __builtin_abort ();
+
+    return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index ea8594db193..bda342ffdf9 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "builtins.h"
 #include "tree-sra.h"
+#include "langhooks.h"
 
 
 /* Enumeration of all aggregate reductions we can do.  */
@@ -130,11 +131,16 @@ struct assign_link;
 
 struct access
 {
-  /* Values returned by  `get_ref_base_and_extent' for each component reference
-     If EXPR isn't a component reference  just set `BASE = EXPR', `OFFSET = 0',
-     `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
+  /* Offset, size and base are values returned by `get_ref_base_and_extent' for
+     each component reference If EXPR isn't a component reference just set
+     `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
   HOST_WIDE_INT offset;
   HOST_WIDE_INT size;
+
+  /* If reg_acc_type is non-NULL, this is the size of the actual egister type
+     the access represented before it was extended to cover padding.  Otherwise
+     must be equal to size.  */
+  HOST_WIDE_INT reg_size;
   tree base;
 
   /* Expression.  It is context dependent so do not use it to create new
@@ -144,6 +150,12 @@ struct access
   /* Type.  */
   tree type;
 
+  /* If non-NULL, this is the type that should be actually extracted or
+     inserted from/to the replacement decl when replacing accesses to the
+     individual field itself (as opposed to accesses created as part of
+     replacing aggregate copies which should use TYPE).  */
+  tree reg_acc_type;
+
   /* The statement this access belongs to.  */
   gimple *stmt;
 
@@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp)
   print_generic_expr (f, access->base);
   fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset);
   fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size);
+  if (access->reg_size != access->size)
+    fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size);
   fprintf (f, ", expr = ");
   print_generic_expr (f, access->expr);
   fprintf (f, ", type = ");
   print_generic_expr (f, access->type);
+  if (access->reg_acc_type)
+    {
+      fprintf (f, ", reg_acc_type = ");
+      print_generic_expr (f, access->type);
+    }
   fprintf (f, ", reverse = %d", access->reverse);
   if (grp)
     fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, "
@@ -481,18 +500,27 @@ get_base_access_vector (tree base)
   return base_access_vec->get (base);
 }
 
+/* Return ACCESS's size or reg_size depending on REG.  */
+
+static HOST_WIDE_INT
+acc_size (struct access *access, bool reg)
+{
+  return reg ? access->reg_size : access->size;
+}
+
 /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted
    in ACCESS.  Return NULL if it cannot be found.  */
 
 static struct access *
 find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
-                       HOST_WIDE_INT size)
+                       HOST_WIDE_INT size, bool reg)
 {
-  while (access && (access->offset != offset || access->size != size))
+  while (access && (access->offset != offset
+                   || acc_size (access, reg) != size))
     {
       struct access *child = access->first_child;
 
-      while (child && (child->offset + child->size <= offset))
+      while (child && (child->offset + acc_size (child, reg) <= offset))
        child = child->next_sibling;
       access = child;
     }
@@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, 
HOST_WIDE_INT offset,
   if (access)
     while (access->first_child
           && access->first_child->offset == offset
-          && access->first_child->size == size)
+          && acc_size (access->first_child, reg) == size)
       access = access->first_child;
 
   return access;
@@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT 
offset,
   if (!access)
     return NULL;
 
-  return find_access_in_subtree (access, offset, size);
+  return find_access_in_subtree (access, offset, size, true);
 }
 
 /* Add LINK to the linked list of assign links of RACC.  */
@@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, 
HOST_WIDE_INT size)
   access->base = base;
   access->offset = offset;
   access->size = size;
+  access->reg_size = size;
 
   base_access_vec->get_or_insert (base).safe_push (access);
 
@@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, 
HOST_WIDE_INT offset,
     {
       tree res;
       if (model->grp_same_access_path
+         && !model->reg_acc_type
          && !TREE_THIS_VOLATILE (base)
          && (TYPE_ADDR_SPACE (TREE_TYPE (base))
              == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
@@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access)
   return access->replacement_decl;
 }
 
+/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which
+   case also construct BIT_FIELD_REF into the replacement of the corresponding
+   type (with location LOC).  */
+
+static tree
+get_reg_access_replacement (location_t loc, struct access *access, bool write,
+                           gassign **conversion)
+{
+  tree repl = get_access_replacement (access);
+  if (!access->reg_acc_type)
+    {
+      *conversion = NULL;
+      return repl;
+    }
+
+  tree tmp = make_ssa_name (access->reg_acc_type);
+  if (write)
+    *conversion = gimple_build_assign (repl, NOP_EXPR, tmp);
+  else
+    *conversion = gimple_build_assign (tmp, NOP_EXPR, repl);
+  gimple_set_location (*conversion, loc);
+  return tmp;
+}
+
 
 /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the
    linked list along the way.  Stop when *ACCESS is NULL or the access pointed
@@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root)
       gcc_assert (offset == access->offset);
       gcc_assert (access->grp_unscalarizable_region
                  || size == max_size);
-      gcc_assert (max_size == access->size);
+      if (access->reg_acc_type)
+       gcc_assert (max_size == access->reg_size
+                   && max_size < access->size);
+      else
+       gcc_assert (max_size == access->size
+                   && max_size == access->reg_size);
       gcc_assert (reverse == access->reverse);
 
       if (access->first_child)
@@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr)
   return false;
 }
 
+static void
+upgrade_integral_size_to_prec (struct access *access)
+{
+  /* Always create access replacements that cover the whole access.
+     For integral types this means the precision has to match.
+     Avoid assumptions based on the integral type kind, too.  */
+  if (INTEGRAL_TYPE_P (access->type)
+      && (TREE_CODE (access->type) != INTEGER_TYPE
+         || TYPE_PRECISION (access->type) != access->size)
+      /* But leave bitfield accesses alone.  */
+      && (TREE_CODE (access->expr) != COMPONENT_REF
+         || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))))
+    {
+      tree rt = access->type;
+      gcc_assert ((access->offset % BITS_PER_UNIT) == 0
+                 && (access->size % BITS_PER_UNIT) == 0);
+      access->type = build_nonstandard_integer_type (access->size,
+                                                  TYPE_UNSIGNED (rt));
+      access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base,
+                                        access->offset, access->reverse,
+                                        access->type, NULL, false);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       {
+         fprintf (dump_file, "Changing the type of a replacement for ");
+         print_generic_expr (dump_file, access->base);
+         fprintf (dump_file, " offset: %u, size: %u ",
+                  (unsigned) access->offset, (unsigned) access->size);
+         fprintf (dump_file, " to an integer.\n");
+       }
+    }
+}
+
 /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
    both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY
    is set, we are totally scalarizing the aggregate.  Also set all sorts of
@@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
        hole = true;
     }
 
+  if (!totally && root->reg_acc_type)
+    {
+      /* If total scalarization did not eventually succeed, let's undo any
+        futile attempts to cover padding.  */
+      root->type = root->reg_acc_type;
+      root->size = root->reg_size;
+      root->reg_acc_type = NULL_TREE;
+    }
+
   if (allow_replacements && scalar && !root->first_child
       && (totally || !root->grp_total_scalarization)
       && (totally
@@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
          || ((root->grp_scalar_read || root->grp_assignment_read)
              && (root->grp_scalar_write || root->grp_assignment_write))))
     {
-      /* Always create access replacements that cover the whole access.
-         For integral types this means the precision has to match.
-        Avoid assumptions based on the integral type kind, too.  */
-      if (INTEGRAL_TYPE_P (root->type)
-         && (TREE_CODE (root->type) != INTEGER_TYPE
-             || TYPE_PRECISION (root->type) != root->size)
-         /* But leave bitfield accesses alone.  */
-         && (TREE_CODE (root->expr) != COMPONENT_REF
-             || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1))))
-       {
-         tree rt = root->type;
-         gcc_assert ((root->offset % BITS_PER_UNIT) == 0
-                     && (root->size % BITS_PER_UNIT) == 0);
-         root->type = build_nonstandard_integer_type (root->size,
-                                                      TYPE_UNSIGNED (rt));
-         root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,
-                                            root->offset, root->reverse,
-                                            root->type, NULL, false);
-
-         if (dump_file && (dump_flags & TDF_DETAILS))
-           {
-             fprintf (dump_file, "Changing the type of a replacement for ");
-             print_generic_expr (dump_file, root->base);
-             fprintf (dump_file, " offset: %u, size: %u ",
-                      (unsigned) root->offset, (unsigned) root->size);
-             fprintf (dump_file, " to an integer.\n");
-           }
-       }
+      upgrade_integral_size_to_prec (root);
 
       root->grp_to_be_replaced = 1;
       root->replacement_decl = create_access_replacement (root);
@@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct 
access *parent,
        root->grp_total_scalarization = 0;
     }
 
-  if (!hole || totally)
+  if (!hole)
     root->grp_covered = 1;
   else if (root->grp_write || comes_initialized_p (root->base))
     root->grp_unscalarized_data = 1; /* not covered and written to */
@@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, 
struct access *model,
   access->expr = expr;
   access->offset = new_offset;
   access->size = model->size;
+  gcc_assert (model->size == model->reg_size);
+  access->reg_size = model->reg_size;
   access->type = model->type;
   access->parent = parent;
   access->grp_read = set_grp_read;
@@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access *parent, 
HOST_WIDE_INT pos,
   access->base = parent->base;
   access->offset = pos;
   access->size = size;
+  access->reg_size = size;
   access->expr = expr;
   access->type = type;
   access->parent = parent;
@@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access *parent, 
HOST_WIDE_INT pos,
   return new_acc;
 }
 
+/* Given SIZE return the integer type to represent that many bits or NULL if
+   there is no suitable one.  */
+
+static tree
+sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1)
+{
+  tree type = lang_hooks.types.type_for_size (size, unsignedp);
+  scalar_int_mode mode;
+  if (type
+      && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
+      && GET_MODE_BITSIZE (mode) == size)
+    return type;
+  else
+    return NULL_TREE;
+}
+
+/* Create a series of accesses to cover padding from FROM to TO anch chain them
+   between LAST_PTR and NEXT_CHILD.  Return the pointer to the last created
+   one.  */
+
+static struct access *
+fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from,
+                           HOST_WIDE_INT to, struct access **last_ptr,
+                           struct access *next_child)
+{
+  struct access *access = NULL;
+
+  gcc_assert (from < to);
+  do
+    {
+      HOST_WIDE_INT diff = to - from;
+      gcc_assert (diff >= BITS_PER_UNIT);
+      HOST_WIDE_INT stsz = 1 << floor_log2 (diff);
+      tree type;
+
+      while (true)
+       {
+         type = sra_type_for_size (stsz);
+         if (type)
+           break;
+         stsz /= 2;
+         gcc_checking_assert (stsz >= BITS_PER_UNIT);
+       }
+
+      do {
+       tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base,
+                                         from, parent->reverse, type, NULL,
+                                         false);
+       access = create_total_scalarization_access (parent, from, stsz, type,
+                                                   expr, last_ptr, next_child);
+       access->grp_no_warning = 1;
+       from += stsz;
+      }
+      while (to - from >= stsz);
+      gcc_assert (from <= to);
+    }
+  while (from < to);
+  return access;
+}
+
+/* Check whether any padding between FROM and TO must be filled during
+   total scalarization and if so, extend *LAST_SIBLING, create additional
+   artificial accesses under PARENT or both to fill it.  Set *LAST_SIBLING to
+   the last created access and set its next_sibling to NEXT_CHILD.  */
+
+static void
+total_scalarization_fill_padding (struct access *parent,
+                                 struct access **last_sibling,
+                                 HOST_WIDE_INT from, HOST_WIDE_INT to,
+                                 struct access *next_child)
+{
+  if (constant_decl_p (parent->base))
+    return;
+
+  gcc_assert (from <= to);
+  if (from == to)
+    return;
+
+  if (struct access *ls = *last_sibling)
+    {
+      /* First, perform any upgrades to full integers we would have done
+        anyway.  */
+      upgrade_integral_size_to_prec (ls);
+
+      HOST_WIDE_INT lastsize = ls->size;
+      if (INTEGRAL_TYPE_P (ls->type)
+         && pow2_or_zerop (lastsize))
+       {
+         /* The last field was a reaonably sized integer, let's try to
+            enlarge as much as we can so that it is still naturally
+            aligned.  */
+         HOST_WIDE_INT lastpos = ls->offset;
+         HOST_WIDE_INT blocksize = lastsize;
+         tree type = NULL_TREE;
+         int unsignedp = TYPE_UNSIGNED (ls->type);
+
+         while (true)
+           {
+             HOST_WIDE_INT b2 = blocksize * 2;
+             if (lastpos + b2 > to
+                 || (lastpos % b2) != 0)
+               break;
+             tree t2 = sra_type_for_size (b2, unsignedp);
+             if (!t2)
+               break;
+
+             blocksize = b2;
+             type = t2;
+           }
+
+         if (blocksize != lastsize)
+           {
+             ls->reg_acc_type = ls->type;
+             ls->type = type;
+             ls->size = blocksize;
+             from = lastpos + blocksize;
+           }
+
+         if (from == to)
+           return;
+       }
+    }
+
+  struct access **last_ptr;
+  if (*last_sibling)
+    last_ptr = &(*last_sibling)->next_sibling;
+  else
+    last_ptr = &parent->first_child;
+  *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr,
+                                             next_child);
+}
+
 static bool totally_scalarize_subtree (struct access *root);
 
 /* Return true if INNER is either the same type as OUTER or if it is the type
@@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access 
*parent,
                                   HOST_WIDE_INT size)
 {
   struct access *next_child;
+  HOST_WIDE_INT covered;
   if (!*last_seen_sibling)
-    next_child = parent->first_child;
+    {
+      next_child = parent->first_child;
+      covered = parent->offset;
+    }
   else
-    next_child = (*last_seen_sibling)->next_sibling;
+    {
+      next_child = (*last_seen_sibling)->next_sibling;
+      covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size;
+    }
 
   /* First, traverse the chain of siblings until it points to an access with
      offset at least equal to POS.  Check all skipped accesses whether they
@@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access 
*parent,
     {
       if (next_child->offset + next_child->size > pos)
        return TOTAL_FLD_FAILED;
+      total_scalarization_fill_padding (parent, last_seen_sibling, covered,
+                                       next_child->offset, next_child);
+      covered = next_child->offset + next_child->size;
       *last_seen_sibling = next_child;
       next_child = next_child->next_sibling;
     }
 
+  total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos,
+                                   next_child);
+
   /* Now check whether next_child has exactly the right POS and SIZE and if so,
      whether it can represent what we need and can be totally scalarized
      itself.  */
@@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root)
     }
 
  out:
+  /* Even though there is nothing in the type, there can still be accesses here
+     in the IL.  */
+
+  HOST_WIDE_INT covered;
+  struct access *next_child;
+
+  if (last_seen_sibling)
+    {
+      covered = last_seen_sibling->offset + last_seen_sibling->size;
+      next_child = last_seen_sibling->next_sibling;
+    }
+  else
+    {
+      covered = root->offset;
+      next_child = root->first_child;
+    }
+
+  while (next_child)
+    {
+      total_scalarization_fill_padding (root, &last_seen_sibling, covered,
+                                       next_child->offset, next_child);
+      covered = next_child->offset + next_child->size;
+      last_seen_sibling = next_child;
+      next_child = next_child->next_sibling;
+    }
+
+  total_scalarization_fill_padding (root, &last_seen_sibling, covered,
+                                   root->offset + root->size, NULL);
   return true;
 }
 
@@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
bool write)
 
   if (access->grp_to_be_replaced)
     {
-      tree repl = get_access_replacement (access);
       /* If we replace a non-register typed access simply use the original
          access expression to extract the scalar component afterwards.
         This happens if scalarizing a function return value or parameter
@@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
bool write)
          be accessed as a different type too, potentially creating a need for
          type conversion (see PR42196) and when scalarized unions are involved
          in assembler statements (see PR42398).  */
-      if (!useless_type_conversion_p (type, access->type))
+      tree actual_type
+       = access->reg_acc_type ? access->reg_acc_type : access->type;
+
+      if (!useless_type_conversion_p (type, actual_type))
        {
          tree ref;
 
+         tree repl = get_access_replacement (access);
          ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
 
          if (write)
@@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, 
bool write)
            }
        }
       else
-       *expr = repl;
+       {
+         gassign *conversion;
+         tree repl = get_reg_access_replacement (loc, access, write,
+                                                 &conversion);
+         *expr = repl;
+
+         if (conversion)
+           {
+             if (write)
+               gsi_insert_after (gsi, conversion, GSI_NEW_STMT);
+             else
+               gsi_insert_before (gsi, conversion, GSI_SAME_STMT);
+           }
+       }
+
       sra_stats.exprs++;
     }
   else if (write && access->grp_to_be_debug_replaced)
@@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc,
          gassign *stmt;
          tree rhs;
 
-         racc = find_access_in_subtree (sad->top_racc, offset, lacc->size);
+         racc = find_access_in_subtree (sad->top_racc, offset, lacc->size,
+                                        false);
          if (racc && racc->grp_to_be_replaced)
            {
              rhs = get_access_replacement (racc);
@@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc,
              tree drhs;
              struct access *racc = find_access_in_subtree (sad->top_racc,
                                                            offset,
-                                                           lacc->size);
+                                                           lacc->size, false);
 
              if (racc && racc->grp_to_be_replaced)
                {
@@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
     {
-      lhs = get_access_replacement (lacc);
+      gassign *lhs_conversion = NULL;
+      lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion);
       gimple_assign_set_lhs (stmt, lhs);
+      if (lhs_conversion)
+       gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT);
       modify_this_stmt = true;
       if (lacc->grp_partial_lhs)
        force_gimple_rhs = true;
@@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
*gsi)
 
   if (racc && racc->grp_to_be_replaced)
     {
-      rhs = get_access_replacement (racc);
+      gassign *rhs_conversion = NULL;
+      rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion);
+      if (rhs_conversion)
+       gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT);
       modify_this_stmt = true;
       if (racc->grp_partial_lhs)
        force_gimple_rhs = true;
-- 
2.24.1

Reply via email to