https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-10-17
            Version|unknown                     |13.0
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The frontend leaves us with

  <<cleanup_point   struct Foo tmp = {};>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
    tmp.next = f->next >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
    (void) (*NON_LVALUE_EXPR <f> = *(const struct Foo &) &tmp) >>>>>;

and ESRA sees

  <bb 2> :
  tmp = {};
  _1 = f_4(D)->next;
  tmp.next = _1;
  *f_4(D) = tmp;
  tmp ={v} {CLOBBER(eol)};
  return;

ESRA somewhat senselessly does

  <bb 2> :
  tmp = {};
  tmp$next_8 = 0B;
  _1 = f_4(D)->next;
  tmp$next_9 = _1;
  tmp.next = tmp$next_9;
  *f_4(D) = tmp;
  tmp ={v} {CLOBBER(eol)};
  return;

it doesn't scalarize the array because that's too large.  I would guess
that Clang doesn't split the initializer and thus its aggregate copy
propagation somehow manages to elide 'tmp'.  We don't have a good
place to peform the desired optimization, certainly the split
initialization of 'tmp' complicates things.

In principle it would be SRAs job since I think it does most of the
necessary analysis, it just lacks knowledge on how to re-materialize
*f_4(D) efficiently at the point of the aggregate assignment?
It has

Candidate (2384): tmp
Too big to totally scalarize: tmp (UID: 2384)
Created a replacement for tmp offset: 0, size: 64: tmp$nextD.2425

Access trees for tmp (UID: 2384):
access { base = (2384)'tmp', offset = 0, size = 4736, expr = tmp, type = struct
Foo, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1,
grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 0,
grp_total_scalarization = 0, grp_hint = 0, grp_covered = 0,
grp_unscalarizable_region = 0, grp_unscalarized_data = 1, grp_same_access_path
= 1, grp_partial_lhs = 0, grp_to_be_replaced = 0, grp_to_be_debug_replaced = 0}
* access { base = (2384)'tmp', offset = 0, size = 64, expr = tmp.next, type =
struct Foo *, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read =
1, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 1,
grp_total_scalarization = 0, grp_hint = 0, grp_covered = 1,
grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_same_access_path
= 1, grp_partial_lhs = 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0}

but it fails to record that for the size 4736 write there's a clear performed
that's cheaply to re-materialize (and no variables need to be created).  SRA
could probably track writes from only constants that way, avoiding to create
scalar replacements.

Reply via email to