https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92706

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Martin - ESRA does something odd here, I see it turning

  MEM[(charD.7 * {ref-all})&vD.1911] = MEM[(charD.7 * {ref-all})p_3(D)];
  uD.1912 = vD.1911;
  wD.1913 = uD.1912;
  _7 = MEM[(my_int128D.1907 * {ref-all})&wD.1913];

into

  MEM[(charD.7 * {ref-all})&vD.1911] = MEM[(charD.7 * {ref-all})p_3(D)];
  uD.1912 = vD.1911;
  u$i$0_1 = MEM[(struct S *)&vD.1911];
  u$i$1_11 = MEM[(struct S *)&vD.1911 + 4B];
  u$i$2_12 = MEM[(struct S *)&vD.1911 + 8B];
  u$i$3_13 = MEM[(struct S *)&vD.1911 + 12B];
  MEM[(struct S *)&wD.1913] = u$i$0_1;
  MEM[(struct S *)&wD.1913 + 4B] = u$i$1_11;
  MEM[(struct S *)&wD.1913 + 8B] = u$i$2_12;
  MEM[(struct S *)&wD.1913 + 12B] = u$i$3_13;
  w_18 = MEM[(struct S *)&wD.1913];
  _7 = w_18;

where it totally scalarizes uD.1912 without factoring in the access
loading to w_18.

This creates a pass ordering issue with FRE which on the aggregate code
would have happily elided the aggregate copy but is confused about
the "scalarized" variant.

SRA sees

Access trees for w (UID: 1913):
access { base = (1913)'w', offset = 0, size = 1024, expr = w, type = struct S,
non_addressable = 0, reverse = 0, grp_read = 1, grp_write = 1,
grp_assignment_read = 0, grp_assignment_write = 1, grp_scalar_read = 0,
grp_scalar_write = 0, grp_total_scalarization = 0, grp_hint = 0, grp_covered =
0, grp_unscalarizable_region = 0, grp_unscalarized_data = 1, grp_partial_lhs =
0, grp_to_be_replaced = 0, grp_to_be_debug_replaced = 0, grp_maybe_modified =
0, grp_not_necessarilly_dereferenced = 0
* access { base = (1913)'w', offset = 0, size = 128, expr = MEM[(my_int128 *
{ref-all})&w], type = my_int128, non_addressable = 0, reverse = 0, grp_read =
1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1,
grp_scalar_read = 1, grp_scalar_write = 0, grp_total_scalarization = 0,
grp_hint = 0, grp_covered = 1, grp_unscalarizable_region = 0,
grp_unscalarized_data = 0, grp_partial_lhs = 0, grp_to_be_replaced = 1,
grp_to_be_debug_replaced = 0, grp_maybe_modified = 0,
grp_not_necessarilly_dereferenced = 0

so I wonder why it chooses to totally scalarize instead of using the
int128 access?

Reply via email to