https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92706
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization CC| |jamborm at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Martin - ESRA does something odd here, I see it turning MEM[(charD.7 * {ref-all})&vD.1911] = MEM[(charD.7 * {ref-all})p_3(D)]; uD.1912 = vD.1911; wD.1913 = uD.1912; _7 = MEM[(my_int128D.1907 * {ref-all})&wD.1913]; into MEM[(charD.7 * {ref-all})&vD.1911] = MEM[(charD.7 * {ref-all})p_3(D)]; uD.1912 = vD.1911; u$i$0_1 = MEM[(struct S *)&vD.1911]; u$i$1_11 = MEM[(struct S *)&vD.1911 + 4B]; u$i$2_12 = MEM[(struct S *)&vD.1911 + 8B]; u$i$3_13 = MEM[(struct S *)&vD.1911 + 12B]; MEM[(struct S *)&wD.1913] = u$i$0_1; MEM[(struct S *)&wD.1913 + 4B] = u$i$1_11; MEM[(struct S *)&wD.1913 + 8B] = u$i$2_12; MEM[(struct S *)&wD.1913 + 12B] = u$i$3_13; w_18 = MEM[(struct S *)&wD.1913]; _7 = w_18; where it totally scalarizes uD.1912 without factoring in the access loading to w_18. This creates a pass ordering issue with FRE which on the aggregate code would have happily elided the aggregate copy but is confused about the "scalarized" variant. SRA sees Access trees for w (UID: 1913): access { base = (1913)'w', offset = 0, size = 1024, expr = w, type = struct S, non_addressable = 0, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 0, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 0, grp_total_scalarization = 0, grp_hint = 0, grp_covered = 0, grp_unscalarizable_region = 0, grp_unscalarized_data = 1, grp_partial_lhs = 0, grp_to_be_replaced = 0, grp_to_be_debug_replaced = 0, grp_maybe_modified = 0, grp_not_necessarilly_dereferenced = 0 * access { base = (1913)'w', offset = 0, size = 128, expr = MEM[(my_int128 * {ref-all})&w], type = my_int128, non_addressable = 0, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1, grp_scalar_read = 1, grp_scalar_write = 0, grp_total_scalarization = 0, grp_hint = 0, grp_covered = 1, grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_partial_lhs = 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0, grp_maybe_modified = 0, grp_not_necessarilly_dereferenced = 0 so I wonder why it chooses to totally scalarize instead of using the int128 access?