https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-10-17 Version|unknown |13.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC| |jamborm at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Confirmed. The frontend leaves us with <<cleanup_point struct Foo tmp = {};>>; <<cleanup_point <<< Unknown tree: expr_stmt tmp.next = f->next >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt (void) (*NON_LVALUE_EXPR <f> = *(const struct Foo &) &tmp) >>>>>; and ESRA sees <bb 2> : tmp = {}; _1 = f_4(D)->next; tmp.next = _1; *f_4(D) = tmp; tmp ={v} {CLOBBER(eol)}; return; ESRA somewhat senselessly does <bb 2> : tmp = {}; tmp$next_8 = 0B; _1 = f_4(D)->next; tmp$next_9 = _1; tmp.next = tmp$next_9; *f_4(D) = tmp; tmp ={v} {CLOBBER(eol)}; return; it doesn't scalarize the array because that's too large. I would guess that Clang doesn't split the initializer and thus its aggregate copy propagation somehow manages to elide 'tmp'. We don't have a good place to peform the desired optimization, certainly the split initialization of 'tmp' complicates things. In principle it would be SRAs job since I think it does most of the necessary analysis, it just lacks knowledge on how to re-materialize *f_4(D) efficiently at the point of the aggregate assignment? It has Candidate (2384): tmp Too big to totally scalarize: tmp (UID: 2384) Created a replacement for tmp offset: 0, size: 64: tmp$nextD.2425 Access trees for tmp (UID: 2384): access { base = (2384)'tmp', offset = 0, size = 4736, expr = tmp, type = struct Foo, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 0, grp_total_scalarization = 0, grp_hint = 0, grp_covered = 0, grp_unscalarizable_region = 0, grp_unscalarized_data = 1, grp_same_access_path = 1, grp_partial_lhs = 0, grp_to_be_replaced = 0, grp_to_be_debug_replaced = 0} * access { base = (2384)'tmp', offset = 0, size = 64, expr = tmp.next, type = struct Foo *, reverse = 0, grp_read = 1, grp_write = 1, grp_assignment_read = 1, grp_assignment_write = 1, grp_scalar_read = 0, grp_scalar_write = 1, grp_total_scalarization = 0, grp_hint = 0, grp_covered = 1, grp_unscalarizable_region = 0, grp_unscalarized_data = 0, grp_same_access_path = 1, grp_partial_lhs = 0, grp_to_be_replaced = 1, grp_to_be_debug_replaced = 0} but it fails to record that for the size 4736 write there's a clear performed that's cheaply to re-materialize (and no variables need to be created). SRA could probably track writes from only constants that way, avoiding to create scalar replacements.