https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111844
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu.org, | |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- We are not optimizing the code at all on the GIMPLE level but expand from <bb 2> [local count: 1073741824]: memcpy (&p, buf_5(D), 88); _1 = p.x; inc.0_2 = (unsigned int) inc_7(D); _3 = _1 + inc.0_2; p.x = _3; memcpy (buf_5(D), &p, 88); p ={v} {CLOBBER(eol)}; return; where when expanding memcpy inline during RTL expanding we seem to be able to clean up after that. It seems to me this is a task for SRA (again...) which should be more forgiving to select stmts requiring address-taking of locals but only when they are not rewritten plus analyzing memcpy, memset (and other select builtins) as to their effect. SRA handles the following by means of totally scalarizing 'p': void foo(P* buf, int inc) { P p; p = *buf; p.x += inc; *buf = p; } and you get _Z3fooP1Pi: .LFB16: .cfi_startproc addl %esi, (%rdi) ret with or without the call to bar (). You could argue more aggressive "inline expanding" memcpy (to char[] = char[] in this case) would be asked for but I think this might confuse SRA and I'm not sure we apply the same costing as to whether to inline-expand the "memcpy" at RTL expansion time.