https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96966
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #4) > Even > extern char a[32]; > > void f (const void *s) > { > char *p = (char*)__builtin_memcpy (a, s, 16); > __builtin_memcpy (p, s, 16); > } > > > void g (const void *s) > { > __builtin_memcpy (a, s, 16); > __builtin_memcpy (a, s, 16); > } > used to be optimized just in 8.1/8.2 and not in earlier or later GCC > versions. > Perhaps delaying the lowering of memcpy a tiny bit and trying to optimize it > when it is still not lowered? lowering early is quite an important thing since it enables better initial into-SSA rewriting and early inline costing. Note even w/o lowering FRE would not optimize this. Even the strlen pass doesn't: extern char a[32]; void __GIMPLE (ssa,startwith("fre1")) g (const void *s) { __BB(2): __builtin_memcpy (&a[0], s_1(D), _Literal (__SIZE_TYPE__) 16); __builtin_memcpy (&a[0], s_1(D), _Literal (__SIZE_TYPE__) 16); return; } has this survive until .fab if you do -O2 -fno-tree-forwprop -fno-tree-vrp