http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54971
--- Comment #13 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-10-29 08:36:03 UTC --- So, beyond the creation of new debug only accesses for whole struct writes into hole if there aren't too many holes, I wonder if SRA doesn't have infrastructure to do aggregate assignment propagation (which could help with the rest of the -Os -m32 issues on the committed testcase, but even for code generation on say): struct A { int a, b, c, d, e, f, g, h; } z; struct B { struct A a, b, c, d, e, f, g, h; } x, y; void foo (void) { struct A a = { 1, 2, 3, 4, 5, 6, 7, 8 }; struct A b = a; struct A c = b; struct A d = c; struct A e = d; z = e; } void bar (void) { struct B a = x; struct B b = a; struct B c = b; struct B d = c; struct B e = d; y = e; } Here, with -Os both routines result in terrible inefficient code, as total scalarization is not performed and even for these simple cases where there is one whole aggregate store and one whole aggregate read that is dominated by the store SRA (nor any other optimization pass, but IMHO SRA has best infrastructure for that) doesn't attempt to optimize by doing just y = x; (and b = x; c = x; d = x; e = x; that would be DCEd away). With -O2 only the second routine generates terrible code. While this testcase is artificial, the checked in testcase shows at least one level of extra aggregate assignment happens e.g. with compound literals.