On Mon, 30 Jan 2012, Richard Guenther wrote: > On Fri, 27 Jan 2012, Richard Guenther wrote: > > > > > This fixes PR51528 by hiding the issue that SRA generates > > copy in/out with a type not suitable for preserving the data > > (any non-mode-precision thing). It fixes it by instead of > > emitting > > > > x$i_8 = x.i; > > D.1720 = x; > > D.1720.i = x$i_8; > > > > for an assign to an unscalarized D.1720 from a partly scalarized x > > (so the aggregate assignment prevails) emit > > > > x.i = x$i_8; > > D.1720 = x; > > > > instead. This allows the unfortunate copy in/out to be optimized > > away (and thus its undefined behavior). > > > > Basically whenever the aggregate copy will prevail prefer to > > restore the RHS and reload the LHS components - just like we > > do for aggregate uses/destinations of call statements. > > > > I think we can still clean up the aggregate copy path a lot, > > but that's for 4.8 (unless new bugs pop up). > > > > Bootstrap and regtest pending on x86_64-unknown-linux-gnu. > > So it turns out this wasn't that easy and I botched up some > of the intricate execution flow in sra_modify_assign. As a > preparation for a smaller fix the following patch refactors > this function to be easier to understand and hopefully make > the real fix obvious in what it does.
And this is the "real" fix - well, avoid the copy in/out we currently generate which we are not able to optimize and instead emit sth we are able to remove the redundant (and harmful) stores. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2012-01-30 Richard Guenther <rguent...@suse.de> PR tree-optimization/51528 * tree-sra.c (sra_modify_assign): Avoid copy-in/out for aggregate assigns. * gcc.dg/torture/pr51528.c: New testcase. Index: gcc/tree-sra.c =================================================================== *** gcc/tree-sra.c (revision 183720) --- gcc/tree-sra.c (working copy) *************** sra_modify_assign (gimple *stmt, gimple_ *** 3135,3143 **** sra_stats.deleted++; return SRA_AM_REMOVED; } if (access_has_children_p (racc)) ! generate_subtree_copies (racc->first_child, lhs, racc->offset, ! 0, 0, gsi, false, true, loc); if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true, loc); --- 3135,3148 ---- sra_stats.deleted++; return SRA_AM_REMOVED; } + /* Restore the aggregate RHS from its components so the + prevailing aggregate copy does the right thing. */ if (access_has_children_p (racc)) ! generate_subtree_copies (racc->first_child, racc->base, 0, 0, 0, ! gsi, false, false, loc); ! /* Re-load the components of the aggregate copy destination. ! But use the RHS aggregate to load from to expose more ! optimization opportunities. */ if (access_has_children_p (lacc)) generate_subtree_copies (lacc->first_child, rhs, lacc->offset, 0, 0, gsi, true, true, loc); Index: gcc/testsuite/gcc.dg/torture/pr51528.c =================================================================== *** gcc/testsuite/gcc.dg/torture/pr51528.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr51528.c (revision 0) *************** *** 0 **** --- 1,46 ---- + /* { dg-do run } */ + /* { dg-options "-fno-early-inlining" } */ + + extern void abort (void); + + union U + { + int i; + _Bool b; + }; + + _Bool gb; + + void __attribute__ ((noinline)) + use_bool (union U u) + { + gb = u.b; + } + + union U + bar (void) + { + union U u; + u.i = 0xFFFE; + return u; + } + + union U __attribute__ ((noinline)) + foo (void) + { + union U u,v; + + u.b = 1; + use_bool (u); + u = bar (); + + return u; + } + + int main (int argc, char **argv) + { + union U u = foo (); + if (u.i != 0xFFFE) + abort (); + return 0; + }