------- Comment #74 from bonzini at gnu dot org 2009-05-07 16:21 ------- Ok. One step at a time. :-) To recap, here is the situation:
- the CSE optimization you mention was *not* removed, it was moved to fwprop, so it does not run at -O1. - once this was done, the way to go is to tune new optimizations, not to reintroduce old ones - for example, fwprop in turn triggered a bad choice in loop invariant motion, for which a patch has been posted. This patch will remove the need for -fno-move-loop-invariants on this testcase (this is a deficiency in LIM that is not specific to machine-generated code, OTOH the presence of many fp[N] accesses helps triggering it). - that scheduling is necessary now and not in 4.2.x, probably is just a matter of luck - why renaming registers is necessary now and not in 4.2.x is still a mystery; but, there is an explanation as to why it helps (it prolongs live ranges, something that on non-x86 archs is done by the pre-regalloc scheduling) - at least we have a set of options providing good performance on this testcase, and guidance towards better tuning of the various problematic optimizations To conclude, nobody is underestimating the significance of its PR, it's just a matter of priorities. Near the end of the release cycle, you tend to look at PRs with small testcases to minimize the time spent understanding the code; near the beginning, you hope that new features magically fix the PRs and concentrate on wrong-code bugs and so on. Complex P2s such as this one unfortunately tend to stay in a limbo. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928