4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

bonzini at gnu dot org Thu, 07 May 2009 09:21:37 -0700


------- Comment #74 from bonzini at gnu dot org  2009-05-07 16:21 -------
Ok.  One step at a time. :-)  To recap, here is the situation:


- the CSE optimization you mention was *not* removed, it was moved to fwprop,
so it does not run at -O1.

- once this was done, the way to go is to tune new optimizations, not to
reintroduce old ones

- for example, fwprop in turn triggered a bad choice in loop invariant motion,
for which a patch has been posted.  This patch will remove the need for
-fno-move-loop-invariants on this testcase (this is a deficiency in LIM that is
not specific to machine-generated code, OTOH the presence of many fp[N]
accesses helps triggering it).

- that scheduling is necessary now and not in 4.2.x, probably is just a matter
of luck

- why renaming registers is necessary now and not in 4.2.x is still a mystery;
but, there is an explanation as to why it helps (it prolongs live ranges,
something that on non-x86 archs is done by the pre-regalloc scheduling)

- at least we have a set of options providing good performance on this
testcase, and guidance towards better tuning of the various problematic
optimizations

To conclude, nobody is underestimating the significance of its PR, it's just a
matter of priorities.  Near the end of the release cycle, you tend to look at
PRs with small testcases to minimize the time spent understanding the code;
near the beginning, you hope that new features magically fix the PRs and
concentrate on wrong-code bugs and so on.  Complex P2s such as this one
unfortunately tend to stay in a limbo.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33928

[Bug rtl-optimization/33928] [4.3/4.4/4.5 Regression] 30% performance slowdown in floating-point code caused by r118475

Reply via email to