FYI the performance impact of this option with SPEC06 (built with google_46 compiler and measured on a core2 box). The base line number is FDO, and ref number is FDO + reorder_with_partitioning.
xalancbmk improves > 3.5% perlbench improves > 1.5% dealII and bzip2 degrades about 1.4%. Note the partitioning scheme is not tuned at all -- there is not even a tunable parameter to play with. David On Tue, Jul 19, 2011 at 2:33 PM, Richard Henderson <r...@redhat.com> wrote: > There are a number of problems with this code that affect > its ability to work with any non-x86-like target, that is, > anyone that doesn't define at least HAS_LONG_UNCOND_BRANCH > and possibly HAS_LONG_COND_BRANCH. > > We begin, quite sensibly, with pass_partition_blocks which > performs a number of transformations upon the code that, > while the actual code could be better factored, is quite > easy to follow. Depending on the features of the target, > fallthrus are turned into unconditional jumps, conditional > jumps are split into branch around branch, unconditional > jumps are turned into indirect jumps. > > There's nice bits of commentary that say why things are > implemented this way, including exposing the indirect jumps > to the register allocator. > > But after pass_partition_blocks, we run into trouble. There > are no less than 4 other passes that add *new* crossing jumps > without doing *any* of the subsequent fixups for less capable > targets: pass_outof_cfg_layout_mode, pass_reorder_blocks, > pass_sched2 (ia64 only? it's in code in haifa that looks like > speculative load fixups), and pass_convert_to_eh_region_ranges. > > The worst part is that test coverage for this feature is > extremely poor. It's very difficult to tell if any cleanup > in this area is likely to introduce more bugs than it fixes. > > After 3 days fighting with this code, I had a bit of a > cathartic whine on IRC. I got two votes to just rip the > whole thing out. > > Andrew Pinski points out that the feature could probably be > equivalently implemented via outlining and function calls > (I assume well back at the gimple level). At which point we > no longer have cross-segment jump_insns at the rtl level, > which seems like a Really Big Win to me at this point. > Not that I'm volunteering to actually do the work to implement > any such scheme. > > Thoughts? > > > r~ >