On Thu, Oct 8, 2015 at 6:57 PM, Segher Boessenkool <seg...@kernel.crashing.org> wrote: > As the PR points out, the "simple" reorder algorithm makes bigger code > than the STC algorithm did, for -Os, for x86. I now tested it for many > different targets and it turns out to be worse everywhere. > > This simple patch tunes "simple" a bit; this makes it better than STC > almost everywhere. The only exceptions (for the targets where I have > results) are x86 and mn10300. For those targets it may be best to switch > the default algorithm for -Os to STC. > > The raw results. This is text size for vmlinux for 31 targets; as you > can see many did not build, but at least all primary targets did. > "none" is no basic block reordering; "trunk" is current trunk; "stc" is > with the STC algorithm; "nodyn" is with simple, but considering all edges > equally likely; "swaps" is that, but prefering the more likely edge from > conditional branches; and "falls" prefers the edge from conditional > branches that is already the fallthrough edge. This patch is "falls". > > none trunk stc nodyn swaps falls best > 3728663 3721479 3700831 3706407 3717971 3690367 falls arm > 2097684 2095560 2089484 2094024 2094212 2086720 falls blackfin > 2096118 2107356 2081894 2092276 2103732 2077162 falls cris > 3204044 3200972 3187196 3191932 3198156 3177980 falls frv > 9254222 9340258 9208805 9265886 9331362 9247119 stc i386 > 3353034 3355482 3334726 3334466 3349710 3314970 falls m32r > 4545720 4549824 4514256 4541832 4544540 4498416 falls microblaze > 4276743 4266363 4246255 4259923 4261367 4227723 falls mips > 5779199 5770567 5741663 5757663 5764475 5721803 falls mips64 > 2059078 2086000 2051005 2064365 2083923 2055705 stc mn10300 > 3311925 3320113 3293873 3305949 3317865 3284081 falls nios2 > 6738701 6747077 6710253 6742917 6740965 6696757 falls parisc64 > 8312408 8312744 8261016 8294480 8306488 8237188 falls powerpc > 17782722 17788382 17722326 17749526 17780642 17683810 falls powerpc64 > 11016289 11029481 10970081 10980065 11024617 10942409 falls s390 > 1406832 1417224 1400344 1409392 1416172 1399428 falls shnommu > 3771111 3776223 3751623 3768459 3771455 3732967 falls sparc > 6113427 6113527 6074875 6091167 6106015 6049571 falls sparc64 > 10449681 10529883 10398908 10458240 10522149 10440814 stc x86_64 > 1905733 1905733 1905733 1905733 1905733 1905733 ----- xtensa > > > Is this okay for trunk?
I think the patch makes sense but it also raises a question for me - how did we decide what edge gets EDGE_FALLTHRU when going out-of-cfglayout? And isn't _that_ mechanism then not part of basic-block reordering that needs to be tweaked for choosing the EDGE_FALLTHRU as better? Thanks, Richard. > > Segher > > > 2015-10-08 Segher Boessenkool <seg...@kernel.crashing.org> > > PR rtl-optimization/67864 > * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing > fallthrough edges for conditional jumps. Don't sort candidate > edges if not optimizing for speed. > > --- > gcc/bb-reorder.c | 16 ++++++++++++---- > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c > index cb001e8..3b7098e 100644 > --- a/gcc/bb-reorder.c > +++ b/gcc/bb-reorder.c > @@ -2318,16 +2318,24 @@ reorder_basic_blocks_simple (void) > > if (any_condjump_p (end)) > { > - edges[n++] = EDGE_SUCC (bb, 0); > - edges[n++] = EDGE_SUCC (bb, 1); > + edge e0 = EDGE_SUCC (bb, 0); > + edge e1 = EDGE_SUCC (bb, 1); > + /* When optimizing for size it is best to keep the original > + fallthrough edges. */ > + if (e1->flags & EDGE_FALLTHRU) > + std::swap (e0, e1); > + edges[n++] = e0; > + edges[n++] = e1; > } > else if (single_succ_p (bb)) > edges[n++] = EDGE_SUCC (bb, 0); > } > > - /* Sort the edges, the most desirable first. */ > + /* Sort the edges, the most desirable first. When optimizing for size > + all edges are equally desirable. */ > > - std::stable_sort (edges, edges + n, edge_order); > + if (optimize_function_for_speed_p (cfun)) > + std::stable_sort (edges, edges + n, edge_order); > > /* Now decide which of those edges to make fallthrough edges. We set > BB_VISITED if a block already has a fallthrough successor assigned > -- > 1.8.1.4 >