As the PR points out, the "simple" reorder algorithm makes bigger code than the STC algorithm did, for -Os, for x86. I now tested it for many different targets and it turns out to be worse everywhere.
This simple patch tunes "simple" a bit; this makes it better than STC almost everywhere. The only exceptions (for the targets where I have results) are x86 and mn10300. For those targets it may be best to switch the default algorithm for -Os to STC. The raw results. This is text size for vmlinux for 31 targets; as you can see many did not build, but at least all primary targets did. "none" is no basic block reordering; "trunk" is current trunk; "stc" is with the STC algorithm; "nodyn" is with simple, but considering all edges equally likely; "swaps" is that, but prefering the more likely edge from conditional branches; and "falls" prefers the edge from conditional branches that is already the fallthrough edge. This patch is "falls". none trunk stc nodyn swaps falls best 3728663 3721479 3700831 3706407 3717971 3690367 falls arm 2097684 2095560 2089484 2094024 2094212 2086720 falls blackfin 2096118 2107356 2081894 2092276 2103732 2077162 falls cris 3204044 3200972 3187196 3191932 3198156 3177980 falls frv 9254222 9340258 9208805 9265886 9331362 9247119 stc i386 3353034 3355482 3334726 3334466 3349710 3314970 falls m32r 4545720 4549824 4514256 4541832 4544540 4498416 falls microblaze 4276743 4266363 4246255 4259923 4261367 4227723 falls mips 5779199 5770567 5741663 5757663 5764475 5721803 falls mips64 2059078 2086000 2051005 2064365 2083923 2055705 stc mn10300 3311925 3320113 3293873 3305949 3317865 3284081 falls nios2 6738701 6747077 6710253 6742917 6740965 6696757 falls parisc64 8312408 8312744 8261016 8294480 8306488 8237188 falls powerpc 17782722 17788382 17722326 17749526 17780642 17683810 falls powerpc64 11016289 11029481 10970081 10980065 11024617 10942409 falls s390 1406832 1417224 1400344 1409392 1416172 1399428 falls shnommu 3771111 3776223 3751623 3768459 3771455 3732967 falls sparc 6113427 6113527 6074875 6091167 6106015 6049571 falls sparc64 10449681 10529883 10398908 10458240 10522149 10440814 stc x86_64 1905733 1905733 1905733 1905733 1905733 1905733 ----- xtensa Is this okay for trunk? Segher 2015-10-08 Segher Boessenkool <seg...@kernel.crashing.org> PR rtl-optimization/67864 * gcc/bb-reorder (reorder_basic_blocks_simple): Prefer existing fallthrough edges for conditional jumps. Don't sort candidate edges if not optimizing for speed. --- gcc/bb-reorder.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index cb001e8..3b7098e 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -2318,16 +2318,24 @@ reorder_basic_blocks_simple (void) if (any_condjump_p (end)) { - edges[n++] = EDGE_SUCC (bb, 0); - edges[n++] = EDGE_SUCC (bb, 1); + edge e0 = EDGE_SUCC (bb, 0); + edge e1 = EDGE_SUCC (bb, 1); + /* When optimizing for size it is best to keep the original + fallthrough edges. */ + if (e1->flags & EDGE_FALLTHRU) + std::swap (e0, e1); + edges[n++] = e0; + edges[n++] = e1; } else if (single_succ_p (bb)) edges[n++] = EDGE_SUCC (bb, 0); } - /* Sort the edges, the most desirable first. */ + /* Sort the edges, the most desirable first. When optimizing for size + all edges are equally desirable. */ - std::stable_sort (edges, edges + n, edge_order); + if (optimize_function_for_speed_p (cfun)) + std::stable_sort (edges, edges + n, edge_order); /* Now decide which of those edges to make fallthrough edges. We set BB_VISITED if a block already has a fallthrough successor assigned -- 1.8.1.4