https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409
--- Comment #9 from Aldy Hernandez <aldyh at gcc dot gnu.org> --- There's definitely something in the threader, but I'm not sure it's the cause of all the regression. For the record, I've reproduced on ppc64le with a spec .cfg file having: OPTIMIZE = -O2 -flto=100 -save-temps -ftime-report -v -fno-checking The slow wrf_r.ltransNN.o files that dominate the compilation and are taking more than 2-3 seconds are (42, 76, and 24). I've distilled -ftime-report for VRP and jump threading, which usually go hand in hand now that VRP2 runs with ranger: dumping.42: tree VRP : 13.70 ( 3%) 0.08 ( 2%) 13.73 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 26.68 ( 5%) 0.00 ( 0%) 26.72 ( 5%) 3609k ( 0%) dumping.42: TOTAL : 524.00 3.31 527.30 1277M dumping.76: tree VRP : 38.30 ( 13%) 0.03 ( 2%) 38.31 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 47.38 ( 17%) 0.01 ( 1%) 47.37 ( 16%) 1671k ( 0%) dumping.76: TOTAL : 286.03 1.79 287.82 1173M dumping.24: tree VRP : 87.43 ( 8%) 0.07 ( 2%) 87.53 ( 8%) 58M ( 3%) dumping.24: backwards jump threading : 129.81 ( 12%) 0.00 ( 0%) 129.81 ( 12%) 8986k ( 0%) dumping.24: TOTAL :1042.37 3.58 1045.93 2325M Threading is usually more expensive than VRP because it tries candidates over and over, but it's not meant to be orders of magnitude slower. Prior to the bisected patch in r12-5228, we had: dumping.42: tree VRP : 14.58 ( 3%) 0.07 ( 2%) 14.62 ( 3%) 45M ( 4%) dumping.42: backwards jump threading : 13.88 ( 3%) 0.00 ( 0%) 13.89 ( 3%) 3609k ( 0%) dumping.42: TOTAL : 484.12 3.06 487.18 1277M dumping.76: tree VRP : 37.68 ( 13%) 0.04 ( 2%) 37.79 ( 13%) 19M ( 2%) dumping.76: backwards jump threading : 45.50 ( 15%) 0.03 ( 2%) 45.52 ( 15%) 1671k ( 0%) dumping.76: TOTAL : 293.74 1.81 295.55 1173M dumping.24: tree VRP : 94.27 ( 9%) 0.11 ( 3%) 94.39 ( 9%) 58M ( 3%) dumping.24: backwards jump threading : 102.63 ( 10%) 0.02 ( 0%) 102.67 ( 10%) 8986k ( 0%) dumping.24: TOTAL :1021.66 4.28 1025.92 2325M So at least for ltrans42, there's a big slowdown with this patch. Before, threading was 4.80% faster than VRP, whereas now it's 94.7% slower. I have a patch for the above slowdown, but I wouldn't characterize the above difference as a "compile hog". When I add up the 3 ltrans unit totals (which are basically the entire compilation), the difference is a 3% slowdown. If this PR is for a larger than 3-4% slowdown, I think we should look elsewhere. I could be wrong though ;-).