https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409

--- Comment #9 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
There's definitely something in the threader, but I'm not sure it's the cause
of all the regression.

For the record, I've reproduced on ppc64le with a spec .cfg file having:

OPTIMIZE    = -O2 -flto=100 -save-temps -ftime-report -v -fno-checking

The slow wrf_r.ltransNN.o files that dominate the compilation and are taking
more than 2-3 seconds are (42, 76, and 24).  I've distilled -ftime-report for
VRP and jump threading, which usually go hand in hand now that VRP2 runs with
ranger:

dumping.42: tree VRP                           :  13.70 (  3%)   0.08 (  2%) 
13.73 (  3%)    45M (  4%)
dumping.42: backwards jump threading           :  26.68 (  5%)   0.00 (  0%) 
26.72 (  5%)  3609k (  0%)
dumping.42: TOTAL                              : 524.00          3.31       
527.30         1277M
dumping.76: tree VRP                           :  38.30 ( 13%)   0.03 (  2%) 
38.31 ( 13%)    19M (  2%)
dumping.76: backwards jump threading           :  47.38 ( 17%)   0.01 (  1%) 
47.37 ( 16%)  1671k (  0%)
dumping.76: TOTAL                              : 286.03          1.79       
287.82         1173M
dumping.24: tree VRP                           :  87.43 (  8%)   0.07 (  2%) 
87.53 (  8%)    58M (  3%)
dumping.24: backwards jump threading           : 129.81 ( 12%)   0.00 (  0%)
129.81 ( 12%)  8986k (  0%)
dumping.24: TOTAL                              :1042.37          3.58      
1045.93         2325M

Threading is usually more expensive than VRP because it tries candidates over
and over, but it's not meant to be orders of magnitude slower.  Prior to the
bisected patch in r12-5228, we had:

dumping.42: tree VRP                           :  14.58 (  3%)   0.07 (  2%) 
14.62 (  3%)    45M (  4%)
dumping.42: backwards jump threading           :  13.88 (  3%)   0.00 (  0%) 
13.89 (  3%)  3609k (  0%)
dumping.42: TOTAL                              : 484.12          3.06       
487.18         1277M
dumping.76: tree VRP                           :  37.68 ( 13%)   0.04 (  2%) 
37.79 ( 13%)    19M (  2%)
dumping.76: backwards jump threading           :  45.50 ( 15%)   0.03 (  2%) 
45.52 ( 15%)  1671k (  0%)
dumping.76: TOTAL                              : 293.74          1.81       
295.55         1173M
dumping.24: tree VRP                           :  94.27 (  9%)   0.11 (  3%) 
94.39 (  9%)    58M (  3%)
dumping.24: backwards jump threading           : 102.63 ( 10%)   0.02 (  0%)
102.67 ( 10%)  8986k (  0%)
dumping.24: TOTAL                              :1021.66          4.28      
1025.92         2325M

So at least for ltrans42, there's a big slowdown with this patch.  Before,
threading was 4.80% faster than VRP, whereas now it's 94.7% slower.

I have a patch for the above slowdown, but I wouldn't characterize the above
difference as a "compile hog".  When I add up the 3 ltrans unit totals (which
are basically the entire compilation), the difference is a 3% slowdown.

If this PR is for a larger than 3-4% slowdown, I think we should look
elsewhere.  I could be wrong though ;-).

Reply via email to