[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

aldyh at gcc dot gnu.org via Gcc-bugs Mon, 29 Nov 2021 06:22:49 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103409


--- Comment #9 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
There's definitely something in the threader, but I'm not sure it's the cause
of all the regression.

For the record, I've reproduced on ppc64le with a spec .cfg file having:

OPTIMIZE    = -O2 -flto=100 -save-temps -ftime-report -v -fno-checking

The slow wrf_r.ltransNN.o files that dominate the compilation and are taking
more than 2-3 seconds are (42, 76, and 24).  I've distilled -ftime-report for
VRP and jump threading, which usually go hand in hand now that VRP2 runs with
ranger:

dumping.42: tree VRP                           :  13.70 (  3%)   0.08 (  2%) 
13.73 (  3%)    45M (  4%)
dumping.42: backwards jump threading           :  26.68 (  5%)   0.00 (  0%) 
26.72 (  5%)  3609k (  0%)
dumping.42: TOTAL                              : 524.00          3.31       
527.30         1277M
dumping.76: tree VRP                           :  38.30 ( 13%)   0.03 (  2%) 
38.31 ( 13%)    19M (  2%)
dumping.76: backwards jump threading           :  47.38 ( 17%)   0.01 (  1%) 
47.37 ( 16%)  1671k (  0%)
dumping.76: TOTAL                              : 286.03          1.79       
287.82         1173M
dumping.24: tree VRP                           :  87.43 (  8%)   0.07 (  2%) 
87.53 (  8%)    58M (  3%)
dumping.24: backwards jump threading           : 129.81 ( 12%)   0.00 (  0%)
129.81 ( 12%)  8986k (  0%)
dumping.24: TOTAL                              :1042.37          3.58      
1045.93         2325M

Threading is usually more expensive than VRP because it tries candidates over
and over, but it's not meant to be orders of magnitude slower.  Prior to the
bisected patch in r12-5228, we had:

dumping.42: tree VRP                           :  14.58 (  3%)   0.07 (  2%) 
14.62 (  3%)    45M (  4%)
dumping.42: backwards jump threading           :  13.88 (  3%)   0.00 (  0%) 
13.89 (  3%)  3609k (  0%)
dumping.42: TOTAL                              : 484.12          3.06       
487.18         1277M
dumping.76: tree VRP                           :  37.68 ( 13%)   0.04 (  2%) 
37.79 ( 13%)    19M (  2%)
dumping.76: backwards jump threading           :  45.50 ( 15%)   0.03 (  2%) 
45.52 ( 15%)  1671k (  0%)
dumping.76: TOTAL                              : 293.74          1.81       
295.55         1173M
dumping.24: tree VRP                           :  94.27 (  9%)   0.11 (  3%) 
94.39 (  9%)    58M (  3%)
dumping.24: backwards jump threading           : 102.63 ( 10%)   0.02 (  0%)
102.67 ( 10%)  8986k (  0%)
dumping.24: TOTAL                              :1021.66          4.28      
1025.92         2325M

So at least for ltrans42, there's a big slowdown with this patch.  Before,
threading was 4.80% faster than VRP, whereas now it's 94.7% slower.

I have a patch for the above slowdown, but I wouldn't characterize the above
difference as a "compile hog".  When I add up the 3 ltrans unit totals (which
are basically the entire compilation), the difference is a 3% slowdown.

If this PR is for a larger than 3-4% slowdown, I think we should look
elsewhere.  I could be wrong though ;-).

[Bug tree-optimization/103409] [12 Regression] 18% SPEC2017 WRF compile-time regression with -O2 -flto since r12-5228-gb7a23949b0dcc4205fcc2be6b84b91441faa384d

Reply via email to