Here are SPEC CPU 2000 results with plain trunk and the two alias-oracle
patches.  Base results are plain -O3 -ffast-math, peak results include
--param max-fields-for-field-sensitive=0 which effectively disables the
creation of SFTs.

Unpatched (three runs):

                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ========================================================================
   164.gzip          1400   100      1400    *     1400    99.9    1402    *
   175.vpr           1400    80.3    1742    *     1400    80.5    1738    *
   176.gcc           1100    48.1    2288    *     1100    46.8    2353    *
   181.mcf           1800   131      1371    *     1800   131      1370    *
   186.crafty        1000    38.0    2635    *     1000    36.6    2732    *
   197.parser        1800   134      1348    *     1800   133      1353    *
   252.eon                                   X                             X
   253.perlbmk       1800    70.8    2541    *     1800    70.4    2557    *
   254.gap           1100    57.3    1921    *     1100    57.1    1925    *
   255.vortex                                X                             X
   256.bzip2         1500    79.3    1892    *     1500    79.9    1877    *
   300.twolf         3000   114      2635    *     3000   114      2633    *
   Est. SPECint_base2000             1914    
   Est. SPECint2000                                                1927    


                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ========================================================================
   168.wupwise       1600      79.9      2002*     1600      80.0      2000*
   171.swim          3100     155        1999*     3100     155        1999*
   172.mgrid         1800      98.6      1825*     1800      98.4      1829*
   173.applu         2100     178        1178*     2100     178        1181*
   177.mesa          1400      57.8      2421*     1400      58.1      2411*
   178.galgel        2900      69.0      4204*     2900      69.0      4203*
   179.art           2600      34.7      7482*     2600      34.1      7617*
   183.equake        1300      74.1      1755*     1300      74.0      1757*
   187.facerec       1900      75.3      2523*     1900      75.3      2522*
   188.ammp          2200     119        1845*     2200     119        1843*
   189.lucas         2000     119        1688*     2000     118        1697*
   191.fma3d         2100     132        1590*     2100     131        1598*
   200.sixtrack      1100     120         919*     1100     120         918*
   301.apsi          2600     171        1518*     2600     172        1509*
   Est. SPECfp_base2000                  2029
   Est. SPECfp2000                                                     2032


Patched (three runs):


                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ========================================================================
   164.gzip          1400   100      1400    *     1400    99.9    1401    *
   175.vpr           1400    80.0    1751    *     1400    80.1    1749    *
   176.gcc           1100    47.4    2319    *     1100    46.8    2352    *
   181.mcf           1800   133      1358    *     1800   133      1349    *
   186.crafty        1000    37.6    2656    *     1000    36.8    2718    *
   197.parser        1800   133      1350    *     1800   133      1349    *
   252.eon                                   X                             X
   253.perlbmk       1800    70.4    2557    *     1800    70.0    2573    *
   254.gap           1100    57.3    1918    *     1100    57.4    1918    *
   255.vortex                                X                             X
   256.bzip2         1500    79.9    1877    *     1500    80.6    1862    *
   300.twolf         3000   114      2641    *     3000   114      2638    *
   Est. SPECint_base2000             1918    
   Est. SPECint2000                                                1923    


                                     Estimated                     Estimated
                   Base      Base      Base      Peak      Peak      Peak
   Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
   ========================================================================
   168.wupwise       1600      80.2      1995*     1600      80.1      1998*
   171.swim          3100     156        1993*     3100     155        1994*
   172.mgrid         1800      98.7      1824*     1800      98.6      1826*
   173.applu         2100     178        1178*     2100     178        1178*
   177.mesa          1400      57.8      2422*     1400      57.9      2417*
   178.galgel        2900      69.3      4188*     2900      69.2      4191*
   179.art           2600      36.8      7063*     2600      33.5      7762*
   183.equake        1300      74.0      1756*     1300      74.1      1754*
   187.facerec       1900      76.0      2500*     1900      74.0      2569*
   188.ammp          2200     119        1846*     2200     119        1845*
   189.lucas         2000     117        1706*     2000     117        1703*
   191.fma3d         2100     130        1612*     2100     129        1633*
   200.sixtrack      1100     120         920*     1100     119         921*
   301.apsi          2600     173        1505*     2600     174        1498*
   Est. SPECfp_base2000                  2020
   Est. SPECfp2000                                                     2039


you can see that in both cases the runs without SFTs are significantly
better(!)  Which hints at the fact that we do a poor job with parititoning
and/or that partitioning triggers earlier with SFTs enabled.

The oracle patches are able to slightly improve the results in the non-SFT
case, but overall there is less difference patched vs. unpatched compared
to the differences that result if you disable SFTs.

If you compare testresults with SFTs disabled unpatched vs. patched you
can see that the oracle patches can retain optimizations that were only
possible with SFTs previously (uninteresting parts snipped, full testsuite
for all default languages was run, -m32 results only if they differ
from -m64 results):

unpatched, SFTs disabled:

                === g++ tests ===


Running target unix/
FAIL: g++.dg/torture/pr34850.C  -O0   (test for warnings, line 14)
FAIL: g++.dg/torture/pr34850.C  -O1   (test for warnings, line 14)
FAIL: g++.dg/torture/pr34850.C  -O2   (test for warnings, line 14)
FAIL: g++.dg/torture/pr34850.C  -O3 -fomit-frame-pointer   (test for warnings, 
line 14)
FAIL: g++.dg/torture/pr34850.C  -O3 -g   (test for warnings, line 14)
FAIL: g++.dg/torture/pr34850.C  -Os   (test for warnings, line 14)

                === g++ Summary for unix/ ===

# of expected passes            17440
# of unexpected failures        6
# of expected failures          82
# of unsupported tests          119

                === gcc tests ===


Running target unix/
FAIL: gcc.dg/tree-ssa/alias-10.c scan-tree-dump optimized "return 3;"
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump salias "SFT.5 created for var m 
offset 128"
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump-times salias "VUSE <SFT.5_" 2
FAIL: gcc.dg/tree-ssa/alias-3.c scan-tree-dump optimized "return 1;"
FAIL: gcc.dg/tree-ssa/alias-4.c scan-tree-dump optimized "return 1;"
FAIL: gcc.dg/tree-ssa/alias-5.c scan-tree-dump optimized "return 1;"
FAIL: gcc.dg/tree-ssa/ldist-4.c scan-tree-dump-times ldist "distributed: split 
to 2 loops" 0
FAIL: gcc.dg/tree-ssa/loadpre8.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/pr26421.c scan-tree-dump-times salias "VDEF" 4
FAIL: gcc.dg/tree-ssa/salias-1.c scan-tree-dump-times salias "structure field 
tag SFT" 2
FAIL: gcc.dg/tree-ssa/structopt-1.c scan-tree-dump-times lim "Executing store 
motion of global.y" 1
FAIL: gcc.dg/tree-ssa/structopt-2.c scan-tree-dump-times optimized "a.e" 0
FAIL: gcc.dg/tree-ssa/structopt-2.c scan-tree-dump-times optimized "a.f" 0
FAIL: gcc.dg/tree-ssa/structopt-2.c scan-tree-dump-times optimized "a.g" 0
FAIL: gcc.dg/tree-ssa/structopt-3.c scan-tree-dump-times optimized "return 11" 1

                === gcc Summary ===

# of expected passes            97489
# of unexpected failures        41
# of expected failures          335
# of untested testcases         70
# of unsupported tests          839
/space/rguenther/obj/gcc/xgcc  version 4.4.0 20080304 (experimental) (GCC) 


Patched results:

                === g++ tests ===


Running target unix/
FAIL: g++.dg/tree-ssa/pr34355.C (test for excess errors)

                === g++ Summary for unix/ ===

# of expected passes            17445
# of unexpected failures        1
# of expected failures          82
# of unsupported tests          119

                === gcc tests ===


Running target unix/
FAIL: gcc.dg/autopar/parallelization-1.c (internal compiler error)
FAIL: gcc.dg/autopar/parallelization-1.c (test for excess errors)
FAIL: gcc.dg/autopar/parallelization-1.c scan-tree-dump-times final_cleanup 
"loopfn" 5
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump salias "SFT.5 created for var m 
offset 128"
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump-times salias "VUSE <SFT.5_" 2
FAIL: gcc.dg/tree-ssa/ldist-4.c scan-tree-dump-times ldist "distributed: split 
to 2 loops" 0
FAIL: gcc.dg/tree-ssa/loop-32.c scan-tree-dump-times lim "Executing store 
motion of" 7
FAIL: gcc.dg/tree-ssa/pr26421.c scan-tree-dump-times salias "VDEF" 4
FAIL: gcc.dg/tree-ssa/salias-1.c scan-tree-dump-times salias "structure field 
tag SFT" 2

                === gcc Summary for unix/ ===

# of expected passes            48691
# of unexpected failures        15
# of expected failures          166
# of untested testcases         35
# of unsupported tests          478

Running target unix//-m32
FAIL: gcc.dg/autopar/parallelization-1.c (internal compiler error)
FAIL: gcc.dg/autopar/parallelization-1.c (test for excess errors)
FAIL: gcc.dg/autopar/parallelization-1.c scan-tree-dump-times final_cleanup 
"loopfn" 5
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump salias "SFT.5 created for var m 
offset 128"
FAIL: gcc.dg/tree-ssa/alias-15.c scan-tree-dump-times salias "VUSE <SFT.5_" 2
FAIL: gcc.dg/tree-ssa/pr26421.c scan-tree-dump-times salias "VDEF" 4
FAIL: gcc.dg/tree-ssa/salias-1.c scan-tree-dump-times salias "structure field 
tag SFT" 2

                === gcc Summary for unix//-m32 ===

# of expected passes            48839
# of unexpected failures        13
# of expected failures          167
# of untested testcases         35
# of unsupported tests          361


Some of the fails with SFTs disabled are actually because the testcases
scan for SFTs in the dumps, which are obviously not available.  Those
tests need to be disabled or adjusted to test optimization outcome
instead.

Thus, with the above results I propose we disable generating SFTs by
default on the mainline (--para max-fields-for-field-sensitive=100
is still available for comparision).  I will prepare a patch to adjust
the false negative testcases above to check for optimization outcome
as well.

Richard.

Reply via email to