I've created a branch for my allocator project which is called Yet
Another Register Allocator (or YARA - yet another recursive acronim).

 I am think I reached the point when my work on a public branch can
be made.  I am focused only on x86 right now.  So YARA will work only
for x86 and probably for x86_64 for some tests (at least for SPEC).
There is no way that it will work for PPC, Itanium or other targets
for now.

 I wanted to report the current state of YARA on coming gcc summit.
Unfortunately my proposal was rejected by the review committee.
Therefore I try to describe YARA briefly here to prevent questions for
me (sorry I can not spend much time on answering the questions).

1. Changes in the infrastructure in comparison with the report on
  previous gcc summit.

 o Copy activeness has been removed because it complicates the
   implementation a lot.

 o More coarse grain allocnos which is now close to store ranges (see
   for example, Kolte's article load/store range analysis for global
   register allocation).

 o New notion CAN (set of connected allocnos) is added.

2. Currently YARA has the following passes (in the same order as they work).

 o build IR (allocnos, copies, allocno-allocno conflicts,
             allocno-copy conflicts).
 o GVN (global value numbering) and removing conflicts for allocnos
   with the same value.
 o Building CANs and CAN conflicts
 o optimistic and extended CAN coalescing (optional)
 o Calculating register costs and classes for CAN (only classes from
   class cover set are used.  Cover set is a set of intersected
   classes covering all register classes.  The set is calculated such
   way that moving between the register classes included in a cover
   set class is cheaper than register-memory moves).
 o Briggs CAN coloring (optimistic and biased coloring) with
   optional live range splitting during coloring (it is analogous to
   splitting in Chow's article about priority based coloring or
   Bergner's article).
 o Optional Chow's priority based coloring is implemented too but
   it is practically debugged because the code is a bit worse than
   one generated by Briggs coloring.
 o Local allocation done on allocno level: satisfying insn constraint,
   register elimination, allocno spilling.  This pass does what reload
   does in the current gcc.
 o Coalescing on allocno level.
 o Synchronization optimization (analogous to one described in ...):
   redundant move removing and changing ld/st by register moves.
 o RTL rewriting

 YARA still uses old allocator passes (regclass and post-reload) for
some reasons but I hope finally it will not needed.

3. x86 YARA on SPEC

SPEC was tested on 3.2Ghz Pentium4 on gcc mainline as of Dec. 2005
with option -O2 -mtune=pentium4 (for base) and -O2 -mtune=pentium4
-fyara (for peak).

The results are the following: no improvement on SPECINT2000, 4-5%
improvement for SPECFP2000, code size is practically the same,
compiler 8% slower on SPECINT (in other words YARA is 2 times slower
than the current gcc allocator, SPECFP has some anomaly tests where
compiler 40% slower).  You can find more info about SPEC at the email
end.

I have 3% improvement of SPECFP for x86_64 (on Intel Nocona, sorry I
have no free AMD machine and I think the results for AMD will be
worse) but I had no time to work on its improvement.

If you are interesting where YARA spend more time:

It is local allocator (about 40%) and building IR (15%).  So they might
be rewritten (again).

YARA is a long project so please don't expect that it will be in main
line soon.  Moreover even if YARA is successful don't expect reload
will be gone after that (all target ports are bound to reload very
tight that is what can I say after long work on the allocator).

What I am going to do in short perspective is
 o work on code quality of some SPECINT tests (e.g. reload is doing
   better job for crafty with many multi-registers than YARA)
 o removing regressions for x86_64
 o make it work for PPC and Itanium
 o tuning for x86_64 (now YARA is very bad when a lot of hard registers
   used in RTL that what differs x86_64 code from x86 one) and than
   other platforms
 o work on its speeding up

In long perspective I am going to add more passes:
 o rematerialization (at least for const and bp+const that is what
   reload does now).
 o register pressure relief (splitting during coloring does not see
   CFG well).
 o Uncoalescing during splitting (for better splitting decision
   during coloring).
 o Chameleon intervals: instead of

   M:=R0                   Swap R0, R1
   R0:=R1     generate     Use R0
   use R0                  Swap R0, R1
   R0:=M

 o solving debug information problem (too many location where the
   variable can be placed)


                                 Estimated                     Estimated
               Base      Base      Base      Peak      Peak      Peak
Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
------------  --------  --------  --------  --------  --------  --------
164.gzip          1400     149         942*     1400     144         971*
175.vpr           1400     236         594*     1400     236         592*
176.gcc           1100      96.4      1141*     1100      96.7      1138*
181.mcf           1800     262         688*     1800     261         689*
186.crafty        1000     100         998*     1000     107         932*
197.parser        1800     232         776*     1800     228         791*
252.eon           1300     172         755*     1300     173         750*
253.perlbmk       1800     153        1174*     1800     135        1338*
254.gap           1100     103        1065*     1100     106        1038*
255.vortex        1900     175        1087*     1900     175        1085*
256.bzip2         1500     196         764*     1500     199         756*
300.twolf         3000     376         798*     3000     390         770*
Est. SPECint_base2000                  879
Est. SPECint2000                                                     881


                                 Estimated                     Estimated
               Base      Base      Base      Peak      Peak      Peak
Benchmarks    Ref Time  Run Time   Ratio    Ref Time  Run Time   Ratio
------------  --------  --------  --------  --------  --------  --------
168.wupwise       1600       174       919*     1600       156      1027*
171.swim          3100       471       659*     3100       458       678*
172.mgrid         1800       273       660*     1800       283       637*
173.applu         2100       235       894*     2100       227       925*
177.mesa          1400       181       775*     1400       185       757*
178.galgel        2900       550       527*     2900       548       529*
179.art           2600       689       377*     2600       695       374*
183.equake        1300       108      1204*     1300       107      1216*
187.facerec       1900       370       514*     1900       254       748*
188.ammp          2200       433       508*     2200       426       516*
189.lucas         2000       247       808*     2000       216       924*
191.fma3d         2100       331       635*     2100       310       677*
200.sixtrack      1100       197       557*     1100       208       529*
301.apsi          2600       399       652*     2600       398       654*
Est. SPECfp_base2000                   664
Est. SPECfp2000                                                      696


Code size (text segment)

----------------CINT2000-----------------
1.047%          33613          33965 164.gzip
-0.785%         130485         129461 175.vpr
-1.212%        1248740        1233604 176.gcc
2.131%           9760           9968 181.mcf
3.360%         196677         203285 186.crafty
-0.129%          86944          86832 197.parser
3.250%         419854         433499 252.eon
-0.353%         466461         464813 253.perlbmk
-0.155%         424017         423361 254.gap
0.267%         540252         541692 255.vortex
-0.149%          28857          28814 256.bzip2
-2.669%         187603         182595 300.twolf
Average = 0.306833%
----------------CFP2000-----------------
-2.888%          25483          24747 168.wupwise
0.106%           7512           7520 171.swim
-1.768%          12671          12447 172.mgrid
3.579%          46718          48390 173.applu
0.456%         452877         454941 177.mesa
2.806%         236603         243243 178.galgel
0.936%          13246          13370 179.art
0.464%          17246          17326 183.equake
-3.257%          60917          58933 187.facerec
2.970%         106134         109286 188.ammp
-11.536%          45909          40613 189.lucas
3.402%         954621         987093 191.fma3d
-2.581%         831071         809623 200.sixtrack
-0.088%         109703         109607 301.apsi
Average = -0.435182%

Reply via email to