16 Mar 06 notes from GCC improvement for Itanium conference call

Mark K. Smith Wed, 22 Mar 2006 13:29:18 -0800

ON THE CALL: Shin-ming Liu (HP), Vladimir Makarov (Red Hat), Mark
Smith (Gelato), Bob Kidd (UIUC), Andrey Belevantsev (RAS), Arutyun
Avetisyan (RAS), Mark Davis (Intel)


Diego Novillo (Red Hat) was unable to join the call, but supplied an
update to include in these notes.

The GCC track at the upcoming Gelato ICE conference now finalized.
Gerolf Hoflehner's talk on SPEC2006 had to be canceled because of a
delay in its release. A new addition to the GCC track is Arutyun
Avetisyan who will give an RAS work overview and start soliciting
input for the August 2006 GCC meeting in Moscow. Confirmed
topics/speakers for the Gelato ICE GCC track include:

* Russian Academy of Science work overview and plans for August GCC
meeting in Moscow - Arutyun Avetisyan
* GCC IP issues - Dan Berlin
* LLVM - Chris Lattner 
* LTO - Mark Mitchell 
* ORC back end for GCC - Shin-Ming Liu 
* Aliasing update - Dan Berlin
* Russian Academy of Science scheduler improvement update - Andrey
Belevantsev 
* Superblock work - Bob Kidd 
* Parallel programming with GCC - Diego Novillo
* Intel micro-architecture talk - Cameron McNairy

For a detailed list of confirmed speakers and topics for Gelato ICE
2006, visit: www.gelato.org/meeting#agenda

Updates from call participants can be found below.

NEXT MEETING: At the Gelato ICE meeting in San Jose, CA, April 24-26,
2006.

Andrey Belevantsev:
-------------------
Testing the aliasing patch with the latest mainline has revealed the
changes in structure aliasing, so we had to rewrite some code that
handles variables with structure field tags (SFTs). Now small arrays
could also be decomposed onto elements for the sake of better
aliasing. The other thing we fixed is more accurate propagation of
original tree expressions saved with MEMs during expand. We have sent
an updated patch to the gcc-patches list.

Vladimir Makarov has approved the speculation patch and provided
commetns on the ia64 part of the patch. We have fixed all issues
pointed to by Vladimir. After additional regtesting on ia64 and i686,
the patch was committed to trunk as rev. 112129. Earlier version of
the patch was also bootstrapped and regtested on sparc-solaris. Using
the patch on other platforms revealed some bugs (PR26275 and PR26734).
The fixes for those PRs are submitted to the list.

We have tested the basic features of code motion during this month. To
accomplish this task, the main scheduling loop was written. A single
iteration of the scheduling loop tries to form a group of
instructions, which could be executed in parallel during one cycle
(more or less corresponds to the instruction group of IA-64). At
first, code motion of entire instructions inside a basic block was
tested. Now we are testing interblock motions, which imply possible
creation of bookkeeping code. Code motion of conditional branches is
now disabled. Our next plans would be enable the code motion of
right-hand sides of expressions.

The last but not least, our paper proposal for GCC Summit 2006 has
been accepted. The paper will talk about new scheduler work, proposed
design and current state of implementation.


Bob Kidd:
---------
(Bob had his paper proposal for the GCC Summit 2006 accepted. The
paper will cover the GCC superblock work in detail.)

I checked the Superblock patch into the ia64-improvements tree. This
patch has no significant effect on the overall estimated SPEC score
for ia64 or ppc, and a slight degradation on x86_64. On IA64, some
benchmarks run faster while others slow down. The overall score varies
by one point. I'm looking into the changed benchmarks to see what
causes the speedup or slowdown.

I investigated 300.twolf, which slows down when superblocks are formed
at the Tree-SSA level. One function (new_dbox_a) is significantly
slower with the superblock patch than without. This function takes a
pointer to an integer as an argument and updates the value of that
integer inside a hot loop. The loop is structured along these lines:

for (hot)
   if (cond) (biased)
     a = ...
   else
     a = ...
   *arg += a ...

Tail duplication generates two copies of the *arg += ... line, which
generates two copies of the load and store of arg. When tail
duplication is not done, PRE can move the load and store of arg out of
the loop, but it is unable to do this in the superblock loop. My
suspicion is that superblock formation needs to fix up the alias info
so that later optimizers realize these two loads are the same.


Shin-ming Liu
-------------
- HP has posted the GCC 4.1 release binary in HP portal for HP-UX:
www.hp.com/go/gcc

- HP submitted 11 patches to stock gcc and 3 patches to binutil

- The Alternative backend project has made reasonable progress. The
front end for this compiler still at 3.3.2. Both C and Fortran are
functional and achieved the similar performance as ORC 2.1. The
current focus is to update the backend to support Itanium C++ ABI.


Vladimir Makarov:
-----------------
Probably Robert Kidd's superblock scheduling in gcc for x86, x86_64
will not give improvement, because interblock scheduling (before
reload) is switched off for this architectures mostly because the
reload can not deal with RTL insns containing hard registers which
were moved by the scheduler before the reload. So the code will be
bigger with less code locality and consequently will be slower. If the
superblock scheduling gives the improvement, it will be most probably
for Itanium which is the least sensitive architecture for the code
locality which I saw.

With my point of view, the major problem of gcc scheduler (for in
order execution processors like Itanium) is that it is done in the
middle of the back-back end and there are insn splitting and lot of
optimization after that. So I think that the current RAS work on the
scheduler is more promising.

Gcc Itanium port has no description of vector insns although there is
vectorizer now in gcc. I proposed to do describe them. Mark Davis told
that according to Intel experience it will not improve code. After
some thoughts I guessed that it is because there are a lot of nops in
ia64 bundles to fill them out by non-vector insns which will be
executed for the same time as vector insn with a lot of nops in the
same bundle.


Diego Novillo:
--------------
- OpenMP has been completely integrated in GCC mainline.

- I will be presenting a design/implementation document on OpenMP at
the next GCC summit.

- We have been working with Dmitry on the tree->rtl alias export
patch. I think we could add it to the ia64-improvements branch
shortly, but I have been a bit sidetracked and haven't been able to
check out their latest version.

- Bob said he was considering doing a mainline->branch merge on the
ia64-improvements branch. I'm not sure whether he's finished that.

- I will continue to work on representation changes for our SSA form
on the mem-ssa branch
(http://gcc.gnu.org/ml/gcc/2006-02/msg00620.html).

16 Mar 06 notes from GCC improvement for Itanium conference call

Reply via email to