Hello,
I am porting GCC4.2.1 to our 2-issue VLIW processor and encounter the
following problem.

Source code:
#define MIN(a, b)  (a<b? a: b)
void applySnrClampCo(short *snr, int toneIx, int logGroupSize,short
*maxSnrPtr)
{
    snr[toneIx]=snr[toneIx] + 5;
}

After tree level optimization:

applySnrClampCo (snr, toneIx, logGroupSize, maxSnrPtr)
{
  short int * D.1510;

<bb 2>:
  D.1510 = snr + (short int *) ((unsigned int) toneIx * 2);
  *D.1510 = (short int) ((short unsigned int) *D.1510 + 5);
  return;

}


Note that D.1510 is extracted as a common expression:

The final assembly code looks like this:

        addw r1, r1, r1
        addw r0, r0, r1
        ldh r1, [r0]
        addh r1, r1, #0x5
        sbl [link]      :       sth r1, [r0]


Actually we should have generated the following code:

        ldh r1, [r0, r1 << 1]
        addh r1, r1, #0x5
        sbl [link]      :       sth r1, [r0, r1 << 1]

Not only save two cycles but two instructions.


The trouble is with the subexpression elimination in the tree
optimization. The
simple address offset is supported by the instruction set with no extra
cost.
In RTL level, it is difficult to reverse the optimization. In our 3.4.6
-based porting, the GCC actually generates the latter code. How to
avoid CSE under such situation? Any suggestion is greatly appreciated.


Cheers,
Bingfeng Mei
Broadcom UK

Reply via email to