Hello, I am porting GCC4.2.1 to our 2-issue VLIW processor and encounter the following problem.
Source code: #define MIN(a, b) (a<b? a: b) void applySnrClampCo(short *snr, int toneIx, int logGroupSize,short *maxSnrPtr) { snr[toneIx]=snr[toneIx] + 5; } After tree level optimization: applySnrClampCo (snr, toneIx, logGroupSize, maxSnrPtr) { short int * D.1510; <bb 2>: D.1510 = snr + (short int *) ((unsigned int) toneIx * 2); *D.1510 = (short int) ((short unsigned int) *D.1510 + 5); return; } Note that D.1510 is extracted as a common expression: The final assembly code looks like this: addw r1, r1, r1 addw r0, r0, r1 ldh r1, [r0] addh r1, r1, #0x5 sbl [link] : sth r1, [r0] Actually we should have generated the following code: ldh r1, [r0, r1 << 1] addh r1, r1, #0x5 sbl [link] : sth r1, [r0, r1 << 1] Not only save two cycles but two instructions. The trouble is with the subexpression elimination in the tree optimization. The simple address offset is supported by the instruction set with no extra cost. In RTL level, it is difficult to reverse the optimization. In our 3.4.6 -based porting, the GCC actually generates the latter code. How to avoid CSE under such situation? Any suggestion is greatly appreciated. Cheers, Bingfeng Mei Broadcom UK