Hello,
I looked at an inefficient code sequence for a simple program using
GCC's picochip port (not yet submitted to mainline). Basically, a
program like
long carray[10];
void fn (long c, int i)
{
carray[i] = c;
}
produces good assembly code. But, if i were to do
struct complex16
{
int re,im;
};
struct complex16 carray[10];
void fn (struct complex16 c, int i)
{
carray[i] = c;
}
GCC generates poor code. It has an extra save and restore of the
frame-pointer, even though we dont use the frame.
I digged a bit further, and found that the get_frame_size() call returns
4 in this case and hence the port's prologue generation code generates
the frame-pointer updation.
It seems to me that each element of the stack is copied to the stack
from the parameter registers and then that value is being used in the
function. I have the following RTL code as we get into RTL.
(insn 6 2 7 2 (set (reg:HI 26)
(reg:HI 0 R0 [ c ])) -1 (nil)
(nil))
(insn 7 6 10 2 (set (reg:HI 27)
(reg:HI 1 R1 [ c+2 ])) -1 (nil)
(nil))
(insn 10 7 8 2 (set (reg/v:HI 28 [ i ])
(reg:HI 2 R2 [ i ])) -1 (nil)
(nil))
(insn 8 10 9 2 (set (mem/s/c:HI (reg/f:HI 21 virtual-stack-vars) [3 c+0
S2 A16])
(reg:HI 26)) -1 (nil)
(nil))
(insn 9 8 11 2 (set (mem/s/c:HI (plus:HI (reg/f:HI 21 virtual-stack-vars)
(const_int 2 [0x2])) [3 c+2 S2 A16])
(reg:HI 27)) -1 (nil)
(nil))
Note that the parameter is being written to the frame in the last 2
instructions above. This, i am guessing is the reason for the
get_frame_size() returning 4 later on, though the actual save of the
struct parameter value on the stack is being eliminated at later
optimization phases (CSE and DCE, i believe).
Why does the compiler do this? I vaguely remember x86 storing all
parameter values on stack. Is that the reason for this behaviour? Is
there anything i can do in the port to get around this problem?
Note : In our port "int" is 16-bits and long is 32-bits.
Thanks in advance,
Regards
Hari