On 10/31/14 16:01, David Kang wrote:
Hi,
I'm newbie in gcc porting.
The architecture that I'm porting gcc has hardware FPU.
But the compiler has to generate code which builds a FPU instruction in a
integer register
at run-time and writes the value to the FPU command register.
To make a single FPU instruction, three instructions are needed.
Two instructions make the FPU instruction in 32 bit (cmd, operands[2],
operands[1], operands[0]) format.
Here operands are the FPU register numbers, which can be 0 ~ 32.
As an example, f3 = f1 + 2 can be encoded as (code of 'add', 2, 1, 3).
And the third instruction write it to a FPU command register.
The architecture can issue up to 3 instructions at a time.
The difficulty lies in that we need to know the FPU register number
for those operands to generate the FPU instruction.
The easiest but lowest performance implementation is to generate those three
instruction
from a single "define_insn" as three consecutive instructions.
However, we lose all possible bundling of those 3 instructions with other
instructions for optimization.
So, I'm trying to find a better way.
I used "define_insn_and_split" and split a single FPU instruction into 3
instructions like this:
(Here I assume to use register r10, but it can be any integer register.)
operands[0] = plus (operands[1], operands[2])
==>
(1) r10 <- lower half of FPU instruction using
(code of 'add', operands[0], operands[1], operands[2])
(2) r10 <- r10 | upper half of FPU instruction using (code of 'add',
operands[0], operands[1], operands[2])
(3) (FPU cmd register) <- r10
The problem is that gcc catches that operands[0] is used before the 3rd
instruction,
and allocates two different hard registers for (1,2) instructions and (3)
instruction.
So, when the code is generated, the first two instructions are assuming wrong
register
for operands[0].
This happens especially frequently when '-unroll' option is used.
So, I think if there is a way to inform gcc to use the same hard registers for
operands[0] across those three instructions.
Is it possible?
Or would there be any better way to generate efficient FPU code?
I will appreciate any advice or pointer to further information.
Use a define_insn_and_split, but only split it after register allocation
& reloading.
Jeff