registers and addressing. When I did this port originally, I mostlyh hid the accumulator
from the register allocator. But I did implement extended precision arithmetic as a pattern that
optimized use of the accumulator.
We got pretty good code generated. There's a pretty complete TCP/IP stack implemented for this chip (it's basically an internet toaster controller) and this is a machine with (IIRC) only 16k words of instruction and 4K BYTES of
data (and that's banked memory at that!).
On Mar 25, 2005, at 08:13, <[EMAIL PROTECTED]> wrote:
On 22 Mar 2005, Ian Lance Taylor wrote:
Miles Bader <[EMAIL PROTECTED]> writes:
I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help things along), and am now running into more understandable reload problems: "unable to find a register to spill in class" :-/
The problem, as I understand, is that reload doesn't deal with conflicts
between secondary and primary reloads -- which are common with my arch
because it's an accumulator architecture.
For instance, slightly modifying my previous example:
Say I've got a mov instruction that only works via an accumulator A,
and a two-operand add instruction. "r" regclass includes regs A,X,Y,
and "a" regclass only includes reg A.
mov has constraints like: 0 = "g,a" 1 = "a,gi"
and add3 has constraints: 0 = "a" 1 = "0" 2 = "ri" (say)
So if before reload you've got an instruction like:
add temp, [sp + 4], [sp + 6]
and v2 and v3 are in memory, it will have to have generate something like:
mov A, [sp + 4] ; primary reload 1 in X, with secondary reload 0 A
mov X, A ; ""
mov A, [sp + 6] ; primary reload 2 in A, with no secondary reload
add A, X
mov temp, A
There's really only _one_ register that can be used for many reloads, A.
I don't think there is any way that reload can cope with this directly. reload would have to get a lot smarter about ordering the reloads.
Since you need the accumulator for so much, one approach you should consider is not exposing the accumulator register until after reload. You could do this by writing pretty much every insn as a define_insn_and_split, with reload_completed as the split condition. Then you split into code that uses the accumulator. Your add instruction permits you to add any two general registers, and you split into moving one into the accumulator, doing the add, and moving the result whereever it should go. If you then split all the insns before the postreload pass, perhaps the generated code won't even be too horrible.
Ian
This approach by itself has obvious problems.
It will generate a lot of redundant moves to/from the accumulator because
the accumulator is exposed much too late.
Consider the 3AC code:
add i,j,k add k,l,m
it will be broken down into:
mov i,a add j,a mov a,k mov k,a add l,a mov a,m
where the third and fourth instructions are basically redundant.
I did a lot of processor architecture research about three years ago, and
I came to some interesting conclusions about accumulator architectures.
Basically, with naive code generation, you will generate 3x as many instructions for an accumulator machine than for a 3AC machine.
If you have a SWAP instruction so you can swap the accumulator with the
index registers, then you can lower the instruction count penalty to about
2x that of a 3AC machine. If you think about this for a while, the reason
will become readily apparent.
In order to reach this 2x figure, it requires a good understanding of how
the data flows through the accumulator in an accumulator arch.
Toshi
Alan Lehotsky - [EMAIL PROTECTED] Carbon Design Systems, Inc