On 22 Mar 2005, Ian Lance Taylor wrote: > Miles Bader <[EMAIL PROTECTED]> writes: > > > I've defined SECONDARY_*_RELOAD_CLASS (and PREFERRED_* to try to help > > things along), and am now running into more understandable reload > > problems: "unable to find a register to spill in class" :-/ > > > > The problem, as I understand, is that reload doesn't deal with conflicts > > between secondary and primary reloads -- which are common with my arch > > because it's an accumulator architecture. > > > > For instance, slightly modifying my previous example: > > > > Say I've got a mov instruction that only works via an accumulator A, > > and a two-operand add instruction. "r" regclass includes regs A,X,Y, > > and "a" regclass only includes reg A. > > > > mov has constraints like: 0 = "g,a" 1 = "a,gi" > > and add3 has constraints: 0 = "a" 1 = "0" 2 = "ri" (say) > > > > So if before reload you've got an instruction like: > > > > add temp, [sp + 4], [sp + 6] > > > > and v2 and v3 are in memory, it will have to have generate something like: > > > > mov A, [sp + 4] ; primary reload 1 in X, with secondary reload 0 A > > mov X, A ; "" > > mov A, [sp + 6] ; primary reload 2 in A, with no secondary reload > > add A, X > > mov temp, A > > > > There's really only _one_ register that can be used for many reloads, A. > > I don't think there is any way that reload can cope with this > directly. reload would have to get a lot smarter about ordering the > reloads. > > Since you need the accumulator for so much, one approach you should > consider is not exposing the accumulator register until after reload. > You could do this by writing pretty much every insn as a > define_insn_and_split, with reload_completed as the split condition. > Then you split into code that uses the accumulator. Your add > instruction permits you to add any two general registers, and you > split into moving one into the accumulator, doing the add, and moving > the result whereever it should go. If you then split all the insns > before the postreload pass, perhaps the generated code won't even be > too horrible. > > Ian >
This approach by itself has obvious problems. It will generate a lot of redundant moves to/from the accumulator because the accumulator is exposed much too late. Consider the 3AC code: add i,j,k add k,l,m it will be broken down into: mov i,a add j,a mov a,k mov k,a add l,a mov a,m where the third and fourth instructions are basically redundant. I did a lot of processor architecture research about three years ago, and I came to some interesting conclusions about accumulator architectures. Basically, with naive code generation, you will generate 3x as many instructions for an accumulator machine than for a 3AC machine. If you have a SWAP instruction so you can swap the accumulator with the index registers, then you can lower the instruction count penalty to about 2x that of a 3AC machine. If you think about this for a while, the reason will become readily apparent. In order to reach this 2x figure, it requires a good understanding of how the data flows through the accumulator in an accumulator arch. Toshi