I'm doing some research on a pretty plain 32-bit RISC architecture that has some extra facilities for doing vector operations. Not exactly new, I know.
The difference with this one is that the vectors are pairs of normal registers. This isn't all that new; lots of architectures have normal register pair loads and stores, lots of machines use pairs of registers to hold DI values, and lots of architectures have the ability to do V2SI or V4HI or V8QI on a 64-bit value... but usually they are special vector registers. The main difference here is that we can access either half of a V2SI result from any kind of vector operation (add, sub, shift, etc) with any SI instruction without any additional copies or moves. Similarly, several instructions can pick out either HI from any register, which means that they can pick out any arbitrary element from a V2HI or V4HI for free. In the same way that we can use either 32-bit register of the pair as a source for a normal SI instruction, if you allocate your registers such that one operation lands in an even register and another operation lands in the odd register next to it, you can use both of them together as a V2SI without any additional modification... free vector formation. As it turns out this is very handy when I'm writing code by hand, but I haven't figured out a good way to teach GCC about it. So my question is: what strategy should I use to teach GCC about this? Basically, (I think) all these are equivelant and free: low word: (subreg:SI (match_operand:DI "register_pair_operand" "P") 0) (vec_select:SI (match_operand:V2SI "register_pair_operand" "P") (parallel [(const_int 0)])) (truncate:SI (match_operand:DI "register_pair_operand" "P")) high word: (subreg:SI (match_operand:DI "register_pair_operand" "P") 1) (vec_select:SI (match_operand:V2SI "register_pair_operand" "P") (parallel [(const_int 1)])) (truncate:SI (lshiftrt:DI (match_operand:DI "register_pair_operand" "P") (const_int 32))) And maybe others, of course. Similarly any two instructions with a destination as as register_operand can get a free (vec_concat:V2SI (match_operand:SI "even_register_operand" "r") (match_operand:SI "odd_register_operand" "r")) if the registers are placed in an even/odd pair. So my theories so far are: A) Generate large numbers of define_insn's that cross all the instructions with all the ways of generating inputs and outputs, and let the combiner try and figure it out. Either by using scripts to generate things or trying to use mode macros... B) Trying to make peephole patterns for these things. The problem here is that all peephole stuff happens after register allocation, and then the likelyhood that we could do the even/odd pairing is much lower... so maybe add a "peephole3" pass that occurs before register allocation and attempts to peephole out all these formations of and extractions from vector types. C) Generating the instructions to form or extract from vectors, telling GCC that they don't cost anything, and then trying to eliminate them later and rewrite registers... but that sounds scary and likely to cause problems deep in the bowels of reload. I have the feeling that someone has already created a clever solution to problems like this and it's probably best to ask... It seems like several ports could benefit from this sort of thing. For instance, it looks like the Sparc port has several peephole2's that combine two SI loads that are adjacent in both memory and the regfile and combines them into a ldd... but it looks like it will only work if the register allocator randomly happens to pick adjacent registers for the two loads. I've looked in other ports but I still haven't found what I'm looking for... If anyone has any ideas, I'm all ears. Thanks in advance, Erich -- Why are ``tolerant'' people so intolerant of intolerant people?