I'm doing some research on a pretty plain 32-bit RISC architecture that has
some extra facilities for doing vector operations.  Not exactly new, I know.

The difference with this one is that the vectors are pairs of normal
registers.

This isn't all that new; lots of architectures have normal register pair
loads and stores, lots of machines use pairs of registers to hold DI
values, and lots of architectures have the ability to do V2SI or V4HI
or V8QI on a 64-bit value... but usually they are special vector registers.

The main difference here is that we can access either half of
a V2SI result from any kind of vector operation (add, sub, shift, etc)
with any SI instruction without any additional copies or moves.

Similarly, several instructions can pick out either HI from any register,
which means that they can pick out any arbitrary element from a V2HI or
V4HI for free.

In the same way that we can use either 32-bit register of the pair as a source
for a normal SI instruction, if you allocate your registers such that
one operation lands in an even register and another operation lands in the odd
register next to it, you can use both of them together as a V2SI without any
additional modification... free vector formation.

As it turns out this is very handy when I'm writing code by hand, but I haven't
figured out a good way to teach GCC about it.  So my question is: what
strategy should I use to teach GCC about this?

Basically, (I think) all these are equivelant and free:

low word:
   (subreg:SI (match_operand:DI "register_pair_operand" "P") 0)
   (vec_select:SI (match_operand:V2SI "register_pair_operand" "P")
       (parallel [(const_int 0)]))
   (truncate:SI (match_operand:DI "register_pair_operand" "P"))

high word:
   (subreg:SI (match_operand:DI "register_pair_operand" "P") 1)
   (vec_select:SI (match_operand:V2SI "register_pair_operand" "P")
       (parallel [(const_int 1)]))
   (truncate:SI (lshiftrt:DI (match_operand:DI "register_pair_operand" "P")
        (const_int 32)))

And maybe others, of course.  Similarly any two instructions with a destination
as as register_operand can get a free

   (vec_concat:V2SI (match_operand:SI "even_register_operand" "r")
       (match_operand:SI "odd_register_operand" "r"))

if the registers are placed in an even/odd pair.

So my theories so far are:

A) Generate large numbers of define_insn's that cross all the instructions
       with all the ways of generating inputs and outputs, and let
       the combiner try and figure it out.  Either by using scripts to
       generate things or trying to use mode macros...

B) Trying to make peephole patterns for these things.  The problem here
       is that all peephole stuff happens after register allocation,
       and then the likelyhood that we could do the even/odd pairing
       is much lower... so maybe add a "peephole3" pass that occurs before
       register allocation and attempts to peephole out all these
       formations of and extractions from vector types.

C) Generating the instructions to form or extract from vectors,
       telling GCC that they don't cost anything, and then trying
       to eliminate them later and rewrite registers... but that
       sounds scary and likely to cause problems deep in the bowels of
       reload.

I have the feeling that someone has already created a clever solution
to problems like this and it's probably best to ask...

It seems like several ports could benefit from this sort of thing.
For instance, it looks like the Sparc port has several peephole2's that
combine two SI loads that are adjacent in both memory and the regfile
and combines them into a ldd... but it looks like it will only work
if the register allocator randomly happens to pick adjacent registers
for the two loads.  I've looked in other ports but I still haven't
found what I'm looking for...

If anyone has any ideas, I'm all ears.

Thanks in advance,

   Erich

--
Why are ``tolerant'' people so intolerant of intolerant people?

Reply via email to