Erich Plondke wrote:
As it turns out this is very handy when I'm writing code by hand, but I
haven't
figured out a good way to teach GCC about it. So my question is: what
strategy should I use to teach GCC about this?
There are several issues you are asking about here.
One of them is general representation of vector operations.
Unfortunately, in GCC, the vector RTL representation hasn't been very
well specified yet. There are overlapping meanings between operators,
and some ambiguous operators. Also, there are very few rtl
optimizations on vectors. To a first degree approximation, the rtl
optimizers will only see what the middle end generated. So you don't
need to worry about all of the possible representation forms. Just pick
one and use it. Generate it in an exander, and match it in a pattern,
and don't worry about the other forms.
Ideally, we would have a canonical form defined for each vector
operation. The combiner only accepts canonical form, and emits
canonical form, so there is no need to define RTL that matches every
possible representation of every arithmetic operation. You only need to
match the canonical forms. Unfortunately, we don't have canonical forms
for vector operations yet. If we did, then you would only have to worry
about matching the canonical form. But since we also lack much RTL
vector optimization, this is mostly a moot issue at the moment.
Also, we lack C syntax for extracting elements from a vector. It has
been suggested that array syntax could be overloaded for this, but at
the moment there is no useful syntax unless you define your own built in
functions. Since you have to have your own builtins anyways, you just
emit RTL of your choice for extracting an element, and no other RTL will
be generated than the one you choose. I'd use vec_select myself.
One of the other issues you mentioned is register allocation related.
Getting pseudo-regs in the right place to avoid movement. Ideally, this
should be handled invisibly by the register allocator. There isn't
really much you can do to help this. There is the macro MODES_TIEABLE_P
which should be defined correctly, but this doesn't guarantee optimal
placement.
--
Jim Wilson, GNU Tools Support, http://www.specifix.com