I've been tinkering with the autovectorizer.  It's really cool.
I particularly like the realignment support.

I've noticed just a few things while tinkering with it (in 4.1.1):

0) The realignment code takes the floor of the unaligned pointer, and we
increment the unaligned pointer in the loop.  This is great for
architectures like Alpha that have floor addressing modes, because finding
the floor is free.  But for architectures like ARM, it's much better to
take the floor outside the loop and be able to postincrement by VECSIZE
inside the loop.


1) The definition of the realignment instruction doesn't match hardware for
instrution sets like ARM WMMX, where aligned amounts shift by 0 bytes
instead of VECSIZE byes.  This makes it useless for vector realignment,
because in the case that the pointer happens to be aligned, we get the
wrong vector.  Looks like the SPARC realignment hook does the same thing...
Indeed, it looks like Altivec is the only one to support it, and they do
some trickery with shifting the wrong (against endianness) way based on the
two's compliment of the source (a very clever trick).  No other machine
(evidentally) can easily meet the description of the current realignment
mechanism.

Of course, for safety reasons I guess we don't always get the next vector
(the one at address floor(ptr+VECSIZE)), which would allow us to use the
shift-style instructions.

So, there may be a few options:

* Have a flag or hook where we can say it is always OK to read the next
       element.  This is probably a bad option; everyone who used the
       vectorizer would have to know that they may need to pad their
       arrays if they are in a protected memory environment.

* Conditionally fetch the next bundle, and don't do the fetch of the
       next data the last time around if might not be safe.  Probably
       a bad idea for architectures without conditional execution.

* Currently we drop out of the loop when there are VEC_ELEMENTS - 1
       iterations or less.  We could drop out when there are VEC_ELEMENTS
       or less, and then we could always fetch the next aligned data.

* Some other clever trick I don't know about. :-)

* Or keep it the way it is, and leave out the machines that have the
       shift-by-zero instead of the shift-by-VECSIZE behavior for
       an aligned pointer.


2) It seems like there may be some hooks that aren't documented.  For
instance, there seems to be some kind of support for the "vcond"
standard name, but I can't seem to find it in the documentation.


In general things work quite well, and it seems to play reasonably well with
things like the modulo scheduler.

Cheers,

 Erich

--
Why are ``tolerant'' people so intolerant of intolerant people?

Reply via email to