On 19/09/18 14:45, Richard Biener wrote:
So I guess the current_vector_size thing isn't too hard to get rid of, what
you'd end up with would be using that size when you decide for vector
types for loads (where there are no USEs with vector types, so for example
this would not apply to gathers).
I've finally got back to looking at this ...
My patch works because current_vector_size is only referenced in two
places. One is passed to get_vectype_for_scalar_type_and_size, and that
function simply calls targetm.vectorize.preferred_simd_mode when the
requested size is zero. The other is passed to build_truth_vector_type,
which only uses it to call targetm.vectorize.get_mask_mode, and the GCN
backend ignores the size parameter because it only has one option.
Presumably other backends would object to a zero size mask.
So, as I said originally, the effect is that leaving current_vector_size
zeroed means "always ask the backend".
Pretty much everything else chains off of those places using
get_same_sized_vectype, so ignoring current_vector_size is safe on GCN,
and might even be safe on other architectures?
So I'd say you want to refactor get_same_sized_vectype uses and
make the size argument to get_vectype_for_scalar_type_and_size
a hint only.
I've looked through the uses of get_same_sized_vectype and I've come to
the conclusion that many of them really mean it.
For example, vectorizable_bswap tries to reinterpret a vector register
as a byte vector so that it can permute it. This is an optimization that
won't work on GCN (because the vector registers don't work like that),
but seems like a valid use of the vector size characteristic of other
architectures.
For another example, vectorizable_conversion is targeting the
vec_pack_trunc patterns, and therefore really does want to specify the
types. Again, this isn't something we want to do on GCN (a regular trunc
pattern with a vector mode will work fine).
However, vectorizable_operation seems to use it to try to match the
input and output types to the same vector unit (i.e. vector size); at
least that's my interpretation. It returns "not vectorizable" if the
input and output vectors have different numbers of elements. For most
operators the lhs and rhs types will be the same, so we're all good, but
I imagine that this code will prevent TRUNC being vectorized on GCN
because the "same size" vector doesn't exist, and it doesn't check if
there's a vector with the same number of elements (I've not actually
tried that, yet, and there may be extra magic elsewhere for that case,
but YSWIM).
I don't think changing this case to a new "get_same_length_vectype"
would be appropriate for many architectures, so I'm not sure what to do
here?
We could fix this with new target hooks, perhaps?
TARGET_VECTORIZE_REINTERPRET_VECTOR (vectype_in, scalartype_out)
Returns a new vectype (or mode) that uses the same vector register as
vectype_in, but has elements of scalartype_out.
The default implementation would be get_same_sized_vectype.
GCN would just return NULL, because you can't do that kind of
optimization.
TARGET_VECTORIZE_COMPATIBLE_VECTOR (opcode, vectype_in, scalartype_out)
Returns a new vectype (or mode) that has the right number of elements
for the opcode (i.e. the same number, or 2x for packed opcodes), and
elements of scalartype_out. The backend might choose a different
vector size, but promises that hardware can do the operation (i.e.
it's not mixing vector units).
The default implementation would be get_same_sized_vectype, for
backward compatibility.
GCN would simply return V64xx according to scalartype_out, and NULL
for unsupported opcodes.
Of course, none of this addresses the question of which vector size to
choose in the first place. I've not figured out how it might ever start
with a type other than the "preferred SIMD mode", yet.
Thoughts?
Andrew