On Fri, Sep 28, 2018 at 2:47 PM Andrew Stubbs <a...@codesourcery.com> wrote:
>
> On 19/09/18 14:45, Richard Biener wrote:
> > So I guess the current_vector_size thing isn't too hard to get rid of, what
> > you'd end up with would be using that size when you decide for vector
> > types for loads (where there are no USEs with vector types, so for example
> > this would not apply to gathers).
>
> I've finally got back to looking at this ...
>
> My patch works because current_vector_size is only referenced in two
> places. One is passed to get_vectype_for_scalar_type_and_size, and that
> function simply calls targetm.vectorize.preferred_simd_mode when the
> requested size is zero. The other is passed to build_truth_vector_type,
> which only uses it to call targetm.vectorize.get_mask_mode, and the GCN
> backend ignores the size parameter because it only has one option.
> Presumably other backends would object to a zero size mask.
>
> So, as I said originally, the effect is that leaving current_vector_size
> zeroed means "always ask the backend".

Yes.

> Pretty much everything else chains off of those places using
> get_same_sized_vectype, so ignoring current_vector_size is safe on GCN,
> and might even be safe on other architectures?

Other architectures really only use it when there's a choice, like
choosing between V4SI, V8SI and V16SI on x86_64.  current_vector_size
was introduced to be able to "iterate" over supported ISAs and let the
vectorizer decide which one to use in the end (SSE vs. AVX vs. AVX512).

The value of zero is simply to give the target another chance to set
its prefered
value based on the first call.  I'd call that a bit awkward (*)

For architectures that only have a single "vector size" this variable
is really spurious and whether it is zero or non-zero doesn't make a difference.
Apart from your architecture of course where non-zero doesn't work ;)

(*) so one possibility would be to forgo with the special-value of zero
("auto-detect") and thus not change current_vector_size in
get_vectype_for_scalar_type at all.  For targets which report multiple
vector size support set current_vector_size to the prefered one in the
loop over vector sizes and for targets that do not simply keep it at zero.

> > So I'd say you want to refactor get_same_sized_vectype uses and
> > make the size argument to get_vectype_for_scalar_type_and_size
> > a hint only.
>
> I've looked through the uses of get_same_sized_vectype and I've come to
> the conclusion that many of them really mean it.
>
> For example, vectorizable_bswap tries to reinterpret a vector register
> as a byte vector so that it can permute it. This is an optimization that
> won't work on GCN (because the vector registers don't work like that),
> but seems like a valid use of the vector size characteristic of other
> architectures.

True.

> For another example, vectorizable_conversion is targeting the
> vec_pack_trunc patterns, and therefore really does want to specify the
> types. Again, this isn't something we want to do on GCN (a regular trunc
> pattern with a vector mode will work fine).
>
> However, vectorizable_operation seems to use it to try to match the
> input and output types to the same vector unit (i.e. vector size); at
> least that's my interpretation. It returns "not vectorizable" if the
> input and output vectors have different numbers of elements. For most
> operators the lhs and rhs types will be the same, so we're all good, but
> I imagine that this code will prevent TRUNC being vectorized on GCN
> because the "same size" vector doesn't exist, and it doesn't check if
> there's a vector with the same number of elements (I've not actually
> tried that, yet, and there may be extra magic elsewhere for that case,
> but YSWIM).

Yeah, we don't have a get_vector_type_for_scalar_type_and_nelems
which would probably be semantically better in many places.

> I don't think changing this case to a new "get_same_length_vectype"
> would be appropriate for many architectures, so I'm not sure what to do
> here?
>
> We could fix this with new target hooks, perhaps?
>
> TARGET_VECTORIZE_REINTERPRET_VECTOR (vectype_in, scalartype_out)
>
>    Returns a new vectype (or mode) that uses the same vector register as
>    vectype_in, but has elements of scalartype_out.
>
>    The default implementation would be get_same_sized_vectype.
>
>    GCN would just return NULL, because you can't do that kind of
>    optimization.
>
> TARGET_VECTORIZE_COMPATIBLE_VECTOR (opcode, vectype_in, scalartype_out)
>
>    Returns a new vectype (or mode) that has the right number of elements
>    for the opcode (i.e. the same number, or 2x for packed opcodes), and
>    elements of scalartype_out.  The backend might choose a different
>    vector size, but promises that hardware can do the operation (i.e.
>    it's not mixing vector units).
>
>    The default implementation would be get_same_sized_vectype, for
>    backward compatibility.
>
>    GCN would simply return V64xx according to scalartype_out, and NULL
>    for unsupported opcodes.

I don't like putting the burden on the target here too much given the vectorizer
should know what kind of constraints it has given it implements the
vectorization
on GIMPLE which as IL constraints that are to be met - we just need to
ask for vector types with the appropriate constraints rather than using
same-size everywhere.

> Of course, none of this addresses the question of which vector size to
> choose in the first place.

See above for a suggestion.

> I've not figured out how it might ever start
> with a type other than the "preferred SIMD mode", yet.

In practically all cases vect_analyze_data_refs calling
get_vectype_for_scalar_type
on a load will be the one nailing down current_vector_size (if zero).
I also cannot
quickly think of a case where that would differ from "preferred SIMD
mode" unless
the target simply lies to us here ;)

So, would a current_vector_size re-org like outlined above help you?  I agree
leaving it at zero should work unless there's code in the vectorizer
that is simply
wrong.  Addressing some GCN issues with get_vectype_for_scalar_type_and_nunits
would also OK with me (if that works).

Thanks,
Richard.

> Thoughts?
>
> Andrew

Reply via email to