stores.

Richard Sandiford Thu, 20 Sep 2018 07:21:24 -0700

Richard Biener <richard.guent...@gmail.com> writes:
> On Thu, Sep 20, 2018 at 3:40 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Richard Biener <richard.guent...@gmail.com> writes:
>> > On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs <a...@codesourcery.com> 
>> > wrote:
>> >> On 17/09/18 12:43, Richard Sandiford wrote:
>> >> > OK, sounds like the cost of vec_construct is too low then.  But looking
>> >> > at the port, I see you have:
>> >> >
>> >> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */
>> >> >
>> >> > int
>> >> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED 
>> >> > (type_of_cost),
>> >> >                       tree ARG_UNUSED (vectype), int ARG_UNUSED 
>> >> > (misalign))
>> >> > {
>> >> >    /* Always vectorize.  */
>> >> >    return 1;
>> >> > }
>> >> >
>> >> > which short-circuits the cost-model altogether.  Isn't that part
>> >> > of the problem?
>> >>
>> >> Well, it's possible that that's a little simplistic. ;-)
>> >>
>> >> Although, actually the elementwise issue predates the existence of
>> >> gcn_vectorization_cost, and the default does appear to penalize
>> >> vec_construct somewhat.
>> >>
>> >> Actually, the default definition doesn't seem to do much besides
>> >> increase vec_construct, so I'm not sure now why I needed to change it?
>> >> Hmm, more experiments to do.
>> >>
>> >> Thanks for the pointer.
>> >
>> > Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
>> > that's a missed "optimization" quite possibly because gather/scatter is so
>> > expensive on x86.  Thus the vectorizer should consider this and use the
>> > cheaper alternative according to the cost model (which you of course should
>> > fill with sensible values...).
>>
>> Do you mean it this way round, or that it doesn't consider using
>> VMAT_ELEMENTWISE for natural gather/scatter accesses?  We do use
>> VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE,
>> but that relies on implementing the new optabs instead of using the old
>> built-in-based interface, so it doesn't work for x86 yet.
>>
>> I guess we might need some way of selecting between the two if
>> the costs of gather and scatter are context-dependent in some way.
>> But if gather/scatter is always more expensive than VMAT_ELEMENTWISE
>> for certain modes then it's probably better not to define the optabs
>> for those modes.
>
> I think we can't vectorize true gathers (indexed from memory loads) w/o
> gather yet, right?


Right.

> So I really was thinking of implementing VMAT_ELEMENTWISE (invariant
> stride) and VMAT_STRIDED_SLP by composing the appropriate index vector
> with a splat and multiplication and using a gather.  I think that's
> not yet implemented?

For SVE we use:

      /* As a last resort, trying using a gather load or scatter store.

         ??? Although the code can handle all group sizes correctly,
         it probably isn't a win to use separate strided accesses based
         on nearby locations.  Or, even if it's a win over scalar code,
         it might not be a win over vectorizing at a lower VF, if that
         allows us to use contiguous accesses.  */
      if (*memory_access_type == VMAT_ELEMENTWISE
          && single_element_p
          && loop_vinfo
          && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
                                                 masked_p, gs_info))
        *memory_access_type = VMAT_GATHER_SCATTER;

in get_group_load_store_type.  This only works when the target defines
gather/scatter using optabs rather than built-ins.

But yeah, no VMAT_STRIDED_SLP support yet.  That would be good
to have...

Richard

Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

Reply via email to