Richard Biener <richard.guent...@gmail.com> writes: > On Thu, Sep 20, 2018 at 3:40 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs <a...@codesourcery.com> >> > wrote: >> >> On 17/09/18 12:43, Richard Sandiford wrote: >> >> > OK, sounds like the cost of vec_construct is too low then. But looking >> >> > at the port, I see you have: >> >> > >> >> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST. */ >> >> > >> >> > int >> >> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED >> >> > (type_of_cost), >> >> > tree ARG_UNUSED (vectype), int ARG_UNUSED >> >> > (misalign)) >> >> > { >> >> > /* Always vectorize. */ >> >> > return 1; >> >> > } >> >> > >> >> > which short-circuits the cost-model altogether. Isn't that part >> >> > of the problem? >> >> >> >> Well, it's possible that that's a little simplistic. ;-) >> >> >> >> Although, actually the elementwise issue predates the existence of >> >> gcn_vectorization_cost, and the default does appear to penalize >> >> vec_construct somewhat. >> >> >> >> Actually, the default definition doesn't seem to do much besides >> >> increase vec_construct, so I'm not sure now why I needed to change it? >> >> Hmm, more experiments to do. >> >> >> >> Thanks for the pointer. >> > >> > Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE, >> > that's a missed "optimization" quite possibly because gather/scatter is so >> > expensive on x86. Thus the vectorizer should consider this and use the >> > cheaper alternative according to the cost model (which you of course should >> > fill with sensible values...). >> >> Do you mean it this way round, or that it doesn't consider using >> VMAT_ELEMENTWISE for natural gather/scatter accesses? We do use >> VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE, >> but that relies on implementing the new optabs instead of using the old >> built-in-based interface, so it doesn't work for x86 yet. >> >> I guess we might need some way of selecting between the two if >> the costs of gather and scatter are context-dependent in some way. >> But if gather/scatter is always more expensive than VMAT_ELEMENTWISE >> for certain modes then it's probably better not to define the optabs >> for those modes. > > I think we can't vectorize true gathers (indexed from memory loads) w/o > gather yet, right?
Right. > So I really was thinking of implementing VMAT_ELEMENTWISE (invariant > stride) and VMAT_STRIDED_SLP by composing the appropriate index vector > with a splat and multiplication and using a gather. I think that's > not yet implemented? For SVE we use: /* As a last resort, trying using a gather load or scatter store. ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. */ if (*memory_access_type == VMAT_ELEMENTWISE && single_element_p && loop_vinfo && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo, masked_p, gs_info)) *memory_access_type = VMAT_GATHER_SCATTER; in get_group_load_store_type. This only works when the target defines gather/scatter using optabs rather than built-ins. But yeah, no VMAT_STRIDED_SLP support yet. That would be good to have... Richard