On Fri, 30 Apr 2021, Andre Vieira (lists) wrote:

> Hi,
> 
> The aim of this RFC is to explore a way of cleaning up the codegen around
> data_references.  To be specific, I'd like to reuse the main-loop's updated
> data_reference as the base_address for the epilogue's corresponding
> data_reference, rather than use the niters.  We have found this leads to
> better codegen in the vectorized epilogue loops.
> 
> The approach in this RFC creates a map if iv_updates which always contain an
> updated pointer that is caputed in vectorizable_{load,store}, an iv_update may
> also contain a skip_edge in case we decide the vectorization can be skipped in
> 'vect_do_peeling'. During the epilogue update this map of iv_updates is then
> checked to see if it contains an entry for a data_reference and it is used
> accordingly and if not it reverts back to the old behavior of using the niters
> to advance the data_reference.
> 
> The motivation for this work is to improve codegen for the option `--param
> vect-partial-vector-usage=1` for SVE. We found that one of the main problems
> for the codegen here was coming from unnecessary conversions caused by the way
> we update the data_references in the epilogue.
> 
> This patch passes regression tests in aarch64-linux-gnu, but the codegen is
> still not optimal in some cases. Specifically those where we have a scalar
> epilogue, as this does not use the data_reference's and will rely on the
> gimple scalar code, thus constructing again a memory access using the niters. 
> This is a limitation for which I haven't quite worked out a solution yet and
> does cause some minor regressions due to unfortunate spills.
> 
> Let me know what you think and if you have ideas of how we can better achieve
> this.

Hmm, so the patch adds a kludge to improve the kludge we have in place ;)

I think it might be interesting to create a C testcase mimicing the
update problem without involving the vectorizer.  That way we can
see how the various components involved behave (FRE + ivopts most
specifically).

That said, a cleaner approach to dealing with this would be to
explicitely track the IVs we generate for vectorized DRs, eventually
factoring that out from vectorizable_{store,load} so we can simply
carry over the actual pointer IV final value to the epilogue as
initial value.  For each DR group we'd create a single IV (we can
even do better in case we have load + store of the "same" group).

We already kind-of track things via the ivexpr_map, but I'm not sure
if this lazly populated map can be reliably re-used to "re-populate"
the epilogue one (walk the map, create epilogue IVs with the appropriate
initial value & adjustd upate).

Richard.

> Kind regards,
> Andre Vieira
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to