On Fri, 30 Apr 2021, Andre Vieira (lists) wrote: > Hi, > > The aim of this RFC is to explore a way of cleaning up the codegen around > data_references. To be specific, I'd like to reuse the main-loop's updated > data_reference as the base_address for the epilogue's corresponding > data_reference, rather than use the niters. We have found this leads to > better codegen in the vectorized epilogue loops. > > The approach in this RFC creates a map if iv_updates which always contain an > updated pointer that is caputed in vectorizable_{load,store}, an iv_update may > also contain a skip_edge in case we decide the vectorization can be skipped in > 'vect_do_peeling'. During the epilogue update this map of iv_updates is then > checked to see if it contains an entry for a data_reference and it is used > accordingly and if not it reverts back to the old behavior of using the niters > to advance the data_reference. > > The motivation for this work is to improve codegen for the option `--param > vect-partial-vector-usage=1` for SVE. We found that one of the main problems > for the codegen here was coming from unnecessary conversions caused by the way > we update the data_references in the epilogue. > > This patch passes regression tests in aarch64-linux-gnu, but the codegen is > still not optimal in some cases. Specifically those where we have a scalar > epilogue, as this does not use the data_reference's and will rely on the > gimple scalar code, thus constructing again a memory access using the niters. > This is a limitation for which I haven't quite worked out a solution yet and > does cause some minor regressions due to unfortunate spills. > > Let me know what you think and if you have ideas of how we can better achieve > this.
Hmm, so the patch adds a kludge to improve the kludge we have in place ;) I think it might be interesting to create a C testcase mimicing the update problem without involving the vectorizer. That way we can see how the various components involved behave (FRE + ivopts most specifically). That said, a cleaner approach to dealing with this would be to explicitely track the IVs we generate for vectorized DRs, eventually factoring that out from vectorizable_{store,load} so we can simply carry over the actual pointer IV final value to the epilogue as initial value. For each DR group we'd create a single IV (we can even do better in case we have load + store of the "same" group). We already kind-of track things via the ivexpr_map, but I'm not sure if this lazly populated map can be reliably re-used to "re-populate" the epilogue one (walk the map, create epilogue IVs with the appropriate initial value & adjustd upate). Richard. > Kind regards, > Andre Vieira > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)