On Thu, Oct 24, 2019 at 12:56 AM Richard Sandiford <richard.sandif...@arm.com> wrote: > > "H.J. Lu" <hjl.to...@gmail.com> writes: > > On Wed, Oct 23, 2019 at 4:51 AM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Richard Biener <richard.guent...@gmail.com> writes: > >> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford > >> > <richard.sandif...@arm.com> wrote: > >> >> > >> >> This patch is the first of a series that tries to remove two > >> >> assumptions: > >> >> > >> >> (1) that all vectors involved in vectorisation must be the same size > >> >> > >> >> (2) that there is only one vector mode for a given element mode and > >> >> number of elements > >> >> > >> >> Relaxing (1) helps with targets that support multiple vector sizes or > >> >> that require the number of elements to stay the same. E.g. if we're > >> >> vectorising code that operates on narrow and wide elements, and the > >> >> narrow elements use 64-bit vectors, then on AArch64 it would normally > >> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors > >> >> for the wide elements. > >> >> > >> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce > >> >> fixed-length code for SVE. It also allows unpacked/half-size SVE > >> >> vectors to work with -msve-vector-bits=256. > >> >> > >> >> The patch adds a new hook that targets can use to control how we > >> >> move from one vector mode to another. The hook takes a starting vector > >> >> mode, a new element mode, and (optionally) a new number of elements. > >> >> The flexibility needed for (1) comes in when the number of elements > >> >> isn't specified. > >> >> > >> >> All callers in this patch specify the number of elements, but a later > >> >> vectoriser patch doesn't. I won't be posting the vectoriser patch > >> >> for a few days, hence the RFC/A tag. > >> >> > >> >> Tested individually on aarch64-linux-gnu and as a series on > >> >> x86_64-linux-gnu. OK to install? Or if not yet, does the idea > >> >> look OK? > >> > > >> > In isolation the idea looks good but maybe a bit limited? I see > >> > how it works for the same-size case but if you consider x86 > >> > where we have SSE, AVX256 and AVX512 what would it return > >> > for related_vector_mode (V4SImode, SImode, 0)? Or is this > >> > kind of query not intended (where the component modes match > >> > but nunits is zero)? > >> > >> In that case we'd normally get V4SImode back. It's an allowed > >> combination, but not very useful. > >> > >> > How do you get from SVE fixed 128bit to NEON fixed 128bit then? Or is > >> > it just used to stay in the same register set for different component > >> > modes? > >> > >> Yeah, the idea is to use the original vector mode as essentially > >> a base architecture. > >> > >> The follow-on patches replace vec_info::vector_size with > >> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes > >> with targetm.vectorize.autovectorize_vector_modes. These are the > >> starting modes that would be passed to the hook in the nunits==0 case. > >> > > > > For a target with different vector sizes, > > targetm.vectorize.autovectorize_vector_sizes > > doesn't return the optimal vector sizes for known trip count and > > unknown trip count. > > For a target with 128-bit and 256-bit vectors, 256-bit followed by > > 128-bit works well for > > known trip count since vectorizer knows the maximum usable vector size. > > But for > > unknown trip count, we may want to use 128-bit vector when 256-bit > > code path won't > > be used at run-time, but 128-bit vector will. At the moment, we can > > only use one > > set of vector sizes for both known trip count and unknown trip count. > > Yeah, we're hit by this for AArch64 too. Andre's recent patches: > > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01564.html > https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00205.html > > should help. > > > Can vectorizer > > support 2 sets of vector sizes, one for known trip count and the other > > for unknown > > trip count? > > The approach Andre's taking is to continue to use the wider vector size > for unknown trip counts, and instead ensure that the epilogue loop is > vectorised at the narrower vector size if possible. The patches then > use this vectorised epilogue as a fallback "main" loop if the runtime > trip count is too low for the wide vectors.
I tried it on 548.exchange2_r in SPEC CPU 2017. There is short cut to vectorized epilogue for low trip count. -- H.J.