Richard Henderson <r...@redhat.com> wrote on 17/11/2009 03:39:42:
> Richard Henderson <r...@redhat.com> > 17/11/2009 03:39 > > To > > Ira Rosen/Haifa/i...@ibmil > > cc > > gcc@gcc.gnu.org > > Subject > > targetm.vectorize.builtin_vec_perm > > What is this hook supposed to do? There is no description of its arguments. > > What is the theory of operation of permute within the vectorizer? Do > you actually need variable permute, or would constants be ok? It is currently used for a specific load permutation of RGB to YUV conversion (http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00445.html). The arguments are vector type and mask type (the last one is returned by the hook). The permute is constant, it depends on the number of loads (group size) and their type. However, there are cases, that we may want to support in the future, that require variable permute - indirect accesses, for example. > > I'm contemplating adding a tree- and gimple-level VEC_PERMUTE_EXPR of > the form: > > VEC_PERMUTE_EXPR (vlow, vhigh, vperm) > > which would be exactly equal to > > (vec_select > (vec_concat vlow vhigh) > vperm) > > at the rtl level. I.e. vperm is an integral vector of the same number > of elements as vlow. > > Truly variable permutation is something that's only supported by ppc and > spu. Also Altivec and SPU support byte permutation (and not only element permutation), however, the vectorizer does not make use of this at present. > Intel AVX has a limited variable permutation -- 64-bit or 32-bit > elements can be rearranged but only within a 128-bit subvector. > So if you're working with 128-bit vectors, it's fully variable, but if > you're working with 256-bit vectors, it's like doing 2 128-bit permute > operations in parallel. Intel before AVX has no variable permute. > > HOWEVER! Most of the useful permutations that I can think of for the > optimizers to generate are actually constant. And these can be > implemented everywhere (with varying degrees of efficiency). > > Anyway, I'm thinking that it might be better to add such a general > operation instead of continuing to add things like > > VEC_EXTRACT_EVEN_EXPR, > VEC_EXTRACT_ODD_EXPR, > VEC_INTERLEAVE_HIGH_EXPR, > VEC_INTERLEAVE_LOW_EXPR, > > and other obvious patterns like broadcast, duplicate even to odd, > duplicate odd to even, etc. If the back end will be able to identify specific masks, e.g., {0,2,4,6} as extract even operation, then we can certainly remove those codes. > > I can imagine having some sort of target hook that computed a cost > metric for a given constant permutation pattern. For instance, I'd > imagine that the interleave patterns are half as expensive as a full > permute for altivec, due to not having to load a mask. This hook would > be fairly complicated for x86, given all of the permuting insns that > were incrementally added in various ISA revisions, but such is life. > > In any case, would a VEC_PERMUTE_EXPR, as described above, work for the > uses of builtin_vec_perm within the vectorizer at present? Yes. Ira > > > r~