> +The elements of the input vectors are numbered from left to right across > +one or both of the vectors. Each element in the mask specifies a number > +of element from the input vector(s). Consider the following example.
It would be more preferable to talk about the memory ordering of the elements rather than "left" and "right" which are ambiguous at best. > + if (TREE_CODE (mask) == VECTOR_CST) > + { > + tree m_type, call; > + tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), &m_type); > + /*rtx t;*/ Leftover crap. > + > + if (!fn) > + goto vshuffle; > + > + if (m_type != TREE_TYPE (TREE_TYPE (mask))) > + { > + int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)); > + tree cvt = build_vector_type (m_type, units); > + mask = fold_convert (cvt, mask); > + } > + > + call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), > fn); > + call = build_call_nary (type /* ? */, call, 3, v0, v1, mask); > + > + return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, > NULL); > + } > + > +vshuffle: > + gcc_assert (operand_equal_p (v0, v1, 0)); Why can't a non-constant shuffle have different V0 and V1? That seems like a direct violation of the documentation, and any sort of usefulness. Also, while I'm ok with the use of builtin_vec_perm here in the short term, I think that in the long term we should simply force the named pattern to handle constants. Then the vectorizer can simply use the predicate and the tree code and we can drop the large redundancy of builtins with different argument types. Indeed, once this patch is applied, I think that ought to be the very next task in this domain. > +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPR<v0, v1, maks> Typo in "mask". > + foreach i in length (mask): > + A = mask[i] < length (v0) ? v0[mask[i]] : v1[mask[i]] Surely it's v1[mask[i] - length]. > + if (TREE_CODE (vect) == VECTOR_CST) > + { > + unsigned i; Indentation is off all through this function. > + mask = gen_rtx_AND (maskmode, mask, mm); > + > + /* Convert mask to vector of chars. */ > + mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0); > + mask = force_reg (V16QImode, mask); Why are you using force_reg to do all the dirty work? Seems to me this should be using expand_normal. All throughout this function. That would also avoid the need for all of the extra force_reg stuff that ought not be there for -O0. I also see that you're not even attempting to use xop_pperm. Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1? And give the vshuffle named pattern the wrong number of arguments? It's certainly possible to handle it, though it takes a few more steps, and might well be more efficient as a libgcc function rather than inline. r~