On 06/05/2014 08:29 AM, Evgeny Stupachenko wrote: > + /* Figure out where permutation elements stay not in their > + respective lanes. */ > + for (i = 0, which = 0; i < nelt; ++i) > + { > + unsigned e = d->perm[i]; > + if (e != i) > + which |= (e < nelt ? 1 : 2); > + } > + /* We can pblend the part where elements stay not in their > + respective lanes only when these elements are all in one > + half of a permutation. > + {0 1 8 3 4 5 9 7} is ok as 8, 9 are not at their respective > + lanes, but both 8 and 9 >= 8 > + {0 1 8 3 4 5 2 7} is not ok as 2 and 8 are not at their > + respective lanes and 8 >= 8, but 2 not. */ > + if (which != 1 && which != 2) > + return false;
I was about to suggest that you'd get more success by putting the blend first, and do the shuffle second. But I suppose it does cover a few cases that the other way would miss, e.g. { 0 4 7 3 } because we can't blend 0 and 4 (or 3 and 7) into the same vector. Whereas the direction you're trying can't handle { 0 6 6 1 } But that can be implemented with { 0 1 2 3 } { 4 5 6 7 } ----------- { 0 1 6 3 } (pblend) ----------- { 0 6 6 1 } (pshufb) So I guess we should cover these two cases in successive patches. > + if (!expand_vec_perm_blend (&dcopy1)) > + return false; > + > + return true; You should avoid doing any work in this function if the ISA isn't enabled. Don't wait until the last test for blend to fail. Separate that out from the start of expand_vec_perm_blend as a subroutine, perhaps. We should be able to prove that we've got a valid blend as input here, so I'd be more inclined to write ok = expand_vec_perm_blend (&dcopy1); gcc_assert (ok); return true; > + if (!expand_vec_perm_1 (&dcopy)) > + return false; If we know we have pblend, then we know we have pshufb, so again I don't see how expand_vec_perm_1 can fail. Another assert would be good. There is a point, earlier in the function, where we know whether we're going to succeed or not. I believe just after > + if (which != 1 && which != 2) > + return false; You should add a if (d->testing_p) return true; at that point. r~