http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607
Marc Glisse <marc.glisse at normalesup dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #26912|0 |1 is obsolete| | --- Comment #17 from Marc Glisse <marc.glisse at normalesup dot org> 2012-03-20 21:50:40 UTC --- Created attachment 26938 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26938 intra-lane shuffle in 3 insn This (mostly untested) patch is a reformulation of the generic v8sf single vector shuffle in 4 insn as a generic intra-lane 2 vector shuffle in at most 3 insn. Reformulating __builtin_shuffle(x,m) as __builtin_shuffle(x,vperm2f128(x,1),mm) would then guarantee a maximum size of 4. Note that the strategy of doing a 2-vector shuffle by shuffling (not restricted to one vpermilp*) each vector and blending the results gives a maximum of 9 insn, whereas the current code often generates twice that number. By the way, I have trouble understanding this comment: /* For d->op0 == d->op1 the only useful vperm2f128 permutation is 0x10. */ Is it really 0x10, or is there a stray 0 at the end and it is really just 1?