http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54566
Bug #: 54566 Summary: __builtin_shuffle: use psrldq+pslldq+por for rotations Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: gli...@gcc.gnu.org Target: x86_64-linux-gnu Hello, this PR is based on those 2 emails: http://gcc.gnu.org/ml/libstdc++/2012-09/msg00048.html http://gcc.gnu.org/ml/libstdc++/2012-09/msg00050.html which say that permutations that are rotations (like {1,2,3,4,5,6,7,0}) should more often be based on the rotation instructions (as in _mm_srli_si128 and _mm_slli_si128). Whether it is better than pshufb is not for me to say, but it could at least help where pshufb is not available. (256 bit AVX2 versions are also possible, although they require an additional lane swap)