https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460
Bug ID: 90460 Summary: Inefficient vector construction from pieces Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Split out from PR90424 template <class T> using V [[gnu::vector_size(16)]] = T; template <class T, unsigned... I> V<T> load(const void *p) { const T* q = static_cast<const T*>(p); V<T> r = {q[I]...}; return r; } // movq or movsd template V<char > load<char , 0,1,2,3,4,5,6,7>(const void *); template V<short > load<short , 0,1,2,3>(const void *); template V<int > load<int , 0,1>(const void *); template V<long > load<long , 0>(const void *); template V<float > load<float , 0,1>(const void *); template V<double> load<double, 0>(const void *); // movd or movss template V<char > load<char , 0,1,2,3>(const void *); template V<short> load<short, 0,1>(const void *); template V<int > load<int , 0>(const void *); template V<float> load<float, 0>(const void *); ends up with IL like load<int, 0, 1> (const void * p) { V r; int _1; int _2; <bb 2> [local count: 1073741824]: _1 = MEM[(const int *)p_3(D)]; _2 = MEM[(const int *)p_3(D) + 4B]; r_5 = {_1, _2}; return r_5; which looks like a job for bswap.