https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90460

            Bug ID: 90460
           Summary: Inefficient vector construction from pieces
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Split out from PR90424

template <class T>
using V [[gnu::vector_size(16)]] = T;

template <class T, unsigned... I>
V<T> load(const void *p) {
  const T* q = static_cast<const T*>(p);
  V<T> r = {q[I]...};
  return r;
}

// movq or movsd
template V<char  > load<char  , 0,1,2,3,4,5,6,7>(const void *);
template V<short > load<short , 0,1,2,3>(const void *);
template V<int   > load<int   , 0,1>(const void *);
template V<long  > load<long  , 0>(const void *);
template V<float > load<float , 0,1>(const void *);
template V<double> load<double, 0>(const void *);

// movd or movss
template V<char > load<char , 0,1,2,3>(const void *);
template V<short> load<short, 0,1>(const void *);
template V<int  > load<int  , 0>(const void *);
template V<float> load<float, 0>(const void *);


ends up with IL like

load<int, 0, 1> (const void * p)
{
  V r;
  int _1;
  int _2;

  <bb 2> [local count: 1073741824]:
  _1 = MEM[(const int *)p_3(D)];
  _2 = MEM[(const int *)p_3(D) + 4B];
  r_5 = {_1, _2};
  return r_5;

which looks like a job for bswap.

Reply via email to