The vectoriser can handle interleaved loads such as:

    for (int i = 0; i < N; i++)
      res[i] = a[2 * i] + a[2 * i + 1];

The vectorised code loads two consecutive vectors from A, then permutes
the elements.  It can handle stores in a similar way.

This patch series adds support for load and store instructions that have
the interleaving "built in", such as NEON's vldN and vstN.  The series
is based on the outline here:

    http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html

except that I'm now using "internal" functions rather than built-ins.

I'll update my internal function patch:

    http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00609.html

after Richard's recent changes and retest, but the patches in this
series are unaffected.

Richard

Reply via email to