The vectoriser can handle interleaved loads such as: for (int i = 0; i < N; i++) res[i] = a[2 * i] + a[2 * i + 1];
The vectorised code loads two consecutive vectors from A, then permutes the elements. It can handle stores in a similar way. This patch series adds support for load and store instructions that have the interleaving "built in", such as NEON's vldN and vstN. The series is based on the outline here: http://gcc.gnu.org/ml/gcc/2011-03/msg00322.html except that I'm now using "internal" functions rather than built-ins. I'll update my internal function patch: http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00609.html after Richard's recent changes and retest, but the patches in this series are unaffected. Richard