https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69908
--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to Yuri Gribov from comment #5) > Well, as we all know there are a lot of missing optimizations in GCC :) I > think the real question is whether it's ever going to be fixed if there's no > standard API for this code pattern which we can recognize as builtin. > > I believe the answer is "No". ATM GCC does not vectorize even the simplest > memcpy equivalent code: > // gcc tmp.c -O3 -mtune=native -ftree-vectorize -o- -S > void memcpy_(char * __restrict a, char * __restrict b, unsigned n) { > unsigned i; > for (i = 0; i < n; ++i) > a[i] = b[i]; > } Please look again. ldist turns this into a call to memcpy. And if you disable ldist, it does get vectorized.