https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80647
--- Comment #2 from Yale Zhang <yzhang1985 at gmail dot com> --- Very interesting case. First, I didn't know unaligned loads were undefined behavior on x86. ICC 17 doesn't vectorize the loop probably because the destination and source of the memmove() alias. But apparently GCC knows how to vectorize memmove(). In this function, the destination always comes before the source, so it's trivial to vectorize. Vectorizing the case where destination > source is harder, and I wonder if GCC can do that. This is some legacy code from > 10 years ago. Manually vectorizing the memmove() was too smart for modern compilers. But the solution is simple. I'll just use the other simple, fallback implementation used on unknown platforms. It's still vectorizable though. thanks Andrew.