https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796
--- Comment #5 from vincenzo Innocente <vincenzo.innocente at cern dot ch> --- so with latest 4.9 gcc version 4.10.0 20140611 (experimental) [trunk revision 211467] (GCC) situation has not changed much (the scalar version is now faster!): I think that the cost of gather instructions is still under-estimated