https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120930
--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> --- I'm seeing a difference between -O2 and -O3 where the -O2 version gets the proper result (3). In the -O3 version we completely unroll the loop but don't seem to populate the "b" array entirely but just the first 16 strided elements, i.e. one full vector at 256b. In 187.pcom the gimple still seems alright.