https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67577
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|rtl-optimization |tree-optimization --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- For aarch64-linux-gnu on the trunk (GCC 6), we are able to produce the vectorized code correctly: adrp x1, .LANCHOR0 add x0, x1, :lo12:.LANCHOR0 ldr q0, [x1, #:lo12:.LANCHOR0] ldr q1, [x0, 16] ldr q4, [x0, 64] ldr q3, [x0, 48] ldr s2, [x0, 32] fsub v4.4s, v4.4s, v1.4s fsub v3.4s, v3.4s, v0.4s dup v2.4s, v2.s[0] fmla v1.4s, v2.4s, v4.4s fmla v0.4s, v2.4s, v3.4s str q1, [x0, 96] str q0, [x0, 80]