https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60408
--- Comment #3 from Jim Wilson <wilson at tuliptree dot org> --- Even if we could fix the vec_extract constraints, we still end up with 3 instructions, as the optimizer can't do anything interesting with the vec_extract RTL. For a 32-bit SFmode value though, we can just use a subreg instead of a vector extract. The ARM port models the vector registers as 32-bit registers, so a subreg for a 32-bit mode will always be valid. Using a subreg instead of a vector extract here, I get 2 instructions. vmov.f32 s15, s0 vadd.f32 s0, s1, s15 That is because the register allocator thinks it needs a temp because inputs and ouputs partially overlap. That is a harder problem to fix. Subregs should also work for 64-bit modes. I have an experimental patch which is mostly untested. I don't know if this works for both big-endian and little-endian. I don't know if this works for all 32-bit modes and all vector types. Etc. All I know is that it seems to work for this testcase.