https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
Devin Hussey <husseydevin at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|GCC generates inefficient |GCC generates inefficient
|U64x2 scalar multiply for |U64x2/v2di scalar multiply
|NEON32 |for NEON32
--- Comment #1 from Devin Hussey <husseydevin at gmail dot com> ---
I noticed that the scalarization is performed in the veclower21 stage.
In making a patch for LLVM, I found that the x86 code could basically be
copy-pasted over, just adding truncates and replacing the SSE instructions with
NEON instructions. I would add it if someone told me where the SSE code is and
where to put the NEON code. That is what helped me with the LLVM patch.