https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
Devin Hussey <husseydevin at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|GCC generates inefficient |GCC generates inefficient |U64x2 scalar multiply for |U64x2/v2di scalar multiply |NEON32 |for NEON32 --- Comment #1 from Devin Hussey <husseydevin at gmail dot com> --- I noticed that the scalarization is performed in the veclower21 stage. In making a patch for LLVM, I found that the x86 code could basically be copy-pasted over, just adding truncates and replacing the SSE instructions with NEON instructions. I would add it if someone told me where the SSE code is and where to put the NEON code. That is what helped me with the LLVM patch.