https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Hongtao.liu from comment #8) > (In reply to Hongtao.liu from comment #7) > > Hmm, we have specific code to add scalar->vector(vmovq) cost to vector > > construct, but it seems not to work here, guess it's because &r0,and thought > > it was load not scalar? > Yes, true for as gimple_assign_load_p > > > (gdb) p debug_gimple_stmt (def) > 72# VUSE <.MEM_46> > 73r0.0_20 = r0; It's a load from stack, and finally eliminated in rtl dse1, but here the vectorizer doesn't know. And slp will not vectorize it when there's extra scalar->vector cost. typedef long long uint64_t; void add4i(uint64_t r0, uint64_t r1, uint64_t r2, uint64_t r3, uint64_t *dst) { dst[0] = r0; dst[1] = r1; dst[2] = r2; dst[3] = r3; } add4i: mov QWORD PTR [r8], rdi mov QWORD PTR [r8+8], rsi mov QWORD PTR [r8+16], rdx mov QWORD PTR [r8+24], rcx ret