https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65078
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |uros at gcc dot gnu.org --- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> --- So, in *.optimized the changes are just 16 times a difference like: - _62 = __builtin_ia32_vec_ext_v2di (_63, 0); + _62 = BIT_FIELD_REF <_63, 64, 0>; And during expansion, the difference is: -;; _62 = __builtin_ia32_vec_ext_v2di (_63, 0); - -(insn 42 41 43 (set (reg:V2DI 329) - (subreg:V2DI (reg:V16QI 138 [ D.4823 ]) 0)) ./include/emmintrin.h:722 -1 - (nil)) - -(insn 43 42 44 (set (reg:DI 330) - (vec_select:DI (reg:V2DI 329) - (parallel [ - (const_int 0 [0]) - ]))) ./include/emmintrin.h:722 -1 - (nil)) - -(insn 44 43 0 (set (reg:DI 136 [ D.4825 ]) - (reg:DI 330)) ./include/emmintrin.h:722 -1 - (nil)) - -;; MEM[(long long int *)dest_268] = _62; - -(insn 45 44 0 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int *)dest_268]+0 S8 A64]) - (reg:DI 136 [ D.4825 ])) ./include/emmintrin.h:722 -1 - (nil)) +;; MEM[(long long int *)dest_268] = _62; + +(insn 42 41 43 (set (reg:TI 329) + (subreg:TI (reg:V16QI 138 [ D.4825 ]) 0)) ./include/emmintrin.h:722 -1 + (nil)) +(insn 43 42 0 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int *)dest_268]+0 S8 A64]) + (subreg:DI (reg:TI 329) 0)) ./include/emmintrin.h:722 -1 + (nil)) With the new storel_epi64 we get before RA: (insn 43 40 44 3 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int *)dest_268]+0 S8 A64]) (subreg:DI (reg:V16QI 328) 0)) ./include/emmintrin.h:722 89 {*movdi_internal} (expr_list:REG_DEAD (reg:V16QI 328) (nil))) out of this, and not surprisingly the RA reloads it by storing the V16QI 328 into stack and loads back a DImode value, while with the old intrinsic before RA we have: (insn 45 43 46 3 (set (mem:DI (reg/v/f:SI 317 [ dest ]) [3 MEM[(long long int *)dest_268]+0 S8 A64]) (vec_select:DI (subreg:V2DI (reg:V16QI 328) 0) (parallel [ (const_int 0 [0]) ]))) ./include/emmintrin.h:722 3660 {*vec_extractv2di_0_sse} (expr_list:REG_DEAD (reg:V16QI 328) (nil))) and don't need to spill that. Now the question is if we can tell RA somehow (secondary reload) that to get a DImode lowpart subreg (and SImode too?) out of a vector register it can use the *vec_extractv2di_0_sse instruction for that. Or add !TARGET_64BIT pattern for storing a DImode lowpart subreg of a vector register (any mode there?) into memory? Or ensure that the BIT_FIELD_REF is expanded as the builtin used to be.