https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119368
Hongtao Liu <liuhongt at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |liuhongt at gcc dot gnu.org --- Comment #3 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > So it seems sse.md expects combiner to be able to simplify the vec_select of > mem into shorter mem, while combiner doesn't do that? There're related codes in simplify-rtx.cc 4870 /* If we select a low-part subreg, return that. */ 4871 if (vec_series_lowpart_p (mode, GET_MODE (trueop0), trueop1)) 4872 { 4873 rtx new_rtx = lowpart_subreg (mode, trueop0, 4874 GET_MODE (trueop0)); 4875 if (new_rtx != NULL_RTX) 4876 return new_rtx; 4877 } but it relies on targetm.can_change_mode_class (op_mode, result_mode, ALL_REGS) which return false for x86. 7017/* Return true if, for all OP of mode OP_MODE: 7018 7019 (vec_select:RESULT_MODE OP SEL) 7020 7021 is equivalent to the lowpart RESULT_MODE of OP. */ 7022 7023bool 7024vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel) 7025{ 7026 int nunits; 7027 if (GET_MODE_NUNITS (op_mode).is_constant (&nunits) 7028 && targetm.can_change_mode_class (op_mode, result_mode, ALL_REGS)) 7029 { 7030 int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (sel, 0) : 0; 7031 return rtvec_series_p (XVEC (sel, 0), offset); 7032 } 7033 return false; 7034} 7035 I once tries to enable it for x86(always use subreg instead of vec_select), but it regressed lots of testcases, some of which needs backend pattern changes, some of which needs middle-end adjustment. But for this case, I think targetm.can_change_mode_class (op_mode, result_mode, ALL_REGS) is not needed since it's memory.