On Mon, Feb 11, 2019 at 02:15:18PM +0100, Uros Bizjak wrote: > > Here is the updated patch to remove ext_sse_reg_operand check with a > > testcase. > > > > OK for trunk? > > No. As said, please correctly set mode to XImode in mode attribute > calculation.
The instructions in question are vmovdqu32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]} vmovdqu32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem vmovdqa32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]} vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem vmovdqa32 %{x,y}mm{[0-9],[12][0-9],3[01]}, %{x,y}mm{1[6-9],2[0-9],3[01]} vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, %{x,y}mm{[0-9],[12][0-9],3[01]} Why should those instructions be XImode? They have 16 or 32 byte operands, never 64 byte. Using EXT_REX_SSE_REG_P is what *movdi_internal uses too: case MODE_TI: /* Handle AVX512 registers set. */ if (EXT_REX_SSE_REG_P (operands[0]) || EXT_REX_SSE_REG_P (operands[1])) return "vmovdqa64\t{%1, %0|%0, %1}"; return "%vmovdqa\t{%1, %0|%0, %1}"; Sure, in that case it is not MODE_DI, but MODE_TI, because the move is actually 128-bit, not 64-bit, but we do not claim it is 512-bit. *movsi_internal is incorrect (and inefficient): case MODE_TI: return "%vmovdqa\t{%1, %0|%0, %1}"; case MODE_XI: return "vmovdqa32\t{%g1, %g0|%g0, %g1}"; ... (eq_attr "alternative" "8,9") (cond [(ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") (ior (not (match_test "TARGET_SSE2")) (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) (const_string "V4SF") (match_test "TARGET_AVX") (const_string "TI") (match_test "optimize_function_for_size_p (cfun)") (const_string "V4SF") ] (const_string "TI")) In my reading, for (set (reg:SI xmm16) (reg:SI xmm17)) the above will emit vmovdqa32 %zmm17, %zmm16 even for -mavx512vl, which looks wrong. So, I'd suggest (and (not (match_test "TARGET_AVX512VL")) (ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") I see other wierdo stuff e.g. in *movdf_internal: /* movaps is one byte shorter for non-AVX targets. */ (eq_attr "alternative" "13,17") (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256")) (not (match_test "TARGET_AVX512VL"))) (ior (match_operand 0 "ext_sse_reg_operand") (match_operand 1 "ext_sse_reg_operand"))) (const_string "V8DF") If !TARGET_AVX512VL and one or both of the operands is (are) ext_sse_reg_operand, obviously MODE_V8DF needs to be used. But doesn't the above force use of vmovapd %zmm16, %zmm17 even if just -mavx512vl -mprefer-vector-width=512? I don't see any reason not to use vmovapd %xmm16, %xmm17 in that case. -mprefer-vector-width=512 is not you must use ZMM all the time, but it is fine to use even EVEX instructions with 512-bit width. Ditto *movsf_internal. Jakub