On Mon, Feb 11, 2019 at 02:15:18PM +0100, Uros Bizjak wrote:
> > Here is the updated patch to remove ext_sse_reg_operand check with a
> > testcase.
> >
> > OK for trunk?
> 
> No. As said, please correctly set mode to XImode in mode attribute 
> calculation.

The instructions in question are
vmovdqu32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqu32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 %{x,y}mm{[0-9],[12][0-9],3[01]}, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, %{x,y}mm{[0-9],[12][0-9],3[01]}
Why should those instructions be XImode?  They have 16 or 32 byte operands,
never 64 byte.

Using EXT_REX_SSE_REG_P is what *movdi_internal uses too:
        case MODE_TI:
          /* Handle AVX512 registers set.  */
          if (EXT_REX_SSE_REG_P (operands[0])
              || EXT_REX_SSE_REG_P (operands[1]))
            return "vmovdqa64\t{%1, %0|%0, %1}";
          return "%vmovdqa\t{%1, %0|%0, %1}";
Sure, in that case it is not MODE_DI, but MODE_TI, because the move is
actually 128-bit, not 64-bit, but we do not claim it is 512-bit.

*movsi_internal is incorrect (and inefficient):
        case MODE_TI:
          return "%vmovdqa\t{%1, %0|%0, %1}";
        case MODE_XI:
          return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
...
            (eq_attr "alternative" "8,9")
              (cond [(ior (match_operand 0 "ext_sse_reg_operand")
                          (match_operand 1 "ext_sse_reg_operand"))
                       (const_string "XI")
                     (ior (not (match_test "TARGET_SSE2"))
                          (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
                       (const_string "V4SF")
                     (match_test "TARGET_AVX")
                       (const_string "TI")
                     (match_test "optimize_function_for_size_p (cfun)")
                       (const_string "V4SF")
                    ]
                    (const_string "TI"))
In my reading, for (set (reg:SI xmm16) (reg:SI xmm17)) the above will
emit vmovdqa32  %zmm17, %zmm16 even for -mavx512vl, which looks wrong.
So, I'd suggest (and (not (match_test "TARGET_AVX512VL"))
                     (ior (match_operand 0 "ext_sse_reg_operand")
                          (match_operand 1 "ext_sse_reg_operand"))
                       (const_string "XI")

I see other wierdo stuff e.g. in *movdf_internal:
               /* movaps is one byte shorter for non-AVX targets.  */
               (eq_attr "alternative" "13,17")
                 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
                                  (not (match_test "TARGET_AVX512VL")))
                             (ior (match_operand 0 "ext_sse_reg_operand")
                                  (match_operand 1 "ext_sse_reg_operand")))
                          (const_string "V8DF")
If !TARGET_AVX512VL and one or both of the operands is (are)
ext_sse_reg_operand, obviously MODE_V8DF needs to be used.  But doesn't the
above force use of vmovapd      %zmm16, %zmm17 even if just -mavx512vl
-mprefer-vector-width=512?  I don't see any reason not to use
vmovapd %xmm16, %xmm17 in that case.  -mprefer-vector-width=512 is not you
must use ZMM all the time, but it is fine to use even EVEX instructions with
512-bit width.  Ditto *movsf_internal.

        Jakub

Reply via email to