https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81225

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kyukhin at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This seems to be a total mess in vec_extract_lo_* patterns.
vec_extract_lo_*_maskm patterns correctly use register_operand as input
operand.
Then for V8D[FI] we have:
(define_insn "vec_extract_lo_<mode><mask_name>"
  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>"
"=<store_mask_constraint>,v")
        (vec_select:<ssehalfvecmode>
          (match_operand:V8FI 1 "nonimmediate_operand" "v,m")
          (parallel [(const_int 0) (const_int 1)
            (const_int 2) (const_int 3)])))]
  "TARGET_AVX512F
   && (<mask_applied> || !(MEM_P (operands[0]) && MEM_P (operands[1])))"
{
  if (<mask_applied> || !TARGET_AVX512VL)
    return "vextract<shuffletype>64x4\t{$0x0, %1,
%0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
  else
    return "#";
}
but vextract[if]64x4 instruction does not support memory operand as %1, so I
have no idea how it can work unless we split the insn.
The corresponding splitter is:
(define_split
  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
        (vec_select:<ssehalfvecmode>
          (match_operand:V8FI 1 "nonimmediate_operand")
          (parallel [(const_int 0) (const_int 1)
            (const_int 2) (const_int 3)])))]
  "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))
   && reload_completed
   && (TARGET_AVX512VL
       || (REG_P (operands[0]) && !EXT_REX_SSE_REG_P (operands[1])))"
  [(set (match_dup 0) (match_dup 1))]
  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")
but it won't trigger if <mask_applied> or if operands[1] is xmm32+ and not
AVX512VL.  For xmm32+ it is handled by the vextract insn though.
So, I wonder if the "v,m" in the pattern shouldn't actually be
"v,<store_mask_constraint>" and replace nonimmediate_operand on the input with
<store_mask_predicate> in order to avoid masked extractions from memory
(or would it work to split the masked extractions of the low half from memory
just by doing a masked load from the memory in ssehalfvecmode for both
operands)?  In any case, we should always return "#" if (MEM_P (operands[1]))
in these vec_extract_lo_* insns to make it clear that the insns don't support
memory as input.
Similarly for V16SF/V16SI and V4DI/V4DF.  But V8SI/V8SF is different:
(define_split
  [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand")
        (vec_select:<ssehalfvecmode>
          (match_operand:VI4F_256 1 "nonimmediate_operand")
          (parallel [(const_int 0) (const_int 1)
                     (const_int 2) (const_int 3)])))]
  "TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1]))
   && reload_completed"
  [(set (match_dup 0) (match_dup 1))]
  "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);")

(define_insn "vec_extract_lo_<mode><mask_name>"
  [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>"
"=<store_mask_constraint>")
        (vec_select:<ssehalfvecmode>
          (match_operand:VI4F_256 1 "register_operand" "v")
          (parallel [(const_int 0) (const_int 1)
                     (const_int 2) (const_int 3)])))]
  "TARGET_AVX && <mask_avx512vl_condition> && <mask_avx512dq_condition>"
{
  if (<mask_applied>)
    return "vextract<shuffletype>32x4\t{$0x0, %1,
%0<mask_operand2>|%0<mask_operand2>, %1, 0x0}";
  else
    return "#";
}
While the splitter works with nonimmediate memory (and no masking), the insn
requires only register operand.  And then we have:
        case IX86_BUILTIN_GATHER3ALTSIV4DF:
        case IX86_BUILTIN_GATHER3ALTSIV4DI:
        case IX86_BUILTIN_GATHERALTSIV4DF:
        case IX86_BUILTIN_GATHERALTSIV4DI:
          half = gen_reg_rtx (V4SImode);
          if (!nonimmediate_operand (op2, V8SImode))
            op2 = copy_to_mode_reg (V8SImode, op2);
          emit_insn (gen_vec_extract_lo_v8si (half, op2));
          op2 = half;
...
        case IX86_BUILTIN_GATHER3ALTDIV8SF:
        case IX86_BUILTIN_GATHER3ALTDIV8SI:
        case IX86_BUILTIN_GATHERALTDIV8SF:
        case IX86_BUILTIN_GATHERALTDIV8SI:
          half = gen_reg_rtx (mode0);
          if (mode0 == V4SFmode)
            gen = gen_vec_extract_lo_v8sf;
          else
            gen = gen_vec_extract_lo_v8si;
          if (!nonimmediate_operand (op0, GET_MODE (op0)))
            op0 = copy_to_mode_reg (GET_MODE (op0), op0);
          emit_insn (gen (half, op0));
which doesn't match those vec_extract_lo_v8s[fi] patterns, because those
require register_operand.
So, quick fix for this ICE is certainly to:
--- gcc/config/i386/i386.c.jj   2017-06-28 09:50:34.000000000 +0200
+++ gcc/config/i386/i386.c      2017-06-28 12:20:25.091261482 +0200
@@ -38918,7 +38918,7 @@ rdseed_step:
        case IX86_BUILTIN_GATHERALTSIV4DF:
        case IX86_BUILTIN_GATHERALTSIV4DI:
          half = gen_reg_rtx (V4SImode);
-         if (!nonimmediate_operand (op2, V8SImode))
+         if (!register_operand (op2, V8SImode))
            op2 = copy_to_mode_reg (V8SImode, op2);
          emit_insn (gen_vec_extract_lo_v8si (half, op2));
          op2 = half;
@@ -38951,13 +38951,13 @@ rdseed_step:
            gen = gen_vec_extract_lo_v8sf;
          else
            gen = gen_vec_extract_lo_v8si;
-         if (!nonimmediate_operand (op0, GET_MODE (op0)))
+         if (!register_operand (op0, GET_MODE (op0)))
            op0 = copy_to_mode_reg (GET_MODE (op0), op0);
          emit_insn (gen (half, op0));
          op0 = half;
          if (GET_MODE (op3) != VOIDmode)
            {
-             if (!nonimmediate_operand (op3, GET_MODE (op3)))
+             if (!register_operand (op3, GET_MODE (op3)))
                op3 = copy_to_mode_reg (GET_MODE (op3), op3);
              emit_insn (gen (half, op3));
              op3 = half;

but it is not the right fix, IMHO we need to fix up all the vec_extract_lo_*
patterns instead (and make them behave consistently).

Reply via email to