https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81225
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kyukhin at gcc dot gnu.org, | |uros at gcc dot gnu.org --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- This seems to be a total mess in vec_extract_lo_* patterns. vec_extract_lo_*_maskm patterns correctly use register_operand as input operand. Then for V8D[FI] we have: (define_insn "vec_extract_lo_<mode><mask_name>" [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>,v") (vec_select:<ssehalfvecmode> (match_operand:V8FI 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX512F && (<mask_applied> || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if (<mask_applied> || !TARGET_AVX512VL) return "vextract<shuffletype>64x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}"; else return "#"; } but vextract[if]64x4 instruction does not support memory operand as %1, so I have no idea how it can work unless we split the insn. The corresponding splitter is: (define_split [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand") (vec_select:<ssehalfvecmode> (match_operand:V8FI 1 "nonimmediate_operand") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1])) && reload_completed && (TARGET_AVX512VL || (REG_P (operands[0]) && !EXT_REX_SSE_REG_P (operands[1])))" [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);") but it won't trigger if <mask_applied> or if operands[1] is xmm32+ and not AVX512VL. For xmm32+ it is handled by the vextract insn though. So, I wonder if the "v,m" in the pattern shouldn't actually be "v,<store_mask_constraint>" and replace nonimmediate_operand on the input with <store_mask_predicate> in order to avoid masked extractions from memory (or would it work to split the masked extractions of the low half from memory just by doing a masked load from the memory in ssehalfvecmode for both operands)? In any case, we should always return "#" if (MEM_P (operands[1])) in these vec_extract_lo_* insns to make it clear that the insns don't support memory as input. Similarly for V16SF/V16SI and V4DI/V4DF. But V8SI/V8SF is different: (define_split [(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand") (vec_select:<ssehalfvecmode> (match_operand:VI4F_256 1 "nonimmediate_operand") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX && !(MEM_P (operands[0]) && MEM_P (operands[1])) && reload_completed" [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);") (define_insn "vec_extract_lo_<mode><mask_name>" [(set (match_operand:<ssehalfvecmode> 0 "<store_mask_predicate>" "=<store_mask_constraint>") (vec_select:<ssehalfvecmode> (match_operand:VI4F_256 1 "register_operand" "v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX && <mask_avx512vl_condition> && <mask_avx512dq_condition>" { if (<mask_applied>) return "vextract<shuffletype>32x4\t{$0x0, %1, %0<mask_operand2>|%0<mask_operand2>, %1, 0x0}"; else return "#"; } While the splitter works with nonimmediate memory (and no masking), the insn requires only register operand. And then we have: case IX86_BUILTIN_GATHER3ALTSIV4DF: case IX86_BUILTIN_GATHER3ALTSIV4DI: case IX86_BUILTIN_GATHERALTSIV4DF: case IX86_BUILTIN_GATHERALTSIV4DI: half = gen_reg_rtx (V4SImode); if (!nonimmediate_operand (op2, V8SImode)) op2 = copy_to_mode_reg (V8SImode, op2); emit_insn (gen_vec_extract_lo_v8si (half, op2)); op2 = half; ... case IX86_BUILTIN_GATHER3ALTDIV8SF: case IX86_BUILTIN_GATHER3ALTDIV8SI: case IX86_BUILTIN_GATHERALTDIV8SF: case IX86_BUILTIN_GATHERALTDIV8SI: half = gen_reg_rtx (mode0); if (mode0 == V4SFmode) gen = gen_vec_extract_lo_v8sf; else gen = gen_vec_extract_lo_v8si; if (!nonimmediate_operand (op0, GET_MODE (op0))) op0 = copy_to_mode_reg (GET_MODE (op0), op0); emit_insn (gen (half, op0)); which doesn't match those vec_extract_lo_v8s[fi] patterns, because those require register_operand. So, quick fix for this ICE is certainly to: --- gcc/config/i386/i386.c.jj 2017-06-28 09:50:34.000000000 +0200 +++ gcc/config/i386/i386.c 2017-06-28 12:20:25.091261482 +0200 @@ -38918,7 +38918,7 @@ rdseed_step: case IX86_BUILTIN_GATHERALTSIV4DF: case IX86_BUILTIN_GATHERALTSIV4DI: half = gen_reg_rtx (V4SImode); - if (!nonimmediate_operand (op2, V8SImode)) + if (!register_operand (op2, V8SImode)) op2 = copy_to_mode_reg (V8SImode, op2); emit_insn (gen_vec_extract_lo_v8si (half, op2)); op2 = half; @@ -38951,13 +38951,13 @@ rdseed_step: gen = gen_vec_extract_lo_v8sf; else gen = gen_vec_extract_lo_v8si; - if (!nonimmediate_operand (op0, GET_MODE (op0))) + if (!register_operand (op0, GET_MODE (op0))) op0 = copy_to_mode_reg (GET_MODE (op0), op0); emit_insn (gen (half, op0)); op0 = half; if (GET_MODE (op3) != VOIDmode) { - if (!nonimmediate_operand (op3, GET_MODE (op3))) + if (!register_operand (op3, GET_MODE (op3))) op3 = copy_to_mode_reg (GET_MODE (op3), op3); emit_insn (gen (half, op3)); op3 = half; but it is not the right fix, IMHO we need to fix up all the vec_extract_lo_* patterns instead (and make them behave consistently).