https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214
--- Comment #19 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- OK. The .optimized dumps seem to be the same for both -mavx2 and -march=skylake-avx512. Things only diverge during expand. It looks like it might be a bug in: (define_insn "<mask_codefor>avx512dq_shuf_<shuffletype>64x2_1<mask_name>" [(set (match_operand:VI8F_256 0 "register_operand" "=v") (vec_select:VI8F_256 (vec_concat:<ssedoublemode> (match_operand:VI8F_256 1 "register_operand" "v") (match_operand:VI8F_256 2 "nonimmediate_operand" "vm")) (parallel [(match_operand 3 "const_0_to_3_operand") (match_operand 4 "const_0_to_3_operand") (match_operand 5 "const_4_to_7_operand") (match_operand 6 "const_4_to_7_operand")])))] "TARGET_AVX512VL && (INTVAL (operands[3]) == (INTVAL (operands[4]) - 1) && INTVAL (operands[5]) == (INTVAL (operands[6]) - 1))" { int mask; mask = INTVAL (operands[3]) / 2; mask |= (INTVAL (operands[5]) - 4) / 2 << 1; operands[3] = GEN_INT (mask); return "vshuf<shuffletype>64x2\t{%3, %2, %1, %0<mask_operand7>|%0<mask_operand7>, %1, %2, %3}"; } [(set_attr "type" "sselog") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) which AFAICT requires without checking that operands 3 and 5 are even (0 or 2 and 4 or 6 respectively). In this case we're using it to match: (insn 40 39 41 6 (set (reg:V4DI 101 [ vect__5.17 ]) (vec_select:V4DI (vec_concat:V8DI (reg:V4DI 98 [ vect__5.14 ]) (reg:V4DI 140 [ vect__5.15 ])) (parallel [ (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 5 [0x5]) (const_int 6 [0x6]) ]))) "/tmp/foo.c":8:22 4069 {*avx512dq_shuf_i64x2_1} (nil)) and treat the permute mask as {2, 3, 4, 5} instead.