It's pretty clear that the operand numbers in the MEM_P() checks are off by one, perhaps due to a copy-and-paste oversight (unlike in most other places here we're dealing with two outputs). --- What I don't understand is why operand 2 is "nonimmediate_operand", not "register_operand" (which afaict would eliminate the need for these MEM_P() checks). This would then also extend to e.g. the subsequent umul<mode><dwi>3_1 and mul<mode><dwi>3_1 (and apparently quite a few more).
--- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -8465,7 +8465,7 @@ (zero_extend:<DWI> (match_dup 3))) (match_operand:QI 4 "const_int_operand" "n"))))] "TARGET_BMI2 && INTVAL (operands[4]) == <MODE_SIZE> * BITS_PER_UNIT - && !(MEM_P (operands[1]) && MEM_P (operands[2]))" + && !(MEM_P (operands[2]) && MEM_P (operands[3]))" "mulx\t{%3, %0, %1|%1, %0, %3}" [(set_attr "type" "imulx") (set_attr "prefix" "vex")