https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072
--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>: https://gcc.gnu.org/g:330782a1b6cfe881ad884617ffab441aeb1c2b5c commit r15-4398-g330782a1b6cfe881ad884617ffab441aeb1c2b5c Author: liuhongt <hongtao....@intel.com> Date: Mon Oct 14 17:16:13 2024 +0800 Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask). For x86 masked fma, there're 2 rtl representations 1) (vec_merge (fma op2 op1 op3) op1 mask) 2) (vec_merge (fma op1 op2 op3) op1 mask). 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>" 5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") 5896 (vec_merge:VFH_AVX512VL 5897 (fma:VFH_AVX512VL 5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0") 5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v") 5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")) 5901 (match_dup 1) 5902 (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))] 5903 "TARGET_AVX512F && <round_mode_condition>" 5904 "@ 5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>} 5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}" 5907 [(set_attr "type" "ssemuladd") 5908 (set_attr "prefix" "evex") 5909 (set_attr "mode" "<MODE>")]) Here op1 has constraint "0", and the scecond op1 is (match_dup 1), we once tried to replace it with (match_operand:M 5 "nonimmediate_operand" "0")) to enable more flexibility for pattern match and recog, but it triggered an ICE in reload(reload can handle at most one perand with "0" constraint). So we need either add 2 patterns in the backend or just do the canonicalization in the middle-end. gcc/ChangeLog: PR middle-end/117072 * combine.cc (maybe_swap_commutative_operands): Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask). --- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>: https://gcc.gnu.org/g:edf4db8355dead3413bad64f6a89bae82dabd0ad commit r15-4399-gedf4db8355dead3413bad64f6a89bae82dabd0ad Author: liuhongt <hongtao....@intel.com> Date: Mon Oct 14 13:09:59 2024 +0800 Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to (vec_merge (fma: op1 op2 op3) (match_dup 1)) mask) For masked FMA, there're 2 forms of RTL representation 1) (vec_merge (fma: op2 op1 op3) op1) mask) 2) (vec_merge (fma: op1 op2 op3) op1) mask) It's because op1 op2 are communatative in RTL(the second op1 is written as (match_dup 1)) we once tried to replace (match_dup 1) with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but trigger an ICE in reload(reload can handle at most one operand with "0" constraint). So the patch do the canonicalizaton for the backend part. gcc/ChangeLog: PR target/117072 * config/i386/sse.md (<avx512>_fmadd_<mode>_mask<round_name>): Relax predicates of fma operands from register_operand to nonimmediate_operand. (<avx512>_fmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fmsub_<mode>_mask3<round_name>): Ditto. (<avx512>_fnmadd_<mode>_mask<round_name>): Ditto. (<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fnmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto. (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto. (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto. (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask<round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto. (*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto. (avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto. (*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_mask<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_maskz_1<round_name>): Ditto. (avx10_2_fmaddnepbf16_<mode>_mask3): Ditto. (avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto. (avx10_2_fmsubnepbf16_<mode>_mask3): Ditto. (avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto. (fmai_vmfmadd_<mode><round_name>): Swap operands[1] and operands[2]. (fmai_vmfmsub_<mode><round_name>): Ditto. (fmai_vmfnmadd_<mode><round_name>): Ditto. (fmai_vmfnmsub_<mode><round_name>): Ditto. (*fmai_fmadd_<mode>): Swap operands[1] and operands[2] adjust operands[1] predicates from register_operand to nonimmediate_operand. (*fmai_fmsub_<mode>): Ditto. (*fmai_fnmadd_<mode><round_name>): Ditto. (*fmai_fnmsub_<mode><round_name>): Ditto.