https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072

--- Comment #13 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>:

https://gcc.gnu.org/g:330782a1b6cfe881ad884617ffab441aeb1c2b5c

commit r15-4398-g330782a1b6cfe881ad884617ffab441aeb1c2b5c
Author: liuhongt <hongtao....@intel.com>
Date:   Mon Oct 14 17:16:13 2024 +0800

    Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1
op2 op3) op1 mask).

    For x86 masked fma, there're 2 rtl representations
    1) (vec_merge (fma op2 op1 op3) op1 mask)
    2) (vec_merge (fma op1 op2 op3) op1 mask).

     5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
     5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
     5896        (vec_merge:VFH_AVX512VL
     5897          (fma:VFH_AVX512VL
     5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand"
"0,0")
     5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>"
"<round_constraint>,v")
     5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>"
"v,<round_constraint>"))
     5901          (match_dup 1)
     5902          (match_operand:<avx512fmaskmode> 4 "register_operand"
"Yk,Yk")))]
     5903  "TARGET_AVX512F && <round_mode_condition>"
     5904  "@
     5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%},
%3, %2<round_op5>}
     5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%},
%2, %3<round_op5>}"
     5907  [(set_attr "type" "ssemuladd")
     5908   (set_attr "prefix" "evex")
     5909   (set_attr "mode" "<MODE>")])

    Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
    we once tried to replace it with (match_operand:M 5
    "nonimmediate_operand" "0")) to enable more flexibility for pattern
    match and recog, but it triggered an ICE in reload(reload can handle
    at most one perand with "0" constraint).

    So we need either add 2 patterns in the backend or just do the
    canonicalization in the middle-end.

    gcc/ChangeLog:

            PR middle-end/117072
            * combine.cc (maybe_swap_commutative_operands):
            Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
            to (vec_merge (fma op1 op2 op3) op1 mask).

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuho...@gcc.gnu.org>:

https://gcc.gnu.org/g:edf4db8355dead3413bad64f6a89bae82dabd0ad

commit r15-4399-gedf4db8355dead3413bad64f6a89bae82dabd0ad
Author: liuhongt <hongtao....@intel.com>
Date:   Mon Oct 14 13:09:59 2024 +0800

    Canonicalize (vec_merge (fma: op2 op1 op3) (match_dup 1)) mask) to
(vec_merge (fma: op1 op2 op3) (match_dup 1)) mask)

    For masked FMA, there're 2 forms of RTL representation
    1) (vec_merge (fma: op2 op1 op3) op1) mask)
    2) (vec_merge (fma: op1 op2 op3) op1) mask)
    It's because op1 op2 are communatative in RTL(the second op1 is
    written as (match_dup 1))

    we once tried to replace (match_dup 1)
    with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but
    trigger an ICE in reload(reload can handle at most one operand with
    "0" constraint).

    So the patch do the canonicalizaton for the backend part.

    gcc/ChangeLog:

            PR target/117072
            * config/i386/sse.md (<avx512>_fmadd_<mode>_mask<round_name>):
            Relax predicates of fma operands from register_operand to
            nonimmediate_operand.
            (<avx512>_fmadd_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmsub_<mode>_mask<round_name>): Ditto.
            (<avx512>_fmsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fnmadd_<mode>_mask<round_name>): Ditto.
            (<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fnmsub_<mode>_mask<round_name>): Ditto.
            (<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
            (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto.
            (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto.
            (*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto.
            (*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_mask<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto.
            (avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto.
            (*avx512f_vmfnmsub_<mode>_maskz_1<round_name>): Ditto.
            (avx10_2_fmaddnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fmsubnepbf16_<mode>_mask3): Ditto.
            (avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto.
            (fmai_vmfmadd_<mode><round_name>): Swap operands[1] and
operands[2].
            (fmai_vmfmsub_<mode><round_name>): Ditto.
            (fmai_vmfnmadd_<mode><round_name>): Ditto.
            (fmai_vmfnmsub_<mode><round_name>): Ditto.
            (*fmai_fmadd_<mode>): Swap operands[1] and operands[2] adjust
            operands[1] predicates from register_operand to
            nonimmediate_operand.
            (*fmai_fmsub_<mode>): Ditto.
            (*fmai_fnmadd_<mode><round_name>): Ditto.
            (*fmai_fnmsub_<mode><round_name>): Ditto.

Reply via email to