https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124315

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2026-03-02

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Zdenek Sojka from comment #0)
> Created attachment 63799 [details]
> reduced testcase
> 
> Output:
> $ x86_64-pc-linux-gnu-gcc -mavx512fp16 -c testcase.c
> $ objdump -S testcase.o > att.S
> $ x86_64-pc-linux-gnu-gcc -mavx512fp16 -c testcase.c -masm=intel
> $ objdump -S testcase.o > intel.S
> $ diff -u att.S intel.S
> --- att.S       2026-03-02 07:54:30.144550317 +0100
> +++ intel.S     2026-03-02 07:54:35.554550344 +0100
> @@ -15,7 +15,7 @@
>    1b:  00 
>    1c:  b8 00 00 00 00          mov    $0x0,%eax
>    21:  c5 f9 92 c8             kmovb  %eax,%k1
> -  25:  62 f6 75 59 bd c2       vfnmadd231sh {ru-sae},%xmm2,%xmm1,%xmm0{%k1}
> +  25:  62 f6 7d 59 bd c2       vfnmadd231sh {ru-sae},%xmm2,%xmm0,%xmm0{%k1}
>    2b:  5d                      pop    %rbp
>    2c:  c3                      ret

(define_insn "avx512f_vmfnmadd_<mode>_mask3<round_name>"
...
  "vfnmadd231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%},
%<iptr>3, %<iptr>2<round_op5>}"

%<iptr>3 for intel dialect is wrong, should be %1

> @@ -30,6 +30,6 @@
>    48:  00 
>    49:  b8 00 00 00 00          mov    $0x0,%eax
>    4e:  c5 f9 92 c8             kmovb  %eax,%k1
> -  52:  62 f2 ed 59 bb c1       vfmsub231sd {ru-sae},%xmm1,%xmm2,%xmm0{%k1}
> +  52:  62 f2 fd 59 bb c1       vfmsub231sd {ru-sae},%xmm1,%xmm0,%xmm0{%k1}
>    58:  5d                      pop    %rbp
>    59:  c3                      ret

(define_insn "avx512f_vmfmsub_<mode>_mask3<round_name>"
...
  "vfmsub231<ssescalarmodesuffix>\t{<round_op5>%2, %1, %0%{%4%}|%0%{%4%},
%<iptr>3, %<iptr>2<round_op5>}"

same as above.

> The -masm=intel output has xmm0 twice as the operand.

BTW: Adding -dp to the compile flags will report the name of the problematic
insn pattern in the asm dump. -dP will write out the whole RTL pattern.

Reply via email to