On Wed, May 8, 2019 at 5:06 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Wed, May 8, 2019 at 2:33 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Tue, May 7, 2019 at 8:49 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > > > > > > > > > >     This patch is about to enable support for bfloat16 
> > > > > > > > > > > which will be in Future Cooper Lake, Please refer to 
> > > > > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
> > > > > > > > > > > for more details about BF16.
> > > > > > > > > > >
> > > > > > > > > > > There are 3 instructions for AVX512BF16: VCVTNE2PS2BF16, 
> > > > > > > > > > > VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector 
> > > > > > > > > > > Neural Network Instructions supporting:
> > > > > > > > > > >
> > > > > > > > > > > -       VCVTNE2PS2BF16: Convert Two Packed Single Data to 
> > > > > > > > > > > One Packed BF16 Data.
> > > > > > > > > > > -       VCVTNEPS2BF16: Convert Packed Single Data to 
> > > > > > > > > > > Packed BF16 Data.
> > > > > > > > > > > -       VDPBF16PS: Dot Product of BF16 Pairs Accumulated 
> > > > > > > > > > > into Packed Single Precision.
> > > > > > > > > > >
> > > > > > > > > > > Since only BF16 intrinsics are supported, we treat it as 
> > > > > > > > > > > HI for simplicity.
> > > > > > > > > >
> > > > > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps 
> > > > > > > > > > using HImode
> > > > > > > > > > instead of HFmode. Is there a compelling reason not to 
> > > > > > > > > > introduce
> > > > > > > > > > corresponding bf16_format supporting infrastructure and 
> > > > > > > > > > declare these
> > > > > > > > > > intrinsics using half-binary (HBmode ?) mode instead?
> > > > > > > > > >
> > > > > > > > > > Uros.
> > > > > > > > >
> > > > > > > > > Bfloat16 isn't IEEE standard which we want to reserve HFmode 
> > > > > > > > > for.
> > > > > > > >
> > > > > > > > True.
> > > > > > > >
> > > > > > > > > The IEEE 754 standard specifies a binary16 as having the 
> > > > > > > > > following format:
> > > > > > > > > Sign bit: 1 bit
> > > > > > > > > Exponent width: 5 bits
> > > > > > > > > Significand precision: 11 bits (10 explicitly stored)
> > > > > > > > >
> > > > > > > > > Bfloat16 has the following format:
> > > > > > > > > Sign bit: 1 bit
> > > > > > > > > Exponent width: 8 bits
> > > > > > > > > Significand precision: 8 bits (7 explicitly stored), as 
> > > > > > > > > opposed to 24
> > > > > > > > > bits in a classical single-precision floating-point format
> > > > > > > >
> > > > > > > > This is why I proposed to introduce HBmode (and corresponding
> > > > > > > > bfloat16_format) to distingush between ieee HFmode and BFmode.
> > > > > > > >
> > > > > > >
> > > > > > > Unless there is BF16 language level support,  HBmode has no 
> > > > > > > advantage
> > > > > > > over HImode.   We can add HBmode when we gain BF16 language 
> > > > > > > support.
> > > > > > >
> > > > > > > --
> > > > > > > H.J.
> > > > > >
> > > > > > Any other comments, I'll merge this to trunk?
> > > > >
> > > > > It is not a regression, so please no.
> > > >
> > > > Ehm, "regression fix" ...
> > > >
> > > > Uros.
> > >
> > > Update patch.
> >
> > Index: gcc/config/i386/i386-builtins.c
> > ===================================================================
> > --- gcc/config/i386/i386-builtins.c    (revision 270934)
> > +++ gcc/config/i386/i386-builtins.c    (working copy)
> > @@ -1920,6 +1920,7 @@
> >    F_VPCLMULQDQ,
> >    F_AVX512VNNI,
> >    F_AVX512BITALG,
> > +  F_AVX512BF16,
> >    F_MAX
> >  };
> >
> > @@ -2064,7 +2065,8 @@
> >    {"gfni",    F_GFNI,    P_ZERO},
> >    {"vpclmulqdq", F_VPCLMULQDQ, P_ZERO},
> >    {"avx512vnni", F_AVX512VNNI, P_ZERO},
> > -  {"avx512bitalg", F_AVX512BITALG, P_ZERO}
> > +  {"avx512bitalg", F_AVX512BITALG, P_ZERO},
> > +  {"avx512bf16", F_AVX512BF16, P_ZERO}
> >  };
> >
> >  /* This parses the attribute arguments to target in DECL and determines
> >
> > You also need to update cpuinfo.h and cpuinfo.c in libgcc/config/i386
> > with avx512bf16, plus relevant test files.
> >
> > Index: gcc/testsuite/gcc.target/i386/avx-1.c
> > Index: gcc/testsuite/gcc.target/i386/avx-2.c
> >
> > No need to update above two files, sse-*.c changes are enough to cover
> > new functionality.
> >
> > Otherwise LGTM, but please repost updated patch with the ChangeLog
> > entry (please see [1]).
> >
> > [1] https://www.gnu.org/software/gcc/contribute.html#patches
> >
> > Uros.
>
> Update patch:
> 1. Add Changelog.

In fututre, please add ChangeLog entries as plaintext to the message,
as instructed in [1]. Also, please read the links to GCC coding
conventions and GNU Coding Standards for further information.

[1] https://www.gnu.org/software/gcc/contribute.html#patches

> 2. Update libgcc part.

Please add one line of space after

+2019-05-07  Wei Xiao  <wei3.x...@intel.com>
+
+    *

in all ChangeLog entries (please follow the form of existing entries
as an example).

+    * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test.
+    * gcc.target/i386/sse-12.c: Add -mavx512bf16.
+    * gcc.target/i386/sse-13.c: Ditto.
+    * gcc.target/i386/sse-14.c: Ditto.
+    * gcc.target/i386/sse-22.c: Ditto.
+    * gcc.target/i386/sse-23.c: Ditto.
+    * g++.dg/other/i386-2.C: Ditto.
+    * g++.dg/other/i386-3.C: Add avx512bf16.
+
+2019-05-07 Hongtao Liu <hongtao....@intel.com>
+    * gcc.target/i386/builtin_target.c: Ditto.

You should group entries a bit and write:

    * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test.
    * gcc.target/i386/builtin_target.c: Handle avx512bf16.
    * gcc.target/i386/sse-12.c: Add -mavx512bf16.
    * gcc.target/i386/sse-13.c: Ditto.
    * gcc.target/i386/sse-14.c: Ditto.
    * gcc.target/i386/sse-22.c: Ditto.
    * gcc.target/i386/sse-23.c: Ditto.
    * g++.dg/other/i386-2.C: Ditto.
    * g++.dg/other/i386-3.C: Ditto.

@@ -265,6 +265,10 @@
     assert (__builtin_cpu_supports ("avx5124vnniw"));
       if (edx & bit_AVX5124FMAPS)
     assert (__builtin_cpu_supports ("avx5124fmaps"));
+
+      __cpuid_count (7, 1, eax, ebx, ecx, edx);
+      if (eax & bit_AVX512BF16)
+          assert (__builtin_cpu_supports ("avx512bf16"));
     }

Please fix indentation here (too many leading spaces).

OK for mainline with the above fixes.

Thanks,
Uros.

Reply via email to