On Wed, May 8, 2019 at 5:06 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, May 8, 2019 at 2:33 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Tue, May 7, 2019 at 8:49 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > > > > > > This patch is about to enable support for bfloat16 > > > > > > > > > > > which will be in Future Cooper Lake, Please refer to > > > > > > > > > > > https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference > > > > > > > > > > > for more details about BF16. > > > > > > > > > > > > > > > > > > > > > > There are 3 instructions for AVX512BF16: VCVTNE2PS2BF16, > > > > > > > > > > > VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector > > > > > > > > > > > Neural Network Instructions supporting: > > > > > > > > > > > > > > > > > > > > > > - VCVTNE2PS2BF16: Convert Two Packed Single Data to > > > > > > > > > > > One Packed BF16 Data. > > > > > > > > > > > - VCVTNEPS2BF16: Convert Packed Single Data to > > > > > > > > > > > Packed BF16 Data. > > > > > > > > > > > - VDPBF16PS: Dot Product of BF16 Pairs Accumulated > > > > > > > > > > > into Packed Single Precision. > > > > > > > > > > > > > > > > > > > > > > Since only BF16 intrinsics are supported, we treat it as > > > > > > > > > > > HI for simplicity. > > > > > > > > > > > > > > > > > > > > I think it was a mistake declaring cvtps2ph and cvtph2ps > > > > > > > > > > using HImode > > > > > > > > > > instead of HFmode. Is there a compelling reason not to > > > > > > > > > > introduce > > > > > > > > > > corresponding bf16_format supporting infrastructure and > > > > > > > > > > declare these > > > > > > > > > > intrinsics using half-binary (HBmode ?) mode instead? > > > > > > > > > > > > > > > > > > > > Uros. > > > > > > > > > > > > > > > > > > Bfloat16 isn't IEEE standard which we want to reserve HFmode > > > > > > > > > for. > > > > > > > > > > > > > > > > True. > > > > > > > > > > > > > > > > > The IEEE 754 standard specifies a binary16 as having the > > > > > > > > > following format: > > > > > > > > > Sign bit: 1 bit > > > > > > > > > Exponent width: 5 bits > > > > > > > > > Significand precision: 11 bits (10 explicitly stored) > > > > > > > > > > > > > > > > > > Bfloat16 has the following format: > > > > > > > > > Sign bit: 1 bit > > > > > > > > > Exponent width: 8 bits > > > > > > > > > Significand precision: 8 bits (7 explicitly stored), as > > > > > > > > > opposed to 24 > > > > > > > > > bits in a classical single-precision floating-point format > > > > > > > > > > > > > > > > This is why I proposed to introduce HBmode (and corresponding > > > > > > > > bfloat16_format) to distingush between ieee HFmode and BFmode. > > > > > > > > > > > > > > > > > > > > > > Unless there is BF16 language level support, HBmode has no > > > > > > > advantage > > > > > > > over HImode. We can add HBmode when we gain BF16 language > > > > > > > support. > > > > > > > > > > > > > > -- > > > > > > > H.J. > > > > > > > > > > > > Any other comments, I'll merge this to trunk? > > > > > > > > > > It is not a regression, so please no. > > > > > > > > Ehm, "regression fix" ... > > > > > > > > Uros. > > > > > > Update patch. > > > > Index: gcc/config/i386/i386-builtins.c > > =================================================================== > > --- gcc/config/i386/i386-builtins.c (revision 270934) > > +++ gcc/config/i386/i386-builtins.c (working copy) > > @@ -1920,6 +1920,7 @@ > > F_VPCLMULQDQ, > > F_AVX512VNNI, > > F_AVX512BITALG, > > + F_AVX512BF16, > > F_MAX > > }; > > > > @@ -2064,7 +2065,8 @@ > > {"gfni", F_GFNI, P_ZERO}, > > {"vpclmulqdq", F_VPCLMULQDQ, P_ZERO}, > > {"avx512vnni", F_AVX512VNNI, P_ZERO}, > > - {"avx512bitalg", F_AVX512BITALG, P_ZERO} > > + {"avx512bitalg", F_AVX512BITALG, P_ZERO}, > > + {"avx512bf16", F_AVX512BF16, P_ZERO} > > }; > > > > /* This parses the attribute arguments to target in DECL and determines > > > > You also need to update cpuinfo.h and cpuinfo.c in libgcc/config/i386 > > with avx512bf16, plus relevant test files. > > > > Index: gcc/testsuite/gcc.target/i386/avx-1.c > > Index: gcc/testsuite/gcc.target/i386/avx-2.c > > > > No need to update above two files, sse-*.c changes are enough to cover > > new functionality. > > > > Otherwise LGTM, but please repost updated patch with the ChangeLog > > entry (please see [1]). > > > > [1] https://www.gnu.org/software/gcc/contribute.html#patches > > > > Uros. > > Update patch: > 1. Add Changelog.
In fututre, please add ChangeLog entries as plaintext to the message, as instructed in [1]. Also, please read the links to GCC coding conventions and GNU Coding Standards for further information. [1] https://www.gnu.org/software/gcc/contribute.html#patches > 2. Update libgcc part. Please add one line of space after +2019-05-07 Wei Xiao <wei3.x...@intel.com> + + * in all ChangeLog entries (please follow the form of existing entries as an example). + * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test. + * gcc.target/i386/sse-12.c: Add -mavx512bf16. + * gcc.target/i386/sse-13.c: Ditto. + * gcc.target/i386/sse-14.c: Ditto. + * gcc.target/i386/sse-22.c: Ditto. + * gcc.target/i386/sse-23.c: Ditto. + * g++.dg/other/i386-2.C: Ditto. + * g++.dg/other/i386-3.C: Add avx512bf16. + +2019-05-07 Hongtao Liu <hongtao....@intel.com> + * gcc.target/i386/builtin_target.c: Ditto. You should group entries a bit and write: * gcc.target/i386/avx512bf16vl-vdpbf16ps-1.c: New test. * gcc.target/i386/builtin_target.c: Handle avx512bf16. * gcc.target/i386/sse-12.c: Add -mavx512bf16. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. @@ -265,6 +265,10 @@ assert (__builtin_cpu_supports ("avx5124vnniw")); if (edx & bit_AVX5124FMAPS) assert (__builtin_cpu_supports ("avx5124fmaps")); + + __cpuid_count (7, 1, eax, ebx, ecx, edx); + if (eax & bit_AVX512BF16) + assert (__builtin_cpu_supports ("avx512bf16")); } Please fix indentation here (too many leading spaces). OK for mainline with the above fixes. Thanks, Uros.