Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

Kyrill Tkachov Wed, 04 Mar 2020 09:21:22 -0800

Hi Delia,

On 3/4/20 2:05 PM, Delia Burduv wrote:

Hi,

The previous version of this patch shared part of its code with the
store intrinsics patch
(https://gcc.gnu.org/ml/gcc-patches/2020-03/msg00145.html) so I removed
any duplicated code. This patch now depends on the previously mentioned
store intrinsics patch.

Here is the latest version and the updated ChangeLog.

gcc/ChangeLog:

2019-03-04  Delia Burduv  <delia.bur...@arm.com>

        * config/arm/arm_neon.h (bfloat16_t): New typedef.
         (vld2_bf16): New.
        (vld2q_bf16): New.
        (vld3_bf16): New.
        (vld3q_bf16): New.
        (vld4_bf16): New.
        (vld4q_bf16): New.
        (vld2_dup_bf16): New.
        (vld2q_dup_bf16): New.
         (vld3_dup_bf16): New.
        (vld3q_dup_bf16): New.
        (vld4_dup_bf16): New.
        (vld4q_dup_bf16): New.
         * config/arm/arm_neon_builtins.def
         (vld2): Changed to VAR13 and added v4bf, v8bf
         (vld2_dup): Changed to VAR8 and added v4bf, v8bf
         (vld3): Changed to VAR13 and added v4bf, v8bf
         (vld3_dup): Changed to VAR8 and added v4bf, v8bf
         (vld4): Changed to VAR13 and added v4bf, v8bf
         (vld4_dup): Changed to VAR8 and added v4bf, v8bf
         * config/arm/iterators.md (VDXBF): New iterator.
         (VQ2BF): New iterator.
         *config/arm/neon.md (vld2): Used new iterators.
         (vld2_dup<mode>): Used new iterators.
         (vld2_dupv8bf): New.
         (vst3): Used new iterators.
         (vst3qa): Used new iterators.
         (vst3qb): Used new iterators.
         (vld3_dup<mode>): Used new iterators.
         (vld3_dupv8bf): New.
         (vst4): Used new iterators.
         (vst4qa): Used new iterators.
         (vst4qb): Used new iterators.
         (vld4_dup<mode>): Used new iterators.
         (vld4_dupv8bf): New.

gcc/testsuite/ChangeLog:

2019-03-04  Delia Burduv  <delia.bur...@arm.com>

        * gcc.target/arm/simd/bf16_vldn_1.c: New test.

Thanks,
Delia

On 2/19/20 5:25 PM, Delia Burduv wrote:
>
> Hi,
>
> Here is the latest version of the patch. It just has some minor
> formatting changes that were brought up by Richard Sandiford in the
> AArch64 patches
>
> Thanks,
> Delia
>
> On 1/22/20 5:31 PM, Delia Burduv wrote:
>> Ping.
>>
>> I will change the tests to use the exact input and output registers as
>> Richard Sandiford suggested for the AArch64 patches.
>>
>> On 12/20/19 6:48 PM, Delia Burduv wrote:
>>> This patch adds the ARMv8.6 ACLE BFloat16 load intrinsics
>>> vld<n>{q}_bf16 as part of the BFloat16 extension.

>>>(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)

>>>
>>> The intrinsics are declared in arm_neon.h .
>>> A new test is added to check assembler output.
>>>
>>> This patch depends on the Arm back-end patche.
>>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>>
>>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't
>>> have commit rights, so if this is ok can someone please commit it for
>>> me?
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>>
>>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>>          (bfloat16x4x2_t): New typedef.
>>>          (bfloat16x8x2_t): New typedef.
>>>          (bfloat16x4x3_t): New typedef.
>>>          (bfloat16x8x3_t): New typedef.
>>>          (bfloat16x4x4_t): New typedef.
>>>          (bfloat16x8x4_t): New typedef.
>>>          (vld2_bf16): New.
>>>      (vld2q_bf16): New.
>>>      (vld3_bf16): New.
>>>      (vld3q_bf16): New.
>>>      (vld4_bf16): New.
>>>      (vld4q_bf16): New.
>>>      (vld2_dup_bf16): New.
>>>      (vld2q_dup_bf16): New.
>>>       (vld3_dup_bf16): New.
>>>      (vld3q_dup_bf16): New.
>>>      (vld4_dup_bf16): New.
>>>      (vld4q_dup_bf16): New.
>>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>>          (VAR13): New.
>>>          (arm_simd_types[Bfloat16x2_t]):New type.
>>>          * config/arm/arm-modes.def (V2BF): New mode.
>>>          * config/arm/arm-simd-builtin-types.def
>>>          (Bfloat16x2_t): New entry.
>>>          * config/arm/arm_neon_builtins.def
>>>          (vld2): Changed to VAR13 and added v4bf, v8bf
>>>          (vld2_dup): Changed to VAR8 and added v4bf, v8bf
>>>          (vld3): Changed to VAR13 and added v4bf, v8bf
>>>          (vld3_dup): Changed to VAR8 and added v4bf, v8bf
>>>          (vld4): Changed to VAR13 and added v4bf, v8bf
>>>          (vld4_dup): Changed to VAR8 and added v4bf, v8bf
>>>          * config/arm/iterators.md (VDXBF): New iterator.
>>>          (VQ2BF): New iterator.
>>>          (V_elem): Added V4BF, V8BF.
>>>          (V_sz_elem): Added V4BF, V8BF.
>>>          (V_mode_nunits): Added V4BF, V8BF.
>>>          (q): Added V4BF, V8BF.
>>>          *config/arm/neon.md (vld2): Used new iterators.
>>>          (vld2_dup<mode>): Used new iterators.
>>>          (vld2_dupv8bf): New.
>>>          (vst3): Used new iterators.
>>>          (vst3qa): Used new iterators.
>>>          (vst3qb): Used new iterators.
>>>          (vld3_dup<mode>): Used new iterators.
>>>          (vld3_dupv8bf): New.
>>>          (vst4): Used new iterators.
>>>          (vst4qa): Used new iterators.
>>>          (vst4qb): Used new iterators.
>>>          (vld4_dup<mode>): Used new iterators.
>>>          (vld4_dupv8bf): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>>
>>>      * gcc.target/arm/simd/bf16_vldn_1.c: New test.



diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c 
b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
new file mode 100644
index 
0000000000000000000000000000000000000000..7ff8b600827e5c2e313ce40d14382aa641b4bb31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vldn_1.c
@@ -0,0 +1,152 @@
+/* { dg-do assemble } */
+/* { dg-options "-save-temps" }  */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-final { check-function-bodies "**" "" } } */


I think this should include an optimisation option like -O2 because...

 +
+#include "arm_neon.h"
+
+
+/*
+**test_vld2_bf16:
+**     ...
+**     vld2.16 {d16-d17}, \[r3\]

... this is unstable codegen depending on the -O0 register allocator moving the 
ptr argument to r3 from its initial r0.
This should really be r0 and the load instruction should load the low D regs.
So let's add an -O2 to the dg-options and scan for the result of that.


Otherwise this is ok.
Thanks!
Kyrill


 +**    ...
+*/
+bfloat16x4x2_t
+test_vld2_bf16 (bfloat16_t * ptr)
+{
+  vld2_bf16 (ptr);
+}
+

Re: ACLE intrinsics: BFloat16 load intrinsics for AArch32

Reply via email to