Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

Kyrill Tkachov Fri, 06 Mar 2020 02:45:45 -0800

Hi Delia,

On 3/5/20 3:53 PM, Delia Burduv wrote:

Hi,
This is the latest version of the patch. I am forcing -mfloat-abi=hardbecause the register allocator behaves differently depending on thefloat-abi used.

Thanks, I've pushed it to master with an updated ChangeLog reflectingthe recent changes. In the future, please send an updated ChangeLogwhenever something changes in the patches.


Thanks again!

Kyrill


2020-03-06  Delia Burduv  <delia.bur...@arm.com>

    * config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
    (bfloat16x8x2_t): New typedef.
    (bfloat16x4x3_t): New typedef.
    (bfloat16x8x3_t): New typedef.
    (bfloat16x4x4_t): New typedef.
    (bfloat16x8x4_t): New typedef.
    (vst2_bf16): New.
    (vst2q_bf16): New.
    (vst3_bf16): New.
    (vst3q_bf16): New.
    (vst4_bf16): New.
    (vst4q_bf16): New.
    * config/arm/arm-builtins.c (v2bf_UP): Define.
    (VAR13): New.
    (arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
    * config/arm/arm-modes.def (V2BF): New mode.
    * config/arm/arm-simd-builtin-types.def
    (Bfloat16x2_t): New entry.
    * config/arm/arm_neon_builtins.def
    (vst2): Changed to VAR13 and added v4bf, v8bf
    (vst3): Changed to VAR13 and added v4bf, v8bf
    (vst4): Changed to VAR13 and added v4bf, v8bf
    * config/arm/iterators.md (VDXBF): New iterator.
    (VQ2BF): New iterator.
    *config/arm/neon.md (neon_vst2<mode>): Used new iterators.
    (neon_vst2<mode>): Used new iterators.
    (neon_vst3<mode>): Used new iterators.
    (neon_vst3<mode>): Used new iterators.
    (neon_vst3qa<mode>): Used new iterators.
    (neon_vst3qb<mode>): Used new iterators.
    (neon_vst4<mode>): Used new iterators.
    (neon_vst4<mode>): Used new iterators.
    (neon_vst4qa<mode>): Used new iterators.
    (neon_vst4qb<mode>): Used new iterators.


Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:

Hi Delia,

On 3/3/20 5:23 PM, Delia Burduv wrote:

Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and thisis the latest version.


Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:

Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:

Hi,

I made a mistake in the previous patch. This is the latestversion. Please let me know if it is ok.


Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:

Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is howthe aarch64 versions are implemented too.


Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:

Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:

Hi,

Here is the latest version of the patch. It just has some minor

formatting changes that were brought up by Richard Sandiford inthe

AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>

> I will change the tests to use the exact input and outputregisters as

> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst<n>{q}_bf16 as part of the BFloat16 extension.

>>(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)

>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>

>> Tested for regression on arm-none-eabi and armeb-none-eabi.I don't>> have commit rights, so if this is ok can someone pleasecommit it for me?

>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>
>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>          (bfloat16x4x2_t): New typedef.
>>          (bfloat16x8x2_t): New typedef.
>>          (bfloat16x4x3_t): New typedef.
>>          (bfloat16x8x3_t): New typedef.
>>          (bfloat16x4x4_t): New typedef.
>>          (bfloat16x8x4_t): New typedef.
>>          (vst2_bf16): New.
>>      (vst2q_bf16): New.
>>      (vst3_bf16): New.
>>      (vst3q_bf16): New.
>>      (vst4_bf16): New.
>>      (vst4q_bf16): New.
>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>          (VAR13): New.
>>          (arm_simd_types[Bfloat16x2_t]):New type.
>>          * config/arm/arm-modes.def (V2BF): New mode.
>>          * config/arm/arm-simd-builtin-types.def
>>          (Bfloat16x2_t): New entry.
>>          * config/arm/arm_neon_builtins.def
>>          (vst2): Changed to VAR13 and added v4bf, v8bf
>>          (vst3): Changed to VAR13 and added v4bf, v8bf
>>          (vst4): Changed to VAR13 and added v4bf, v8bf
>>          * config/arm/iterators.md (VDXBF): New iterator.
>>          (VQ2BF): New iterator.
>>          (V_elem): Added V4BF, V8BF.
>>          (V_sz_elem): Added V4BF, V8BF.
>>          (V_mode_nunits): Added V4BF, V8BF.
>>          (q): Added V4BF, V8BF.
>>          *config/arm/neon.md (vst2): Used new iterators.
>>          (vst3): Used new iterators.
>>          (vst3qa): Used new iterators.
>>          (vst3qb): Used new iterators.
>>          (vst4): Used new iterators.
>>          (vst4qa): Used new iterators.
>>          (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>
>>      * gcc.target/arm/simd/bf16_vstn_1.c: New test.

One thing I just noticed in this and the other arm bfloat16patches...


diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h

index3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a100644

--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h

@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t__r, float32x4_t __a, float32x4_t __b, return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b,__index);

  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;

These should be in a new arm_bf16.h file that gets included inthe main arm_neon.h file, right?

I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+

diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.cb/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c

new file mode 100644

index0000000000000000000000000000000000000000..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
@@ -0,0 +1,84 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+

I don't see the check-function-bodies checks being performed in mytesting. Changing the directives order to:

/* { dg-do assemble } */
/* { dg-options "-save-temps" }  */
/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
/* { dg-add-options arm_v8_2a_bf16_neon } */
/* { dg-final { check-function-bodies "**" "" } } */

makes them run but they fail, I think because this test also needs an-O2 option, same as the load intrinsics patch. Can you please adjustthe order of the dg-* directives in the test and the function bodyscan tests to match the codegen?

With this, it will be ready to go :)
Thanks,
Kyrill



  +#include "arm_neon.h"
+
+/*
+**test_vst2_bf16:
+**    ...
+**    vst2.16    {d16-d17}, \[r0\]
+**    ...
+*/
+void
+test_vst2_bf16 (bfloat16_t *ptr, bfloat16x4x2_t val)
+{
+  vst2_bf16 (ptr, val);
+}
+

Re: ACLE intrinsics: BFloat16 store (vst{q}_bf16) intrinsics for AArch32

Reply via email to