Hi Delia,

On 3/5/20 3:53 PM, Delia Burduv wrote:
Hi,

This is the latest version of the patch. I am forcing -mfloat-abi=hard because the register allocator behaves differently depending on the float-abi used.

Thanks, I've pushed it to master with an updated ChangeLog reflecting the recent changes. In the future, please send an updated ChangeLog whenever something changes in the patches.

Thanks again!

Kyrill


2020-03-06  Delia Burduv  <delia.bur...@arm.com>

    * config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
    (bfloat16x8x2_t): New typedef.
    (bfloat16x4x3_t): New typedef.
    (bfloat16x8x3_t): New typedef.
    (bfloat16x4x4_t): New typedef.
    (bfloat16x8x4_t): New typedef.
    (vst2_bf16): New.
    (vst2q_bf16): New.
    (vst3_bf16): New.
    (vst3q_bf16): New.
    (vst4_bf16): New.
    (vst4q_bf16): New.
    * config/arm/arm-builtins.c (v2bf_UP): Define.
    (VAR13): New.
    (arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
    * config/arm/arm-modes.def (V2BF): New mode.
    * config/arm/arm-simd-builtin-types.def
    (Bfloat16x2_t): New entry.
    * config/arm/arm_neon_builtins.def
    (vst2): Changed to VAR13 and added v4bf, v8bf
    (vst3): Changed to VAR13 and added v4bf, v8bf
    (vst4): Changed to VAR13 and added v4bf, v8bf
    * config/arm/iterators.md (VDXBF): New iterator.
    (VQ2BF): New iterator.
    *config/arm/neon.md (neon_vst2<mode>): Used new iterators.
    (neon_vst2<mode>): Used new iterators.
    (neon_vst3<mode>): Used new iterators.
    (neon_vst3<mode>): Used new iterators.
    (neon_vst3qa<mode>): Used new iterators.
    (neon_vst3qb<mode>): Used new iterators.
    (neon_vst4<mode>): Used new iterators.
    (neon_vst4<mode>): Used new iterators.
    (neon_vst4qa<mode>): Used new iterators.
    (neon_vst4qb<mode>): Used new iterators.




Thanks,
Delia

On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
Hi Delia,

On 3/3/20 5:23 PM, Delia Burduv wrote:
Hi,

I noticed that the patch doesn't apply cleanly. I fixed it and this is the latest version.

Thanks,
Delia

On 3/3/20 4:23 PM, Delia Burduv wrote:
Sorry, I forgot the attachment.

On 3/3/20 4:20 PM, Delia Burduv wrote:
Hi,

I made a mistake in the previous patch. This is the latest version. Please let me know if it is ok.

Thanks,
Delia

On 2/21/20 3:18 PM, Delia Burduv wrote:
Hi Kyrill,

The arm_bf16.h is only used for scalar operations. That is how the aarch64 versions are implemented too.

Thanks,
Delia

On 2/21/20 2:06 PM, Kyrill Tkachov wrote:
Hi Delia,

On 2/19/20 5:25 PM, Delia Burduv wrote:
Hi,

Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in the
AArch64 patches

Thanks,
Delia

On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst<n>{q}_bf16 as part of the BFloat16 extension.
>> (https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi. I don't >> have commit rights, so if this is ok can someone please commit it for me?
>>
>> gcc/ChangeLog:
>>
>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>
>>      * config/arm/arm_neon.h (bfloat16_t): New typedef.
>>          (bfloat16x4x2_t): New typedef.
>>          (bfloat16x8x2_t): New typedef.
>>          (bfloat16x4x3_t): New typedef.
>>          (bfloat16x8x3_t): New typedef.
>>          (bfloat16x4x4_t): New typedef.
>>          (bfloat16x8x4_t): New typedef.
>>          (vst2_bf16): New.
>>      (vst2q_bf16): New.
>>      (vst3_bf16): New.
>>      (vst3q_bf16): New.
>>      (vst4_bf16): New.
>>      (vst4q_bf16): New.
>>          * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>>          (VAR13): New.
>>          (arm_simd_types[Bfloat16x2_t]):New type.
>>          * config/arm/arm-modes.def (V2BF): New mode.
>>          * config/arm/arm-simd-builtin-types.def
>>          (Bfloat16x2_t): New entry.
>>          * config/arm/arm_neon_builtins.def
>>          (vst2): Changed to VAR13 and added v4bf, v8bf
>>          (vst3): Changed to VAR13 and added v4bf, v8bf
>>          (vst4): Changed to VAR13 and added v4bf, v8bf
>>          * config/arm/iterators.md (VDXBF): New iterator.
>>          (VQ2BF): New iterator.
>>          (V_elem): Added V4BF, V8BF.
>>          (V_sz_elem): Added V4BF, V8BF.
>>          (V_mode_nunits): Added V4BF, V8BF.
>>          (q): Added V4BF, V8BF.
>>          *config/arm/neon.md (vst2): Used new iterators.
>>          (vst3): Used new iterators.
>>          (vst3qa): Used new iterators.
>>          (vst3qb): Used new iterators.
>>          (vst4): Used new iterators.
>>          (vst4qa): Used new iterators.
>>          (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14  Delia Burduv <delia.bur...@arm.com>
>>
>>      * gcc.target/arm/simd/bf16_vstn_1.c: New test.

One thing I just noticed in this and the other arm bfloat16 patches...

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t __r, float32x4_t __a, float32x4_t __b,     return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b, __index);
  }

+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+  bfloat16x4_t val[2];
+} bfloat16x4x2_t;


These should be in a new arm_bf16.h file that gets included in the main arm_neon.h file, right?
I believe the aarch64 versions are implemented that way.

Otherwise the patch looks good to me.
Thanks!
Kyrill


  +
+typedef struct bfloat16x8x2_t
+{
+  bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+


diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
@@ -0,0 +1,84 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+


I don't see the check-function-bodies checks being performed in my testing. Changing the directives order to:
/* { dg-do assemble } */
/* { dg-options "-save-temps" }  */
/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
/* { dg-add-options arm_v8_2a_bf16_neon } */
/* { dg-final { check-function-bodies "**" "" } } */

makes them run but they fail, I think because this test also needs an -O2 option, same as the load intrinsics patch. Can you please adjust the order of the dg-* directives in the test and the function body scan tests to match the codegen?
With this, it will be ready to go :)
Thanks,
Kyrill



  +#include "arm_neon.h"
+
+/*
+**test_vst2_bf16:
+**    ...
+**    vst2.16    {d16-d17}, \[r0\]
+**    ...
+*/
+void
+test_vst2_bf16 (bfloat16_t *ptr, bfloat16x4x2_t val)
+{
+  vst2_bf16 (ptr, val);
+}
+

Reply via email to