Hi Delia,
On 3/5/20 3:53 PM, Delia Burduv wrote:
Hi,
This is the latest version of the patch. I am forcing -mfloat-abi=hard
because the register allocator behaves differently depending on the
float-abi used.
Thanks, I've pushed it to master with an updated ChangeLog reflecting
the recent changes. In the future, please send an updated ChangeLog
whenever something changes in the patches.
Thanks again!
Kyrill
2020-03-06 Delia Burduv <delia.bur...@arm.com>
* config/arm/arm_neon.h (bfloat16x4x2_t): New typedef.
(bfloat16x8x2_t): New typedef.
(bfloat16x4x3_t): New typedef.
(bfloat16x8x3_t): New typedef.
(bfloat16x4x4_t): New typedef.
(bfloat16x8x4_t): New typedef.
(vst2_bf16): New.
(vst2q_bf16): New.
(vst3_bf16): New.
(vst3q_bf16): New.
(vst4_bf16): New.
(vst4q_bf16): New.
* config/arm/arm-builtins.c (v2bf_UP): Define.
(VAR13): New.
(arm_init_simd_builtin_types): Init Bfloat16x2_t eltype.
* config/arm/arm-modes.def (V2BF): New mode.
* config/arm/arm-simd-builtin-types.def
(Bfloat16x2_t): New entry.
* config/arm/arm_neon_builtins.def
(vst2): Changed to VAR13 and added v4bf, v8bf
(vst3): Changed to VAR13 and added v4bf, v8bf
(vst4): Changed to VAR13 and added v4bf, v8bf
* config/arm/iterators.md (VDXBF): New iterator.
(VQ2BF): New iterator.
*config/arm/neon.md (neon_vst2<mode>): Used new iterators.
(neon_vst2<mode>): Used new iterators.
(neon_vst3<mode>): Used new iterators.
(neon_vst3<mode>): Used new iterators.
(neon_vst3qa<mode>): Used new iterators.
(neon_vst3qb<mode>): Used new iterators.
(neon_vst4<mode>): Used new iterators.
(neon_vst4<mode>): Used new iterators.
(neon_vst4qa<mode>): Used new iterators.
(neon_vst4qb<mode>): Used new iterators.
Thanks,
Delia
On 3/4/20 5:20 PM, Kyrill Tkachov wrote:
Hi Delia,
On 3/3/20 5:23 PM, Delia Burduv wrote:
Hi,
I noticed that the patch doesn't apply cleanly. I fixed it and this
is the latest version.
Thanks,
Delia
On 3/3/20 4:23 PM, Delia Burduv wrote:
Sorry, I forgot the attachment.
On 3/3/20 4:20 PM, Delia Burduv wrote:
Hi,
I made a mistake in the previous patch. This is the latest
version. Please let me know if it is ok.
Thanks,
Delia
On 2/21/20 3:18 PM, Delia Burduv wrote:
Hi Kyrill,
The arm_bf16.h is only used for scalar operations. That is how
the aarch64 versions are implemented too.
Thanks,
Delia
On 2/21/20 2:06 PM, Kyrill Tkachov wrote:
Hi Delia,
On 2/19/20 5:25 PM, Delia Burduv wrote:
Hi,
Here is the latest version of the patch. It just has some minor
formatting changes that were brought up by Richard Sandiford in
the
AArch64 patches
Thanks,
Delia
On 1/22/20 5:29 PM, Delia Burduv wrote:
> Ping.
>
> I will change the tests to use the exact input and output
registers as
> Richard Sandiford suggested for the AArch64 patches.
>
> On 12/20/19 6:46 PM, Delia Burduv wrote:
>> This patch adds the ARMv8.6 ACLE BFloat16 store intrinsics
>> vst<n>{q}_bf16 as part of the BFloat16 extension.
>>
(https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics)
>>
>> The intrinsics are declared in arm_neon.h .
>> A new test is added to check assembler output.
>>
>> This patch depends on the Arm back-end patche.
>> (https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html)
>>
>> Tested for regression on arm-none-eabi and armeb-none-eabi.
I don't
>> have commit rights, so if this is ok can someone please
commit it for me?
>>
>> gcc/ChangeLog:
>>
>> 2019-11-14 Delia Burduv <delia.bur...@arm.com>
>>
>> * config/arm/arm_neon.h (bfloat16_t): New typedef.
>> (bfloat16x4x2_t): New typedef.
>> (bfloat16x8x2_t): New typedef.
>> (bfloat16x4x3_t): New typedef.
>> (bfloat16x8x3_t): New typedef.
>> (bfloat16x4x4_t): New typedef.
>> (bfloat16x8x4_t): New typedef.
>> (vst2_bf16): New.
>> (vst2q_bf16): New.
>> (vst3_bf16): New.
>> (vst3q_bf16): New.
>> (vst4_bf16): New.
>> (vst4q_bf16): New.
>> * config/arm/arm-builtins.c (E_V2BFmode): New mode.
>> (VAR13): New.
>> (arm_simd_types[Bfloat16x2_t]):New type.
>> * config/arm/arm-modes.def (V2BF): New mode.
>> * config/arm/arm-simd-builtin-types.def
>> (Bfloat16x2_t): New entry.
>> * config/arm/arm_neon_builtins.def
>> (vst2): Changed to VAR13 and added v4bf, v8bf
>> (vst3): Changed to VAR13 and added v4bf, v8bf
>> (vst4): Changed to VAR13 and added v4bf, v8bf
>> * config/arm/iterators.md (VDXBF): New iterator.
>> (VQ2BF): New iterator.
>> (V_elem): Added V4BF, V8BF.
>> (V_sz_elem): Added V4BF, V8BF.
>> (V_mode_nunits): Added V4BF, V8BF.
>> (q): Added V4BF, V8BF.
>> *config/arm/neon.md (vst2): Used new iterators.
>> (vst3): Used new iterators.
>> (vst3qa): Used new iterators.
>> (vst3qb): Used new iterators.
>> (vst4): Used new iterators.
>> (vst4qa): Used new iterators.
>> (vst4qb): Used new iterators.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-11-14 Delia Burduv <delia.bur...@arm.com>
>>
>> * gcc.target/arm/simd/bf16_vstn_1.c: New test.
One thing I just noticed in this and the other arm bfloat16
patches...
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index
3c78f435009ab027f92693d00ab5b40960d5419d..fd81c18948db3a7f6e8e863d32511f75bf950e6a
100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -18742,6 +18742,89 @@ vcmlaq_rot270_laneq_f32 (float32x4_t
__r, float32x4_t __a, float32x4_t __b,
return __builtin_neon_vcmla_lane270v4sf (__r, __a, __b,
__index);
}
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+typedef struct bfloat16x4x2_t
+{
+ bfloat16x4_t val[2];
+} bfloat16x4x2_t;
These should be in a new arm_bf16.h file that gets included in
the main arm_neon.h file, right?
I believe the aarch64 versions are implemented that way.
Otherwise the patch looks good to me.
Thanks!
Kyrill
+
+typedef struct bfloat16x8x2_t
+{
+ bfloat16x8_t val[2];
+} bfloat16x8x2_t;
+
diff --git a/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
new file mode 100644
index
0000000000000000000000000000000000000000..b52ecfb959776fd04c7c33908cb7f8898ec3fe0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/bf16_vstn_1.c
@@ -0,0 +1,84 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon } */
+/* { dg-additional-options "-save-temps" } */
+/* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
+
I don't see the check-function-bodies checks being performed in my
testing. Changing the directives order to:
/* { dg-do assemble } */
/* { dg-options "-save-temps" } */
/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
/* { dg-add-options arm_v8_2a_bf16_neon } */
/* { dg-final { check-function-bodies "**" "" } } */
makes them run but they fail, I think because this test also needs an
-O2 option, same as the load intrinsics patch. Can you please adjust
the order of the dg-* directives in the test and the function body
scan tests to match the codegen?
With this, it will be ready to go :)
Thanks,
Kyrill
+#include "arm_neon.h"
+
+/*
+**test_vst2_bf16:
+** ...
+** vst2.16 {d16-d17}, \[r0\]
+** ...
+*/
+void
+test_vst2_bf16 (bfloat16_t *ptr, bfloat16x4x2_t val)
+{
+ vst2_bf16 (ptr, val);
+}
+