On 07/25/2018 07:08 PM, Sudakshina Das wrote:
Hi Sam
On 25/07/18 14:08, Sam Tebbs wrote:
On 07/23/2018 05:01 PM, Sudakshina Das wrote:
Hi Sam
On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote:
Hi all,
This patch extends the aarch64_get_lane_zero_extendsi instruction
definition to
also cover DI mode. This prevents a redundant AND instruction from
being
generated due to the pattern failing to be matched.
Example:
typedef char v16qi __attribute__ ((vector_size (16)));
unsigned long long
foo (v16qi a)
{
return a[0];
}
Previously generated:
foo:
umov w0, v0.b[0]
and x0, x0, 255
ret
And now generates:
foo:
umov w0, v0.b[0]
ret
Bootstrapped on aarch64-none-linux-gnu and tested on
aarch64-none-elf with no
regressions.
gcc/
2018-07-23 Sam Tebbs <sam.te...@arm.com>
* config/aarch64/aarch64-simd.md
(*aarch64_get_lane_zero_extendsi<mode>):
Rename to...
(*aarch64_get_lane_zero_extend<mode><VDQQH:mode>): ... This.
Use GPI iterator instead of SI mode.
gcc/testsuite
2018-07-23 Sam Tebbs <sam.te...@arm.com>
* gcc.target/aarch64/extract_zero_extend.c: New file
You will need an approval from a maintainer, but I would only add
one request to this:
diff --git a/gcc/config/aarch64/aarch64-simd.md
b/gcc/config/aarch64/aarch64-simd.md
index 89e38e6..15fb661 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3032,15 +3032,16 @@
[(set_attr "type" "neon_to_gp<q>")]
)
-(define_insn "*aarch64_get_lane_zero_extendsi<mode>"
- [(set (match_operand:SI 0 "register_operand" "=r")
- (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend<mode><VDQQH:mode>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (zero_extend:GPI
Since you are adding 4 new patterns with this change, could you add
more cases in your test as well to make sure you have coverage for
each of them.
Thanks
Sudi
Hi Sudi,
Thanks for the feedback. Here is an updated patch that adds more
testcases to cover the patterns generated by the different mode
combinations. The changelog and description from my original email
still apply.
Thanks for making the changes and adding more test cases. I do however
see that you are only covering 2 out of 4 new
*aarch64_get_lane_zero_extenddi<> patterns. The
*aarch64_get_lane_zero_extendsi<> were already existing. I don't mind
those tests. I would just ask you to add the other two new patterns
as well. Also since the different versions of the instruction generate
same instructions (like foo_16qi and foo_8qi both give out the same
instruction), I would suggest using a -fdump-rtl-final (or any relevant
rtl dump) with the dg-options and using a scan-rtl-dump to scan the
pattern name. Something like:
/* { dg-do compile } */
/* { dg-options "-O3 -fdump-rtl-final" } */
...
...
/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi"
"final" } } */
Thanks
Sudi
Hi Sudi,
Thanks again. Here's an update that adds 4 more tests, so all 8 patterns
generated are now tested for!
Below is the updated changelog
gcc/
2018-07-26 Sam Tebbs <sam.te...@arm.com>
* config/aarch64/aarch64-simd.md
(*aarch64_get_lane_zero_extendsi<mode>):
Rename to...
(*aarch64_get_lane_zero_extend<GPI:mode><VDQQH:mode>): ... This.
Use GPI iterator instead of SI mode.
gcc/testsuite
2018-07-26 Sam Tebbs <sam.te...@arm.com>
* gcc.target/aarch64/extract_zero_extend.c: New file
(vec_select:<VEL>
(match_operand:VDQQH 1 "register_operand" "w")
(parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
"TARGET_SIMD"
{
- operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL
(operands[2]));
+ operands[2] = aarch64_endian_lane_rtx (<VDQQH:MODE>mode,
+ INTVAL (operands[2]));
return "umov\\t%w0, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon_to_gp<q>")]
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index f1784d72e55c412d076de43f2f7aad4632d55ecb..e92a3b49c65e84d2a16a2a480c359a0b4d8fa3e3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3033,15 +3033,16 @@
[(set_attr "type" "neon_to_gp<q>")]
)
-(define_insn "*aarch64_get_lane_zero_extendsi<mode>"
- [(set (match_operand:SI 0 "register_operand" "=r")
- (zero_extend:SI
+(define_insn "*aarch64_get_lane_zero_extend<GPI:mode><VDQQH:mode>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (zero_extend:GPI
(vec_select:<VEL>
(match_operand:VDQQH 1 "register_operand" "w")
(parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
"TARGET_SIMD"
{
- operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
+ operands[2] = aarch64_endian_lane_rtx (<VDQQH:MODE>mode,
+ INTVAL (operands[2]));
return "umov\\t%w0, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon_to_gp<q>")]
diff --git a/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
new file mode 100644
index 0000000000000000000000000000000000000000..a294b261909a1d67ab339c929f2609dcda01c067
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/extract_zero_extend.c
@@ -0,0 +1,81 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-rtl-final" } */
+
+/* Tests div16qi. */
+typedef unsigned char div16qi __attribute__ ((vector_size (16)));
+/* Tests div8qi. */
+typedef unsigned char div8qi __attribute__ ((vector_size (8)));
+/* Tests div8hi. */
+typedef unsigned short div8hi __attribute__ ((vector_size (16)));
+/* Tests div4hi. */
+typedef unsigned short div4hi __attribute__ ((vector_size (8)));
+
+/* Tests siv16qi. */
+typedef unsigned char siv16qi __attribute__ ((vector_size (16)));
+/* Tests siv8qi. */
+typedef unsigned char siv8qi __attribute__ ((vector_size (8)));
+/* Tests siv8hi. */
+typedef unsigned short siv8hi __attribute__ ((vector_size (16)));
+/* Tests siv4hi. */
+typedef unsigned short siv4hi __attribute__ ((vector_size (8)));
+
+
+unsigned long long
+foo_div16qi (div16qi a)
+{
+ return a[0];
+}
+
+unsigned long long
+foo_div8qi (div8qi a)
+{
+ return a[0];
+}
+
+unsigned long long
+foo_div8hi (div8hi a)
+{
+ return a[0];
+}
+
+unsigned long long
+foo_div4hi (div4hi a)
+{
+ return a[0];
+}
+
+unsigned int
+foo_siv16qi (siv16qi a)
+{
+ return a[0];
+}
+
+unsigned int
+foo_siv8qi (siv8qi a)
+{
+ return a[0];
+}
+
+unsigned int
+foo_siv8hi (siv8hi a)
+{
+ return a[0];
+}
+
+unsigned int
+foo_siv4hi (siv4hi a)
+{
+ return a[0];
+}
+
+/* { dg-final { scan-assembler-times "umov\\t" 8 } } */
+/* { dg-final { scan-assembler-not "and\\t" } } */
+
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8qi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv8hi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv4hi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv16qi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8qi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv8hi" "final" } } */
+/* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extendsiv4hi" "final" } } */