On 10/31/22 05:56, Tamar Christina wrote:
Hi All,
This patch series is to add recognition of pairwise operations (reductions)
in match.pd such that we can benefit from them even at -O1 when the vectorizer
isn't enabled.
Ths use of these allow for a lot simpler codegen in AArch64 and allows us to
avoid quite a lot of codegen warts.
As an example a simple:
typedef float v4sf __attribute__((vector_size (16)));
float
foo3 (v4sf x)
{
return x[1] + x[2];
}
currently generates:
foo3:
dup s1, v0.s[1]
dup s0, v0.s[2]
fadd s0, s1, s0
ret
while with this patch series now generates:
foo3:
ext v0.16b, v0.16b, v0.16b, #4
faddp s0, v0.2s
ret
This patch will not perform the operation if the source is not a gimple
register and leaves memory sources to the vectorizer as it's able to deal
correctly with clobbers.
The use of these instruction makes a significant difference in codegen quality
for AArch64 and Arm.
NOTE: The last entry in the series contains tests for all of the previous
patches as it's a bit of an all or nothing thing.
Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* match.pd (adjacent_data_access_p): Import.
Add new pattern for bitwise plus, min, max, fmax, fmin.
* tree-cfg.cc (verify_gimple_call): Allow function arguments in IFNs.
* tree.cc (adjacent_data_access_p): New.
* tree.h (adjacent_data_access_p): New.
Nice stuff. I'd pondered some similar stuff at Tachyum, but got dragged
away before it could be implemented.
diff --git a/gcc/tree.cc b/gcc/tree.cc
index
007c9325b17076f474e6681c49966c59cf6b91c7..5315af38a1ead89ca5f75dc4b19de9841e29d311
100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10457,6 +10457,90 @@ bitmask_inv_cst_vector_p (tree t)
return builder.build ();
}
+/* Returns base address if the two operands represent adjacent access of data
+ such that a pairwise operation can be used. OP1 must be a lower subpart
+ than OP2. If POS is not NULL then on return if a value is returned POS
+ will indicate the position of the lower address. If COMMUTATIVE_P then
+ the operation is also tried by flipping op1 and op2. */
+
+tree adjacent_data_access_p (tree op1, tree op2, poly_uint64 *pos,
+ bool commutative_p)
Formatting nit. Return type on a different line.
OK with that fixed.
jeff