This patch improves SLP performance in combination with some patches I
have in development to add multiple vector sizes to amdgcn.
The problem is that amdgcn's preferred vector size has 64 lanes, and SLP
does not support lane masking. My patches will add smaller vector sizes
(32, 16, 8, 4, 2) which make the lane masking implicit, but still SLP
doesn't use them; it simply rejects the first size it sees and gives up.
This patch detects the rejection early and looks to see if there is a
smaller, more suitable vector size. The result is many more successful
SLP testcases.
OK to commit? (I have an x86_64 bootstrap and test in progress.)
Andrew
vect: Try smaller vector size when SLP split fails
If the preferred vector size is larger than can be used then try again with
a smaller size. This allows SLP to work more effectively on targets with very
large vectors.
gcc/ChangeLog:
* tree-vect-slp.c (vect_analyze_slp_instance): Reduce vector size if
the default mode is too large.
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 72192b5a813..95518a263c7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2367,6 +2367,16 @@ vect_analyze_slp_instance (vec_info *vinfo,
for (i = 0; i < group_size; i++)
if (!matches[i]) break;
+ if (i > 1 && i < group_size && i < const_nunits && scalar_type)
+ {
+ tree vec = get_vectype_for_scalar_type (vinfo, scalar_type, i);
+ if (vec)
+ {
+ nunits = TYPE_VECTOR_SUBPARTS (vec);
+ const_nunits = nunits.to_constant ();
+ }
+ }
+
if (i >= const_nunits && i < group_size)
{
/* Split into two groups at the first vector boundary before i. */