https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120927
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Testcase that segfaults at runtime with -O3 -mavx512bw -mavx512vl --param
vect-partial-vector-usage=1
#include <vector>
std::vector<double> quadrature_points;
double weights[5];
static double __attribute__((aligned(64))) wts[]{2., 2., 2., 2., 5.};
void __attribute__((noipa)) foo(unsigned n)
{
for (unsigned i = 0; i < n; ++i)
quadrature_points[i] = weights[i] = wts[i];
}
int main()
{
quadrature_points.push_back (0.);
quadrature_points.push_back (0.);
quadrature_points.push_back (0.);
quadrature_points.push_back (0.);
quadrature_points.push_back (0.);
foo (5);
}
or alternatively the C testcase
static const double a[] = { 1., 2., 3., 4., 5. };
void __attribute__((noipa))
foo (double *b, double *bp, double c, int n)
{
for (int i = 0; i < n; ++i)
b[i] = bp[i] = a[i] * c;
}
int main()
{
double b[5], bp[5];
foo (b, bp, 3., 5);
}
The reason is we run into
static bool
vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo)
{
...
else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
/* ??? When peeling for gaps but not alignment, we could
try to check whether the (variable) niters is known to be
VF * N + 1. That's something of a niche case though. */
|| LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
|| ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
< (unsigned) exact_log2 (const_vf))
/* In case of versioning, check if the maximum number of
iterations is greater than th. If they are identical,
the epilogue is unnecessary. */
&& (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
|| ((unsigned HOST_WIDE_INT) max_niter
/* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD
but that's only computed later based on our result.
The following is the most conservative approximation. */
> (std::max ((unsigned HOST_WIDE_INT) th,
const_vf) / const_vf) * const_vf))))
return true;
return false;
which decides that peeling or partial vectors are _not_ necessary as
we are versioning for aliasing and max_niter (== 5) > 8.
But we use LOOP_VINFO_COST_MODEL_THRESHOLD which isn't even computed yet.
Also the code uses > rather than ==, so it's wrong, at least for
the partial vector case.
OTOH we should never even consider an epilogue with a gt VF than its
main loop.