https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81366
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot
gnu.org
CC| |rguenth at gcc dot gnu.org
Keywords| |openmp
Status|ASSIGNED |NEW
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
So we vectorize this now, but we emit a runtime check that's always false
around the loop. This is from vect_do_peeling, the condition to skip_vector,
generated as
_8 + 4294967295 <= 2
with _8 defined as _8 = .GOMP_SIMD_VF (simduid.3_7(D))
and when updating .GOMP_SIMD_VF, which appears in BB2 in this case, there is
no htab, thus no VF recorded and we use VF == 1, which choses scalar code.
This might be because the loop in question has loop->simduid == NULL.
We do not vectorize the main computation loop (but only the init and the final
reduction loop), because
t.c:7:21: note: ==> examining statement: pretmp_31 = .MASK_LOAD (_20, 64,
_49, 0.0);
t.c:7:21: note: vect_is_simple_use: operand _21 < _27, type of def: internal
t.c:7:21: missed: unsupported masked emulated gather.
t.c:7:17: missed: not vectorized: relevant stmt not supported: pretmp_31 =
.MASK_LOAD (_20, 64, _49, 0.0);
t.c:7:21: note: unsupported SLP instance starting from: D.36226[_15] =
prephitmp_30;
t.c:7:21: missed: unsupported SLP instances
it seems that gather handling isn't working well, or that we fail to
recognize SIMD loads from .MASK_LOADs.
I also see
t.c:7:21: note: === vect_analyze_data_ref_dependences ===
(compute_affine_dependence
ref_a: MEM[(const double &)_19], stmt_a: _21 = MEM[(const double &)_19];
ref_b: D.36226[_15], stmt_b: D.36226[_15] = prephitmp_30;
) -> no dependence
(compute_affine_dependence
ref_a: MEM <double[32]> [(const double &)&D.36226][_15], stmt_a: _27 = MEM
<double[32]> [(const double &)&D.36226][_15];
ref_b: D.36226[_15], stmt_b: D.36226[_15] = prephitmp_30;
) -> dependence analysis failed
(compute_affine_dependence
ref_a: MEM[(const double &)_20], stmt_a: pretmp_31 = .MASK_LOAD (_20, 64,
_49, 0.0);
ref_b: D.36226[_15], stmt_b: D.36226[_15] = prephitmp_30;
) -> dependence analysis failed
Using -fno-tree-sink makes the loop vectorized. We're facing
_15 = .GOMP_SIMD_LANE (simduid.3_7(D), 0);
_16 = (long unsigned int) i_37;
_17 = _16 * 8;
_19 = x_18(D) + _17;
_32 = (sizetype) _15;
_41 = _32 * 8;
_20 = &D.36226 + _41;
_21 = MEM[(const double &)_19];
_27 = MEM <double[32]> [(const double &)&D.36226][_15];
_48 = _21 < _27;
pretmp_31 = .MASK_LOAD (_20, 64, _48, 0.0);
then instead of
_53 = (unsigned long) &D.36226;
...
_15 = .GOMP_SIMD_LANE (simduid.3_7(D), 0);
_16 = (long unsigned int) i_37;
_17 = _16 * 8;
_19 = x_18(D) + _17;
_21 = MEM[(const double &)_19];
_27 = MEM <double[32]> [(const double &)&D.36226][_15];
_49 = _21 < _27;
_32 = (sizetype) _15;
_41 = _32 * 8;
_54 = _53 + _41;
_20 = (double *) _54;
pretmp_31 = .MASK_LOAD (_20, 64, _49, 0.0);
that is, we sink
_32 = (sizetype) _15;
_41 = _32 * 8;
_20 = &D.36226 + _41;
and ifcvt turns the conditional POINTER_PLUS_EXPR into a PLUS_EXPR
(because UB), which analysis fails to handle. I think this is the code in
vect_find_stmt_data_reference. We fail to match
if (integer_zerop (off)
&& TREE_CODE (base_address) == POINTER_PLUS_EXPR)
{
off = TREE_OPERAND (base_address, 1);
base_address = TREE_OPERAND (base_address, 0);
for the base address (double *) ((sizetype) _15 * 8 + (sizetype) &D.36226).
The following is a crude hack lacking verification that fixes this,
vectorizing the main loop. It would be more maintainable to move
this pattern matching to a match.pd match I think. I'm not working on this.
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index ae556d85c7f..c320223926e 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5250,6 +5250,13 @@ vect_find_stmt_data_reference (loop_p loop, gimple
*stmt,
off = TREE_OPERAND (base_address, 1);
base_address = TREE_OPERAND (base_address, 0);
}
+ else if (integer_zerop (off)
+ && CONVERT_EXPR_CODE_P (TREE_CODE (base_address))
+ && TREE_CODE (TREE_OPERAND (base_address, 0)) == PLUS_EXPR)
+ {
+ off = TREE_OPERAND (TREE_OPERAND (base_address, 0), 0);
+ base_address = TREE_OPERAND (TREE_OPERAND (base_address, 0), 1);
+ }
STRIP_NOPS (off);
if (TREE_CODE (off) == MULT_EXPR
&& tree_fits_uhwi_p (TREE_OPERAND (off, 1)))