Abe wrote:
In other words, the problem about which I was concerned is not going to be triggered by
e.g. "if (c) x = ..."
which lacks an attached "else x = ..." in a multithreaded program without
enough locking just because 'x' is global/static.
The only remaining case to consider is if some code being compiler takes the address of
something thread-local and then "gives"
that pointer to another thread. Even for _that_ extreme case, Sebastian says
that the gimplifier will detect this
"address has been taken" situation and do the right thing such that the new if
converter also does the right thing.
Great :). I don't understand much/anything about how gcc deals with
thread-locals, but everything before that, all sounds good...
[Alan wrote:]
Can you give an example?
The test cases in the GCC tree at "gcc.dg/vect/pr61194.c" and
"gcc.dg/vect/vect-mask-load-1.c"
currently test as: the new if-converter is "converting" something that`s
already vectorizer-friendly...
> [snip]
However, TTBOMK the vectorizer already "understands" that in cases where its
input looks like:
x = c ? y : z;
... and 'y' and 'z' are both pure [side-effect-free] -- including, but not limited to,
they must be non-"volatile" --
it may vectorize a loop containing code like the preceding, ignoring for this
particular instance the C mandate
that only one of {y, z} be evaluated...
My understanding, is that any decision as to whether one or both of y or z is
evaluated (when 'evaluation' involves doing any work, e.g. a load), has already
been encoded into the gimple/tree IR. Thus, if we are to only evaluate one of
'y' or 'z' in your example, the IR will (prior to if-conversion), contain basic
blocks and control flow, that means we jump around the one that's not evaluated.
This appears to be the case in pr61194.c: prior to if-conversion, the IR for the
loop in barX is
<bb 3>:
# i_16 = PHI <i_13(7), 0(2)>
# ivtmp_21 = PHI <ivtmp_20(7), 1024(2)>
_5 = x[i_16];
_6 = _5 > 0.0;
_7 = w[i_16];
_8 = _7 < 0.0;
_9 = _6 & _8;
if (_9 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
iftmp.0_10 = z[i_16];
goto <bb 6>;
<bb 5>:
iftmp.0_11 = y[i_16];
<bb 6>:
# iftmp.0_2 = PHI <iftmp.0_10(4), iftmp.0_11(5)>
z[i_16] = iftmp.0_2;
i_13 = i_16 + 1;
ivtmp_20 = ivtmp_21 - 1;
if (ivtmp_20 != 0)
goto <bb 7>;
else
goto <bb 8>;
<bb 7>:
goto <bb 3>;
which clearly contains (unvectorizable!) control flow. Without
-ftree-loop-if-convert-stores, if-conversion leaves this alone, and
vectorization fails (i.e. the vectorizer bails out because the loop has >2 basic
blocks). With -ftree-loop-if-convert-stores, if-conversion produces
<bb 3>:
# i_16 = PHI <i_13(4), 0(2)>
# ivtmp_21 = PHI <ivtmp_20(4), 1024(2)>
_5 = x[i_16];
_6 = _5 > 0.0;
_7 = w[i_16];
_8 = _7 < 0.0;
_9 = _6 & _8;
iftmp.0_10 = z[i_16]; // <== here
iftmp.0_11 = y[i_16]; // <== here
iftmp.0_2 = _9 ? iftmp.0_10 : iftmp.0_11;
z[i_16] = iftmp.0_2;
i_13 = i_16 + 1;
ivtmp_20 = ivtmp_21 - 1;
if (ivtmp_20 != 0)
goto <bb 4>;
else
goto <bb 5>;
<bb 4>:
goto <bb 3>;
where I have commented the conditional loads that have become unconditional.
(Hence, "-ftree-loop-if-convert-stores" looks misnamed - it affects how the
if-conversion phase converts loads, too - please correct me if I misunderstand
(Richard?) ?!) This contains no control flow, and so is vectorizable.
(This is all without your scratchpad patch, of course.) IOW this being
vectorized, or not, relies upon the preceding if-conversion phase removing the
control flow.
HTH
Alan