(if-conversion could directly generate masked load/stores
 of course and not use a scratch-pad at all in that case).

IMO that`s a great idea, but I don`t know how to do it.  Hints would be welcome.  In 
particular, how does one "generate masked load/stores" at the GIMPLE level?


But are we correctly handling these cases in the current if conversion code?

I`m uncertain to what that is intended to refer, but I believe Sebastian would 
agree that the new if converter is safer than the old one in terms of 
correctness at the time of running the code being compiled.


Abe's changes would seem like a step forward from a correctness standpoint

Not to argue, but as a point of humility: Sebastian did by far the most work on 
this patch.  I just modernized it and helped move it along.  Even _that_ was 
done with Sebastian`s help.


even if they take us a step backwards from a performance standpoint.

For now, we have a few performance regressions, and so far we have found that 
it`s non-trivial to remove all of those regressions.
We may be better off pushing the current patch to trunk and having the 
performance regressions currently introduced by the new if converter be fixed 
by later patches.
Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the code will 
get more "eyeballs" than if it lingers in an uncommitted patch or in a branch.
I, for one, would love some help in fixing these performance regressions. ;-)

If fixing the performance regressions winds up taking too long, perhaps the 
current imperfect patch could be undone on trunk just before a release is 
tagged,
and then we`ll push it in again when trunk goes back to being allowed to be 
unstable?  According to my analysis of the data near the end of the page at
<https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to work 
on not-yet-perfect patches in trunk.


So the question is whether we get more non-vectorized if-converted
code out of this (and thus whether we want to use --param
allow-store-data-races to get the old code back which is nicer to less
capable CPUs and probably faster than using scatter/gather or masked 
loads/stores).

I do think conditionalizing some of this on the allow-store-data-races makes 
sense.

I think having both the old if-converter and the new one "live on" in GCC is 
nontrivial, but not impossible.  I also don`t think it`s the best long-term goal,
but only a short-term workaround.  In the long run, IMO there should be only one if 
converter, albeit perhaps with tweaking flags [e.g. 
"-fallow-unsafe-if-conversion"].


I also wonder if we should really care about load data races (not sure your 
patch does).

According to a recent long discussion I had with Sebastian, our current patch 
does not have the flaw I was concerned it might have in terms of loads because:

  [1] the scratchpad is only being used to if-convert assignments to 
thread-local scalars, never to globals/statics, and because...

  [2] the gimplifier is supposed to detect "the address of this scalar has been 
taken" and when such is detected in the code being compiled,
      it causes the scalar to no longer look like a scalar in GIMPLE so that we 
are also safe from stale-data problems that could come from
      corner-case code that takes the address of a thread-local variable and 
gives that address to another thread [which then proceeds to
      overwrite the value in the supposedly-thread-local scalar that belongs to 
a different thread from the one doing the writing]


Regards,

Abe


Reply via email to