On Wed, Jun 24, 2015 at 7:10 PM, Jeff Law <l...@redhat.com> wrote:
> On 06/22/2015 10:27 AM, Alan Lawrence wrote:
>
>>
>> My main thought concerns the direction we are travelling here. A major
>> reason why we do if-conversion is to enable vectorization. Is this is
>> targetted at gathering/scattering loads? Following vectorization,
>> different elements of the vector being loaded/stored may have to go
>> to/from the scratchpad or to/from main memory.
>>
>> Or, are we aiming at the case where the predicate or address are
>> invariant? That seems unlikely - loop unswitching would be better for
>> the predicate; loading from an address, we'd just peel and hoist;
>> storing, this'd result in the address holding the last value written, at
>> exit from the loop, a curious idiom. Where the predicate/address is
>> invariant across the vector? (!)
>>
>> Or, at we aiming at non-vectorized code?
>
> I think we're aiming at correctness issues, particularly WRT not allowing
> the optimizers to introduce new data races for C11/C++11.

Yes, and if you just look at scalar code then with the rewrite we can
enable store if-conversion unconditionally.

_But_

when the new scheme triggers vectorization cannot succeed on the
result as we get

  if (cond)
    *p = val;

if-converted to

  tem = cond ? p : &scratch;
  *tem = val;

and

   if (cond)
     val = *p;

if-converted to

  scatch = val;
  tem = cond ? p : &scratch;
  val = *tem;

and thus the store and loads appear as scather/gather ones to
the vectorizer (if-conversion could directly generate masked
load/stores of course and not use a scratch-pad at all in that case).

So the question is whether we get more non-vectorized if-converted
code out of this (and thus whether we want to use
--param allow-store-data-races to get the old code back which is
nicer to less capable CPUs and probably faster than using
scatter/gather or masked loads/stores).  Note that I think scatter
support is still not implemented in the vectorizer.  I also wonder
if we should really care about load data races (not sure your patch
does).

I didn't look at the patch in detail yet - please address Alans comments
and re-post an updated patch.

In general I like the idea.

Thanks,
Richard.

>>
>>
>> Re. creation of scratchpads:
>>     (1) Should the '64' byte size be the result of scanning the
>> function, for the largest data size to which we store? (ideally,
>> conditionally store!)
>
> I suspect most functions have conditional stores, but far fewer have
> conditional stores that we'd like to if-convert.  ISTM that if we can lazily
> allocate the scratchpad that'd be best.   If this were an RTL pass, then I'd
> say query the backend for the widest mode store insn and use that to size
> the scratchpad.  We may have something similar we can do in gimple without
> resorting querying insn backend capabilities. Perhaps walking the in-scope
> addressable variables or somesuch.
>
>
>>     (2) Allocating only once per function: if we had one scratchpad per
>> loop, it could/would live inside the test of "gimple_build_call_internal
>> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
>> loops in the function, but then fail to vectorize them, we'll leave the
>> scratchpad around for later phases to clean up. Is that OK?
>
> If the scratchpad is local to a function, then I'd expect we'd clean it up
> just like any other unused local.  Once it's a global, then all bets are
> off.
>
> Anyway, I probably should just look at the patch before making more
> comments.
>
> jeff
>

Reply via email to