On Wed, Jun 24, 2015 at 7:10 PM, Jeff Law <l...@redhat.com> wrote: > On 06/22/2015 10:27 AM, Alan Lawrence wrote: > >> >> My main thought concerns the direction we are travelling here. A major >> reason why we do if-conversion is to enable vectorization. Is this is >> targetted at gathering/scattering loads? Following vectorization, >> different elements of the vector being loaded/stored may have to go >> to/from the scratchpad or to/from main memory. >> >> Or, are we aiming at the case where the predicate or address are >> invariant? That seems unlikely - loop unswitching would be better for >> the predicate; loading from an address, we'd just peel and hoist; >> storing, this'd result in the address holding the last value written, at >> exit from the loop, a curious idiom. Where the predicate/address is >> invariant across the vector? (!) >> >> Or, at we aiming at non-vectorized code? > > I think we're aiming at correctness issues, particularly WRT not allowing > the optimizers to introduce new data races for C11/C++11.
Yes, and if you just look at scalar code then with the rewrite we can enable store if-conversion unconditionally. _But_ when the new scheme triggers vectorization cannot succeed on the result as we get if (cond) *p = val; if-converted to tem = cond ? p : &scratch; *tem = val; and if (cond) val = *p; if-converted to scatch = val; tem = cond ? p : &scratch; val = *tem; and thus the store and loads appear as scather/gather ones to the vectorizer (if-conversion could directly generate masked load/stores of course and not use a scratch-pad at all in that case). So the question is whether we get more non-vectorized if-converted code out of this (and thus whether we want to use --param allow-store-data-races to get the old code back which is nicer to less capable CPUs and probably faster than using scatter/gather or masked loads/stores). Note that I think scatter support is still not implemented in the vectorizer. I also wonder if we should really care about load data races (not sure your patch does). I didn't look at the patch in detail yet - please address Alans comments and re-post an updated patch. In general I like the idea. Thanks, Richard. >> >> >> Re. creation of scratchpads: >> (1) Should the '64' byte size be the result of scanning the >> function, for the largest data size to which we store? (ideally, >> conditionally store!) > > I suspect most functions have conditional stores, but far fewer have > conditional stores that we'd like to if-convert. ISTM that if we can lazily > allocate the scratchpad that'd be best. If this were an RTL pass, then I'd > say query the backend for the widest mode store insn and use that to size > the scratchpad. We may have something similar we can do in gimple without > resorting querying insn backend capabilities. Perhaps walking the in-scope > addressable variables or somesuch. > > >> (2) Allocating only once per function: if we had one scratchpad per >> loop, it could/would live inside the test of "gimple_build_call_internal >> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more >> loops in the function, but then fail to vectorize them, we'll leave the >> scratchpad around for later phases to clean up. Is that OK? > > If the scratchpad is local to a function, then I'd expect we'd clean it up > just like any other unused local. Once it's a global, then all bets are > off. > > Anyway, I probably should just look at the patch before making more > comments. > > jeff >