On Fri, Jul 31, 2015 at 7:43 PM, Abe <abe_skol...@yahoo.com> wrote: > [Abe wrote:] >>> >>> TTBOMK, that "expense" is temporary, pending fixes/improvement >>> to the new converter. IOW, I don`t think any of the relevant >>> regressions are as a result of "essential complexity". > > > [Richard wrote:] >> >> I don't see how you can fix this without scatter support in the CPU. > > > Well, for some test cases, I can see in the GIMPLE dumps that > a/the scratchpad wasn`t needed here and wasn`t needed there. > IOW, the converter that "knows how" to indirect via scratchpads > is doing so _always_ rather than only when it`s truly needed. > This is causing additional indirection which the vectorizer > rejects. Making the new converter "smarter" should fix this. > > Overall, however, I see your point and agree with it. Gather support > is needed for the loads and scatter for the stores, in the general case.
You skipped my analysis that even with scatter/gather support as implemented by AVX on x86_64 vectorization is not possible because of how those instructions are designed. Note that the old converter is already "smarter". Note that the old converter knows to use masked loads/stores which is a closer match to the original code and which will be faster and trivially vectorizable. Also note that all x86_64 CPUs that suport gather also support masked loads and stores. I know no other CPU architecture with scatter/gather support (and doubt they do not at the same time implement masked loads/stores) Richard. > > [Richard wrote:] >> >> But then you should restrict the if-conversion >> to targets that can do gather vectorization. > > > I think adding this requirement -- to generalize, _two_ req.s: > _gather_ support for _load_ if conversion with [a] scratchpad[s], > and scatter support for _store_ if conversion with [a] scratchpad[s] -- > is a reasonable and probably wise goal. Doing so should eventually > enable us to conclude automatically "it is safe to enable this kind of > if conversion for/on this target" and enable the relevant transformations > automatically when given e.g. "-O3" and autovectorization. > > > [Abe wrote] >>> >>> As of now, the relevant "improvement" in/due-to the new >>> converter is not always effective [...] We intend to fix this. > > > [Richard wrote:] >> >> This should be fixed before the patch can go in. > > > The current plan is to add new switches that control which [if any] > if converter is active, and to have the old one _and_ the new one > be included in GCC trunk for the time being. Then I/we can have > more eyes on the new code, helping with bugs etc., without having > any impact at all on code that only either is compiled with > the old flags [which shall retain their old semantics > for now] or with no relevant flags whatsoever. > > Is that plan OK? > > > [Richard wrote:] >> >> There is the existing mechanism to if-convert only a copy >> of the loop guarded with IFN_LOOP_VECOTRIZED so the >> cost model is that of the vectorizer when it tries to vectorize >> the if-converted copy. If it doesn't vectorize that copy the >> non-if-converted loop is preserved, otherwise it is deleted. > > > I think I found the code to which you are referring here, > and I tried to use it [so as to not reduplicate efforts], > but my first effort at doing so was unsuccessful AFAIK. > Trying again -- this time with help from Jakub, > I hope -- is on my work agenda. > > > [Richard wrote:] >> >> I've argued in the past that the GIMPLE if-converter should >> always go that path (thus strictly be a vectorization enabler only). > > > If I understand correctly what you mean by that, then I agree. > > >> if-conversion is already on by default if vectorizing. > >> We only do not introduce store-data-races by default. > > Thanks. > > Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included] > release[s] were "safe" even without your recent bug fix [July 10] > for code _not_ compiled with "-ftree-loop-if-convert-stores"? > I ask b/c the test case I wrote which was able to cause GCC to > generate crashing code [for a correct-under-low-optimization test > case] intentionally had a crash due to dereferencing a null pointer > in a _load_. IIRC, getting GCC to generate that bad code required > "-ftree-loop-if-convert-stores" despite it being a faulty _load_. > > FYI: I have made some progress on the SPEC testing, but I do not > yet have the work at a point where I`m ready to report numbers. > > > Regards, > > Abe