On Fri, Jul 31, 2015 at 7:43 PM, Abe <abe_skol...@yahoo.com> wrote:
> [Abe wrote:]
>>>
>>> TTBOMK, that "expense" is temporary, pending fixes/improvement
>>> to the new converter.  IOW, I don`t think any of the relevant
>>> regressions are as a result of "essential complexity".
>
>
> [Richard wrote:]
>>
>> I don't see how you can fix this without scatter support in the CPU.
>
>
> Well, for some test cases, I can see in the GIMPLE dumps that
> a/the scratchpad wasn`t needed here and wasn`t needed there.
> IOW, the converter that "knows how" to indirect via scratchpads
> is doing so _always_ rather than only when it`s truly needed.
> This is causing additional indirection which the vectorizer
> rejects.  Making the new converter "smarter" should fix this.
>
> Overall, however, I see your point and agree with it.  Gather support
> is needed for the loads and scatter for the stores, in the general case.

You skipped my analysis that even with scatter/gather support as
implemented by AVX on x86_64 vectorization is not possible because
of how those instructions are designed.

Note that the old converter is already "smarter".  Note that the old
converter knows to use masked loads/stores which is a closer
match to the original code and which will be faster and trivially
vectorizable.  Also note that all x86_64 CPUs that suport gather
also support masked loads and stores.

I know no other CPU architecture with scatter/gather support
(and doubt they do not at the same time implement masked loads/stores)

Richard.

>
> [Richard wrote:]
>>
>> But then you should restrict the if-conversion
>> to targets that can do gather vectorization.
>
>
> I think adding this requirement -- to generalize, _two_ req.s:
> _gather_ support for _load_ if conversion with [a] scratchpad[s],
> and scatter support for _store_ if conversion with [a] scratchpad[s] --
> is a reasonable and probably wise goal.  Doing so should eventually
> enable us to conclude automatically "it is safe to enable this kind of
> if conversion for/on this target" and enable the relevant transformations
> automatically when given e.g. "-O3" and autovectorization.
>
>
> [Abe wrote]
>>>
>>> As of now, the relevant "improvement" in/due-to the new
>>> converter is not always effective [...] We intend to fix this.
>
>
> [Richard wrote:]
>>
>> This should be fixed before the patch can go in.
>
>
> The current plan is to add new switches that control which [if any]
> if converter is active, and to have the old one _and_ the new one
> be included in GCC trunk for the time being.  Then I/we can have
> more eyes on the new code, helping with bugs etc., without having
> any impact at all on code that only either is compiled with
> the old flags [which shall retain their old semantics
> for now] or with no relevant flags whatsoever.
>
> Is that plan OK?
>
>
> [Richard wrote:]
>>
>> There is the existing mechanism to if-convert only a copy
>> of the loop guarded with IFN_LOOP_VECOTRIZED so the
>> cost model is that of the vectorizer when it tries to vectorize
>> the if-converted copy.  If it doesn't vectorize that copy the
>> non-if-converted loop is preserved, otherwise it is deleted.
>
>
> I think I found the code to which you are referring here,
> and I tried to use it [so as to not reduplicate efforts],
> but my first effort at doing so was unsuccessful AFAIK.
> Trying again -- this time with help from Jakub,
> I hope -- is on my work agenda.
>
>
> [Richard wrote:]
>>
>> I've argued in the past that the GIMPLE if-converter should
>> always go that path (thus strictly be a vectorization enabler only).
>
>
> If I understand correctly what you mean by that, then I agree.
>
>
>> if-conversion is already on by default if vectorizing.
>
>> We only do not introduce store-data-races by default.
>
> Thanks.
>
> Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included]
> release[s] were "safe" even without your recent bug fix [July 10]
> for code _not_ compiled with "-ftree-loop-if-convert-stores"?
> I ask b/c the test case I wrote which was able to cause GCC to
> generate crashing code [for a correct-under-low-optimization test
> case] intentionally had a crash due to dereferencing a null pointer
> in a _load_.  IIRC, getting GCC to generate that bad code required
> "-ftree-loop-if-convert-stores" despite it being a faulty _load_.
>
> FYI: I have made some progress on the SPEC testing, but I do not
> yet have the work at a point where I`m ready to report numbers.
>
>
> Regards,
>
> Abe

Reply via email to