2016-06-16 8:22 GMT+03:00 Jeff Law <l...@redhat.com>:
> On 06/15/2016 05:03 AM, Richard Biener wrote:
>>
>> On Thu, May 19, 2016 at 9:39 PM, Ilya Enkovich
>> <enkovich....@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> This patch introduces changes required to run vectorizer on loop
>>> epilogue. This also enables epilogue vectorization using a vector
>>> of smaller size.
>>
>>
>> While the idea of epilogue vectorization sounds straight-forward the
>> implementation is somewhat icky with all the ->aux stuff, "redundant"
>> if-conversion and loop iteration stuff.
>>
>> So I was thinking of when epilogue vectorization is beneficial which
>> is obviously when the overall loop trip count is low.  We are not
>> good in optimizing for that case generally (too much peeling for
>> alignment, using expensive avx256 vectorization, etc.), so I wonder
>> if versioning for that case would be a better idea
>> (performance-wise).
>>
>> Thus - what cases were you looking at when deciding that vectorizing
>> the epilogue (with a smaller vector size) is profitable?  Do other
>> compilers generally do this?
>
> I would think it's better stated that the relative benefits of vectorizing
> the epilogue are greater the shorter the loop, but that's nit-picking the
> discussion.
>
> I do think you've got a legitimate question though.   Ilya, can you give any
> insights here based on your KNL and Haswell testing or data/insights from
> the LLVM and/or ICC teams?

I have no information about LLVM.  As I said in other thread ICC uses all
options (masked epilogue, combined loop, vectorized epilogue with smaller
vector size).  It also may generate different versions (e.g. combined and
with masked epilogue) and choose dynamically depending on iterations count.

Thanks,
Ilya

>
> Jeff

Reply via email to