2016-06-17 17:48 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>:
> On Fri, Jun 17, 2016 at 3:33 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
>> 2016-06-16 9:00 GMT+03:00 Jeff Law <l...@redhat.com>:
>>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote:
>>>>
>>>> Hi,
>>>>
>>>> This patch introduces changes required to run vectorizer on loop epilogue.
>>>> This also enables epilogue vectorization using a vector of smaller size.
>>>>
>>>> Thanks,
>>>> Ilya
>>>> --
>>>> gcc/
>>>>
>>>> 2016-05-19  Ilya Enkovich  <ilya.enkov...@intel.com>
>>>>
>>>>         * tree-if-conv.c (tree_if_conversion): Make public.
>>>>         * tree-if-conv.h: New file.
>>>>         * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't
>>>>         try to enhance alignment for epilogues.
>>>>         * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return
>>>>         created loop.
>>>>         * tree-vect-loop.c: include tree-if-conv.h.
>>>>         (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in
>>>>         loop->aux.
>>>>         (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset
>>>>         loop->aux.
>>>>         (vect_analyze_loop): Reset loop->aux.
>>>>         (vect_transform_loop): Check if created epilogue should be
>>>> returned
>>>>         for further vectorization.  If-convert epilogue if required.
>>>>         * tree-vectorizer.c (vectorize_loops): Add a queue of loops to
>>>>         process and insert vectorized loop epilogues into this queue.
>>>>         * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return
>>>> created
>>>>         loop.
>>>>         (vect_transform_loop): Return created loop.
>>>
>>> As Richi noted, the additional calls into the if-converter are unfortunate.
>>> I'm not sure how else to avoid them though.  It looks like we can run
>>> if-conversion on just the epilogue, so maybe that's not too bad.
>>>
>>>
>>>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo,
>>>> bool clean_stmts)
>>>>    destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo));
>>>>    loop_vinfo->scalar_cost_vec.release ();
>>>>
>>>> +  loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo);
>>>>    free (loop_vinfo);
>>>> -  loop->aux = NULL;
>>>>  }
>>>
>>> Hmm, there seems to be a level of indirection I'm missing here.  We're
>>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
>>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
>>> the original loop to the vectorized epilogue.  What am I missing?  Rather
>>> than smuggling around in the aux field, is there some inherent reason why we
>>> can't just copy the info from the original loop directly into
>>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?
>>
>> LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
>>  - mark this loop as epilogue
>>  - get VF of original loop (required for both mask and nomask modes)
>>  - get decision about epilogue masking
>>
>> That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
>> vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
>> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
>> LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
>> place to bind it with the original loop and therefore I use vacant loop->aux
>> for that.  Any other way to bind epilogue with its original loop would work
>> as well.  I just chose loop->aux to avoid new fields and data structures.
>>
>>>
>>>> +  /* FORNOW: Currently alias checks are not inherited for epilogues.
>>>> +     Don't try to vectorize epilogue because it will require
>>>> +     additional alias checks.  */
>>>
>>> Are the alias checks here redundant with the ones done for the original
>>> loop?  If so won't DOM eliminate them?
>>
>> I revisited this part recently and thought it should actually be safe to
>> assume we have no aliasing in epilogue because we are dominated by alias
>> checks of the original loop.  So I prepared a patch to remove this 
>> restriction
>> and avoid alias checks generation for epilogues (so we compute aliases checks
>> required but don't emit them).  I didn't send this patch yet.
>> Do you think it is a valid assumption?
> I recently visited that part and agree it's valid, unless epilogue
> loop is vectorized in larger vector-units, but that would be unlikely
> to happen, right?  BTW, does this patch start all over analyzing
> epilogue loop?  As you said the alias checks will be computed.

Original loop is vectorized for the max possible vector size and we can't
(and don't want to) choose a bigger one.

We don't preserve any info for epilogue.  Actually even when we try various
vector sizes for a single loop we recompute everything for each vector size.

Thanks,
Ilya

>
> Thanks,
> bin
>>
>>>
>>>
>>> And something just occurred to me -- is there some inherent reason why SLP
>>> doesn't vectorize the epilogue, particularly for the cases where we can
>>> vectorize the epilogue using smaller vectors?  Sorry if you've already
>>> answered this somewhere or it's a dumb question.
>>
>> IIUC this may happen only if we unroll epilogue into a single BB which 
>> happens
>> only when epilogue iterations count is known. Right?
>>
>>>
>>>
>>>
>>>>
>>>> +       /* Add new loop to a processing queue.  To make it easier
>>>> +          to match loop and its epilogue vectorization in dumps
>>>> +          put new loop as the next loop to process.  */
>>>> +       if (new_loop)
>>>> +         {
>>>> +           loops.safe_insert (i + 1, new_loop->num);
>>>> +           vect_loops_num = number_of_loops (cfun);
>>>> +         }
>>>> +
>>>
>>> So just to be clear, the only reason to do this is for dumps -- other than
>>> processing the loop before it's epilogue, there's no other inherently
>>> necessary ordering of the loops, right?
>>
>> Right, I don't see other reasons to do it.
>>
>> Thanks,
>> Ilya
>>
>>>
>>>
>>> Jeff

Reply via email to