2016-06-17 17:48 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>: > On Fri, Jun 17, 2016 at 3:33 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: >> 2016-06-16 9:00 GMT+03:00 Jeff Law <l...@redhat.com>: >>> On 05/19/2016 01:39 PM, Ilya Enkovich wrote: >>>> >>>> Hi, >>>> >>>> This patch introduces changes required to run vectorizer on loop epilogue. >>>> This also enables epilogue vectorization using a vector of smaller size. >>>> >>>> Thanks, >>>> Ilya >>>> -- >>>> gcc/ >>>> >>>> 2016-05-19 Ilya Enkovich <ilya.enkov...@intel.com> >>>> >>>> * tree-if-conv.c (tree_if_conversion): Make public. >>>> * tree-if-conv.h: New file. >>>> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Don't >>>> try to enhance alignment for epilogues. >>>> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Return >>>> created loop. >>>> * tree-vect-loop.c: include tree-if-conv.h. >>>> (destroy_loop_vec_info): Preserve LOOP_VINFO_ORIG_LOOP_INFO in >>>> loop->aux. >>>> (vect_analyze_loop_form): Init LOOP_VINFO_ORIG_LOOP_INFO and reset >>>> loop->aux. >>>> (vect_analyze_loop): Reset loop->aux. >>>> (vect_transform_loop): Check if created epilogue should be >>>> returned >>>> for further vectorization. If-convert epilogue if required. >>>> * tree-vectorizer.c (vectorize_loops): Add a queue of loops to >>>> process and insert vectorized loop epilogues into this queue. >>>> * tree-vectorizer.h (vect_do_peeling_for_loop_bound): Return >>>> created >>>> loop. >>>> (vect_transform_loop): Return created loop. >>> >>> As Richi noted, the additional calls into the if-converter are unfortunate. >>> I'm not sure how else to avoid them though. It looks like we can run >>> if-conversion on just the epilogue, so maybe that's not too bad. >>> >>> >>>> @@ -1212,8 +1213,8 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, >>>> bool clean_stmts) >>>> destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); >>>> loop_vinfo->scalar_cost_vec.release (); >>>> >>>> + loop->aux = LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo); >>>> free (loop_vinfo); >>>> - loop->aux = NULL; >>>> } >>> >>> Hmm, there seems to be a level of indirection I'm missing here. We're >>> smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux. Ewww. I thought >>> the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from >>> the original loop to the vectorized epilogue. What am I missing? Rather >>> than smuggling around in the aux field, is there some inherent reason why we >>> can't just copy the info from the original loop directly into >>> LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue? >> >> LOOP_VINFO_ORIG_LOOP_INFO is used for several things: >> - mark this loop as epilogue >> - get VF of original loop (required for both mask and nomask modes) >> - get decision about epilogue masking >> >> That's all. When epilogue is created it has no LOOP_VINFO. Also when we >> vectorize loop we create and destroy its LOOP_VINFO multiple times. When >> loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in >> LOOP_VINFO_ORIG_LOOP_INFO. When Loop has no LOOP_VINFO associated I have no >> place to bind it with the original loop and therefore I use vacant loop->aux >> for that. Any other way to bind epilogue with its original loop would work >> as well. I just chose loop->aux to avoid new fields and data structures. >> >>> >>>> + /* FORNOW: Currently alias checks are not inherited for epilogues. >>>> + Don't try to vectorize epilogue because it will require >>>> + additional alias checks. */ >>> >>> Are the alias checks here redundant with the ones done for the original >>> loop? If so won't DOM eliminate them? >> >> I revisited this part recently and thought it should actually be safe to >> assume we have no aliasing in epilogue because we are dominated by alias >> checks of the original loop. So I prepared a patch to remove this >> restriction >> and avoid alias checks generation for epilogues (so we compute aliases checks >> required but don't emit them). I didn't send this patch yet. >> Do you think it is a valid assumption? > I recently visited that part and agree it's valid, unless epilogue > loop is vectorized in larger vector-units, but that would be unlikely > to happen, right? BTW, does this patch start all over analyzing > epilogue loop? As you said the alias checks will be computed.
Original loop is vectorized for the max possible vector size and we can't (and don't want to) choose a bigger one. We don't preserve any info for epilogue. Actually even when we try various vector sizes for a single loop we recompute everything for each vector size. Thanks, Ilya > > Thanks, > bin >> >>> >>> >>> And something just occurred to me -- is there some inherent reason why SLP >>> doesn't vectorize the epilogue, particularly for the cases where we can >>> vectorize the epilogue using smaller vectors? Sorry if you've already >>> answered this somewhere or it's a dumb question. >> >> IIUC this may happen only if we unroll epilogue into a single BB which >> happens >> only when epilogue iterations count is known. Right? >> >>> >>> >>> >>>> >>>> + /* Add new loop to a processing queue. To make it easier >>>> + to match loop and its epilogue vectorization in dumps >>>> + put new loop as the next loop to process. */ >>>> + if (new_loop) >>>> + { >>>> + loops.safe_insert (i + 1, new_loop->num); >>>> + vect_loops_num = number_of_loops (cfun); >>>> + } >>>> + >>> >>> So just to be clear, the only reason to do this is for dumps -- other than >>> processing the loop before it's epilogue, there's no other inherently >>> necessary ordering of the loops, right? >> >> Right, I don't see other reasons to do it. >> >> Thanks, >> Ilya >> >>> >>> >>> Jeff