-----Original Message----- From: Jeff Law [mailto:l...@redhat.com] Sent: Friday, July 10, 2015 4:04 AM To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE
On 06/02/2015 10:43 PM, Ajit Kumar Agarwal wrote: > > > -----Original Message----- > From: Jeff Law [mailto:l...@redhat.com] > Sent: Tuesday, June 02, 2015 9:19 PM > To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org > Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju > Mekala > Subject: Re: [RFC] Design and Implementation for Path Splitting for > Loop with Conditional IF-THEN-ELSE > > On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote: >>>> I don't offhand know if any of the benchmarks you cite above are >>>> free-enough to derive a testcase from. But one trick many of us >>>> use is to instrument the >>pass and compile some known free >>>> software (often gcc >>>> itself) to find triggering code and use that to generate tests for the new >>>> transformation. >> >> I will add tests in the suite. I could see many existing tests in the suite >> also get triggered with this optimization. >>> Thanks. For cases in the existing testsuite where you need to change the >>> expected output, it's useful to note why the expected output was changed. >>> >>Sometimes a test is compromised by a new optimization, sometimes the >>> expected output is changed and is papering over a problem, etc so it's >>> something >>we look at reasonably closely. > > Thanks. I will modify accordingly. > >> >>> >>> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96 >>> 100644 >>> --- a/gcc/cfghooks.c >>> +++ b/gcc/cfghooks.c >>> @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb) >>> >>> /* If we remove the header or the latch of a loop, mark the loop >>> for >>> removal. */ >>> - if (loop->latch == bb >>> + if (loop && loop->latch == bb >>> || loop->header == bb) >>> mark_loop_for_removal (loop); >>>> So what caused you to add this additional test? In general loop >>>> structures are supposed to always be available. The change here >>>> implies that the loop structures were gone at some point. That >>>> seems at first glance a mistake. >> >> I was using the gimple_duplicate_bb which will not add the duplicate >> basic block inside the current_loops. That's why the above Condition >> is required. I am using duplicate_block instead of gimple_duplicate_bb. With >> this change the above check with loop Is not required as it adds the >> duplicate basic block inside the loops. >>> OK. Good to hear it's not required anymore. > > >> >>> >>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409 >>> 100644 >>> --- a/gcc/tree-cfg.c >>> +++ b/gcc/tree-cfg.c >>> @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val) >>> } >>> } >>> >>> +void >>> +gimple_threaded_merge_blocks (basic_block a, basic_block b) >> If we keep this function it will need a block comment. >> >>>> I say "if" for a couple reasons. First we already have support >>>> routines that know how to merge blocks. If you really need to >>>> merge blocks you should try to use them. >> >>>> Second, I'm not sure that you really need to worry about block >>>> merging in this pass. Just create the duplicates, wire them into >>>> the CFG and let the existing block merging support handle this problem. >> >> The above routine is not merging but duplicates the join nodes into >> its predecessors. If I change the name of the above Function to the >> gimple_threaded_duplicating_join_node it should be fine. >>> But you don't need to duplicate into the predecessors. If you create the >>> duplicates and wire them into the CFG properly the existing code in >>> cfgcleanup >>should take care of this for you. > > Certainly I will do it. >> >>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c >>> index 4303a18..2c7d36d 100644 >>> --- a/gcc/tree-ssa-threadedge.c >>> +++ b/gcc/tree-ssa-threadedge.c >>> @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e, >>> return 0; >>> } >>> >>> +static void >>> +replace_threaded_uses (basic_block a,basic_block b) >>>> If you keep this function, then it'll need a function comment. >> >>>> It looks like this is just doing const/copy propagation. I think a >>>> better structure is to implement your optimization as a distinct >>>> pass, then rely on existing passes such as update_ssa, DOM, CCP to >>>> handle updating the SSA graph and propagation opportunities exposed >>>> by your transformation. >> >> >>>> Similarly for the other replace_ functions. >> >> I think these replace_ functions are required as existing Dom, CCP >> and propagation opportunities doesn't transform these Propagation given >> below. >> >> <bb 29>: >> xk_124 = MIN_EXPR <xy_123, xc_121>; >> xc_126 = xc_121 - xk_6; >> xm_127 = xm_122 - xk_6; >> xy_128 = xy_123 - xk_6; >> *EritePtr_14 = xc_126; >> MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] >> = xy_128; >> EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte >> *)EritePtr_14 + 3B] = xk_6; >> i_137 = i_4 + 1; >> goto <bb 31>; >> >> <bb 30>: >> xk_125 = MIN_EXPR <xy_123, xm_122>; >> xc_165 = xc_121 - xk_6; >> xm_166 = xm_122 - xk_6; >> xy_167 = xy_123 - xk_6; >> *EritePtr_14 = xc_126; >> MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] >> = xy_128; >> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte >> *)EritePtr_14 + 3B] = xk_6; >> i_173 = i_4 + 1; >> >> <bb 29> and <bb 30 > are the predecessors of the join node. After the >> duplication of join node and the duplicating the Join node into the >> predecessors < bb 29> and <bb 30>, the above is the gimple representations. >> } >> >> But the < bb 30 > should be the following after the transformation. >> >> <bb 30>: >> xk_125 = MIN_EXPR <xy_123, xm_122>; >> xc_165 = xc_121 - xk_125; >> xm_166 = xm_122 - xk_125; >> xy_167 = xy_123 - xk_125; >> *EritePtr_14 = xc_165; >> MEM[(Byte *)EritePtr_14 + 1B] = xm_166; MEM[(Byte *)EritePtr_14 + 2B] >> = xy_167; >> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte >> *)EritePtr_14 + 3B] = xk_125; >> i_173 = i_4 + 1; >> >> The phi-only cprop will replace the x_6 with xk_125 but the other >> replacements like xk_126, xk_127, xk_128 with Xk_165, xk_166, xk_167 >> will not be done by the existing routine like phi-only-cprop as this not >> only copy propagation because the duplicate block defines New results but >> the copy is still old with the join node. >> >> The replace function does the above transformation and I think it is >> required. >>> Can you send me the testcase this came from so that I can compile >>> and examine the dumps myself and possibly poke at things with the debugger? > >>I don't have enough context to know what's missing. > >>> If indeed there's a form of propagation missing, we should enhance >>> the propagation routines to handle the missing cases rather than >>> write new ones inside a pass. > > Please find the testcase given below. >>Thanks. But I don't offhand see how that testcase matches up with the code >>you provided. If I use your patch from May on x86_64 compiling with -O2, I >>>>have something like 9 basic blocks where your gimple hunks above show ~30. Thanks. The gimple above with ~30 is with respect to the application compiled with the changes. The testcase I have sent is the Subset of the application with lesser basic blocks. Sorry for the inconvenience caused. >>I've having to read between the lines a lot here, but I think the missing >>propagations are really an artifact of not calling the SSA updater and >>instead doing a >>by-hand update inside replace_threaded_uses_block. >>When you duplicate blocks, you duplicate side effects. ie, you will have one >>SSA_NAME that is set multiple times. Thankfully the block duplication code >>>>does the bookkeeping necessary so that if you tell the pass manager that >>you need an incremental ssa update, the right things will "just happen". Yes I agree. >>So again, I like what you're trying to do, but I think that implementing SSA >>updates, const/copy propagation, block merging, etc within your code is the >>wrong >>thing to do. >>Instead make your code an independent pass -- which would include duplicating >>blocks and wiring up the new edges in the CFG and possibly deleting some >>>>edges in the CFG. Make sure the new pass returns TODO_cleanup_cfg and >>TODO_update_ssa which should handle the block merging and an incremental >>>>update of the SSA graph. Thanks. In the updated patch I have sent a week back, have made the path splitting as an independent pass and made the changes as given above. >>I believe you'll want a phi-only cprop pass run after your path splitting >>which will take care of the const/copy propagation exposed by path splitting. In the updated patch, I have placed the path splitting optimization pass well before the cprop. >>I'll try to look at the updated patch shortly. Thanks! I am looking forward to it. Thanks & Regards Ajit Jeff