RE: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

Ajit Kumar Agarwal Fri, 10 Jul 2015 03:07:32 -0700


-----Original Message-----
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Friday, July 10, 2015 4:04 AM
To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with 
Conditional IF-THEN-ELSE


On 06/02/2015 10:43 PM, Ajit Kumar Agarwal wrote:
>
>
> -----Original Message-----
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, June 02, 2015 9:19 PM
> To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org
> Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju 
> Mekala
> Subject: Re: [RFC] Design and Implementation for Path Splitting for 
> Loop with Conditional IF-THEN-ELSE
>
> On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:
>>>> I don't offhand know if any of the benchmarks you cite above are 
>>>> free-enough to derive a testcase from.  But one trick many of us 
>>>> use is to instrument the >>pass and compile some known free 
>>>> software (often gcc
>>>> itself) to find triggering code and use that to generate tests for the new 
>>>> transformation.
>>
>> I will add tests in the suite. I could see many existing tests in the suite 
>> also get triggered with this optimization.
>>> Thanks.  For cases in the existing testsuite where you need to change the 
>>> expected output, it's useful to note why the expected output was changed.  
>>> >>Sometimes a test is compromised by a new optimization, sometimes the 
>>> expected output is changed and is papering over a problem, etc so it's 
>>> something >>we look at reasonably closely.
>
> Thanks. I will modify accordingly.
>
>>
>>>
>>> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96
>>> 100644
>>> --- a/gcc/cfghooks.c
>>> +++ b/gcc/cfghooks.c
>>> @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)
>>>
>>>           /* If we remove the header or the latch of a loop, mark the loop 
>>> for
>>>              removal.  */
>>> -      if (loop->latch == bb
>>> +      if (loop && loop->latch == bb
>>>               || loop->header == bb)
>>>             mark_loop_for_removal (loop);
>>>> So what caused you to add this additional test?  In general loop 
>>>> structures are supposed to always be available.  The change here 
>>>> implies that the loop structures were gone at some point.  That 
>>>> seems at first glance a mistake.
>>
>> I was using the gimple_duplicate_bb which will not add the duplicate 
>> basic block inside the current_loops. That's why the above Condition 
>> is required. I am using duplicate_block instead of gimple_duplicate_bb. With 
>> this change the above check with loop Is not required as it adds the 
>> duplicate basic block inside the loops.
>>> OK.  Good to hear it's not required anymore.
>
>
>>
>>>
>>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409
>>> 100644
>>> --- a/gcc/tree-cfg.c
>>> +++ b/gcc/tree-cfg.c
>>> @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
>>>         }
>>>     }
>>>
>>> +void
>>> +gimple_threaded_merge_blocks (basic_block a, basic_block b)
>> If we keep this function it will need a block comment.
>>
>>>> I say "if" for a couple reasons.  First we already have support 
>>>> routines that know how to merge blocks.  If you really need to 
>>>> merge blocks you should try to use them.
>>
>>>> Second, I'm not sure that you really need to worry about block 
>>>> merging in this pass.  Just create the duplicates, wire them into 
>>>> the CFG and let the existing block merging support handle this problem.
>>
>> The above routine is not merging but duplicates the join nodes into 
>> its predecessors. If I change the name of the above Function to the 
>> gimple_threaded_duplicating_join_node it should be fine.
>>> But you don't need to duplicate into the predecessors.  If you create the 
>>> duplicates and wire them into the CFG properly the existing code in 
>>> cfgcleanup >>should take care of this for you.
>
> Certainly I will do it.
>>
>>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c 
>>> index 4303a18..2c7d36d 100644
>>> --- a/gcc/tree-ssa-threadedge.c
>>> +++ b/gcc/tree-ssa-threadedge.c
>>> @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
>>>       return 0;
>>>     }
>>>
>>> +static void
>>> +replace_threaded_uses (basic_block a,basic_block b)
>>>> If you keep this function, then it'll need a function comment.
>>
>>>> It looks like this is just doing const/copy propagation.  I think a 
>>>> better structure is to implement your optimization as a distinct 
>>>> pass, then rely on existing passes such as update_ssa, DOM, CCP to 
>>>> handle updating the SSA graph and propagation opportunities exposed 
>>>> by your transformation.
>>
>>
>>>> Similarly for the other replace_ functions.
>>
>> I think these replace_ functions are required as existing Dom, CCP 
>> and propagation opportunities doesn't transform these Propagation given 
>> below.
>>
>> <bb 29>:
>> xk_124 = MIN_EXPR <xy_123, xc_121>;
>> xc_126 = xc_121 - xk_6;
>> xm_127 = xm_122 - xk_6;
>> xy_128 = xy_123 - xk_6;
>> *EritePtr_14 = xc_126;
>> MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] 
>> = xy_128;
>> EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte 
>> *)EritePtr_14 + 3B] = xk_6;
>> i_137 = i_4 + 1;
>> goto <bb 31>;
>>
>> <bb 30>:
>> xk_125 = MIN_EXPR <xy_123, xm_122>;
>> xc_165 = xc_121 - xk_6;
>> xm_166 = xm_122 - xk_6;
>> xy_167 = xy_123 - xk_6;
>> *EritePtr_14 = xc_126;
>> MEM[(Byte *)EritePtr_14 + 1B] = xm_127; MEM[(Byte *)EritePtr_14 + 2B] 
>> = xy_128;
>> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte 
>> *)EritePtr_14 + 3B] = xk_6;
>> i_173 = i_4 + 1;
>>
>> <bb 29> and <bb 30 > are the predecessors of the join node. After the 
>> duplication of join node and the duplicating the Join node into the 
>> predecessors < bb 29> and <bb 30>, the above is the gimple representations.
>> }
>>
>> But the < bb 30 > should be the following after the transformation.
>>
>> <bb 30>:
>> xk_125 = MIN_EXPR <xy_123, xm_122>;
>> xc_165 = xc_121 - xk_125;
>> xm_166 = xm_122 - xk_125;
>> xy_167 = xy_123 - xk_125;
>> *EritePtr_14 = xc_165;
>> MEM[(Byte *)EritePtr_14 + 1B] = xm_166; MEM[(Byte *)EritePtr_14 + 2B] 
>> = xy_167;
>> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; MEM[(Byte 
>> *)EritePtr_14 + 3B] = xk_125;
>> i_173 = i_4 + 1;
>>
>> The phi-only cprop will  replace the x_6 with xk_125 but the other 
>> replacements like xk_126, xk_127, xk_128 with Xk_165, xk_166, xk_167 
>> will not be done by the existing routine like phi-only-cprop as this not 
>> only copy propagation because the duplicate block defines New results but 
>> the copy is still old with the join node.
>>
>> The replace function does the above transformation and I think it is 
>> required.
>>> Can you send me the testcase this came from so that I can compile 
>>> and examine the dumps myself and possibly poke at things with the debugger?
>    >>I don't have enough context to know what's missing.
>
>>> If indeed there's a form of propagation missing, we should enhance 
>>> the propagation routines to handle the missing cases rather than 
>>> write new ones inside a pass.
>
> Please find the testcase given below.
>>Thanks.  But I don't offhand see how that testcase matches up with the code 
>>you provided.  If I use your patch from May on x86_64 compiling with -O2, I 
>>>>have something like 9 basic blocks where your gimple hunks above show ~30.

Thanks. The gimple above with ~30 is with respect to the application compiled 
with the changes. The testcase I have sent is the
Subset of the application with lesser basic blocks. Sorry for the inconvenience 
caused.

>>I've having to read between the lines a lot here, but I think the missing 
>>propagations are really an artifact of not calling the SSA updater and 
>>instead doing a >>by-hand update inside replace_threaded_uses_block.

>>When you duplicate blocks, you duplicate side effects.  ie, you will have one 
>>SSA_NAME that is set multiple times.  Thankfully the block duplication code 
>>>>does the bookkeeping necessary so that if you tell the pass manager that 
>>you need an incremental ssa update, the right things will "just happen".

Yes I agree.

>>So again, I like what you're trying to do, but I think that implementing SSA 
>>updates, const/copy propagation, block merging, etc within your code is the 
>>wrong >>thing to do.

>>Instead make your code an independent pass -- which would include duplicating 
>>blocks and wiring up the new edges in the CFG and possibly deleting some 
>>>>edges in the CFG. Make sure the new pass returns TODO_cleanup_cfg and 
>>TODO_update_ssa which should handle the block merging and an incremental 
>>>>update of the SSA graph.

Thanks. In the updated patch I have sent a week back,  have made the path 
splitting as an independent pass and made the changes as given
above.

>>I believe you'll want a phi-only cprop pass run after your path splitting 
>>which will take care of the const/copy propagation exposed by path splitting.

In the updated patch, I have placed the path splitting optimization pass  well 
before the cprop.
 
>>I'll try to look at the updated patch shortly.

Thanks! I am looking forward to it.

Thanks & Regards
Ajit
Jeff

RE: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

Reply via email to