-----Original Message----- From: Jeff Law [mailto:l...@redhat.com] Sent: Tuesday, June 02, 2015 9:19 PM To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE
On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote: >>> I don't offhand know if any of the benchmarks you cite above are >>> free-enough to derive a testcase from. But one trick many of us use >>> is to instrument the >>pass and compile some known free software >>> (often gcc >>> itself) to find triggering code and use that to generate tests for the new >>> transformation. > > I will add tests in the suite. I could see many existing tests in the suite > also get triggered with this optimization. >>Thanks. For cases in the existing testsuite where you need to change the >>expected output, it's useful to note why the expected output was changed. >>>>Sometimes a test is compromised by a new optimization, sometimes the >>expected output is changed and is papering over a problem, etc so it's >>something >>we look at reasonably closely. Thanks. I will modify accordingly. > >> >> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96 >> 100644 >> --- a/gcc/cfghooks.c >> +++ b/gcc/cfghooks.c >> @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb) >> >> /* If we remove the header or the latch of a loop, mark the loop for >> removal. */ >> - if (loop->latch == bb >> + if (loop && loop->latch == bb >> || loop->header == bb) >> mark_loop_for_removal (loop); >>> So what caused you to add this additional test? In general loop >>> structures are supposed to always be available. The change here >>> implies that the loop structures were gone at some point. That >>> seems at first glance a mistake. > > I was using the gimple_duplicate_bb which will not add the duplicate > basic block inside the current_loops. That's why the above Condition > is required. I am using duplicate_block instead of gimple_duplicate_bb. With > this change the above check with loop Is not required as it adds the > duplicate basic block inside the loops. >>OK. Good to hear it's not required anymore. > >> >> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409 >> 100644 >> --- a/gcc/tree-cfg.c >> +++ b/gcc/tree-cfg.c >> @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val) >> } >> } >> >> +void >> +gimple_threaded_merge_blocks (basic_block a, basic_block b) > If we keep this function it will need a block comment. > >>> I say "if" for a couple reasons. First we already have support >>> routines that know how to merge blocks. If you really need to merge >>> blocks you should try to use them. > >>> Second, I'm not sure that you really need to worry about block >>> merging in this pass. Just create the duplicates, wire them into >>> the CFG and let the existing block merging support handle this problem. > > The above routine is not merging but duplicates the join nodes into > its predecessors. If I change the name of the above Function to the > gimple_threaded_duplicating_join_node it should be fine. >>But you don't need to duplicate into the predecessors. If you create the >>duplicates and wire them into the CFG properly the existing code in >>cfgcleanup >>should take care of this for you. Certainly I will do it. > >> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c >> index 4303a18..2c7d36d 100644 >> --- a/gcc/tree-ssa-threadedge.c >> +++ b/gcc/tree-ssa-threadedge.c >> @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e, >> return 0; >> } >> >> +static void >> +replace_threaded_uses (basic_block a,basic_block b) >>> If you keep this function, then it'll need a function comment. > >>> It looks like this is just doing const/copy propagation. I think a >>> better structure is to implement your optimization as a distinct pass, >>> then rely on existing passes such as update_ssa, DOM, CCP to handle >>> updating the SSA graph and propagation opportunities exposed by your >>> transformation. > > >>> Similarly for the other replace_ functions. > > I think these replace_ functions are required as existing Dom, CCP and > propagation opportunities doesn't transform these > Propagation given below. > > <bb 29>: > xk_124 = MIN_EXPR <xy_123, xc_121>; > xc_126 = xc_121 - xk_6; > xm_127 = xm_122 - xk_6; > xy_128 = xy_123 - xk_6; > *EritePtr_14 = xc_126; > MEM[(Byte *)EritePtr_14 + 1B] = xm_127; > MEM[(Byte *)EritePtr_14 + 2B] = xy_128; > EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B]; > MEM[(Byte *)EritePtr_14 + 3B] = xk_6; > i_137 = i_4 + 1; > goto <bb 31>; > > <bb 30>: > xk_125 = MIN_EXPR <xy_123, xm_122>; > xc_165 = xc_121 - xk_6; > xm_166 = xm_122 - xk_6; > xy_167 = xy_123 - xk_6; > *EritePtr_14 = xc_126; > MEM[(Byte *)EritePtr_14 + 1B] = xm_127; > MEM[(Byte *)EritePtr_14 + 2B] = xy_128; > EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; > MEM[(Byte *)EritePtr_14 + 3B] = xk_6; > i_173 = i_4 + 1; > > <bb 29> and <bb 30 > are the predecessors of the join node. After the > duplication of join node and the duplicating the > Join node into the predecessors < bb 29> and <bb 30>, the above is the gimple > representations. > } > > But the < bb 30 > should be the following after the transformation. > > <bb 30>: > xk_125 = MIN_EXPR <xy_123, xm_122>; > xc_165 = xc_121 - xk_125; > xm_166 = xm_122 - xk_125; > xy_167 = xy_123 - xk_125; > *EritePtr_14 = xc_165; > MEM[(Byte *)EritePtr_14 + 1B] = xm_166; > MEM[(Byte *)EritePtr_14 + 2B] = xy_167; > EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B]; > MEM[(Byte *)EritePtr_14 + 3B] = xk_125; > i_173 = i_4 + 1; > > The phi-only cprop will replace the x_6 with xk_125 but the other > replacements like xk_126, xk_127, xk_128 with > Xk_165, xk_166, xk_167 will not be done by the existing routine like > phi-only-cprop as this not only copy propagation because the duplicate block > defines > New results but the copy is still old with the join node. > > The replace function does the above transformation and I think it is required. >>Can you send me the testcase this came from so that I can compile and >>examine the dumps myself and possibly poke at things with the debugger? >>I don't have enough context to know what's missing. >>If indeed there's a form of propagation missing, we should enhance the >>propagation routines to handle the missing cases rather than write new >>ones inside a pass. Please find the testcase given below. #include <stdio.h> #define RGBMAX 255 int test() { int i, Pels; unsigned char xr, xg, xb; unsigned char xc, xm, xy, xk; unsigned char *ReadPtr, *EritePtr; for (i = 0; i < 100; i++) { xr = *ReadPtr++; xg = *ReadPtr++; xb = *ReadPtr++; xc = (unsigned char) (RGBMAX - xr); xm = (unsigned char) (RGBMAX - xg); xy = (unsigned char) (RGBMAX - xb); if (xc < xm) { xk = (unsigned char) (xc < xy ? xc : xy); } else { xk = (unsigned char) (xm < xy ? xm : xy); } xc = (unsigned char) (xc - xk); xm = (unsigned char) (xm - xk); xy = (unsigned char) (xy - xk); *EritePtr++ = xc; *EritePtr++ = xm; *EritePtr++ = xy; *EritePtr++ = xk; } printf(" In tests \n"); return 0; } int main(){ printf("%d \n",test()); return 0; } Thanks & Regards Ajit jeff