Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

Jeff Law Thu, 09 Jul 2015 15:34:54 -0700

On 06/02/2015 10:43 PM, Ajit Kumar Agarwal wrote:



-----Original Message-----
From: Jeff Law [mailto:l...@redhat.com]
Sent: Tuesday, June 02, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with 
Conditional IF-THEN-ELSE

On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:

I don't offhand know if any of the benchmarks you cite above are
free-enough to derive a testcase from.  But one trick many of us use
is to instrument the >>pass and compile some known free software
(often gcc
itself) to find triggering code and use that to generate tests for the new 
transformation.


I will add tests in the suite. I could see many existing tests in the suite 
also get triggered with this optimization.

Thanks.  For cases in the existing testsuite where you need to change the expected output, 
it's useful to note why the expected output was changed.  >>Sometimes a test is 
compromised by a new optimization, sometimes the expected output is changed and is papering 
over a problem, etc so it's something >>we look at reasonably closely.


Thanks. I will modify accordingly.


diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96
100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)

          /* If we remove the header or the latch of a loop, mark the loop for
         removal.  */
-      if (loop->latch == bb
+      if (loop && loop->latch == bb
          || loop->header == bb)
        mark_loop_for_removal (loop);

So what caused you to add this additional test?  In general loop
structures are supposed to always be available.  The change here
implies that the loop structures were gone at some point.  That
seems at first glance a mistake.


I was using the gimple_duplicate_bb which will not add the duplicate
basic block inside the current_loops. That's why the above Condition
is required. I am using duplicate_block instead of gimple_duplicate_bb. With 
this change the above check with loop Is not required as it adds the duplicate 
basic block inside the loops.

OK.  Good to hear it's not required anymore.


diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409
100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
        }
    }

+void
+gimple_threaded_merge_blocks (basic_block a, basic_block b)

If we keep this function it will need a block comment.

I say "if" for a couple reasons.  First we already have support
routines that know how to merge blocks.  If you really need to merge
blocks you should try to use them.

Second, I'm not sure that you really need to worry about block
merging in this pass.  Just create the duplicates, wire them into
the CFG and let the existing block merging support handle this problem.


The above routine is not merging but duplicates the join nodes into
its predecessors. If I change the name of the above Function to the 
gimple_threaded_duplicating_join_node it should be fine.

But you don't need to duplicate into the predecessors.  If you create the duplicates 
and wire them into the CFG properly the existing code in cfgcleanup >>should 
take care of this for you.


Certainly I will do it.

diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 4303a18..2c7d36d 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
      return 0;
    }

+static void
+replace_threaded_uses (basic_block a,basic_block b)

If you keep this function, then it'll need a function comment.

It looks like this is just doing const/copy propagation.  I think a
better structure is to implement your optimization as a distinct pass,
then rely on existing passes such as update_ssa, DOM, CCP to handle
updating the SSA graph and propagation opportunities exposed by your
transformation.

Similarly for the other replace_ functions.


I think these replace_ functions are required as existing Dom, CCP and 
propagation opportunities doesn't transform these
Propagation given below.

<bb 29>:
xk_124 = MIN_EXPR <xy_123, xc_121>;
xc_126 = xc_121 - xk_6;
xm_127 = xm_122 - xk_6;
xy_128 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_137 = i_4 + 1;
goto <bb 31>;

<bb 30>:
xk_125 = MIN_EXPR <xy_123, xm_122>;
xc_165 = xc_121 - xk_6;
xm_166 = xm_122 - xk_6;
xy_167 = xy_123 - xk_6;
*EritePtr_14 = xc_126;
MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
i_173 = i_4 + 1;

<bb 29> and <bb 30 > are the predecessors of the join node. After the 
duplication of join node and the duplicating the
Join node into the predecessors < bb 29> and <bb 30>, the above is the gimple 
representations.
}

But the < bb 30 > should be the following after the transformation.

<bb 30>:
xk_125 = MIN_EXPR <xy_123, xm_122>;
xc_165 = xc_121 - xk_125;
xm_166 = xm_122 - xk_125;
xy_167 = xy_123 - xk_125;
*EritePtr_14 = xc_165;
MEM[(Byte *)EritePtr_14 + 1B] = xm_166;
MEM[(Byte *)EritePtr_14 + 2B] = xy_167;
EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
MEM[(Byte *)EritePtr_14 + 3B] = xk_125;
i_173 = i_4 + 1;

The phi-only cprop will  replace the x_6 with xk_125 but the other replacements 
like xk_126, xk_127, xk_128 with
Xk_165, xk_166, xk_167 will not be done by the existing routine like 
phi-only-cprop as this not only copy propagation because the duplicate block 
defines
New results but the copy is still old with the join node.

The replace function does the above transformation and I think it is required.

Can you send me the testcase this came from so that I can compile and
examine the dumps myself and possibly poke at things with the debugger?

   >>I don't have enough context to know what's missing.

If indeed there's a form of propagation missing, we should enhance the
propagation routines to handle the missing cases rather than write new
ones inside a pass.


Please find the testcase given below.

Thanks. But I don't offhand see how that testcase matches up with thecode you provided. If I use your patch from May on x86_64 compilingwith -O2, I have something like 9 basic blocks where your gimple hunksabove show ~30.

I've having to read between the lines a lot here, but I think themissing propagations are really an artifact of not calling the SSAupdater and instead doing a by-hand update insidereplace_threaded_uses_block.

When you duplicate blocks, you duplicate side effects. ie, you willhave one SSA_NAME that is set multiple times. Thankfully the blockduplication code does the bookkeeping necessary so that if you tell thepass manager that you need an incremental ssa update, the right thingswill "just happen".

So again, I like what you're trying to do, but I think that implementingSSA updates, const/copy propagation, block merging, etc within your codeis the wrong thing to do.

Instead make your code an independent pass -- which would includeduplicating blocks and wiring up the new edges in the CFG and possiblydeleting some edges in the CFG. Make sure the new pass returnsTODO_cleanup_cfg and TODO_update_ssa which should handle the blockmerging and an incremental update of the SSA graph.

I believe you'll want a phi-only cprop pass run after your pathsplitting which will take care of the const/copy propagation exposed bypath splitting.


I'll try to look at the updated patch shortly.

Jeff

Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

Reply via email to