-----Original Message-----
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Tuesday, June 02, 2015 9:19 PM
To: Ajit Kumar Agarwal; Richard Biener; gcc@gcc.gnu.org
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [RFC] Design and Implementation for Path Splitting for Loop with 
Conditional IF-THEN-ELSE

On 06/02/2015 12:35 AM, Ajit Kumar Agarwal wrote:
>>> I don't offhand know if any of the benchmarks you cite above are 
>>> free-enough to derive a testcase from.  But one trick many of us use 
>>> is to instrument the >>pass and compile some known free software 
>>> (often gcc
>>> itself) to find triggering code and use that to generate tests for the new 
>>> transformation.
>
> I will add tests in the suite. I could see many existing tests in the suite 
> also get triggered with this optimization.
>>Thanks.  For cases in the existing testsuite where you need to change the 
>>expected output, it's useful to note why the expected output was changed.  
>>>>Sometimes a test is compromised by a new optimization, sometimes the 
>>expected output is changed and is papering over a problem, etc so it's 
>>something >>we look at reasonably closely.

Thanks. I will modify accordingly.

>
>>
>> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c index 9faa339..559ca96 
>> 100644
>> --- a/gcc/cfghooks.c
>> +++ b/gcc/cfghooks.c
>> @@ -581,7 +581,7 @@ delete_basic_block (basic_block bb)
>>
>>          /* If we remove the header or the latch of a loop, mark the loop for
>>       removal.  */
>> -      if (loop->latch == bb
>> +      if (loop && loop->latch == bb
>>        || loop->header == bb)
>>      mark_loop_for_removal (loop);
>>> So what caused you to add this additional test?  In general loop 
>>> structures are supposed to always be available.  The change here 
>>> implies that the loop structures were gone at some point.  That 
>>> seems at first glance a mistake.
>
> I was using the gimple_duplicate_bb which will not add the duplicate 
> basic block inside the current_loops. That's why the above Condition 
> is required. I am using duplicate_block instead of gimple_duplicate_bb. With 
> this change the above check with loop Is not required as it adds the 
> duplicate basic block inside the loops.
>>OK.  Good to hear it's not required anymore.


>
>>
>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index aed5254..b25e409 
>> 100644
>> --- a/gcc/tree-cfg.c
>> +++ b/gcc/tree-cfg.c
>> @@ -1838,6 +1838,64 @@ replace_uses_by (tree name, tree val)
>>        }
>>    }
>>
>> +void
>> +gimple_threaded_merge_blocks (basic_block a, basic_block b)
> If we keep this function it will need a block comment.
>
>>> I say "if" for a couple reasons.  First we already have support 
>>> routines that know how to merge blocks.  If you really need to merge 
>>> blocks you should try to use them.
>
>>> Second, I'm not sure that you really need to worry about block 
>>> merging in this pass.  Just create the duplicates, wire them into 
>>> the CFG and let the existing block merging support handle this problem.
>
> The above routine is not merging but duplicates the join nodes into 
> its predecessors. If I change the name of the above Function to the 
> gimple_threaded_duplicating_join_node it should be fine.
>>But you don't need to duplicate into the predecessors.  If you create the 
>>duplicates and wire them into the CFG properly the existing code in 
>>cfgcleanup >>should take care of this for you.

Certainly I will do it.
>
>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> index 4303a18..2c7d36d 100644
>> --- a/gcc/tree-ssa-threadedge.c
>> +++ b/gcc/tree-ssa-threadedge.c
>> @@ -1359,6 +1359,322 @@ thread_through_normal_block (edge e,
>>      return 0;
>>    }
>>
>> +static void
>> +replace_threaded_uses (basic_block a,basic_block b)
>>> If you keep this function, then it'll need a function comment.
>
>>> It looks like this is just doing const/copy propagation.  I think a
>>> better structure is to implement your optimization as a distinct pass,
>>> then rely on existing passes such as update_ssa, DOM, CCP to handle
>>> updating the SSA graph and propagation opportunities exposed by your
>>> transformation.
>
>
>>> Similarly for the other replace_ functions.
>
> I think these replace_ functions are required as existing Dom, CCP and 
> propagation opportunities doesn't transform these
> Propagation given below.
>
> <bb 29>:
> xk_124 = MIN_EXPR <xy_123, xc_121>;
> xc_126 = xc_121 - xk_6;
> xm_127 = xm_122 - xk_6;
> xy_128 = xy_123 - xk_6;
> *EritePtr_14 = xc_126;
> MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
> MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
> EritePtr_135 = &MEM[(void *)EritePtr_14 + 4B];
> MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
> i_137 = i_4 + 1;
> goto <bb 31>;
>
> <bb 30>:
> xk_125 = MIN_EXPR <xy_123, xm_122>;
> xc_165 = xc_121 - xk_6;
> xm_166 = xm_122 - xk_6;
> xy_167 = xy_123 - xk_6;
> *EritePtr_14 = xc_126;
> MEM[(Byte *)EritePtr_14 + 1B] = xm_127;
> MEM[(Byte *)EritePtr_14 + 2B] = xy_128;
> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
> MEM[(Byte *)EritePtr_14 + 3B] = xk_6;
> i_173 = i_4 + 1;
>
> <bb 29> and <bb 30 > are the predecessors of the join node. After the 
> duplication of join node and the duplicating the
> Join node into the predecessors < bb 29> and <bb 30>, the above is the gimple 
> representations.
> }
>
> But the < bb 30 > should be the following after the transformation.
>
> <bb 30>:
> xk_125 = MIN_EXPR <xy_123, xm_122>;
> xc_165 = xc_121 - xk_125;
> xm_166 = xm_122 - xk_125;
> xy_167 = xy_123 - xk_125;
> *EritePtr_14 = xc_165;
> MEM[(Byte *)EritePtr_14 + 1B] = xm_166;
> MEM[(Byte *)EritePtr_14 + 2B] = xy_167;
> EritePtr_171 = &MEM[(void *)EritePtr_14 + 4B];
> MEM[(Byte *)EritePtr_14 + 3B] = xk_125;
> i_173 = i_4 + 1;
>
> The phi-only cprop will  replace the x_6 with xk_125 but the other 
> replacements like xk_126, xk_127, xk_128 with
> Xk_165, xk_166, xk_167 will not be done by the existing routine like 
> phi-only-cprop as this not only copy propagation because the duplicate block 
> defines
> New results but the copy is still old with the join node.
>
> The replace function does the above transformation and I think it is required.
>>Can you send me the testcase this came from so that I can compile and 
>>examine the dumps myself and possibly poke at things with the debugger? 
  >>I don't have enough context to know what's missing.

>>If indeed there's a form of propagation missing, we should enhance the 
>>propagation routines to handle the missing cases rather than write new 
>>ones inside a pass.

Please find the testcase given below. 

#include <stdio.h>
#define RGBMAX 255
int test()
{
  int           i, Pels;
  unsigned char            xr, xg, xb;
  unsigned char            xc, xm, xy, xk;
  unsigned char            *ReadPtr, *EritePtr;

  for (i = 0; i < 100; i++) {

    xr = *ReadPtr++;
    xg = *ReadPtr++;
    xb = *ReadPtr++;

    xc = (unsigned char) (RGBMAX - xr);
    xm = (unsigned char) (RGBMAX - xg);
    xy = (unsigned char) (RGBMAX - xb);

    if (xc < xm) {
      xk = (unsigned char) (xc < xy ? xc : xy);
    } else {
      xk = (unsigned char) (xm < xy ? xm : xy);
    }

    xc = (unsigned char) (xc - xk);
    xm = (unsigned char) (xm - xk);
    xy = (unsigned char) (xy - xk);

    *EritePtr++ = xc;
    *EritePtr++ = xm;
    *EritePtr++ = xy;
    *EritePtr++ = xk;

  }
  printf(" In tests \n");
  return 0;
}

int main(){

  printf("%d \n",test());
  return 0;
}

Thanks & Regards
Ajit

jeff

Reply via email to