-----Original Message-----
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Saturday, January 16, 2016 4:33 AM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa 
representation

On 01/14/2016 01:55 AM, Jeff Law wrote:
[ Replying to myself again, mostly to make sure we've got these thoughts in the 
archives. ]
>
> Anyway, going back to adpcm_decode, we do end up splitting this path:
>
>   # vpdiff_12 = PHI <vpdiff_11(12), vpdiff_50(13)>
>    if (sign_41 != 0)
>      goto <bb 15>;
>    else
>      goto <bb 16>;
> ;;    succ:       15
> ;;                16
>
> ;;   basic block 15, loop depth 1
> ;;    pred:       14
>    valpred_51 = valpred_76 - vpdiff_12;
>    goto <bb 17>;
> ;;    succ:       17
>
> ;;   basic block 16, loop depth 1
> ;;    pred:       14
>    valpred_52 = vpdiff_12 + valpred_76;
> ;;    succ:       17
>
> ;;   basic block 17, loop depth 1
> ;;    pred:       15
> ;;                16
>    # valpred_7 = PHI <valpred_51(15), valpred_52(16)>
>    _85 = MAX_EXPR <valpred_7, -32768>;
>    valpred_13 = MIN_EXPR <_85, 32767>;
>    step_53 = stepsizeTable[index_62];
>    outp_54 = outp_69 + 2;
>    _55 = (short int) valpred_13;
>    MEM[base: outp_54, offset: -2B] = _55;
>    if (outp_54 != _74)
>      goto <bb 20>;
>    else
>      goto <bb 18>;
>
> This doesn't result in anything particularly interesting/good AFAICT. 
> We propagate valpred_51/52 into the use in the MAX_EXPR in the 
> duplicate paths, but that doesn't allow any further simplification.
>>So with the heuristic I'm poking at, this gets rejected.  Essentially it 
>>doesn't think it's likely to expose CSE/DCE opportunities (and it's correct). 
>> The number of statements in predecessor >>blocks that feed operands in the 
>>to-be-copied-block is too small relative to the size of the 
>>to-be-copied-block.


>
> Ajit, can you confirm which of adpcm_code or adpcm_decode where path 
> splitting is showing a gain?  I suspect it's the former but would like 
> to make sure so that I can adjust the heuristics properly.
>>I'd still like to have this answered when you can Ajit, just to be 100%
 >> that it's the path splitting in adpcm_code that's responsible for the 
 >> improvements you're seeing in adpcm.

The adpcm_coder get optimized with path splitting whereas the adpcm_decoder is 
not optimized further with path splitting. In adpcm_decoder
the join node is duplicated into its predecessors and with the duplication of 
join node the code is not optimized further.

In adpcm_coder with path splitting the following optimization is triggered with 
path splitting.

1. /* Output last step, if needed */
    if ( !bufferstep )
      *outp++ = outputbuffer;

     IF-THEN inside the loop will be triggered with bufferstep is 1.  Then the 
flip happens and bufferstep is 0. For the exit branch if the bufferstep
Is 1 the flip convert it to 0  and above IF-THEN generate store to assign 
outputbuffer to outp.

The above sequence is optimized with path splitting, if the bufferstep is 1 
then exit branch of the loop branches to the above store. This does not require 
the flip of
bufferstep using xor with immediate 1. With this optimization there is one 
level of exit branch for the bufferstep 1 path. This lead to scheduling the
exit branch to the store with a meaningful instruction instead of xor with 
immediate 1.

Without Path Splitting if the bufferstep is 1  the exit branch of the loop 
branches to piece of branch flipping it to zero and the above IF-THEN outside 
the
loop does the store to assign outputbuffer to outp. Thus without path splitting 
there is two level of branch in the case of exit branch in the path where 
bufferstep is 1 inside the loop generating non optimized. Also without path 
splitting the two level of exit branch of the loop is scheduled with xor 
immediate with 1.

 Thanks & Regards
Ajit

jeff

Reply via email to