-----Original Message----- From: Jeff Law [mailto:l...@redhat.com] Sent: Saturday, January 16, 2016 4:33 AM To: Ajit Kumar Agarwal; Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation
On 01/14/2016 01:55 AM, Jeff Law wrote: [ Replying to myself again, mostly to make sure we've got these thoughts in the archives. ] > > Anyway, going back to adpcm_decode, we do end up splitting this path: > > # vpdiff_12 = PHI <vpdiff_11(12), vpdiff_50(13)> > if (sign_41 != 0) > goto <bb 15>; > else > goto <bb 16>; > ;; succ: 15 > ;; 16 > > ;; basic block 15, loop depth 1 > ;; pred: 14 > valpred_51 = valpred_76 - vpdiff_12; > goto <bb 17>; > ;; succ: 17 > > ;; basic block 16, loop depth 1 > ;; pred: 14 > valpred_52 = vpdiff_12 + valpred_76; > ;; succ: 17 > > ;; basic block 17, loop depth 1 > ;; pred: 15 > ;; 16 > # valpred_7 = PHI <valpred_51(15), valpred_52(16)> > _85 = MAX_EXPR <valpred_7, -32768>; > valpred_13 = MIN_EXPR <_85, 32767>; > step_53 = stepsizeTable[index_62]; > outp_54 = outp_69 + 2; > _55 = (short int) valpred_13; > MEM[base: outp_54, offset: -2B] = _55; > if (outp_54 != _74) > goto <bb 20>; > else > goto <bb 18>; > > This doesn't result in anything particularly interesting/good AFAICT. > We propagate valpred_51/52 into the use in the MAX_EXPR in the > duplicate paths, but that doesn't allow any further simplification. >>So with the heuristic I'm poking at, this gets rejected. Essentially it >>doesn't think it's likely to expose CSE/DCE opportunities (and it's correct). >> The number of statements in predecessor >>blocks that feed operands in the >>to-be-copied-block is too small relative to the size of the >>to-be-copied-block. > > Ajit, can you confirm which of adpcm_code or adpcm_decode where path > splitting is showing a gain? I suspect it's the former but would like > to make sure so that I can adjust the heuristics properly. >>I'd still like to have this answered when you can Ajit, just to be 100% >> that it's the path splitting in adpcm_code that's responsible for the >> improvements you're seeing in adpcm. The adpcm_coder get optimized with path splitting whereas the adpcm_decoder is not optimized further with path splitting. In adpcm_decoder the join node is duplicated into its predecessors and with the duplication of join node the code is not optimized further. In adpcm_coder with path splitting the following optimization is triggered with path splitting. 1. /* Output last step, if needed */ if ( !bufferstep ) *outp++ = outputbuffer; IF-THEN inside the loop will be triggered with bufferstep is 1. Then the flip happens and bufferstep is 0. For the exit branch if the bufferstep Is 1 the flip convert it to 0 and above IF-THEN generate store to assign outputbuffer to outp. The above sequence is optimized with path splitting, if the bufferstep is 1 then exit branch of the loop branches to the above store. This does not require the flip of bufferstep using xor with immediate 1. With this optimization there is one level of exit branch for the bufferstep 1 path. This lead to scheduling the exit branch to the store with a meaningful instruction instead of xor with immediate 1. Without Path Splitting if the bufferstep is 1 the exit branch of the loop branches to piece of branch flipping it to zero and the above IF-THEN outside the loop does the store to assign outputbuffer to outp. Thus without path splitting there is two level of branch in the case of exit branch in the path where bufferstep is 1 inside the loop generating non optimized. Also without path splitting the two level of exit branch of the loop is scheduled with xor immediate with 1. Thanks & Regards Ajit jeff