Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Jeff Law Wed, 13 Jan 2016 00:11:54 -0800

On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:


I am consistently getting the gains for office_ispell and office_stringsearch1, 
telcom_adpcm_d. I ran it again today and we see gains in the same bench mark 
tests
with the split path changes.

What functions are being affected that in turn impact performance?


For office_ispell: The function are Function "linit (linit, funcdef_no=0, 
decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
                                    "Function checkfile (checkfile, funcdef_no=1, 
decl_uid=2478, cgraph_uid=1, symbol_order=4)"
                                    " Function correct (correct, funcdef_no=2, 
decl_uid=2503, cgraph_uid=2, symbol_order=5)"
                                    " Function askmode (askmode, funcdef_no=24, 
decl_uid=2464, cgraph_uid=24, symbol_order=27)"
                                    for correct.c file.

For office_stringsearch1: The function is Function "bmhi_search (bmhi_search, 
funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.

So I can see split-paths affecting adpcm & lookup. I don't see itaffecting correct.c or bmhisrch.c.

That's progress though. It's likely one of one or more of the flags iscritical, so thanks for passing those along.

I'm going to focus on adpcm for the moment, in particular adpcm_coder.It appears the key blocks are:



;;   basic block 14, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 13, next block 15, flags: (NEW, REACHABLE)
;;    pred:       12 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                13 [100.0%]  (FALLTHRU,EXECUTABLE)
  # valpred_12 = PHI <valpred_54(12), valpred_55(13)>
  _112 = MAX_EXPR <valpred_12, -32768>;
  valpred_18 = MIN_EXPR <_112, 32767>;
  delta_56 = delta_7 | iftmp.1_114;
  _57 = indexTable[delta_56];
  index_58 = _57 + index_107;
  _113 = MIN_EXPR <index_58, 88>;
  index_111 = MAX_EXPR <_113, 0>;
  step_59 = stepsizeTable[index_111];
  if (bufferstep_93 != 0)
    goto <bb 15>;
  else
    goto <bb 16>;
;;    succ:       15 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                16 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 15, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 14, next block 16, flags: (NEW, REACHABLE)
;;    pred:       14 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  _60 = delta_56 << 4;
  goto <bb 17>;
;;    succ:       17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 16, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 15, next block 17, flags: (NEW, REACHABLE)
;;    pred:       14 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  outp_62 = outp_83 + 1;
  _63 = (signed char) delta_56;
  _65 = (signed char) outputbuffer_90;
  _66 = _63 | _65;
  *outp_83 = _66;
;;    succ:       17 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 17, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 16, next block 18, flags: (NEW, REACHABLE)
;;    pred:       15 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                16 [100.0%]  (FALLTHRU,EXECUTABLE)
  # outp_3 = PHI <outp_83(15), outp_62(16)>
  # outputbuffer_21 = PHI <_60(15), outputbuffer_90(16)>
  _109 = bufferstep_93 ^ 1;
  _98 = _109 & 1;
  ivtmp.11_68 = ivtmp.11_105 + 2;
  if (ivtmp.11_68 != _116)
    goto <bb 4>;
  else
    goto <bb 18>;

Block #17 is the join point that we're going to effectively copy intoblocks #15 and #16. Doing so in turn exposes bufferstep_93 as theconstant 0 in block #16, which in turn allows elimination of a couplestatements in the extended version of block #16 and we propagate theconstant 1 for bufferstep_93 to the top of the loop when reached viablock #16. So we save a few instructions. However, I think we'reactually doing a fairly poor job here.

bufferstep is a great example of a flip-flop variable and its value isstatically computable based on the path from the prior loop iterationwhich, if exploited would allow the FSM threader to eliminate theconditional at the end of bb14. I'm going to have to play with that.

Anyway, it's late and I want to rip this test apart a bit more and seehow it interacts with the heuristic that I've cobbled together as wellas see what it would take to have DOM or VRP get data on bufferstep_93on the true path out of BB14 after a path-split.


Jeff

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Reply via email to