Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Jeff Law Thu, 14 Jan 2016 00:56:23 -0800

On 01/13/2016 01:10 AM, Jeff Law wrote:


I'm going to focus on adpcm for the moment, in particular adpcm_coder.
It appears the key blocks are:

Looking at adpcm_decoder we have the same idiom as in adpcm_coder:

  if (bufferstep_79 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;
;;    succ:       6 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                7 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 6, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 5, next block 7, flags: (NEW, REACHABLE)
;;    pred:       5 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  delta_31 = inputbuffer_65 & 15;
  goto <bb 8>;
;;    succ:       8 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 7, loop depth 1, count 0, freq 4550, maybe hot
;;    prev block 6, next block 8, flags: (NEW, REACHABLE)
;;    pred:       5 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  inp_32 = inp_70 + 1;
  _33 = *inp_70;
  inputbuffer_34 = (int) _33;
  _35 = inputbuffer_34 >> 4;
  delta_36 = _35 & 15;
;;    succ:       8 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 8, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 7, next block 9, flags: (NEW, REACHABLE)
;;    pred:       6 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                7 [100.0%]  (FALLTHRU,EXECUTABLE)
  # inp_2 = PHI <inp_70(6), inp_32(7)>
  # delta_5 = PHI <delta_31(6), delta_36(7)>
  # inputbuffer_16 = PHI <inputbuffer_65(6), inputbuffer_34(7)>
  _83 = bufferstep_79 ^ 1;
  _1 = _83 & 1;
  _39 = indexTable[delta_5];
  index_40 = _39 + index_81;
  _84 = MIN_EXPR <index_40, 88>;
  index_62 = MAX_EXPR <_84, 0>;
  sign_41 = delta_5 & 8;
  vpdiff_42 = step_71 >> 3;
  _43 = delta_5 & 4;
  if (_43 != 0)
    goto <bb 9>;
  else
    goto <bb 10>;

The difference is there's all kinds of code between BB8 and the latch.So it doesn't trigger path splitting, which in turn doesn't expose theCSE/DCE opportunity for adpcm_decoder. We're likely leaving someperformance on the table for this code.

This brings up a deeper question. Are we approaching path splitting ina fundamentally backward way?

Right now path splitting is guided by finding the right shape in the CFGimmediately preceding the latch block of a loop. Plus some heuristicsI'm working on to predict when the duplication is more likely to lead toCSE/DCE opportunities. But really it's driven by finding the right CFGshape before the latch block.

Should path splitting really be driven by partitions of PHI argumentsand other incoming values (such as the implicit sets from conditionals).I've been pondering this for a long time as it's a natural extensionof what we do in the erroneous path isolation pass.

ie at any join point in the CFG, look for partitions of the arguments inany PHI nodes and split paths based on that.


So for example if we had a PHI like

x5 = phi (x4, 0, 0, 1, 1)

Look at how x5 is used. If propagation of any of those PHI argumentvalues would result in simplifications at the use points of x5, then weshould consider splitting off all the paths with the beneficial PHIargument.

So in the case above, assume that the value 0 results insimplifications. We'd create two paths. One for the paths where x5would get the value zero, another for everything else.

This is *a lot* like the path isolation done erroneous paths in terms ofthe CFG & SSA manipulations. Essentially in both cases we want toisolate a path so that we can manipulate it based on known inputswithout affecting paths with uninteresting inputs. All that reallychanges is how we drive the selection of paths to isolate and whether ornot we insert the __builtin_trap or not.



Anyway, going back to adpcm_decode, we do end up splitting this path:

 # vpdiff_12 = PHI <vpdiff_11(12), vpdiff_50(13)>
  if (sign_41 != 0)
    goto <bb 15>;
  else
    goto <bb 16>;
;;    succ:       15
;;                16

;;   basic block 15, loop depth 1
;;    pred:       14
  valpred_51 = valpred_76 - vpdiff_12;
  goto <bb 17>;
;;    succ:       17

;;   basic block 16, loop depth 1
;;    pred:       14
  valpred_52 = vpdiff_12 + valpred_76;
;;    succ:       17

;;   basic block 17, loop depth 1
;;    pred:       15
;;                16
  # valpred_7 = PHI <valpred_51(15), valpred_52(16)>
  _85 = MAX_EXPR <valpred_7, -32768>;
  valpred_13 = MIN_EXPR <_85, 32767>;
  step_53 = stepsizeTable[index_62];
  outp_54 = outp_69 + 2;
  _55 = (short int) valpred_13;
  MEM[base: outp_54, offset: -2B] = _55;
  if (outp_54 != _74)
    goto <bb 20>;
  else
    goto <bb 18>;

This doesn't result in anything particularly interesting/good AFAICT.We propagate valpred_51/52 into the use in the MAX_EXPR in the duplicatepaths, but that doesn't allow any further simplification.

Ajit, can you confirm which of adpcm_code or adpcm_decode where pathsplitting is showing a gain? I suspect it's the former but would liketo make sure so that I can adjust the heuristics properly.



jeff

Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation

Reply via email to