On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:
I am consistently getting the gains for office_ispell and office_stringsearch1,
telcom_adpcm_d. I ran it again today and we see gains in the same bench mark
tests
with the split path changes.
What functions are being affected that in turn impact performance?
For office_ispell: The function are Function "linit (linit, funcdef_no=0,
decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
"Function checkfile (checkfile, funcdef_no=1,
decl_uid=2478, cgraph_uid=1, symbol_order=4)"
" Function correct (correct, funcdef_no=2,
decl_uid=2503, cgraph_uid=2, symbol_order=5)"
" Function askmode (askmode, funcdef_no=24,
decl_uid=2464, cgraph_uid=24, symbol_order=27)"
for correct.c file.
For office_stringsearch1: The function is Function "bmhi_search (bmhi_search,
funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.
So I can see split-paths affecting adpcm & lookup. I don't see it
affecting correct.c or bmhisrch.c.
That's progress though. It's likely one of one or more of the flags is
critical, so thanks for passing those along.
I'm going to focus on adpcm for the moment, in particular adpcm_coder.
It appears the key blocks are:
;; basic block 14, loop depth 1, count 0, freq 9100, maybe hot
;; prev block 13, next block 15, flags: (NEW, REACHABLE)
;; pred: 12 [100.0%] (FALLTHRU,EXECUTABLE)
;; 13 [100.0%] (FALLTHRU,EXECUTABLE)
# valpred_12 = PHI <valpred_54(12), valpred_55(13)>
_112 = MAX_EXPR <valpred_12, -32768>;
valpred_18 = MIN_EXPR <_112, 32767>;
delta_56 = delta_7 | iftmp.1_114;
_57 = indexTable[delta_56];
index_58 = _57 + index_107;
_113 = MIN_EXPR <index_58, 88>;
index_111 = MAX_EXPR <_113, 0>;
step_59 = stepsizeTable[index_111];
if (bufferstep_93 != 0)
goto <bb 15>;
else
goto <bb 16>;
;; succ: 15 [50.0%] (TRUE_VALUE,EXECUTABLE)
;; 16 [50.0%] (FALSE_VALUE,EXECUTABLE)
;; basic block 15, loop depth 1, count 0, freq 4550, maybe hot
;; prev block 14, next block 16, flags: (NEW, REACHABLE)
;; pred: 14 [50.0%] (TRUE_VALUE,EXECUTABLE)
_60 = delta_56 << 4;
goto <bb 17>;
;; succ: 17 [100.0%] (FALLTHRU,EXECUTABLE)
;; basic block 16, loop depth 1, count 0, freq 4550, maybe hot
;; prev block 15, next block 17, flags: (NEW, REACHABLE)
;; pred: 14 [50.0%] (FALSE_VALUE,EXECUTABLE)
outp_62 = outp_83 + 1;
_63 = (signed char) delta_56;
_65 = (signed char) outputbuffer_90;
_66 = _63 | _65;
*outp_83 = _66;
;; succ: 17 [100.0%] (FALLTHRU,EXECUTABLE)
;; basic block 17, loop depth 1, count 0, freq 9100, maybe hot
;; prev block 16, next block 18, flags: (NEW, REACHABLE)
;; pred: 15 [100.0%] (FALLTHRU,EXECUTABLE)
;; 16 [100.0%] (FALLTHRU,EXECUTABLE)
# outp_3 = PHI <outp_83(15), outp_62(16)>
# outputbuffer_21 = PHI <_60(15), outputbuffer_90(16)>
_109 = bufferstep_93 ^ 1;
_98 = _109 & 1;
ivtmp.11_68 = ivtmp.11_105 + 2;
if (ivtmp.11_68 != _116)
goto <bb 4>;
else
goto <bb 18>;
Block #17 is the join point that we're going to effectively copy into
blocks #15 and #16. Doing so in turn exposes bufferstep_93 as the
constant 0 in block #16, which in turn allows elimination of a couple
statements in the extended version of block #16 and we propagate the
constant 1 for bufferstep_93 to the top of the loop when reached via
block #16. So we save a few instructions. However, I think we're
actually doing a fairly poor job here.
bufferstep is a great example of a flip-flop variable and its value is
statically computable based on the path from the prior loop iteration
which, if exploited would allow the FSM threader to eliminate the
conditional at the end of bb14. I'm going to have to play with that.
Anyway, it's late and I want to rip this test apart a bit more and see
how it interacts with the heuristic that I've cobbled together as well
as see what it would take to have DOM or VRP get data on bufferstep_93
on the true path out of BB14 after a path-split.
Jeff