On 01/04/2016 07:32 AM, Ajit Kumar Agarwal wrote:
-----Original Message-----
From: Jeff Law [mailto:l...@redhat.com]
Sent: Wednesday, December 23, 2015 12:06 PM
To: Ajit Kumar Agarwal; Richard Biener
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida;
Nagaraju Mekala
Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation
On 12/11/2015 02:11 AM, Ajit Kumar Agarwal wrote:
Mibench/EEMBC benchmarks (Target Microblaze)
Automotive_qsort1(4.03%), Office_ispell(4.29%), Office_stringsearch1(3.5%).
Telecom_adpcm_d( 1.37%), ospfv2_lite(1.35%).
I'm having a real tough time reproducing any of these results. In fact, I'm having a
tough time seeing cases where path splitting even applies to the Mibench/EEMBC
benchmarks >>mentioned above.
In the very few cases where split-paths might apply, the net resulting assembly
code I get is the same with and without split-paths.
How consistent are these results?
I am consistently getting the gains for office_ispell and office_stringsearch1,
telcom_adpcm_d. I ran it again today and we see gains in the same bench mark
tests
with the split path changes.
What functions are being affected that in turn impact performance?
For office_ispell: The function are Function "linit (linit, funcdef_no=0,
decl_uid=2535, cgraph_uid=0, symbol_order=2) for lookup.c file".
"Function checkfile (checkfile, funcdef_no=1,
decl_uid=2478, cgraph_uid=1, symbol_order=4)"
" Function correct (correct, funcdef_no=2,
decl_uid=2503, cgraph_uid=2, symbol_order=5)"
" Function askmode (askmode, funcdef_no=24,
decl_uid=2464, cgraph_uid=24, symbol_order=27)"
for correct.c file.
For office_stringsearch1: The function is Function "bmhi_search (bmhi_search,
funcdef_no=1, decl_uid=2178, cgraph_uid=1, symbol_order=5)"
for bmhisrch.c file.
In linit there are two path splitting opportunities. Neither of them
are cases where path splitting exposes any CSE or DCE opportunities at
the tree level. In fact, in both cases there are no operands from the
predecessors that feed into the join (that we duplicate the split the path).
There's a path splitting opportunity in correct.c::givehelp which AFAICT
is totally uninteresting from a performance standpoint. However, these
are one of the few cases where path splitting actually results in
something that is better optimized at the tree level. How ironic. We'd
easily get the same result by sinking a statement down through a PHI in
a manner similar to what's been suggested for 64700.
In correct.c::checkfile and correct.c::correct and correct.c::askmode
the path splitting opportunities do not lead to any further
simplifications at the gimple level.
Similarly for bmhisrch.c::bmhi_search.
So when I look across all these examples, the only one that's really a
CSE/DCE opportunity exposed by path splitting that is performance
important is adpcm_code.
The rest, AFAICT, benefit at a much lower level -- a diamond in the CFG
will require an unconditional branch from the end of one arm around the
other arm to the join point. With path splitting that unconditional
branch is eliminated. So there's a small gain for that. That gain may
be even larger on the microblaze because of its exposed delay slot
architecture -- one less slot to fill. It may also result in better
code layouts which help simplistic branch predictors.
So I find myself wondering if the primary effect we're looking for most
of the time is really elimination of that unconditional branch. And if
it is the case, then we're looking at a very different costing heuristic
-- one that favors very small join blocks rather than larger ones (that
supposedly help expose CSE/DCE, but in reality aren't for the benchmarks
I've looked at).
And if that's the case, then we may really be looking at something that
belongs at the RTL level rather than at the tree/gimple level. Sadly,
it's harder to do things like duplicate blocks at the RTL level.
Anyway, I'm going to ponder some more.
jeff