On 18.01.2018 16:08, Sebastian Perta wrote:
Hello,
I am interested in implementing a new pass in gcc to merge identical
sequences of code in GCC to be used mainly for RL78.
The commercial RL78 compilers have such algorithms implemented and they make
quite good use of it.
Opportunities arise from the limited capabilities of RL78, for other targets
this might be a lot less useful.
A while ago I found the following:
https://www.gnu.org/software/gcc/projects/cfo.html
And I ported all algorithms to gcc 4.9.2 and tried it on RL78 and RX and
this is what I found out:
For RX: no visible improvements with any of them
For RL78: some minor improvements only with -frtl-seqbastr:
Compiling all the C files from gcc/testsuite/gcc.c-torture/execute/*c with
"-Os" and "-Os -frtl-seqabstr" (using the modified gcc 4.9.2)
The algorithm was effective only in 60 files(out of 1643 files, that's only
0.03% of the files currently present in gcc/testsuite/gcc.c-torture/execute)
On those 60 files I got an average of 6.5% improvement with the best
improvement for pr58574.c (36.4%).
What do you think: is it worthwhile porting this to the trunk or I will just
waste my time?
For that particular implementation, it makes no sense to put time into
it, IMO, for the following reasons:
* It causes compile-time hogs. I saw modules that take ~seconds to
compile to consume ~30min with that feature activated. Presumably, it's
just a proof-of-concept but brute force approach with quadratic or even
exponential run-time complexity.
* Parts of it jump in after register allocation, hence when register
allocation uses different registers, the gains where sometimes
negligible, even though the code had good factoring opportunities.
* Adding calls after reload turned out to be problematic and would
regularly ICE (e.g. for avr).
I spend some time with this testing for avr and some 32-bit target, but
gave up soon.
IMO without a sound analysis, e.g. of expected host complexity, any such
attempts are doomed to fail.
From a general perspective, it is desirable to add respective path
before passes that might shred factoring opportunities, in particular:
instruction scheduling, forward propagation (same code ending up in
different chunks depending on context), instruction combination (dito),
likely many other passes. But this makes it harder to precisely
estimate costs, in particular register pressure effects or turning leafs
into non-leafs.
Johann