On 18.01.2018 16:08, Sebastian Perta wrote:
Hello,

I am interested in implementing a new pass in gcc to merge identical
sequences of code in GCC to be used mainly for RL78.
The commercial RL78 compilers have such algorithms implemented and they make
quite good use of it.
Opportunities arise from the limited capabilities of RL78, for other targets
this might be a lot less useful.

A while ago I found the following:
https://www.gnu.org/software/gcc/projects/cfo.html
And I ported all algorithms to gcc 4.9.2 and tried it on RL78 and RX and
this is what I found out:
For RX: no visible improvements with any of them
For RL78: some minor improvements only with -frtl-seqbastr:
Compiling all the C files from gcc/testsuite/gcc.c-torture/execute/*c  with
"-Os" and "-Os  -frtl-seqabstr" (using the modified gcc 4.9.2)
The algorithm was effective only in 60 files(out of 1643 files, that's only
0.03% of the files currently present in gcc/testsuite/gcc.c-torture/execute)
On those 60 files I got an average of 6.5% improvement with the best
improvement for pr58574.c (36.4%).

What do you think: is it worthwhile porting this to the trunk or I will just
waste my time?

For that particular implementation, it makes no sense to put time into it, IMO, for the following reasons:

* It causes compile-time hogs. I saw modules that take ~seconds to compile to consume ~30min with that feature activated. Presumably, it's just a proof-of-concept but brute force approach with quadratic or even exponential run-time complexity.

* Parts of it jump in after register allocation, hence when register allocation uses different registers, the gains where sometimes negligible, even though the code had good factoring opportunities.

* Adding calls after reload turned out to be problematic and would regularly ICE (e.g. for avr).

I spend some time with this testing for avr and some 32-bit target, but gave up soon.

IMO without a sound analysis, e.g. of expected host complexity, any such attempts are doomed to fail.

From a general perspective, it is desirable to add respective path before passes that might shred factoring opportunities, in particular: instruction scheduling, forward propagation (same code ending up in different chunks depending on context), instruction combination (dito), likely many other passes. But this makes it harder to precisely estimate costs, in particular register pressure effects or turning leafs into non-leafs.

Johann

Reply via email to