Hello All,
my MELT branch http://gcc.gnu.org/wiki/MiddleEndLispTranslator has a big
source file in it warm-basilys-0.c. It is "self" generated, about
14Mbytes & almost 280KLOC (in rev136334). It ends with a big
initialization routine of 100KLOC which mostly fills a 5000 member
structure (each member being itself a small structure) and calls a few
routines. This initialization routine has a simple control structure (no
deeply nested blocks or loops).
But gcc (either gcc-4.1 or 4.2 or 4.3 from Debian, or the bootsrapped
trunk rev136331) can compile this file without any optimisation ie with
-O0 -g3 in about 16 seconds and less than 1Gb RAM.
But on my 6 Gbytes machine (Core2, 2400MHz, Debian/Sid/AMD64) the cc1
process with -O2 (either 4.2, 4.3 or the trunk) eats nearly 10Gb of
virtual memory and trashes (using 4.8Gb of RAM, 1% cpu time, waiting for
the swap IO). The same happens with -O1. -Os is a bit better.
The time to run the
./built-melt-cc-script warm-basilys-0.c warm-basilys-0.so
which compiles warm-basilys-0.c with -O2 -fPIC is
(you can set the MELT_EXTRACFLAGS environment variable to pass
real 84m23.594s
user 6m23.496s
sys 1m5.032s
I am attaching the -ftime-report output for information. One of the most
demanding passes is tree operand scan
I find this report misleading on the memory consumption total (1591718kB
= 1.6Gb). The top command gives that cc1 needs nearly 10Gb of process
space, and uses nearly 5G (and trashes).
I won't be annoyed for long by this, since I'll soon split the
warm-basilys.bysl file (and hence the generated files) in several
distinct files. Until then, -O0 is enough for me.
Are there any specific flags to pass to gcc to lower the RAM consumption
(even at the expense of generated code quality)?
Are there any pragma-s to disable (or lower) optimisation of a single
routine?
My intuition (and experience) is that gcc -O2 (or even -O1) time and
space consumption is nearly quadratic on the size of the longest routine.
Thanks for reading.
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***
Execution times (seconds)
garbage collection : 7.16 ( 2%) usr 0.45 ( 1%) sys 47.16 ( 1%) wall
0 kB ( 0%) ggc
callgraph construction: 16.83 ( 4%) usr 0.10 ( 0%) sys 16.87 ( 0%) wall
41478 kB ( 3%) ggc
callgraph optimization: 9.82 ( 3%) usr 0.11 ( 0%) sys 9.95 ( 0%) wall
9184 kB ( 1%) ggc
ipa reference : 0.25 ( 0%) usr 0.02 ( 0%) sys 0.26 ( 0%) wall
52 kB ( 0%) ggc
ipa pure const : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
cfg cleanup : 2.76 ( 1%) usr 0.03 ( 0%) sys 2.91 ( 0%) wall
5120 kB ( 0%) ggc
CFG verifier : 11.22 ( 3%) usr 0.69 ( 1%) sys 177.08 ( 3%) wall
0 kB ( 0%) ggc
trivially dead code : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.80 ( 0%) wall
0 kB ( 0%) ggc
df reaching defs : 3.01 ( 1%) usr 0.49 ( 1%) sys 34.85 ( 1%) wall
0 kB ( 0%) ggc
df live regs : 3.46 ( 1%) usr 0.06 ( 0%) sys 3.57 ( 0%) wall
0 kB ( 0%) ggc
df live&initialized regs: 2.12 ( 1%) usr 0.00 ( 0%) sys 2.16 ( 0%) wall
0 kB ( 0%) ggc
df use-def / def-use chains: 1.61 ( 0%) usr 0.02 ( 0%) sys 1.75 ( 0%)
wall 0 kB ( 0%) ggc
df reg dead/unused notes: 1.07 ( 0%) usr 0.04 ( 0%) sys 1.10 ( 0%) wall
15075 kB ( 1%) ggc
register information : 0.51 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall
0 kB ( 0%) ggc
alias analysis : 1.05 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall
19781 kB ( 1%) ggc
register scan : 0.25 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall
163 kB ( 0%) ggc
rebuild jump labels : 0.53 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall
0 kB ( 0%) ggc
preprocessing : 1.24 ( 0%) usr 0.56 ( 1%) sys 1.93 ( 0%) wall
46597 kB ( 3%) ggc
lexical analysis : 0.30 ( 0%) usr 0.81 ( 1%) sys 1.29 ( 0%) wall
0 kB ( 0%) ggc
parser : 1.70 ( 0%) usr 0.49 ( 1%) sys 2.24 ( 0%) wall
123365 kB ( 8%) ggc
inline heuristics : 0.63 ( 0%) usr 0.01 ( 0%) sys 0.62 ( 0%) wall
5491 kB ( 0%) ggc
integration : 2.11 ( 1%) usr 0.22 ( 0%) sys 2.25 ( 0%) wall
168932 kB (11%) ggc
tree gimplify : 1.86 ( 0%) usr 0.05 ( 0%) sys 1.78 ( 0%) wall
109046 kB ( 7%) ggc
tree eh : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree CFG construction : 0.22 ( 0%) usr 0.01 ( 0%) sys 0.23 ( 0%) wall
69444 kB ( 4%) ggc
tree CFG cleanup : 3.42 ( 1%) usr 0.03 ( 0%) sys 4.15 ( 0%) wall
7307 kB ( 0%) ggc
tree VRP : 3.69 ( 1%) usr 0.24 ( 0%) sys 11.89 ( 0%) wall
115325 kB ( 7%) ggc
tree copy propagation : 1.80 ( 0%) usr 0.05 ( 0%) sys 3.50 ( 0%) wall
3511 kB ( 0%) ggc
tree find ref. vars : 0.12 ( 0%) usr 0.01 ( 0%) sys 0.12 ( 0%) wall
9570 kB ( 1%) ggc
tree PTA : 2.59 ( 1%) usr 0.61 ( 1%) sys 57.50 ( 1%) wall
17158 kB ( 1%) ggc
tree alias analysis : 1.13 ( 0%) usr 0.33 ( 1%) sys 26.66 ( 1%) wall
2461 kB ( 0%) ggc
tree call clobbering : 0.20 ( 0%) usr 0.00 ( 0%) sys 0.22 ( 0%) wall
10 kB ( 0%) ggc
tree flow sensitive alias: 0.46 ( 0%) usr 0.00 ( 0%) sys 0.53 ( 0%) wall
10992 kB ( 1%) ggc
tree flow insensitive alias: 8.41 ( 2%) usr 0.06 ( 0%) sys 8.96 ( 0%)
wall 0 kB ( 0%) ggc
tree memory partitioning: 0.38 ( 0%) usr 0.01 ( 0%) sys 0.41 ( 0%) wall
111 kB ( 0%) ggc
tree PHI insertion : 0.05 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
119 kB ( 0%) ggc
tree SSA rewrite : 1.44 ( 0%) usr 0.03 ( 0%) sys 1.46 ( 0%) wall
44376 kB ( 3%) ggc
tree SSA other : 0.09 ( 0%) usr 0.09 ( 0%) sys 0.27 ( 0%) wall
0 kB ( 0%) ggc
tree SSA incremental : 2.11 ( 1%) usr 0.14 ( 0%) sys 4.59 ( 0%) wall
4795 kB ( 0%) ggc
tree operand scan : 80.93 (21%) usr 0.92 ( 1%) sys 82.92 ( 2%) wall
71551 kB ( 4%) ggc
dominator optimization: 3.97 ( 1%) usr 0.06 ( 0%) sys 3.92 ( 0%) wall
84156 kB ( 5%) ggc
tree SRA : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
0 kB ( 0%) ggc
tree STORE-CCP : 0.47 ( 0%) usr 0.05 ( 0%) sys 0.69 ( 0%) wall
992 kB ( 0%) ggc
tree CCP : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall
1205 kB ( 0%) ggc
tree PHI const/copy prop: 0.06 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
77 kB ( 0%) ggc
tree split crit edges : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall
21401 kB ( 1%) ggc
tree reassociation : 0.43 ( 0%) usr 0.01 ( 0%) sys 0.45 ( 0%) wall
236 kB ( 0%) ggc
tree PRE : 13.92 ( 4%) usr 52.21 (81%) sys4339.32 (86%) wall
109776 kB ( 7%) ggc
tree FRE : 4.18 ( 1%) usr 2.51 ( 4%) sys 6.69 ( 0%) wall
61570 kB ( 4%) ggc
tree code sinking : 0.53 ( 0%) usr 0.03 ( 0%) sys 1.54 ( 0%) wall
1578 kB ( 0%) ggc
tree linearize phis : 0.16 ( 0%) usr 0.01 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
tree forward propagate: 0.36 ( 0%) usr 0.03 ( 0%) sys 0.35 ( 0%) wall
2466 kB ( 0%) ggc
tree phiprop : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
tree conservative DCE : 0.93 ( 0%) usr 0.01 ( 0%) sys 0.91 ( 0%) wall
20 kB ( 0%) ggc
tree aggressive DCE : 0.28 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
0 kB ( 0%) ggc
tree DSE : 0.35 ( 0%) usr 0.01 ( 0%) sys 0.33 ( 0%) wall
562 kB ( 0%) ggc
PHI merge : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
0 kB ( 0%) ggc
loop invariant motion : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
6 kB ( 0%) ggc
complete unrolling : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
316 kB ( 0%) ggc
tree iv optimization : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
7 kB ( 0%) ggc
tree loop init : 0.29 ( 0%) usr 0.00 ( 0%) sys 0.33 ( 0%) wall
281 kB ( 0%) ggc
tree loop fini : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
tree copy headers : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.09 ( 0%) wall
524 kB ( 0%) ggc
tree SSA uncprop : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.08 ( 0%) wall
0 kB ( 0%) ggc
tree SSA to normal : 52.85 (14%) usr 0.27 ( 0%) sys 53.12 ( 1%) wall
25180 kB ( 2%) ggc
tree rename SSA copies: 0.22 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
0 kB ( 0%) ggc
tree SSA verifier : 21.08 ( 6%) usr 0.19 ( 0%) sys 21.67 ( 0%) wall
4603 kB ( 0%) ggc
tree STMT verifier : 47.77 (12%) usr 1.47 ( 2%) sys 49.16 ( 1%) wall
0 kB ( 0%) ggc
callgraph verifier : 0.86 ( 0%) usr 0.00 ( 0%) sys 0.93 ( 0%) wall
2891 kB ( 0%) ggc
dominance frontiers : 0.13 ( 0%) usr 0.00 ( 0%) sys 0.25 ( 0%) wall
0 kB ( 0%) ggc
dominance computation : 3.59 ( 1%) usr 0.04 ( 0%) sys 3.55 ( 0%) wall
0 kB ( 0%) ggc
control dependences : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
expand : 11.91 ( 3%) usr 0.31 ( 0%) sys 21.34 ( 0%) wall
172552 kB (11%) ggc
lower subreg : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
jump : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
0 kB ( 0%) ggc
forward prop : 0.71 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall
18126 kB ( 1%) ggc
CSE : 4.33 ( 1%) usr 0.03 ( 0%) sys 4.51 ( 0%) wall
7344 kB ( 0%) ggc
dead code elimination : 0.63 ( 0%) usr 0.00 ( 0%) sys 0.58 ( 0%) wall
0 kB ( 0%) ggc
dead store elim1 : 1.24 ( 0%) usr 0.00 ( 0%) sys 1.27 ( 0%) wall
14629 kB ( 1%) ggc
dead store elim2 : 0.65 ( 0%) usr 0.01 ( 0%) sys 0.65 ( 0%) wall
11488 kB ( 1%) ggc
loop analysis : 0.18 ( 0%) usr 0.00 ( 0%) sys 0.21 ( 0%) wall
278 kB ( 0%) ggc
global CSE : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall
0 kB ( 0%) ggc
CPROP 1 : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
4114 kB ( 0%) ggc
PRE : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.46 ( 0%) wall
3000 kB ( 0%) ggc
CPROP 2 : 0.31 ( 0%) usr 0.00 ( 0%) sys 0.30 ( 0%) wall
3110 kB ( 0%) ggc
bypass jumps : 0.21 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
2539 kB ( 0%) ggc
CSE 2 : 4.29 ( 1%) usr 0.02 ( 0%) sys 4.21 ( 0%) wall
5306 kB ( 0%) ggc
branch prediction : 0.66 ( 0%) usr 0.01 ( 0%) sys 0.67 ( 0%) wall
3048 kB ( 0%) ggc
combiner : 1.60 ( 0%) usr 0.01 ( 0%) sys 1.72 ( 0%) wall
22097 kB ( 1%) ggc
if-conversion : 0.70 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall
456 kB ( 0%) ggc
regmove : 0.91 ( 0%) usr 0.01 ( 0%) sys 0.87 ( 0%) wall
118 kB ( 0%) ggc
local alloc : 4.45 ( 1%) usr 0.01 ( 0%) sys 4.49 ( 0%) wall
11555 kB ( 1%) ggc
global alloc : 9.35 ( 2%) usr 0.03 ( 0%) sys 9.42 ( 0%) wall
37993 kB ( 2%) ggc
reload CSE regs : 1.83 ( 0%) usr 0.02 ( 0%) sys 1.90 ( 0%) wall
30852 kB ( 2%) ggc
thread pro- & epilogue: 0.24 ( 0%) usr 0.00 ( 0%) sys 0.19 ( 0%) wall
1494 kB ( 0%) ggc
if-conversion 2 : 0.17 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
143 kB ( 0%) ggc
peephole 2 : 0.27 ( 0%) usr 0.00 ( 0%) sys 0.28 ( 0%) wall
2505 kB ( 0%) ggc
rename registers : 0.93 ( 0%) usr 0.00 ( 0%) sys 0.94 ( 0%) wall
93 kB ( 0%) ggc
scheduling 2 : 2.72 ( 1%) usr 0.01 ( 0%) sys 2.75 ( 0%) wall
1617 kB ( 0%) ggc
machine dep reorg : 0.34 ( 0%) usr 0.00 ( 0%) sys 0.40 ( 0%) wall
385 kB ( 0%) ggc
reorder blocks : 0.72 ( 0%) usr 0.00 ( 0%) sys 0.66 ( 0%) wall
6485 kB ( 0%) ggc
final : 1.07 ( 0%) usr 0.02 ( 0%) sys 1.16 ( 0%) wall
8151 kB ( 1%) ggc
symout : 0.03 ( 0%) usr 0.01 ( 0%) sys 0.04 ( 0%) wall
2181 kB ( 0%) ggc
tree if-combine : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 382.44 64.16 5061.26
1591718 kB