http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56113



Steven Bosscher <steven at gcc dot gnu.org> changed:



           What    |Removed                     |Added

----------------------------------------------------------------------------

             Status|UNCONFIRMED                 |NEW

   Last reconfirmed|                            |2013-01-27

     Ever Confirmed|0                           |1



--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2013-01-27 
00:16:35 UTC ---

I do not believe this kind of test case can ever be compiled by any

reasonable compiler. The problem is not the labels, but the computed

gotos. This causes an exceptionally dense CFG with N_LABELS**2 edges.

GCC tries to factor these computed gotos early on in the passes

pipeline but expands them near the end of the passes pipeline. The

function is still simply too large to handle. But anyway, let's all

pretend for a moment that this test case is sane...





On powerpc64-unknown-linux-gnu, with GCC 4.8 trunk r195103, compile

time grows nicely linear with the number of computed gotos in the

function, but memory usage appears to grow quadratically:



$ for n in 1000 2000 3000 4000 5000 10000 20000 40000 ; do

>     ./goto_gen.py ${n} t.c

>    ~/bin/compacttime ./cc1 -quiet -O1 t.c

> done

DBG: 1000

29.48user 0.39system 0:29.88elapsed 99%CPU

(0avgtext+0avgdata 172352maxresident)k

DBG: 2000

70.07user 0.56system 1:10.64elapsed 99%CPU

(0avgtext+0avgdata 386496maxresident)k

DBG: 3000

137.31user 1.45system 2:18.80elapsed 99%CPU

(0avgtext+0avgdata 664960maxresident)k

DBG: 4000

190.45user 2.43system 3:12.92elapsed 99%CPU

(0avgtext+0avgdata 1050560maxresident)k

DBG: 5000

318.18user 2.90system 5:21.16elapsed 99%CPU

(0avgtext+0avgdata 629760maxresident)k

DBG: 10000

1374.55user 13.03system 23:07.66elapsed 99%CPU

(0avgtext+0avgdata 1701120maxresident)k

DBG: 20000

3587.06user 39.71system 1:00:27.19elapsed 99%CPU

(0avgtext+0avgdata 6343296maxresident)k

DBG: 40000

(...still running... top says:

 4567 stevenb   20   0 23.4g  23g  17m R 100.1 37.6  26:20.65 cc1

so it's good this machine has 64GB plus some swap -- it might just

make it all the way through to the end, some day... :-)



(Above 5000 computed gotos in the test case, there appears to be a

tipping point where some transformation decides the function is

too big to handle, causing memory usage to actually drop at that

point. I'm guessing that's IRA's conflict table toggle, but I'm

not sure.)





>From the point of view of compile time, this is not great but not

unreasonable either. According to -ftime-report, there are no passes

that stand out as not scaling well for compile time of this test

case. Everything looks reasonable and as expected: parsing, gimplify,

expand, combine, and IRA top the GC memory lists (being IR rewriting

passes), and top time consumers for n=20000 are:



 tree PTA                :1093.69 (30%) usr  6250 kB ( 0%) ggc

 tree SSA incremental    : 858.29 (24%) usr     0 kB ( 0%) ggc

 forward prop            : 316.48 ( 9%) usr 64614 kB ( 4%) ggc





The quadratic increase of the memory footprint is a problem, though...



The test case with 20000 computed gotos has 360025 lines of code, which

expand to ~10 instructions per line of code. Memory peaks at ~6.3GB.

That doesn't look reasonable if you do a bit of math:



* There are ~3 gimple_statement_with_memory_ops per input line, with

typically 3 operands. These are 128 bytes, consuming an estimated

memory of 3*360000*128=131MB. Other than that, there are a many small

PHIs (and a few large PHIs for the factored computed goto and for the

return statement) that consume a significant amount of memory.

Let's assume we need about 6 times the 131MB to to account for all stmt

operands and for PHI nodes. That's ~1GB total for the function body.



* Everything else is relatively small. Per computed goto there are

initially 7 basic blocks and 8 edges, and 8 label_decls.  On a 64bit

host, sizeof(struct basic_block_def)=108, sizeof(struct edge_def)=56,

and sizeof(struct tree_label_decl)=128.

With alignment that's 128, 64, and 128.

That's 20000*(7*128+8*64+8*128)=46MB for the CFG and labels.

Nothing else of any significance lives in GC memory early in the

compilation pipeline.



* Function bodies as RTL tend to be much smaller than GIMPLE, accounting

for maybe 200MB or so in this case (1/5th of the size of the GIMPLE body

is typical for pre-processed GCC itself).



* The above estimates are supported by -ftime-report's total allocated

GC memory counter: 1476435kB. That means the other 4.8GB is allocated

in non-GC space, and that non-GC memory dominates the memory footprint.





For n=10000, max. GC=741582kB, and max. resident is 1.7GB.  So here,

too, ~1GB of on-GC memory dominates the memory footprint.  For n=40000

peak memory is ~23.4GB so far. So tabulated:



n      max. mem    max. GC mem    max. non-GC mem

10000  1.7GB        741582kB       ~1GB

20000  6.3GB       1476435kB       ~4.8GB

40000  23.4GB     ~2900000kB       ~20.5GB



A compiler built with --enable-gather-detailed-mem-stats should be able

to tell where and for what that memory is allocated. The peak memory

usage happens pretty early in the compilation process, so I'm guessing

the memory is allocated for PTA.

Reply via email to