http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56113
Steven Bosscher <steven at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2013-01-27 Ever Confirmed|0 |1 --- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2013-01-27 00:16:35 UTC --- I do not believe this kind of test case can ever be compiled by any reasonable compiler. The problem is not the labels, but the computed gotos. This causes an exceptionally dense CFG with N_LABELS**2 edges. GCC tries to factor these computed gotos early on in the passes pipeline but expands them near the end of the passes pipeline. The function is still simply too large to handle. But anyway, let's all pretend for a moment that this test case is sane... On powerpc64-unknown-linux-gnu, with GCC 4.8 trunk r195103, compile time grows nicely linear with the number of computed gotos in the function, but memory usage appears to grow quadratically: $ for n in 1000 2000 3000 4000 5000 10000 20000 40000 ; do > ./goto_gen.py ${n} t.c > ~/bin/compacttime ./cc1 -quiet -O1 t.c > done DBG: 1000 29.48user 0.39system 0:29.88elapsed 99%CPU (0avgtext+0avgdata 172352maxresident)k DBG: 2000 70.07user 0.56system 1:10.64elapsed 99%CPU (0avgtext+0avgdata 386496maxresident)k DBG: 3000 137.31user 1.45system 2:18.80elapsed 99%CPU (0avgtext+0avgdata 664960maxresident)k DBG: 4000 190.45user 2.43system 3:12.92elapsed 99%CPU (0avgtext+0avgdata 1050560maxresident)k DBG: 5000 318.18user 2.90system 5:21.16elapsed 99%CPU (0avgtext+0avgdata 629760maxresident)k DBG: 10000 1374.55user 13.03system 23:07.66elapsed 99%CPU (0avgtext+0avgdata 1701120maxresident)k DBG: 20000 3587.06user 39.71system 1:00:27.19elapsed 99%CPU (0avgtext+0avgdata 6343296maxresident)k DBG: 40000 (...still running... top says: 4567 stevenb 20 0 23.4g 23g 17m R 100.1 37.6 26:20.65 cc1 so it's good this machine has 64GB plus some swap -- it might just make it all the way through to the end, some day... :-) (Above 5000 computed gotos in the test case, there appears to be a tipping point where some transformation decides the function is too big to handle, causing memory usage to actually drop at that point. I'm guessing that's IRA's conflict table toggle, but I'm not sure.) >From the point of view of compile time, this is not great but not unreasonable either. According to -ftime-report, there are no passes that stand out as not scaling well for compile time of this test case. Everything looks reasonable and as expected: parsing, gimplify, expand, combine, and IRA top the GC memory lists (being IR rewriting passes), and top time consumers for n=20000 are: tree PTA :1093.69 (30%) usr 6250 kB ( 0%) ggc tree SSA incremental : 858.29 (24%) usr 0 kB ( 0%) ggc forward prop : 316.48 ( 9%) usr 64614 kB ( 4%) ggc The quadratic increase of the memory footprint is a problem, though... The test case with 20000 computed gotos has 360025 lines of code, which expand to ~10 instructions per line of code. Memory peaks at ~6.3GB. That doesn't look reasonable if you do a bit of math: * There are ~3 gimple_statement_with_memory_ops per input line, with typically 3 operands. These are 128 bytes, consuming an estimated memory of 3*360000*128=131MB. Other than that, there are a many small PHIs (and a few large PHIs for the factored computed goto and for the return statement) that consume a significant amount of memory. Let's assume we need about 6 times the 131MB to to account for all stmt operands and for PHI nodes. That's ~1GB total for the function body. * Everything else is relatively small. Per computed goto there are initially 7 basic blocks and 8 edges, and 8 label_decls. On a 64bit host, sizeof(struct basic_block_def)=108, sizeof(struct edge_def)=56, and sizeof(struct tree_label_decl)=128. With alignment that's 128, 64, and 128. That's 20000*(7*128+8*64+8*128)=46MB for the CFG and labels. Nothing else of any significance lives in GC memory early in the compilation pipeline. * Function bodies as RTL tend to be much smaller than GIMPLE, accounting for maybe 200MB or so in this case (1/5th of the size of the GIMPLE body is typical for pre-processed GCC itself). * The above estimates are supported by -ftime-report's total allocated GC memory counter: 1476435kB. That means the other 4.8GB is allocated in non-GC space, and that non-GC memory dominates the memory footprint. For n=10000, max. GC=741582kB, and max. resident is 1.7GB. So here, too, ~1GB of on-GC memory dominates the memory footprint. For n=40000 peak memory is ~23.4GB so far. So tabulated: n max. mem max. GC mem max. non-GC mem 10000 1.7GB 741582kB ~1GB 20000 6.3GB 1476435kB ~4.8GB 40000 23.4GB ~2900000kB ~20.5GB A compiler built with --enable-gather-detailed-mem-stats should be able to tell where and for what that memory is allocated. The peak memory usage happens pretty early in the compilation process, so I'm guessing the memory is allocated for PTA.