> 
> Hello,
>   taking latest trunk gcc, I built Firefox and Chromium. Both
> projects compiled without debugging symbols and -O2 on an 8-core
> machine.
> 
> Firefox:
> -flto=9, peak memory usage (in LTRANS): 11GB
> 
> Chromium:
> -flto=6, peak memory usage (in parallel WPA phase ): 16.5GB

I see, the ltrans memory use is however about the same later in the game.
> 
> For details please see attached with graphs. The attachment contains
> also -fmem-report and -fmem-report-wpa.
> I think reduced memory footprint to ~3.5GB is a bit optimistic:
> http://gcc.gnu.org/gcc-4.9/changes.html

I will need to re-measure my setup - it is what I got last time with basically
same configuration.  It depends on parallelism, you should get sub 4GB peak
with -flto=1, right? We should clarify this in changes.html.
> 
> Is there any way we can reduce the memory footprint?

Looking at the memreport we get for ggc memory:

Chromium:
cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          0: 
0.0%  274319552: 4.8%          0: 0.0%    2637688
cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          0: 
0.0%  426228128: 7.5%          0: 0.0%    1299476
toplev.c:960 (realloc_for_line_map)                       0: 0.0%  357908640: 
3.8% 1073743896:18.8%        184: 0.0%         10
tree-streamer-in.c:621 (streamer_alloc_tree)      216054000:86.6% 
7623611824:80.2% 2536849136:44.5%   57818592:36.0%   69421368
Total                                             249562346       9504578411    
   5700671942        160593619         97146243
source location                                     Garbage            Freed    
         Leak         Overhead            Times

Firefox:
cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          0: 
0.0%  130358176: 6.9%          0: 0.0%    1253444
cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          0: 
0.0%  182236800: 9.7%          0: 0.0%     555600
toplev.c:960 (realloc_for_line_map)                       0: 0.0%   89503888: 
5.5%  268468240:14.3%        160: 0.0%         13
tree-streamer-in.c:621 (streamer_alloc_tree)       93089976:77.5%  
972848816:59.6%  639230248:33.9%   21332480:32.3%   13496198
Total                                             120076578       1632997043    
   1883064062         65981723         24732501
source location                                     Garbage            Freed    
         Leak         Overhead            Times

So chromium uses quite a lot more trees and also seem to have about twice as 
many functions.
Next time, it is useful to include -Q while collecting the data - it shows 
individual GGC runs and also
memory usage accounted per pass.  That way we would know if there are a lot 
more functions to start with, or just
more inlining going on.

I have older patch that introduces cache to line map stremaing reducing its 
size quite a bit, that should save
some memory of realloc_for_line_map.
I will dig it out and update to current tree.

I also wonder where the rest of memory goes, since the graphs shows about 10GB 
for Firefox.
Some is probably accounting of mmap files, also gold's memory usage.
We collect only some of memory usage that is not in ggc. Vectors:

Chromium:
ipa-cp.c:2421 (grow_edge_clone_vectors)            17225752: 6.9%   17225752    
           1: 0.0%           
vec.h:1393 (copy)                                  17291228: 6.9%  100465316    
     1499009: 3.7%           
lto-cgraph.c:141 (lto_symtab_encoder_encode)       30436272:12.2%   53192752    
        1460: 0.0%           
passes.c:2254 (execute_one_pass)                   53853360:21.6%   83885960    
     1426939: 3.5%           
ipa-inline-analysis.c:974 (inline_summary_alloc)   84406056:33.8%  137806000    
      484472: 1.2%         
Total                                             249721648                     
     40747241
Firefox:
ipa-cp.c:2421 (grow_edge_clone_vectors)             7753312: 6.1%    7753312    
           1: 0.0%
ipa-inline-analysis.c:4053 (read_inline_edge_sum    8758216: 6.9%   26420804    
      909584: 4.9%
ipa-ref.c:54 (ipa_record_reference)                10747880: 8.4%   20943288    
      371083: 2.0%
lto-cgraph.c:141 (lto_symtab_encoder_encode)       19756008:15.5%   23548272    
        1335: 0.0%
passes.c:2254 (execute_one_pass)                   26769688:21.0%   41942904    
      716378: 3.9%
ipa-inline-analysis.c:974 (inline_summary_alloc)   40110248:31.5%   62026480    
      284283: 1.5%
Total                                             127480444                     
     18430703

that seems as usual. 249MB seems acceptable.

Bitmaps seems to be dominated by ipa-reference.  On Chromium this pass seems to 
go crazy, having
about 800000MB of bitmaps.  Perhaps you could try to get data with 
-fno-ipa-reference?

We ought to get stats on hashtables, since these probably consume quite some 
memory
during LTO streaing.
Could you perhaps also get -flto-report?

Honza
> 
> Attachment (due to size restriction): 
> https://drive.google.com/file/d/0B0pisUJ80pO1bnV5V0RtWXJkaVU/edit?usp=sharing
> 
> Thank you,
> Martin

Reply via email to