> I resend the mail, because I was given 502 error.
> 
> On 04/03/2014 12:43 AM, Jan Hubicka wrote:
> >>Hello,
> >>   taking latest trunk gcc, I built Firefox and Chromium. Both
> >>projects compiled without debugging symbols and -O2 on an 8-core
> >>machine.
> >>
> >>Firefox:
> >>-flto=9, peak memory usage (in LTRANS): 11GB
> >>
> >>Chromium:
> >>-flto=6, peak memory usage (in parallel WPA phase ): 16.5GB
> >I see, the ltrans memory use is however about the same later in the game.
> >>For details please see attached with graphs. The attachment contains
> >>also -fmem-report and -fmem-report-wpa.
> >>I think reduced memory footprint to ~3.5GB is a bit optimistic:
> >>http://gcc.gnu.org/gcc-4.9/changes.html
> >I will need to re-measure my setup - it is what I got last time with 
> >basically
> >same configuration.  It depends on parallelism, you should get sub 4GB peak
> >with -flto=1, right? We should clarify this in changes.html.
> >>Is there any way we can reduce the memory footprint?
> >Looking at the memreport we get for ggc memory:
> >
> >Chromium:
> >cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          
> >0: 0.0%  274319552: 4.8%          0: 0.0%    2637688
> >cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          
> >0: 0.0%  426228128: 7.5%          0: 0.0%    1299476
> >toplev.c:960 (realloc_for_line_map)                       0: 0.0%  
> >357908640: 3.8% 1073743896:18.8%        184: 0.0%         10
> >tree-streamer-in.c:621 (streamer_alloc_tree)      216054000:86.6% 
> >7623611824:80.2% 2536849136:44.5%   57818592:36.0%   69421368
> >Total                                             249562346       9504578411 
> >      5700671942        160593619         97146243
> >source location                                     Garbage            Freed 
> >            Leak         Overhead            Times
> >
> >Firefox:
> >cgraph.c:869 (cgraph_create_edge_1)                       0: 0.0%          
> >0: 0.0%  130358176: 6.9%          0: 0.0%    1253444
> >cgraph.c:510 (cgraph_allocate_node)                       0: 0.0%          
> >0: 0.0%  182236800: 9.7%          0: 0.0%     555600
> >toplev.c:960 (realloc_for_line_map)                       0: 0.0%   
> >89503888: 5.5%  268468240:14.3%        160: 0.0%         13
> >tree-streamer-in.c:621 (streamer_alloc_tree)       93089976:77.5%  
> >972848816:59.6%  639230248:33.9%   21332480:32.3%   13496198
> >Total                                             120076578       1632997043 
> >      1883064062         65981723         24732501
> >source location                                     Garbage            Freed 
> >            Leak         Overhead            Times
> >
> >So chromium uses quite a lot more trees and also seem to have about twice as 
> >many functions.
> >Next time, it is useful to include -Q while collecting the data - it shows 
> >individual GGC runs and also
> >memory usage accounted per pass.  That way we would know if there are a lot 
> >more functions to start with, or just
> >more inlining going on.
> >
> >I have older patch that introduces cache to line map stremaing reducing its 
> >size quite a bit, that should save
> >some memory of realloc_for_line_map.
> >I will dig it out and update to current tree.
> >
> >I also wonder where the rest of memory goes, since the graphs shows about 
> >10GB for Firefox.
> >Some is probably accounting of mmap files, also gold's memory usage.
> >We collect only some of memory usage that is not in ggc. Vectors:
> >
> >Chromium:
> >ipa-cp.c:2421 (grow_edge_clone_vectors)            17225752: 6.9%   17225752 
> >              1: 0.0%
> >vec.h:1393 (copy)                                  17291228: 6.9%  100465316 
> >        1499009: 3.7%
> >lto-cgraph.c:141 (lto_symtab_encoder_encode)       30436272:12.2%   53192752 
> >           1460: 0.0%
> >passes.c:2254 (execute_one_pass)                   53853360:21.6%   83885960 
> >        1426939: 3.5%
> >ipa-inline-analysis.c:974 (inline_summary_alloc)   84406056:33.8%  137806000 
> >         484472: 1.2%

Actually, one way to save some memory would be also to free inline summaries
(they ought to cost more than the 84MB reported here, with predicates and other
stuff hooked to them) and only record function sizes that are actually used by
the partitioner. Will give it a try tomorrow.

The passes.c use probably would benefit from disabling the default vector 
growth.
Somewhere I have patch to make vec.h properly account copy operation.

Honza

Reply via email to