On 09/27/2014 07:59 AM, Markus Trippelsdorf wrote:
> On 2014.09.27 at 01:27 +0200, Jan Hubicka wrote:
>>> While a plain Firefox -flto build works fine. LTO/PGO build fails with:
>>>
>>> lto1: internal compiler error: in ipa_merge_profiles, at ipa-utils.c:540
>>> 0x7d6165 ipa_merge_profiles(cgraph_node*, cgraph_node*)
>>>         ../../gcc/gcc/ipa-utils.c:540
>>> 0xf10c41 ipa_icf::sem_function::merge(ipa_icf::sem_item*)
>>>         ../../gcc/gcc/ipa-icf.c:753
>>> 0xf15206 ipa_icf::sem_item_optimizer::merge_classes(unsigned int)
>>>         ../../gcc/gcc/ipa-icf.c:2706
>>> 0xf1c1f4 ipa_icf::sem_item_optimizer::execute()
>>>         ../../gcc/gcc/ipa-icf.c:2098
>>> 0xf1d3f1 ipa_icf_driver
>>>         ../../gcc/gcc/ipa-icf.c:2784
>>> 0xf1d3f1 ipa_icf::pass_ipa_icf::execute(function*)
>>>         ../../gcc/gcc/ipa-icf.c:2831
>>>
>>>
>>> The pass is also very memory hungry (from 3GB without ICF to 4GB during
>>> libxul link), while the code size savings are in the 1% range.
>>
>> Thnks for checking. I was just thinking about doing that myself.  Would
>> you mind posting -ftime-report of firefox WPA stage?
> 
> (without ICF)
> Execution times (seconds)
>  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall    1412 kB ( 0%) ggc
>  phase opt and generate  :  58.38 (63%) usr   2.00 (47%) sys  60.37 (40%) 
> wall  403069 kB (12%) ggc
>  phase stream in         :  30.24 (33%) usr   0.97 (23%) sys  33.90 (22%) 
> wall 2944210 kB (88%) ggc
>  phase stream out        :   4.29 ( 5%) usr   1.32 (31%) sys  57.32 (38%) 
> wall       0 kB ( 0%) ggc
>  phase finalize          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.13 ( 0%) 
> wall       0 kB ( 0%) ggc
>  garbage collection      :   3.68 ( 4%) usr   0.00 ( 0%) sys   3.68 ( 2%) 
> wall       0 kB ( 0%) ggc
>  callgraph optimization  :   0.50 ( 1%) usr   0.00 ( 0%) sys   0.50 ( 0%) 
> wall     166 kB ( 0%) ggc
>  ipa dead code removal   :   6.91 ( 7%) usr   0.08 ( 2%) sys   7.25 ( 5%) 
> wall       0 kB ( 0%) ggc
>  ipa virtual call target :   7.08 ( 8%) usr   0.04 ( 1%) sys   6.93 ( 5%) 
> wall       0 kB ( 0%) ggc
>  ipa devirtualization    :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) 
> wall   10365 kB ( 0%) ggc
>  ipa cp                  :   1.81 ( 2%) usr   0.06 ( 1%) sys   3.40 ( 2%) 
> wall  173701 kB ( 5%) ggc
>  ipa inlining heuristics :  16.60 (18%) usr   0.27 ( 6%) sys  17.48 (12%) 
> wall  532704 kB (16%) ggc
>  ipa comdats             :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa lto gimple out      :   0.21 ( 0%) usr   0.04 ( 1%) sys   0.97 ( 1%) 
> wall       0 kB ( 0%) ggc
>  ipa lto decl in         :  18.29 (20%) usr   0.54 (13%) sys  18.96 (12%) 
> wall 2226088 kB (66%) ggc
>  ipa lto decl out        :   3.93 ( 4%) usr   0.13 ( 3%) sys   4.06 ( 3%) 
> wall       0 kB ( 0%) ggc
>  ipa lto constructors in :   0.24 ( 0%) usr   0.03 ( 1%) sys   0.59 ( 0%) 
> wall   14226 kB ( 0%) ggc
>  ipa lto constructors out:   0.08 ( 0%) usr   0.04 ( 1%) sys   0.15 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.89 ( 1%) usr   0.12 ( 3%) sys   1.02 ( 1%) 
> wall  364151 kB (11%) ggc
>  ipa lto decl merge      :   2.14 ( 2%) usr   0.01 ( 0%) sys   2.14 ( 1%) 
> wall    8196 kB ( 0%) ggc
>  ipa lto cgraph merge    :   1.59 ( 2%) usr   0.00 ( 0%) sys   1.60 ( 1%) 
> wall   12716 kB ( 0%) ggc
>  whopr wpa               :   1.54 ( 2%) usr   0.03 ( 1%) sys   1.55 ( 1%) 
> wall       1 kB ( 0%) ggc
>  whopr wpa I/O           :   0.04 ( 0%) usr   1.11 (26%) sys  52.10 (34%) 
> wall       0 kB ( 0%) ggc
>  whopr partitioning      :   5.02 ( 5%) usr   0.01 ( 0%) sys   5.03 ( 3%) 
> wall    4938 kB ( 0%) ggc
>  ipa reference           :   2.04 ( 2%) usr   0.02 ( 0%) sys   2.08 ( 1%) 
> wall       0 kB ( 0%) ggc
>  ipa profile             :   0.32 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa pure const          :   2.43 ( 3%) usr   0.02 ( 0%) sys   2.49 ( 2%) 
> wall       0 kB ( 0%) ggc
>  tree STMT verifier      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) 
> wall       0 kB ( 0%) ggc
>  callgraph verifier      :  16.31 (18%) usr   1.69 (39%) sys  17.96 (12%) 
> wall       0 kB ( 0%) ggc
>  dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> wall       0 kB ( 0%) ggc
>  varconst                :   0.01 ( 0%) usr   0.03 ( 1%) sys   0.05 ( 0%) 
> wall       0 kB ( 0%) ggc
>  unaccounted todo        :   0.69 ( 1%) usr   0.00 ( 0%) sys   0.69 ( 0%) 
> wall       0 kB ( 0%) ggc
>  TOTAL                 :  92.91             4.29           151.73            
> 3348693 kB
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.
> 
> (with ICF)
> Execution times (seconds)
>  phase setup             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall    1412 kB ( 0%) ggc
>  phase opt and generate  :  82.70 (70%) usr   3.31 (53%) sys  86.17 (45%) 
> wall 1468975 kB (33%) ggc
>  phase stream in         :  30.46 (26%) usr   1.02 (16%) sys  31.48 (16%) 
> wall 2944210 kB (67%) ggc
>  phase stream out        :   4.52 ( 4%) usr   1.90 (30%) sys  73.47 (38%) 
> wall      12 kB ( 0%) ggc
>  phase finalize          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) 
> wall       0 kB ( 0%) ggc
>  garbage collection      :   7.01 ( 6%) usr   0.00 ( 0%) sys   6.99 ( 4%) 
> wall       0 kB ( 0%) ggc
>  callgraph optimization  :   0.49 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 0%) 
> wall     166 kB ( 0%) ggc
>  ipa dead code removal   :   6.98 ( 6%) usr   0.13 ( 2%) sys   6.89 ( 4%) 
> wall       0 kB ( 0%) ggc
>  ipa virtual call target :   6.93 ( 6%) usr   0.03 ( 0%) sys   7.20 ( 4%) 
> wall       6 kB ( 0%) ggc
>  ipa devirtualization    :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) 
> wall   10365 kB ( 0%) ggc
>  ipa cp                  :   1.87 ( 2%) usr   0.11 ( 2%) sys   2.00 ( 1%) 
> wall  167204 kB ( 4%) ggc
>  ipa inlining heuristics :  17.15 (15%) usr   0.21 ( 3%) sys  17.35 ( 9%) 
> wall  512636 kB (12%) ggc
>  ipa comdats             :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.19 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa lto gimple in       :   5.17 ( 4%) usr   1.04 (17%) sys   6.51 ( 3%) 
> wall  855058 kB (19%) ggc
>  ipa lto gimple out      :   0.38 ( 0%) usr   0.08 ( 1%) sys   3.07 ( 2%) 
> wall      12 kB ( 0%) ggc
>  ipa lto decl in         :  18.38 (16%) usr   0.56 ( 9%) sys  18.95 (10%) 
> wall 2226088 kB (50%) ggc
>  ipa lto decl out        :   3.95 ( 3%) usr   0.08 ( 1%) sys   4.03 ( 2%) 
> wall       0 kB ( 0%) ggc
>  ipa lto constructors in :   0.29 ( 0%) usr   0.01 ( 0%) sys   0.29 ( 0%) 
> wall   14389 kB ( 0%) ggc
>  ipa lto constructors out:   0.10 ( 0%) usr   0.03 ( 0%) sys   0.58 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa lto cgraph I/O      :   0.91 ( 1%) usr   0.10 ( 2%) sys   1.02 ( 1%) 
> wall  364151 kB ( 8%) ggc
>  ipa lto decl merge      :   2.14 ( 2%) usr   0.00 ( 0%) sys   2.14 ( 1%) 
> wall    8196 kB ( 0%) ggc
>  ipa lto cgraph merge    :   1.65 ( 1%) usr   0.01 ( 0%) sys   1.66 ( 1%) 
> wall   12716 kB ( 0%) ggc
>  whopr wpa               :   1.81 ( 2%) usr   0.01 ( 0%) sys   1.85 ( 1%) 
> wall       1 kB ( 0%) ggc
>  whopr wpa I/O           :   0.05 ( 0%) usr   1.71 (27%) sys  65.75 (34%) 
> wall       0 kB ( 0%) ggc
>  whopr partitioning      :   5.05 ( 4%) usr   0.00 ( 0%) sys   5.06 ( 3%) 
> wall    5012 kB ( 0%) ggc
>  ipa reference           :   2.13 ( 2%) usr   0.03 ( 0%) sys   2.16 ( 1%) 
> wall       0 kB ( 0%) ggc
>  ipa profile             :   0.32 ( 0%) usr   0.01 ( 0%) sys   0.33 ( 0%) 
> wall       0 kB ( 0%) ggc
>  ipa pure const          :   2.57 ( 2%) usr   0.00 ( 0%) sys   2.56 ( 1%) 
> wall       0 kB ( 0%) ggc
>  ipa icf                 :   6.88 ( 6%) usr   0.08 ( 1%) sys   7.01 ( 4%) 
> wall     855 kB ( 0%) ggc
>  tree SSA rewrite        :   0.23 ( 0%) usr   0.06 ( 1%) sys   0.28 ( 0%) 
> wall   33946 kB ( 1%) ggc
>  tree SSA incremental    :   0.42 ( 0%) usr   0.05 ( 1%) sys   0.53 ( 0%) 
> wall   21099 kB ( 0%) ggc
>  tree operand scan       :   0.47 ( 0%) usr   0.08 ( 1%) sys   0.34 ( 0%) 
> wall  181275 kB ( 4%) ggc
>  tree STMT verifier      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall       0 kB ( 0%) ggc
>  callgraph verifier      :  22.76 (19%) usr   1.68 (27%) sys  24.44 (13%) 
> wall       0 kB ( 0%) ggc
>  dominance frontiers     :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) 
> wall       0 kB ( 0%) ggc
>  dominance computation   :   0.19 ( 0%) usr   0.05 ( 1%) sys   0.25 ( 0%) 
> wall       0 kB ( 0%) ggc
>  varconst                :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) 
> wall       0 kB ( 0%) ggc
>  loop fini               :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) 
> wall       0 kB ( 0%) ggc
>  unaccounted todo        :   0.82 ( 1%) usr   0.00 ( 0%) sys   0.81 ( 0%) 
> wall       0 kB ( 0%) ggc
>  TOTAL                 : 117.68             6.23           191.15            
> 4414612 kB
> Extra diagnostic checks enabled; compiler may run slowly.
> Configure with --enable-checking=release to disable checks.
> 
>> It seems that in this case we reject too many of equality candidates?
>> It think the original numbers was about 4-5% but later some equivalences was
>> disabled because of devirt/aliasing issues. Do you compare it with gold ICF
>> enabled? There are quite few obvious improvements to the analysis that can
>> be done, but I guess we need to analyze the interesting cases one by one.
> 
> Gold ICF was enabled (-Wl,--icf=all,--icf-iterations=3).
> 

Hi.

Thank you Markus for presenting numbers, it corresponds with I measured. If I 
see correctly, IPA ICF pass takes about 7 seconds,
the rest is distributed in verifier (not interesting for release version of the 
compiler) and 'phase opt and generate'. No idea
what can make the difference?

Martin

Reply via email to