On 10/30/18 10:46 PM, Jan Hubicka wrote:
> Hi,
> this patch increases lto-partitions to 128.  This makes ltrans.o file sizes to
> grow from 458MB to 651MB which is still not perfect but a lot better than
> prevoiusly.  On firefox the growth is smaller (only about 10%) which is
> probably caused by the "unified build" they use where they merge multiple
> sources via #include to reduce number of objects "only" to about 8000.
> I will do testing w/o unified build this week as well.

Hi.

That sounds promising!

> 
> What is however interesting that even on my 8core 16hyperthread buldozer
> machine this reduces both overall time and user time:
> 
> partitions    real              user               sys      
> 16:           4m25.586s         30m0.760s          0m21.772s
> 32:           4m16.163s         28m58.992s         0m28.996s
> 32:           3m17.889s         28m57.012s         0m29.084s
> 64:           2m55.663s         27m46.344s         0m39.568s
> 64:           2m57.010s         27m48.812s         0m39.192s
> 128:          2m52.978s         27m43.616s         0m47.964s
> 256:          2m54.915s         27m56.324s         1m2.272s 
> 512:          3m2.762s          28m20.696s         1m25.616s
> 512:          3m1.851s          28m20.124s         1m23.812s
> 
> 1to1:         4m34.263s         31m49.760s         1m56.804s
> 
> Firefox actually preffers even more partitions: it seems that ideal size for
> partition memory use is about 80MB which is probably hard to achieve 
> generally.
> I plan to fine tune this at begining of stage3 but I want to increase
> partitioning now so we hit possible negative performance effects earlier.
> 
> WPA stage having some ovbvious bottle necks:
> Time variable                                   usr           sys          
> wall               GGC
>  phase opt and generate             :  39.34 ( 75%)   0.62 (  6%)  39.98 ( 
> 65%)  360751 kB ( 26%)
>  phase stream in                    :  11.88 ( 23%)   0.46 (  5%)  12.36 ( 
> 20%) 1050929 kB ( 74%)
>  ipa function summary               :   0.17 (  0%)   0.03 (  0%)   0.23 (  
> 0%)   68036 kB (  5%)
>  ipa cp                             :   0.83 (  2%)   0.07 (  1%)   0.98 (  
> 2%)  127680 kB (  9%)
>  ipa inlining heuristics            :  30.90 ( 59%)   0.05 (  1%)  30.96 ( 
> 50%)  118731 kB (  8%)
>  lto stream inflate                 :   2.94 (  6%)   0.15 (  2%)   2.95 (  
> 5%)       0 kB (  0%)
>  ipa lto gimple in                  :   1.10 (  2%)   0.32 (  3%)   1.32 (  
> 2%)  162967 kB ( 12%)
>  ipa lto decl in                    :   7.51 ( 14%)   0.18 (  2%)   7.77 ( 
> 13%)  748707 kB ( 53%)
>  whopr partitioning                 :   1.45 (  3%)   0.02 (  0%)   1.48 (  
> 2%)    5451 kB (  0%)
>  ipa icf                            :   2.71 (  5%)   0.07 (  1%)   2.76 (  
> 4%)   12571 kB (  1%)
>  TOTAL                              :  52.15          9.62         61.86      
>   1413731 kB
> 
>  - we may be in position to look for faster compression library (to save 6% 
> of WPA)
>  - icf and profile merging still brings in too many function bodies (to save 
> 12% of GGC memory)

Will take a look at ICF, maybe we can make hash function more fine.

>  - inliner got slower. Reason is twofold. It now spends about 15% in the
>    hashtable mapping summaries to symbol nodes (we used to have an array which
>    was removed by Martin)

Is the problematic one ipa_call_summaries ?

 and we do spend a lot of time in sreal computation.
>    This can be microoptimized + I have some patches to speed it up noticeably
>    by getting functions contextes handled better.

I can also help with that if you guide me.

Martin

>  - I have noticed that ltrans spends absurt amount of time in 
>    lookup_external_ref (up to 20% in large partitions) which may affect the 
> table
>    above in favour of more partitioning.
> 
> Still we could get important wins by reducing amount of decl streaming
> (I will do some tests on simplifing function types, arrays and enums to see
> if there is low hanging fruit left) but we do a lot better than ever brefore.
> 
> Bootstrapped/regtested x86_64-linux, comitted.
> 
> Honza
> 
> 
>       * params.def (lto-partitions): Set to 128 (instead of 32).
> Index: params.def
> ===================================================================
> --- params.def        (revision 265573)
> +++ params.def        (working copy)
> @@ -1103,7 +1103,7 @@ DEFPARAM (PARAM_IPA_MAX_AA_STEPS,
>  DEFPARAM (PARAM_LTO_PARTITIONS,
>         "lto-partitions",
>         "Number of partitions the program should be split to.",
> -       32, 1, 0)
> +       128, 1, 0)
>  
>  DEFPARAM (MIN_PARTITION_SIZE,
>         "lto-min-partition",
> 

Reply via email to