On 3/13/20 4:11 PM, Jan Hubicka wrote:
$ time g++ -O2 /tmp/gimple-match.ii -c -flto -fno-checking
real    0m8.709s
user    0m8.543s

WPA+LTRANS:

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o 
gimple-match2.o --param lto-partitions=4  -fno-checking
real    0m11.220s
user    0m33.067s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o 
gimple-match2.o --param lto-partitions=6  -fno-checking
real    0m9.880s
user    0m35.599s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o 
gimple-match2.o --param lto-partitions=8  -fno-checking
real    0m6.681s
user    0m39.746s

default:
$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o  -r -o 
gimple-match2.o -fno-checking
real    0m6.065s
user    1m22.698s

I did
/aux/hubicka/trunk-git/build2/./prev-gcc/xg++ 
-B/aux/hubicka/trunk-git/build2/./prev-gcc/ 
-B/usr/local/x86_64-pc-linux-gnu/bin/ -nostdinc++ 
-B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-B/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
 
-I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu
 -I/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/include 
-I/aux/hubicka/trunk-git/libstdc++-v3/libsupc++ 
-L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/src/.libs 
-L/aux/hubicka/trunk-git/build2/prev-x86_64-pc-linux-gnu/libstdc++-v3/libsupc++/.libs
 -fno-PIE -c   -g -O2 -fchecking=0  -DIN_GCC     -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
-I../../gcc -I../../gcc/.  -I../../gcc/../include -I../../gcc/../libcpp/include 
-I/aux/hubicka/trunk-git/build2/./gmp -I/aux/hubicka/trunk-git/gmp 
-I/aux/hubicka/trunk-git/build2/./mpfr/src -I/aux/hubicka/trunk-git/mpfr/src 
-I/aux/hubicka/trunk-git/mpc/src -I../../gcc/../libdecnumber 
-I../../gcc/../libdecnumber/bid -I../libdecnumber -I../../gcc/../libbacktrace 
-I/aux/hubicka/trunk-git/build2/./isl/include 
-I/aux/hubicka/trunk-git/isl/include  -o gimple-match.o -MT gimple-match.o -MMD 
-MP -MF ./.deps/gimple-match.TPo gimple-match.c -flto

(copying from build disabling checking and adding -flto) and I get:
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time 
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel 
gimple-match.o -fno-checking --param lto-partitions=128 -r

real    0m10.394s
user    2m13.809s
sys     0m3.896s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time 
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel 
gimple-match.o -fno-checking --param lto-partitions=8 -r

real    0m21.033s
user    2m3.063s
sys     0m2.539s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time 
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel 
gimple-match.o -fno-checking --param lto-partitions=6 -r

real    0m23.975s
user    1m56.139s
sys     0m2.595s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time 
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel 
gimple-match.o -fno-checking --param lto-partitions=4 -r

real    0m32.383s
user    1m39.411s
sys     0m2.213s

With debug info disabled (like you do, but I guess in less realistic
setting) I get:

hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=128 -r

real    0m10.905s
user    1m55.065s
sys     0m2.956s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=8 -r

real    0m17.297s
user    1m26.513s
sys     0m1.626s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=6 -r

real    0m22.365s
user    1m30.969s
sys     0m1.386s
hubicka@lomikamen-jh:/aux/hubicka/trunk-git/build2/gcc$ time
/aux/hubicka/trunk-install/bin/gcc -flto=auto -flinker-output=nolto-rel
gimple-match.o -fno-checking --param lto-partitions=4 -r

real    0m26.534s
user    1m21.593s
sys     0m0.902s

So I do not see such notable idfference in user times (but they are
consistently worse than yours). Perhaps, can you try to perf it
including the system profile? It may give us some idea why things behave
differently.

That's strange. So let's take my gimple-match.ii:
https://drive.google.com/file/d/1B8d3bIvz1KA_ksIo8h-JgkaJTCRiSPR4/view?usp=sharing

For gcc9 package (LTO+PGO) I get:

$ time g++ -O2 gimple-match.ii -c -flto
real    0m8.180s
user    0m7.992s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking 
--param lto-partitions=4 -r

real    0m9.041s
user    0m28.157s
sys     0m0.493s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking 
--param lto-partitions=128 -r

real    0m6.011s
user    1m20.326s
sys     0m2.147s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking -r

real    0m6.303s
user    1m18.789s
sys     0m2.244s

$ time gcc -flto=auto -flinker-output=nolto-rel gimple-match.o -fno-checking 
--param lto-partitions=8 -r

real    0m5.875s
user    0m38.938s
sys     0m0.784s

For default I get:

perf report --stdio | head -n30
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 351K of event 'cycles:u'
# Event count (approx.): 341558047686
#
# Overhead  Command          Shared Object                Symbol
# ........  ...............  ...........................  
............................................................................
#
     3.61%  lto1-ltrans      lto1                         [.] 
df_worklist_dataflow
     1.93%  lto1-ltrans      lto1                         [.] cleanup_cfg
     1.15%  lto1-ltrans      lto1                         [.] 
init_alias_analysis
     1.02%  lto1-ltrans      lto1                         [.] 
pre_and_rev_post_order_compute_fn
     0.93%  lto1-ltrans      lto1                         [.] 
calculate_dominance_info
     0.84%  lto1-ltrans      lto1                         [.] 
inverted_post_order_compute
     0.75%  lto1-ltrans      lto1                         [.] post_order_compute
     0.71%  lto1-ltrans      libc-2.31.so                 [.] _int_malloc
     0.69%  lto1-ltrans      lto1                         [.] constrain_operands
     0.68%  lto1-ltrans      lto1                         [.] df_bb_refs_record
     0.59%  lto1-ltrans      lto1                         [.] side_effects_p
     0.53%  lto1-ltrans      lto1                         [.] 
delete_unreachable_blocks
     0.53%  lto1-ltrans      lto1                         [.] 
rewrite_update_dom_walker::before_dom_children
     0.49%  lto1-ltrans      lto1                         [.] bitmap_set_bit
     0.47%  lto1-ltrans      lto1                         [.] 
record_temporary_equivalences
     0.46%  lto1-ltrans      lto1                         [.] 
single_def_use_dom_walker::before_dom_children
     0.46%  lto1-ltrans      lto1                         [.] df_compact_blocks
     0.45%  lto1-ltrans      lto1                         [.] 
substitute_and_fold_engine::substitute_and_fold
     0.45%  lto1-ltrans      libc-2.31.so                 [.] _int_free


Martin


Compiler binary I use is profiledbootstrapped with LTO.

Honza

So I would recommend to set the param value to 75000, which leads to 6 
partitions. That would be:

9+10s = 19s vs. 40s (total real time 44s). That seems reasonable to me.

Thoughts?
Thanks,
Martin

gcc/ChangeLog:

2020-03-13  Martin Liska  <mli...@suse.cz>

        * params.opt: Bump min-lto-partition in order to not create
        too many LTRANS.
---
  gcc/params.opt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



diff --git a/gcc/params.opt b/gcc/params.opt
index e39216aa7d0..49fafac20af 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -363,7 +363,7 @@ Common Joined UInteger 
Var(param_max_lto_streaming_parallelism) Init(32) Integer
  maximal number of LTO partitions streamed in parallel.
-param=lto-min-partition=
-Common Joined UInteger Var(param_min_partition_size) Init(10000) Param
+Common Joined UInteger Var(param_min_partition_size) Init(75000) Param
  Minimal size of a partition for LTO (in estimated instructions).
-param=lto-partitions=



Reply via email to