http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57706
Bug ID: 57706
Summary: LRA is bottleneck while compiling LTO firefox
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
One of ltrans partitions wihle building firefox gets stuck with the following
profile:
CPU: AMD64 family10, speed 2100 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask
of 0x00 (No unit mask) count 750000
samples % image name app name symbol name
84432 27.1889 lto1 lto1
ggc_internal_alloc_stat(unsigned long)
5490 1.7679 libc-2.11.1.so libc-2.11.1.so _int_malloc
4746 1.5283 lto1 lto1
bitmap_set_bit(bitmap_head_def*, int)
4155 1.3380 libc-2.11.1.so libc-2.11.1.so memset
3190 1.0272 lto1 lto1
hash_table_mod1(unsigned int, unsigned int)
3029 0.9754 lto1 lto1
for_each_rtx_1(rtx_def*, int, int (*)(rtx_def**, void*), void*)
2860 0.9210 lto1 lto1
bitmap_bit_p(bitmap_head_def*, int)
2325 0.7487 lto1 lto1
df_note_compute(bitmap_head_def*)
2173 0.6998 as as hash_lookup
2102 0.6769 lto1 lto1
record_reg_classes(int, int, rtx_def**, machine_mode*, char const**, rtx_def*,
reg_class*)
1859 0.5986 lto1 lto1
constrain_operands(int)
1804 0.5809 lto1 lto1
hash_table<variable_hasher, xcallocator>::find_slot_with_hash(void const*,
unsigned int, insert_option)
1674 0.5391 libc-2.11.1.so libc-2.11.1.so malloc
1660 0.5346 lto1 lto1
operand_equal_p(tree_node const*, tree_node const*, unsigned int)
1653 0.5323 lto1 lto1
htab_find_slot_with_hash
1543 0.4969 libc-2.11.1.so libc-2.11.1.so _int_free
1538 0.4953 lto1 lto1
get_attr_enabled(rtx_def*)
1511 0.4866 lto1 lto1
mem_attrs_eq_p(mem_attrs const*, mem_attrs const*)
1376 0.4431 libc-2.11.1.so libc-2.11.1.so
malloc_consolidate
integrated RA : 57.28 (11%) usr 0.21 ( 3%) sys 57.51 (11%) wall
382450 kB (106%) ggc
LRA non-specific : 5.35 ( 1%) usr 0.02 ( 0%) sys 5.43 ( 1%) wall
24447 kB ( 7%) ggc
LRA virtuals elimination: 0.35 ( 0%) usr 0.01 ( 0%) sys 0.35 ( 0%) wall
8263 kB ( 2%) ggc
LRA reload inheritance : 0.64 ( 0%) usr 0.01 ( 0%) sys 0.78 ( 0%) wall
11556 kB ( 3%) ggc
LRA create live ranges : 1.11 ( 0%) usr 0.00 ( 0%) sys 0.89 ( 0%) wall
2973 kB ( 1%) ggc
LRA hard reg assignment : 166.89 (33%) usr 0.03 ( 0%) sys 166.96 (33%) wall
0 kB ( 0%) ggc
LRA coalesce pseudo regs: 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
0 kB ( 0%) ggc
reload : 0.13 ( 0%) usr 0.01 ( 0%) sys 0.18 ( 0%) wall
0 kB ( 0%) ggc
reload CSE regs : 10.24 ( 2%) usr 0.04 ( 1%) sys 10.31 ( 2%) wall
51758 kB (14%) ggc
load CSE after reload : 2.02 ( 0%) usr 0.01 ( 0%) sys 2.10 ( 0%) wall
185 kB ( 0%) ggc
ree : 0.21 ( 0%) usr 0.02 ( 0%) sys 0.19 ( 0%) wall
696 kB ( 0%) ggc
thread pro- & epilogue : 0.78 ( 0%) usr 0.00 ( 0%) sys 0.76 ( 0%) wall
21050 kB ( 6%) ggc
if-conversion 2 : 0.10 ( 0%) usr 0.02 ( 0%) sys 0.16 ( 0%) wall
214 kB ( 0%) ggc
combine stack adjustments: 0.13 ( 0%) usr 0.02 ( 0%) sys 0.14 ( 0%) wall
0 kB ( 0%) ggc
peephole 2 : 0.77 ( 0%) usr 0.01 ( 0%) sys 0.70 ( 0%) wall
2982 kB ( 1%) ggc
rename registers : 3.87 ( 1%) usr 0.00 ( 0%) sys 3.55 ( 1%) wall
16083 kB ( 4%) ggc
hard reg cprop : 1.61 ( 0%) usr 0.01 ( 0%) sys 1.61 ( 0%) wall
821 kB ( 0%) ggc
scheduling 2 : 11.50 ( 2%) usr 0.03 ( 0%) sys 11.47 ( 2%) wall
15888 kB ( 4%) ggc
machine dep reorg : 1.81 ( 0%) usr 0.01 ( 0%) sys 1.71 ( 0%) wall
590 kB ( 0%) ggc
reorder blocks : 1.26 ( 0%) usr 0.03 ( 0%) sys 1.12 ( 0%) wall
15841 kB ( 4%) ggc
shorten branches : 0.96 ( 0%) usr 0.00 ( 0%) sys 1.13 ( 0%) wall
0 kB ( 0%) ggc
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
69 kB ( 0%) ggc
final : 6.98 ( 1%) usr 0.46 ( 7%) sys 7.09 ( 1%) wall
129826 kB (36%) ggc
variable output : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.17 ( 0%) wall
669 kB ( 0%) ggc
symout : 15.92 ( 3%) usr 0.14 ( 2%) sys 26.70 ( 5%) wall
406238 kB (113%) ggc
variable tracking : 14.50 ( 3%) usr 0.03 ( 0%) sys 14.71 ( 3%) wall
103487 kB (29%) ggc
var-tracking dataflow : 11.07 ( 2%) usr 0.01 ( 0%) sys 10.80 ( 2%) wall
2108 kB ( 1%) ggc
var-tracking emit : 9.11 ( 2%) usr 0.02 ( 0%) sys 9.26 ( 2%) wall
119939 kB (33%) ggc
tree if-combine : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
66 kB ( 0%) ggc
straight-line strength reduction: 0.34 ( 0%) usr 0.01 ( 0%) sys 0.27 (
0%) wall 1583 kB ( 0%) ggc
unaccounted optimizations: 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall
0 kB ( 0%) ggc
rest of compilation : 4.49 ( 1%) usr 1.21 (17%) sys 5.41 ( 1%) wall
56815 kB (16%) ggc
remove unused locals : 0.33 ( 0%) usr 0.00 ( 0%) sys 0.37 ( 0%) wall
17 kB ( 0%) ggc
address taken : 0.23 ( 0%) usr 0.00 ( 0%) sys 0.27 ( 0%) wall
3 kB ( 0%) ggc
unaccounted todo : 2.71 ( 1%) usr 0.42 ( 6%) sys 3.19 ( 1%) wall
225 kB ( 0%) ggc
rebuild frequencies : 0.15 ( 0%) usr 0.00 ( 0%) sys 0.15 ( 0%) wall
18 kB ( 0%) ggc
repair loop structures : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
0 kB ( 0%) ggc
TOTAL : 499.43 7.04 512.56
360511 kB