> > I am giving the patch brief benchmarking on profiledbootstrap and it it > > won't > > cause major regression, I think we should go ahead with the patch.
Uhm, I profiledbootstrapped and we bit too fast to get resonable oprofile. What I get is: 7443 9.4372 lto1 lto1 lto_end_uncompression(lto_compression_stream*) 4438 5.6271 lto1 lto1 _ZL14DFS_write_treeP12output_blockP4sccsP9tree_nodebb.lto_priv.4993 2351 2.9809 lto1 lto1 lto_output_tree(output_block*, tree_node*, bool, bool) 2179 2.7628 lto1 lto1 _ZL30linemap_macro_loc_to_exp_pointP9line_mapsjPPK8line_map.lto_priv.7860 1910 2.4217 lto1 lto1 _ZL19unpack_value_fieldsP7data_inP9bitpack_dP9tree_node.lto_priv.7292 1855 2.3520 libc-2.11.1.so libc-2.11.1.so msort_with_tmp 1531 1.9412 lto1 lto1 streamer_string_index(output_block*, char const*, unsigned int, bool) 1530 1.9399 libc-2.11.1.so libc-2.11.1.so _int_malloc 1471 1.8651 lto1 lto1 do_estimate_growth(cgraph_node*) 1306 1.6559 lto1 lto1 pointer_map_insert(pointer_map_t*, void const*) 1238 1.5697 lto1 lto1 _Z28streamer_pack_tree_bitfieldsP12output_blockP9bitpack_dP9tree_node.constprop.1086 1138 1.4429 lto1 lto1 compare_tree_sccs_1(tree_node*, tree_node*, tree_node***) 1082 1.3719 lto1 lto1 streamer_write_tree_body(output_block*, tree_node*, bool) 1044 1.3237 lto1 lto1 _ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_j3vecIP9tree_node7va_heap6vl_ptrES7_S2_IP21ipa_agg_jump_function We take 12 seconds of WPA on GCC (with my fork patch) Execution times (seconds) phase setup : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 1412 kB ( 0%) ggc phase opt and generate : 4.48 (37%) usr 0.05 ( 6%) sys 4.57 (34%) wall 42983 kB ( 7%) ggc phase stream in : 7.21 (60%) usr 0.26 (32%) sys 7.47 (56%) wall 565102 kB (93%) ggc phase stream out : 0.38 ( 3%) usr 0.50 (62%) sys 1.37 (10%) wall 623 kB ( 0%) ggc callgraph optimization : 0.04 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 6 kB ( 0%) ggc ipa dead code removal : 0.46 ( 4%) usr 0.00 ( 0%) sys 0.46 ( 3%) wall 0 kB ( 0%) ggc ipa cp : 0.36 ( 3%) usr 0.01 ( 1%) sys 0.41 ( 3%) wall 38261 kB ( 6%) ggc ipa inlining heuristics : 2.84 (24%) usr 0.05 ( 6%) sys 2.87 (21%) wall 60263 kB (10%) ggc ipa lto gimple in : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa lto gimple out : 0.04 ( 0%) usr 0.02 ( 2%) sys 0.06 ( 0%) wall 0 kB ( 0%) ggc ipa lto decl in : 6.23 (52%) usr 0.18 (22%) sys 6.40 (48%) wall 425731 kB (70%) ggc ipa lto decl out : 0.09 ( 1%) usr 0.01 ( 1%) sys 0.10 ( 1%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 0.22 ( 2%) usr 0.02 ( 2%) sys 0.25 ( 2%) wall 60840 kB (10%) ggc ipa lto decl merge : 0.20 ( 2%) usr 0.00 ( 0%) sys 0.20 ( 1%) wall 1051 kB ( 0%) ggc ipa lto cgraph merge : 0.22 ( 2%) usr 0.01 ( 1%) sys 0.25 ( 2%) wall 17676 kB ( 3%) ggc whopr wpa : 0.38 ( 3%) usr 0.00 ( 0%) sys 0.35 ( 3%) wall 626 kB ( 0%) ggc whopr wpa I/O : 0.01 ( 0%) usr 0.47 (58%) sys 0.98 ( 7%) wall 0 kB ( 0%) ggc whopr partitioning : 0.18 ( 1%) usr 0.00 ( 0%) sys 0.19 ( 1%) wall 0 kB ( 0%) ggc ipa reference : 0.31 ( 3%) usr 0.01 ( 1%) sys 0.33 ( 2%) wall 0 kB ( 0%) ggc ipa profile : 0.09 ( 1%) usr 0.01 ( 1%) sys 0.10 ( 1%) wall 150 kB ( 0%) ggc ipa pure const : 0.29 ( 2%) usr 0.00 ( 0%) sys 0.30 ( 2%) wall 0 kB ( 0%) ggc tree SSA incremental : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 203 kB ( 0%) ggc tree operand scan : 0.00 ( 0%) usr 0.01 ( 1%) sys 0.00 ( 0%) wall 3512 kB ( 1%) ggc dominance computation : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc unaccounted todo : 0.06 ( 0%) usr 0.01 ( 1%) sys 0.09 ( 1%) wall 0 kB ( 0%) ggc TOTAL : 12.08 0.81 13.43 610123 kB Inliing heuristics was also around 25% w/o your change. Timming maches my experience with firefox - growth estimation tends to be the hot functions, with caching, badness is off the radar. As such I think the patch is safe to go. Thank you! > > > > I was never really happy about the double use there and in fact the whole > > fixed > > point arithmetic in badness compuation is a mess. If we had template based > > fibonaci heap and sreal fast enough, turing it all to reals would save quite > > some maintenance burden. > > Yeah, well. > > Richard. > > > Honza