Hi!

For a tramp3d -O2 compile we still have

Flat profile:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
5.10 4.53 4.53 129961 0.00 0.00 cgraph_remove_node
2.93 7.13 2.60 28401225 0.00 0.00 ggc_alloc_stat
1.65 8.60 1.47 11131801 0.00 0.00 splay_tree_splay_helper
1.38 9.83 1.23 17375270 0.00 0.00 get_stmt_operands
1.34 11.02 1.19 13906935 0.00 0.00 htab_find_slot_with_hash


with is probably the loop(s) over all clones in cgraph_remove_node which
results in all this being O(n^2) inlining all of the clones.

The obvious fix for the first loop, the removal of the node from the
clones list, is to make this list doubly-linked, there is another loop
just a few lines below with the same complexity:

for (n = *slot; n; n = n->next_clone)
if (n->global.inlined_to
|| (!n->global.inlined_to
&& !TREE_ASM_WRITTEN (n->decl) && !DECL_EXTERNAL (n->decl)))
break;
if (!n && !dump_enabled_p (TDI_tree_all))
{
DECL_SAVED_TREE (node->decl) = NULL;
DECL_STRUCT_FUNCTION (node->decl) = NULL;
DECL_INITIAL (node->decl) = error_mark_node;
}


which checks if we still need the function body. Why is this loop there
in the first place? Wouldn't it be enough to check this if the node is
the last clone? Why do we need the list of clones anyway? It looks like its purpose can be achieved with some reference counting, too?


Any ideas how to fix this remaining O(N^2) complexity for 4.0?

Thanks,
Richard.

Reply via email to