On Tuesday 01 March 2005 01:33, Jan Hubicka wrote: > > On Monday 28 February 2005 10:25, Richard Guenther wrote: > > > > I can only wonder why we are having this discussion just after GCC > > > > 4.0 was branched, while it was obvious already two years ago that > > > > inlining heuristics were going to be a difficult item with tree-ssa. > > > > > > There were of course complaints and discussions about this, and I even > > > tried to tweak inlining parameters once. See the audit trails of > > > PR7863 and PR8704. There were people telling me "well in branch XYZ we > > > do so much better", as always, so I was not encouraged to persue this > > > further. > > > > > > Anyway, I think we should try the patch on mainline and I'll plan to > > > re-submit it together with a 10% lowering of the inlining parameters > > > compared to 3.4 (this is conservative for the mean size change for C > > > code, for C++ we're still too high). I personally cannot afford to do > > > so much testing to please everyone. > > > > I tested your -fobey-inline patch a bit using the test case from PR8361. > > The run was still going after 3 minutes (without the flag it takes 20s) > > so I terminated it and took the following oprofile: > > > > CPU: Hammer, speed 1394.98 MHz (estimated) > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a > > unit mask of 0x00 (No unit mask) count 4000 Counted DATA_CACHE_MISSES > > events (Data cache misses) with a unit mask of 0x00 (No unit mask) count > > 1000 samples % samples % image name symbol > > name 4607300 78.7190 98784 79.4179 cc1plus > > cgraph_remove_edge 861258 14.7152 15308 12.3070 cc1plus > > cgraph_remove_node 60871 1.0400 999 0.8032 cc1plus > > ggc_set_mark 56907 0.9723 2054 1.6513 cc1plus > > cgraph_optimize 36513 0.6239 1132 0.9101 cc1plus > > cgraph_clone_inlined_nodes 29570 0.5052 843 > > 0.6777 cc1plus cgraph_postorder 16187 0.2766 367 > > 0.2951 cc1plus ggc_alloc_stat 7787 0.1330 97 > > 0.0780 cc1plus gt_ggc_mx_cgraph_node 6851 > > 0.1171 138 0.1109 cc1plus cgraph_edge 6671 > > 0.1140 305 0.2452 cc1plus comptypes 5776 > > 0.0987 95 0.0764 cc1plus gt_ggc_mx_cgraph_edge > > 5243 0.0896 93 0.0748 cc1plus > > gt_ggc_mx_lang_tree_node > > > > Honza, it seems the cgraph code needs whipping here. > > I think I can shot down the cgraph_remove_node lazyness by simple > reference counting, but concerning removal of edges, only alternative I > see is going for vectors/doubly linked lists.
Doubly linked lists would mean doubly-doubly-linked list, for the caller and the callee, no? Does not sound attractive. With VECs, on the other hand, you'd only need two integers on the cgraph edges (index in the caller and callee edge vectors). Sounds like the fastest way to do this to me. Of course, I'm high on VECs after the edge-vector-branch work... > I would still expect this > time to be dominated by later inlining/compation explossion so I would > not take that too seriously (unless proved otherwise by > cgraph_remove_edge being top on overall profile ;) After half an hour, I had this: 35490975 82.9876 779873 85.0702 cc1plus cgraph_remove_edge 6706686 15.6821 123588 13.4812 cc1plus cgraph_remove_node and cc1plus was still not into even the tree optimizers by then. So I think we can safely say this is a serious bottleneck. Note that I've seen these cgraph functions show up in less unrealistic test runs also. Gr. Steven