On Tuesday 01 March 2005 01:33, Jan Hubicka wrote:
> > On Monday 28 February 2005 10:25, Richard Guenther wrote:
> > > > I can only wonder why we are having this discussion just after GCC
> > > > 4.0 was branched, while it was obvious already two years ago that
> > > > inlining heuristics were going to be a difficult item with tree-ssa.
> > >
> > > There were of course complaints and discussions about this, and I even
> > > tried to tweak inlining parameters once.  See the audit trails of
> > > PR7863 and PR8704.  There were people telling me "well in branch XYZ we
> > > do so much better", as always, so I was not encouraged to persue this
> > > further.
> > >
> > > Anyway, I think we should try the patch on mainline and I'll plan to
> > > re-submit it together with a 10% lowering of the inlining parameters
> > > compared to 3.4 (this is conservative for the mean size change for C
> > > code, for C++ we're still too high).  I personally cannot afford to do
> > > so much testing to please everyone.
> >
> > I tested your -fobey-inline patch a bit using the test case from PR8361.
> > The run was still going after 3 minutes (without the flag it takes 20s)
> > so I terminated it and took the following oprofile:
> >
> > CPU: Hammer, speed 1394.98 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
> > unit mask of 0x00 (No unit mask) count 4000 Counted DATA_CACHE_MISSES
> > events (Data cache misses) with a unit mask of 0x00 (No unit mask) count
> > 1000 samples  %        samples  %        image name               symbol
> > name 4607300  78.7190  98784    79.4179  cc1plus                 
> > cgraph_remove_edge 861258   14.7152  15308    12.3070  cc1plus           
> >       cgraph_remove_node 60871     1.0400  999       0.8032  cc1plus     
> >             ggc_set_mark 56907     0.9723  2054      1.6513  cc1plus     
> >             cgraph_optimize 36513     0.6239  1132      0.9101  cc1plus  
> >                cgraph_clone_inlined_nodes 29570     0.5052  843      
> > 0.6777  cc1plus                  cgraph_postorder 16187     0.2766  367  
> >     0.2951  cc1plus                  ggc_alloc_stat 7787      0.1330  97 
> >       0.0780  cc1plus                  gt_ggc_mx_cgraph_node 6851     
> > 0.1171  138       0.1109  cc1plus                  cgraph_edge 6671     
> > 0.1140  305       0.2452  cc1plus                  comptypes 5776     
> > 0.0987  95        0.0764  cc1plus                  gt_ggc_mx_cgraph_edge
> > 5243      0.0896  93        0.0748  cc1plus                 
> > gt_ggc_mx_lang_tree_node
> >
> > Honza, it seems the cgraph code needs whipping here.
>
> I think I can shot down the cgraph_remove_node lazyness by simple
> reference counting, but concerning removal of edges, only alternative I
> see is going for vectors/doubly linked lists.

Doubly linked lists would mean doubly-doubly-linked list, for the caller
and the callee, no?  Does not sound attractive.  With VECs, on the other
hand, you'd only need two integers on the cgraph edges (index in the
caller and callee edge vectors).  Sounds like the fastest way to do this
to me.  Of course, I'm high on VECs after the edge-vector-branch work...

> I would still expect this 
> time to be dominated by later inlining/compation explossion so I would
> not take that too seriously (unless proved otherwise by
> cgraph_remove_edge being top on overall profile ;)

After half an hour, I had this:

35490975 82.9876  779873   85.0702  cc1plus                cgraph_remove_edge
6706686  15.6821  123588   13.4812  cc1plus                cgraph_remove_node

and cc1plus was still not into even the tree optimizers by then.
So I think we can safely say this is a serious bottleneck.

Note that I've seen these cgraph functions show up in less unrealistic
test runs also.

Gr.
Steven

Reply via email to