Re: [PATCH][2/2] Fix expansion slowness of PR60291

Richard Sandiford Fri, 21 Feb 2014 04:50:30 -0800

Richard Biener <rguent...@suse.de> writes:
> On Fri, 21 Feb 2014, Richard Biener wrote:
>
>> On Fri, 21 Feb 2014, Richard Biener wrote:
>> 
>> > On Fri, 21 Feb 2014, Richard Biener wrote:
>> > 
>> > > 
>> > > This fixes the slowness of RTL expansion in PR60291 which is caused
>> > > by excessive collisions in mem-attr sharing.  The issue here is
>> > > that sharing attempts happens across functions and we have a _lot_
>> > > of functions in this testcase referencing the same lexically
>> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
>> > > means those get the same hash value.  But they don't compare
>> > > equal because an SSA name _5 from function A is of course not equal
>> > > to one from function B.
>> > > 
>> > > The following fixes that by not doing mem-attr sharing across functions
>> > > by clearing the mem-attrs hashtable in rest_of_clean_state.
>> > > 
>> > > Another fix may be to do what the comment in iterative_hash_expr
>> > > says for SSA names:
>> > > 
>> > >     case SSA_NAME:
>> > >       /* We can just compare by pointer.  */
>> > >       return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > > 
>> > > (probably blame me for changing that to hashing the SSA version).
>> > 
>> > It was lxo.
>> > 
>> > > But I'm not sure that doesn't uncover issues with various hashtables and
>> > > walking them, generating code dependent on the order.  It's IMHO just not
>> > > expected that you compare function-local expressions from different
>> > > functions.
>> > 
>> > Same speedup result from
>> > 
>> > Index: gcc/tree.c
>> > ===================================================================
>> > --- gcc/tree.c  (revision 207960)
>> > +++ gcc/tree.c  (working copy)
>> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
>> >        }
>> >      case SSA_NAME:
>> >        /* We can just compare by pointer.  */
>> > -      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > +      return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
>> >      case PLACEHOLDER_EXPR:
>> >        /* The node itself doesn't matter.  */
>> >        return val;
>> > 
>> > and from
>> > 
>> > Index: gcc/tree.c
>> > ===================================================================
>> > --- gcc/tree.c  (revision 207960)
>> > +++ gcc/tree.c  (working copy)
>> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
>> >        }
>> >      case SSA_NAME:
>> >        /* We can just compare by pointer.  */
>> > -      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
>> > +      return iterative_hash_host_wide_int
>> > +             (DECL_UID (cfun->decl),
>> > +              iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
>> >      case PLACEHOLDER_EXPR:
>> >        /* The node itself doesn't matter.  */
>> >        return val;
>> > 
>> > better than hashing pointers but requring cfun != NULL in this
>> > function isn't good either.
>> > 
>> > At this point I'm more comfortable with clearing the hashtable
>> > than with changing iterative_hash_expr in any way.  It's also
>> > along the way to get rid of the hash completely.
>> > 
>> > Oh, btw, the speedup is going from
>> > 
>> >  expand                  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
>> > wall  293891 kB (15%) ggc
>> > 
>> > to
>> > 
>> >  expand                  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
>> > wall  262544 kB (13%) ggc
>> > 
>> > at -O0 (less dramatic slowness for -On).
>> > 
>> > > The other thing would be to discard mem-attr sharing alltogether,
>> > > but that doesn't seem appropriate at this stage (but it would
>> > > also simplify quite some code).  With only one function in RTL
>> > > at a time that shouldn't be too bad (see several suggestions
>> > > along that line, even with statistics).
>> 
>> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html
>
> With the patch below to get some statistics we see that one important
> piece of sharing not covered by above measurements is RTX copying(?).
>
> On the testcase for this PR I get at -O1 and without the patch
> to clear the hashtable after each function
>
> 142489 mem_attrs created (142439 for new, 50 for modification)
> 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
>
> and with the patch to clear after each function
>
> 364411 mem_attrs created (144478 for new, 219933 for modification)
> 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 
> by rtx copying)
> 0 mem_attrs dropped
>
> while for dwarf2out.c I see without the clearing
>
> 24399 mem_attrs created (6929 for new, 17470 for modification)
> 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by 
> rtx copying)
> 16 mem_attrs dropped
>
> which means that completely dropping the sharing would result
> in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs.
> That's still not a lot overhead given that mem-attrs take 40 bytes
> (3MB vs. 950kB).  There is also always the possibility to
> explicitely ref-count mem-attrs to handle sharing by rtx
> copying (at least cse, fwprop, combine, ira and reload copy MEMs,
> probably some for no good reason because MEMs are not shared),
> thus make mem-attrs unshare-on-modify.


In a thread a few years ago you talked about the possibility of going
further and folding the attributes into the MEM itself, so avoiding
the indirection and separate allocation:

  http://thread.gmane.org/gmane.comp.gcc.patches/244464/focus=244538

(and earlier posts in the thread).  Would that still be OK?
I might have a go if so.

Thanks,
Richard

Re: [PATCH][2/2] Fix expansion slowness of PR60291

Reply via email to