On Fri, 21 Feb 2014, Richard Biener wrote:

> 
> This fixes the slowness of RTL expansion in PR60291 which is caused
> by excessive collisions in mem-attr sharing.  The issue here is
> that sharing attempts happens across functions and we have a _lot_
> of functions in this testcase referencing the same lexically
> equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> means those get the same hash value.  But they don't compare
> equal because an SSA name _5 from function A is of course not equal
> to one from function B.
> 
> The following fixes that by not doing mem-attr sharing across functions
> by clearing the mem-attrs hashtable in rest_of_clean_state.
> 
> Another fix may be to do what the comment in iterative_hash_expr
> says for SSA names:
> 
>     case SSA_NAME:
>       /* We can just compare by pointer.  */
>       return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> 
> (probably blame me for changing that to hashing the SSA version).

It was lxo.

> But I'm not sure that doesn't uncover issues with various hashtables and
> walking them, generating code dependent on the order.  It's IMHO just not
> expected that you compare function-local expressions from different
> functions.

Same speedup result from

Index: gcc/tree.c
===================================================================
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
       }
     case SSA_NAME:
       /* We can just compare by pointer.  */
-      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+      return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
     case PLACEHOLDER_EXPR:
       /* The node itself doesn't matter.  */
       return val;

and from

Index: gcc/tree.c
===================================================================
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
       }
     case SSA_NAME:
       /* We can just compare by pointer.  */
-      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+      return iterative_hash_host_wide_int
+             (DECL_UID (cfun->decl),
+              iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
     case PLACEHOLDER_EXPR:
       /* The node itself doesn't matter.  */
       return val;

better than hashing pointers but requring cfun != NULL in this
function isn't good either.

At this point I'm more comfortable with clearing the hashtable
than with changing iterative_hash_expr in any way.  It's also
along the way to get rid of the hash completely.

Oh, btw, the speedup is going from

 expand                  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
wall  293891 kB (15%) ggc

to

 expand                  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
wall  262544 kB (13%) ggc

at -O0 (less dramatic slowness for -On).

> The other thing would be to discard mem-attr sharing alltogether,
> but that doesn't seem appropriate at this stage (but it would
> also simplify quite some code).  With only one function in RTL
> at a time that shouldn't be too bad (see several suggestions
> along that line, even with statistics).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
> trunk and 4.8 branch?
> 
> Thanks,
> Richard.
> 
> 2014-02-21  Richard Biener  <rguent...@suse.de>
> 
>       PR middle-end/60291
>       * rtl.h (clear_mem_attrs_htab): Declare.
>       * emit-rtl.c (clear_mem_attrs_htab): New function.
>       * final.c (rest_of_clean_state): Call clear_mem_attrs_htab
>       to avoid sharing mem-attrs between functions.
> 
> Index: gcc/rtl.h
> ===================================================================
> *** gcc/rtl.h (revision 207960)
> --- gcc/rtl.h (working copy)
> *************** extern int in_sequence_p (void);
> *** 2546,2551 ****
> --- 2546,2552 ----
>   extern void init_emit (void);
>   extern void init_emit_regs (void);
>   extern void init_emit_once (void);
> + extern void clear_mem_attrs_htab (void);
>   extern void push_topmost_sequence (void);
>   extern void pop_topmost_sequence (void);
>   extern void set_new_first_and_last_insn (rtx, rtx);
> Index: gcc/emit-rtl.c
> ===================================================================
> *** gcc/emit-rtl.c    (revision 207960)
> --- gcc/emit-rtl.c    (working copy)
> *************** init_emit_once (void)
> *** 5913,5918 ****
> --- 5913,5926 ----
>     simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
>     cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
>   }
> + 
> + /* Clear the mem-attrs sharing hash table.  */
> + 
> + void
> + clear_mem_attrs_htab (void)
> + {
> +   htab_empty (mem_attrs_htab);
> + }
>   
>   /* Produce exact duplicate of insn INSN after AFTER.
>      Care updating of libcall regions if present.  */
> Index: gcc/final.c
> ===================================================================
> *** gcc/final.c       (revision 207960)
> --- gcc/final.c       (working copy)
> *************** rest_of_clean_state (void)
> *** 4678,4683 ****
> --- 4678,4686 ----
>   
>     init_recog_no_volatile ();
>   
> +   /* Reset mem-attrs sharing.  */
> +   clear_mem_attrs_htab ();
> + 
>     /* We're done with this function.  Free up memory if we can.  */
>     free_after_parsing (cfun);
>     free_after_compilation (cfun);
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

Reply via email to