https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943

--- Comment #41 from Andrew Macleod <amacleod at redhat dot com> ---

> 
> so it's still by far jump-threading/VRP dominating compile-times (I wonder
> if we should separate "old" and "new" [E]VRP timevars).  Given that VRP
> shows up as well it's more likely the underlying ranger infrastructure?

Yeah, Id be tempted to just label them vrp1 (evrp)  vrp2 (current vrp1)  and
vrp3 (current vrp2) and track them separately.  I have noticed significant
behaviour differences from the code we see at VRP2 time vs EVRP.


> 
> perf thrown on ltrans22 shows
> 
> Samples: 302K of event 'cycles', Event count (approx.): 331301505627        
> 
> Overhead       Samples  Command      Shared Object     Symbol               
> 
>   10.34%         31299  lto1-ltrans  lto1              [.]
> bitmap_get_aligned_chunk
>    7.44%         22540  lto1-ltrans  lto1              [.] bitmap_bit_p
>    3.17%          9593  lto1-ltrans  lto1              [.]

> 
> callgraph info in perf is a mixed bag, but maybe it helps to pinpoint things:
> 
> -   10.20%    10.18%         30364  lto1-ltrans  lto1              [.]
> bitmap_get_aligned_chunk                                                    
> #
>    - 10.18% 0xffffffffffffffff                                              
> #
>       + 9.16% ranger_cache::propagate_cache                                 
> #
>       + 1.01% ranger_cache::fill_block_cache               
> 

I am currently looking at reworking the cache again so that the propagation is
limited only to actual changes.  It can still currently get out of hand in
massive CFGs, and thats already using the sparse representation.  There may be
some minor tweaks that can make a big difference here.    I'll have a look over
the next couple of days.

Its probably safe to assume the threading performance is directly related to
this as well.


.

Reply via email to