https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102943
--- Comment #41 from Andrew Macleod <amacleod at redhat dot com> --- > > so it's still by far jump-threading/VRP dominating compile-times (I wonder > if we should separate "old" and "new" [E]VRP timevars). Given that VRP > shows up as well it's more likely the underlying ranger infrastructure? Yeah, Id be tempted to just label them vrp1 (evrp) vrp2 (current vrp1) and vrp3 (current vrp2) and track them separately. I have noticed significant behaviour differences from the code we see at VRP2 time vs EVRP. > > perf thrown on ltrans22 shows > > Samples: 302K of event 'cycles', Event count (approx.): 331301505627 > > Overhead Samples Command Shared Object Symbol > > 10.34% 31299 lto1-ltrans lto1 [.] > bitmap_get_aligned_chunk > 7.44% 22540 lto1-ltrans lto1 [.] bitmap_bit_p > 3.17% 9593 lto1-ltrans lto1 [.] > > callgraph info in perf is a mixed bag, but maybe it helps to pinpoint things: > > - 10.20% 10.18% 30364 lto1-ltrans lto1 [.] > bitmap_get_aligned_chunk > # > - 10.18% 0xffffffffffffffff > # > + 9.16% ranger_cache::propagate_cache > # > + 1.01% ranger_cache::fill_block_cache > I am currently looking at reworking the cache again so that the propagation is limited only to actual changes. It can still currently get out of hand in massive CFGs, and thats already using the sparse representation. There may be some minor tweaks that can make a big difference here. I'll have a look over the next couple of days. Its probably safe to assume the threading performance is directly related to this as well. .