https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for the alternate testcase we have
Kind Nodes Bytes
ssa names 753988 54287136
GIMPLE statements
Kind Stmts Bytes
phi nodes 747999 381049752
Bitmaps Leak Peak
Times N searches Search iter Type
tree-ssa-coalesce.c:705 (new_live_track) 91800: 0.0% 91800
6469037: 1.8% 1238519 248500 heap
tree-ssa-coalesce.c:586 (ssa_conflicts_add_one) 9916176: 3.5%4977905016
341148023: 94.8% 364290728 601796535 heap
tree-ssa-live.c:937 (new_tree_live_info) 119177600: 42.5% 119177600
2979440: 0.8% 2230467 0 heap
tree-ssa-live.c:938 (new_tree_live_info) 149077680: 53.1% 149077680
4476926: 1.2% 4467438 1379139 heap
as expected the conflict bitmaps are the issue.
We can shave off some more compile-time by noticing the asymmetry in
live_track_process_def and use bitmap_ior_into for one half. That should
be more cache friendly as well.
before:
> /usr/bin/time ./cc1 -quiet t2.c
38.98user 1.40system 0:40.39elapsed 99%CPU (0avgtext+0avgdata
5626832maxresident)k
0inputs+440outputs (0major+1448898minor)pagefaults 0swaps
after:
> /usr/bin/time ./cc1 -quiet t2.c
35.81user 1.48system 0:37.30elapsed 99%CPU (0avgtext+0avgdata
5628100maxresident)k
0inputs+440outputs (0major+1440532minor)pagefaults 0swaps
not as much as with the previous patch but still...
Moves the above to
tree-ssa-coalesce.c:812 (live_track_process_def) 9896264: 3.5%4958012960
339784598: 94.4% 301208226 660769345 heap
and then we can of course avoid touching bitmaps with already recorded
conflicts. Doing that improves things to
33.18user 1.43system 0:34.62elapsed 99%CPU (0avgtext+0avgdata
5626660maxresident)k
0inputs+440outputs (0major+1390642minor)pagefaults 0swaps
tree-ssa-coalesce.c:803 (live_track_process_def) 19912: 0.0% 19892656
1363425: 0.4% 960492 1915158 heap
tree-ssa-coalesce.c:810 (live_track_process_def) 9896264: 3.5%4958012960
339784598: 94.4% 301208226 660769345 heap
note it all seems to be a bit noisy in the end...
@@ -818,8 +794,20 @@ live_track_process_def (live_track *ptr,
if (bitmap_bit_p (ptr->live_base_var, root))
{
b = ptr->live_base_partitions[root];
- EXECUTE_IF_SET_IN_BITMAP (b, 0, x, bi)
- ssa_conflicts_add (graph, p, x);
+ bitmap bp = graph->conflicts[p];
+ if (!bp)
+ bp = graph->conflicts[p] = BITMAP_ALLOC (&graph->obstack);
+ /* Add a conflict to P to each live base partition. */
+ EXECUTE_IF_AND_COMPL_IN_BITMAP (b, bp, 0, x, bi)
+ {
+ gcc_checking_assert (x != (unsigned)p);
+ bitmap bx = graph->conflicts[x];
+ if (!bx)
+ bx = graph->conflicts[x] = BITMAP_ALLOC (&graph->obstack);
+ bitmap_set_bit (bx, p);
+ }
+ /* Add conflicts to each live base partition to P. */
+ bitmap_ior_into (bp, b);
}
}
ideally we'd be able to combine the ior_into with the EXECUTE_IF.. walk.
Manually jump-threading the case of !bp might be worth it as well, since
we are micro-optimizing this...