https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109695

--- Comment #23 from Aldy Hernandez <aldyh at gcc dot gnu.org> ---
An update on the int_range_max memory bloat work.

As Andrew mentioned, having int_range<25> solves the problem, but is just
kicking the can down the road.  I ran some stats on what we actually need on a
bootstrap, and 99.7% of ranges fit in a 3 sub-range range, but we need more to
represent switches, etc.

There's no clear winner for choosing <N>, as the distribution for anything past
<3> is rather random.  What I did see was that at no point do we need more than
125 sub-ranges (on a set of .ii files from a boostrap).

I've implemented various alternatives using a dynamic approach similar to what
we do for auto_vec.  I played with allocating 2x as much as needed, and
allocating 10 or 20 more than needed, as well going from N to 255 in one go. 
All of it required some shuffling to make sure the penalty isn't much wrt
virtuals, etc, but I think the dynamic approach is the way to go.

The question is how much of a performance hit are we willing take in order to
reduce the memory footprint.  Memory to speed is a linear relationship here, so
we just have to pick a number we're happy with.

Here are some numbers for various sub-ranges (the sub-ranges grow automatically
in union, intersect, invert, and assignment, which are the methods that grow in
sub-ranges).

trunk (wide_ints <255>) =>  40912 bytes  
GCC 12 (trees <255>)    =>   4112 bytes
auto_int_range<2>       =>    432 bytes  (5.14% penalty for VRP)
auto_int_range<3>       =>    592 bytes  (4.01% penalty)
auto_int_range<8>       =>   1392 bytes  (2.68% penalty)
auto_int_range<10>      =>   1712 bytes  (2.14% penalty)

As you can see, even at N=10, we're still 24X smaller than trunk and 2.4X
smaller than GCC12 for a 2.14% performance drop.

I'm tempted to just pick a number and tweak this later as we have ultimate
flexibility here.  Plus, we can also revert to a very small N, and have passes
that care about switches use their own temporaries (auto_int_range<20> or
such).

Note that we got a 13.22% improvement for the wide_int+legacy work, so even the
absolute worst case of a 5.14% penalty would have us sitting on a net 8.76%
improvement over GCC12.

Bike shedding welcome ;-)

Reply via email to