On Mon, May 15, 2023 at 5:03 PM Aldy Hernandez <al...@redhat.com> wrote: > > > > On 5/15/23 13:08, Richard Biener wrote: > > On Mon, May 15, 2023 at 12:35 PM Aldy Hernandez <al...@redhat.com> wrote: > >> > >> <tldr> > >> We can now have int_range<N, RESIZABLE=false> for automatically > >> resizable ranges. int_range_max is now int_range<3, true> > >> for a 69X reduction in size from current trunk, and 6.9X reduction from > >> GCC12. This incurs a 5% performance penalty for VRP that is more than > >> covered by our > 13% improvements recently. > >> </tldr> > >> > >> int_range_max is the temporary range object we use in the ranger for > >> integers. With the conversion to wide_int, this structure bloated up > >> significantly because wide_ints are huge (80 bytes a piece) and are > >> about 10 times as big as a plain tree. Since the temporary object > >> requires 255 sub-ranges, that's 255 * 80 * 2, plus the control word. > >> This means the structure grew from 4112 bytes to 40912 bytes. > >> > >> This patch adds the ability to resize ranges as needed, defaulting to > >> no resizing, while int_range_max now defaults to 3 sub-ranges (instead > >> of 255) and grows to 255 when the range being calculated does not fit. > >> > >> For example: > >> > >> int_range<1> foo; // 1 sub-range with no resizing. > >> int_range<5> foo; // 5 sub-ranges with no resizing. > >> int_range<5, true> foo; // 5 sub-ranges with resizing. > >> > >> I ran some tests and found that 3 sub-ranges cover 99% of cases, so > >> I've set the int_range_max default to that: > >> > >> typedef int_range<3, /*RESIZABLE=*/true> int_range_max; > >> > >> We don't bother growing incrementally, since the default covers most > >> cases and we have a 255 hard-limit. This hard limit could be reduced > >> to 128, since my tests never saw a range needing more than 124, but we > >> could do that as a follow-up if needed. > >> > >> With 3-subranges, int_range_max is now 592 bytes versus 40912 for > >> trunk, and versus 4112 bytes for GCC12! The penalty is 5.04% for VRP > >> and 3.02% for threading, with no noticeable change in overall > >> compilation (0.27%). This is more than covered by our 13.26% > >> improvements for the legacy removal + wide_int conversion. > > > > Thanks for doing this. > > > >> I think this approach is a good alternative, while providing us with > >> flexibility going forward. For example, we could try defaulting to a > >> 8 sub-ranges for a noticeable improvement in VRP. We could also use > >> large sub-ranges for switch analysis to avoid resizing. > >> > >> Another approach I tried was always resizing. With this, we could > >> drop the whole int_range<N> nonsense, and have irange just hold a > >> resizable range. This simplified things, but incurred a 7% penalty on > >> ipa_cp. This was hard to pinpoint, and I'm not entirely convinced > >> this wasn't some artifact of valgrind. However, until we're sure, > >> let's avoid massive changes, especially since IPA changes are coming > >> up. > >> > >> For the curious, a particular hot spot for IPA in this area was: > >> > >> ipcp_vr_lattice::meet_with_1 (const value_range *other_vr) > >> { > >> ... > >> ... > >> value_range save (m_vr); > >> m_vr.union_ (*other_vr); > >> return m_vr != save; > >> } > >> > >> The problem isn't the resizing (since we do that at most once) but the > >> fact that for some functions with lots of callers we end up a huge > >> range that gets copied and compared for every meet operation. Maybe > >> the IPA algorithm could be adjusted somehow??. > > > > Well, the above just wants to know whether the union_ operation changed > > the range. I suppose that would be an interesting (and easy to compute?) > > secondary output of union_ and it seems it already computes that (but > > maybe not correctly?). So I suggest to change the above to > > union_ returns a value specifically for that, which Andrew uses for > cache optimization. For that matter, your suggestion was my first > approach, but I quickly found out we were being overly pessimistic in > some cases, and I was too lazy to figure out why. > > > > > bool res; > > if (flag_checking) > > { > > value_range save (m_vr); > > res = m_vr.union_ (*other_vr); > > gcc_assert (res == (m_vr != save)); > > } > > else > > res = m_vr.union (*other_vr); > > return res; > > With your suggested sanity check I chased the problem to a minor > inconsistency when unioning nonzero masks. The issue wasn't a bug, just > a pessimization. I'm attaching a patch that corrects the oversight > (well, not oversight, everything was more expensive with trees)... It > yields a 6.89% improvement to the ipa-cp pass!!! Thanks. > > I'll push it if it passes tests.
Tests passed. Pushed patch. I've also pushed the original patch in this email. We can address anything else as a follow-up. Thanks for everyone's feedback. Aldy