On Tue, Oct 29, 2024 at 3:04 PM Andi Kleen <a...@linux.intel.com> wrote: > > On Tue, Oct 29, 2024 at 01:50:57PM +0100, Richard Biener wrote: > > On Mon, Oct 28, 2024 at 9:58 PM Andi Kleen <a...@linux.intel.com> wrote: > > > > > > From: Andi Kleen <a...@gcc.gnu.org> > > > > > > The current switch bit test clustering enumerates all possible case > > > clusters combinations to find ones that fit the bit test constrains > > > best. This causes performance problems with very large switches. > > > > > > For bit test clustering which happens naturally in word sized chunks > > > I don't think such an expensive algorithm is really needed. > > > > > > This patch implements a simple greedy algorithm that walks > > > the sorted list and examines word sized windows and tries > > > to cluster them. > > > > > > Surprisingly the new algorithm gives consistly better clusters > > > for the examples I tried. > > > > > > For example from the gcc bootstrap: > > > > > > old: 0-15 16-31 96-175 > > > new: 0-31 96-175 > > > > > > I'm not fully sure why that is, probably some bug in the old > > > algorithm? This shows even up in the test suite where if-to-switch-6 > > > now can generate a switch, as well as a case in switch-1.c > > > > > > I don't have a proof that the new algorithm is always as good or better, > > > but so far at least I don't see any counter examples. > > > > > > It also fixes the excessive compile time in PR117091, > > > however this was already fixed by an earlier patch > > > that doesn't run clustering when no targets have multiple > > > values. > > > > OK if you add a comment (as part of the function comment for example) > > explaining the idea of the algorithm. > > > I added the comment. > > I will commit it with this change. I also had to add a few more > -fno-bit-tests to make the Linaro tester happy.
Those should have been xfailed instead of adding -fno-bit-tests. > > However this exposes PR117352 which is a negative interaction of the > more aggressive bit test conversion. I don't think it's a show stopper, > this can be sorted out later. I think it is a show stopper for GCC 15 because it is a pretty big performance regression with targets that have ccmp (which now includes x86_64). Thanks, Andrew > > -Andi