On Tue, Oct 29, 2024 at 3:04 PM Andi Kleen <a...@linux.intel.com> wrote:
>
> On Tue, Oct 29, 2024 at 01:50:57PM +0100, Richard Biener wrote:
> > On Mon, Oct 28, 2024 at 9:58 PM Andi Kleen <a...@linux.intel.com> wrote:
> > >
> > > From: Andi Kleen <a...@gcc.gnu.org>
> > >
> > > The current switch bit test clustering enumerates all possible case
> > > clusters combinations to find ones that fit the bit test constrains
> > > best.  This causes performance problems with very large switches.
> > >
> > > For bit test clustering which happens naturally in word sized chunks
> > > I don't think such an expensive algorithm is really needed.
> > >
> > > This patch implements a simple greedy algorithm that walks
> > > the sorted list and examines word sized windows and tries
> > > to cluster them.
> > >
> > > Surprisingly the new algorithm gives consistly better clusters
> > > for the examples I tried.
> > >
> > > For example from the gcc bootstrap:
> > >
> > > old: 0-15 16-31 96-175
> > > new: 0-31 96-175
> > >
> > > I'm not fully sure why that is, probably some bug in the old
> > > algorithm? This shows even up in the test suite where if-to-switch-6
> > > now can generate a switch, as well as a case in switch-1.c
> > >
> > > I don't have a proof that the new algorithm is always as good or better,
> > > but so far at least I don't see any counter examples.
> > >
> > > It also fixes the excessive compile time in PR117091,
> > > however this was already fixed by an earlier patch
> > > that doesn't run clustering when no targets have multiple
> > > values.
> >
> > OK if you add a comment (as part of the function comment for example)
> > explaining the idea of the algorithm.
>
>
> I added the comment.
>
> I will commit it with this change. I also had to add a few more
> -fno-bit-tests to make the Linaro tester happy.

Those should have been xfailed instead of adding -fno-bit-tests.

>
> However this exposes PR117352 which is a negative interaction of the
> more aggressive bit test conversion.  I don't think it's a show stopper,
> this can be sorted out later.

I think it is a show stopper for GCC 15 because it is a pretty big
performance regression with targets that have ccmp (which now includes
x86_64).

Thanks,
Andrew

>
> -Andi

Reply via email to