Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

Dmitrij Pochepko Mon, 09 Sep 2019 11:55:00 -0700

Hi,

thank you for looking into it.


On Fri, Sep 06, 2019 at 12:23:40PM +0200, Richard Biener wrote:
> On Thu, Sep 5, 2019 at 5:35 PM Dmitrij Pochepko
> <dmitrij.poche...@bell-sw.com> wrote:
> >
> > This patch adds matching for Hamming weight (popcount) implementation. The 
> > following sources:
> >
> > int
> > foo64 (unsigned long long a)
> > {
> >     unsigned long long b = a;
> >     b -= ((b>>1) & 0x5555555555555555ULL);
> >     b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL);
> >     b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
> >     b *= 0x0101010101010101ULL;
> >     return (int)(b >> 56);
> > }
> >
> > and
> >
> > int
> > foo32 (unsigned int a)
> > {
> >     unsigned long b = a;
> >     b -= ((b>>1) & 0x55555555UL);
> >     b = ((b>>2) & 0x33333333UL) + (b & 0x33333333UL);
> >     b = ((b>>4) + b) & 0x0F0F0F0FUL;
> >     b *= 0x01010101UL;
> >     return (int)(b >> 24);
> > }
> >
> > and equivalents are now recognized as popcount for platforms with hw 
> > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and 
> > aarch64-linux-gnu systems with no regressions.
> >
> > (I have no write access to repo)
> 
> +(simplify
> +  (convert
> +    (rshift
> +      (mult
> 
> is the outer convert really necessary?  That is, if we change
> the simplification result to
> 
>  (convert (BUILT_IN_POPCOUNT @0))
> 
> wouldn't that be correct as well?

Yes, this is better. I fixed it in the new version.

> 
> Is the Hamming weight popcount
> faster than the libgcc table-based approach?  I wonder if we really
> need to restrict this conversion to the case where the target
> has an expander.
> 
> +      (mult
> +       (bit_and:c
> 
> this doesn't need :c (second operand is a constant).

Yes. Agree, this is redundant.

> 
> +       int shift = TYPE_PRECISION (long_long_unsigned_type_node) - prec;
> +       const unsigned long long c1 = 0x0101010101010101ULL >> shift,
> 
> I think this mixes host and target properties.  I guess intead of
> 'const unsigned long long' you want to use 'const uint64_t' and
> instead of TYPE_PRECISION (long_long_unsigned_type_node) 64?
> Since you are later comparing with unsigned HOST_WIDE_INT
> eventually unsigned HOST_WIDE_INT is better (that's always 64bit as well).

Agree. It is better to use HOST_WIDE_INT.

> 
> You are using tree_to_uhwi but nowhere verifying if @0 is unsigned.
> What happens if 'prec' is > 64?  (__int128 ...).  Ah, I guess the
> final selection will simply select nothing...
> 
> Otherwise the patch looks reasonable, even if the pattern
> is a bit unwieldly... ;)
> 
> Does it work for targets where 'unsigned int' is smaller than 32bit?

Yes. The only 16-bit-int architecture with popcount support on hw level is avr. 
I built gcc for avr and checked that 16-bit popcount algorithm is recognized 
successfully.

Thanks,
Dmitrij

> 
> Thanks,
> Richard.
> >
> > Thanks,
> > Dmitrij
> >
> >
> > gcc/ChangeLog:
> >
> >         PR tree-optimization/90836
> >
> >         * gcc/match.pd (popcount): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR tree-optimization/90836
> >
> >         * lib/target-supports.exp (check_effective_target_popcount)
> >         (check_effective_target_popcountll): New effective targets.
> >         * gcc.dg/tree-ssa/popcount4.c: New test.
> >         * gcc.dg/tree-ssa/popcount4l.c: New test.
> >         * gcc.dg/tree-ssa/popcount4ll.c: New test.

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

Reply via email to