Hi, thank you for looking into it.
On Fri, Sep 06, 2019 at 12:23:40PM +0200, Richard Biener wrote: > On Thu, Sep 5, 2019 at 5:35 PM Dmitrij Pochepko > <dmitrij.poche...@bell-sw.com> wrote: > > > > This patch adds matching for Hamming weight (popcount) implementation. The > > following sources: > > > > int > > foo64 (unsigned long long a) > > { > > unsigned long long b = a; > > b -= ((b>>1) & 0x5555555555555555ULL); > > b = ((b>>2) & 0x3333333333333333ULL) + (b & 0x3333333333333333ULL); > > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL; > > b *= 0x0101010101010101ULL; > > return (int)(b >> 56); > > } > > > > and > > > > int > > foo32 (unsigned int a) > > { > > unsigned long b = a; > > b -= ((b>>1) & 0x55555555UL); > > b = ((b>>2) & 0x33333333UL) + (b & 0x33333333UL); > > b = ((b>>4) + b) & 0x0F0F0F0FUL; > > b *= 0x01010101UL; > > return (int)(b >> 24); > > } > > > > and equivalents are now recognized as popcount for platforms with hw > > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and > > aarch64-linux-gnu systems with no regressions. > > > > (I have no write access to repo) > > +(simplify > + (convert > + (rshift > + (mult > > is the outer convert really necessary? That is, if we change > the simplification result to > > (convert (BUILT_IN_POPCOUNT @0)) > > wouldn't that be correct as well? Yes, this is better. I fixed it in the new version. > > Is the Hamming weight popcount > faster than the libgcc table-based approach? I wonder if we really > need to restrict this conversion to the case where the target > has an expander. > > + (mult > + (bit_and:c > > this doesn't need :c (second operand is a constant). Yes. Agree, this is redundant. > > + int shift = TYPE_PRECISION (long_long_unsigned_type_node) - prec; > + const unsigned long long c1 = 0x0101010101010101ULL >> shift, > > I think this mixes host and target properties. I guess intead of > 'const unsigned long long' you want to use 'const uint64_t' and > instead of TYPE_PRECISION (long_long_unsigned_type_node) 64? > Since you are later comparing with unsigned HOST_WIDE_INT > eventually unsigned HOST_WIDE_INT is better (that's always 64bit as well). Agree. It is better to use HOST_WIDE_INT. > > You are using tree_to_uhwi but nowhere verifying if @0 is unsigned. > What happens if 'prec' is > 64? (__int128 ...). Ah, I guess the > final selection will simply select nothing... > > Otherwise the patch looks reasonable, even if the pattern > is a bit unwieldly... ;) > > Does it work for targets where 'unsigned int' is smaller than 32bit? Yes. The only 16-bit-int architecture with popcount support on hw level is avr. I built gcc for avr and checked that 16-bit popcount algorithm is recognized successfully. Thanks, Dmitrij > > Thanks, > Richard. > > > > Thanks, > > Dmitrij > > > > > > gcc/ChangeLog: > > > > PR tree-optimization/90836 > > > > * gcc/match.pd (popcount): New pattern. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/90836 > > > > * lib/target-supports.exp (check_effective_target_popcount) > > (check_effective_target_popcountll): New effective targets. > > * gcc.dg/tree-ssa/popcount4.c: New test. > > * gcc.dg/tree-ssa/popcount4l.c: New test. > > * gcc.dg/tree-ssa/popcount4ll.c: New test.