On Sat, 15 Oct 2016, Prathamesh Kulkarni wrote: > On 13 October 2016 at 13:22, Marc Glisse <marc.gli...@inria.fr> wrote: > > On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote: > > > >> On 12 October 2016 at 14:43, Richard Biener <rguent...@suse.de> wrote: > >>> > >>> On Wed, 12 Oct 2016, Marc Glisse wrote: > >>> > >>>> On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote: > >>>> > >>>>> I was having a look at PR71636 and added the following pattern to > >>>>> match.pd: > >>>>> x & ((1U << b) - 1) -> x & ~(~0U << b) > >>>>> However the transform is useful only if the target supports "andnot" > >>>>> instruction. > >>>> > >>>> > >>>> rth was selling the transformation as a canonicalization, which is > >>>> beneficial > >>>> when there is an andnot instruction, and neutral otherwise, so it could > >>>> be > >>>> done always. > >>> > >>> > >>> Well, its three instructions to three instructions and a more expensive > >>> constant(?). ~0U might not be available as immediate for the shift > >>> instruction and 1U << b might be available as a bit-set instruction ... > >>> (vs. the andnot). > > > > > > True, I hadn't thought of bit-set. > > > >>> So yes, we might decide to canonicalize to andnot (and decide that > >>> three binary to two binary and one unary op is "better"). > >>> > >>> So no excuse to explore the target specific .pd fragment idea ... :/ > >> > >> Hi, > >> I have attached patch that adds the transform. > >> Does that look OK ? > > > > > > Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested in > > the PR? If we only do the transformation when (1<<b)-1 is fed to a bit_and, > > then we probably want to require that it has a single use (maybe even the > > shift). > > > >> I am not sure how to write test-cases for it though. > >> For the test-case: > >> unsigned f(unsigned x, unsigned b) > >> { > >> unsigned t1 = 1U << b; > >> unsigned t2 = t1 - 1; > >> unsigned t3 = x & t2; > >> return t3; > >> } > >> > >> forwprop dump shows: > >> Applying pattern match.pd:523, gimple-match.c:47419 > >> gimple_simplified to _6 = 4294967295 << b_1(D); > >> _8 = ~_6; > >> t3_5 = x_4(D) & _8; > >> > >> I could scan for "_6 = 4294967295 << b_1(D);" however I suppose > >> ~0 would depend on width of int and not always be 4294967295 ? > >> Or should I scan for "_6 = 4294967295 << b_1(D);" > >> and add /* { dg-require-effective int32 } */ to the test-case ? > > > > > > You could check that you have ~, or that you don't have " 1 << ". > Thanks for the suggestions. > Does the attached patch look OK ? > > For test-cases, scan-tree-dump-not "1 <<" works well for pr71636-1.c > which tests GENERIC folding, > however for GIMPLE folding, "1 << " still remains in the forwprop dump > because dce isn't > run to remove unused values. > > For the test-case: > unsigned f(unsigned x, unsigned b) > { > unsigned t1 = 1U << b; > unsigned t2 = t1 - 1; > unsigned t3 = x & t2; > return t3; > } > > forwprop dump shows: > Applying pattern match.pd:523, gimple-match.c:47418 > gimple_simplified to _6 = 4294967295 << b_1(D); > _8 = ~_6; > t3_5 = x_4(D) & _8; > f (unsigned int x, unsigned int b) > { > unsigned int t3; > unsigned int t2; > unsigned int t1; > unsigned int _6; > unsigned int _8; > > <bb 2>: > t1_2 = 1 << b_1(D); > t2_3 = t1_2 + 4294967295; > _6 = 4294967295 << b_1(D); > _8 = ~_6; > t3_5 = x_4(D) & _8; > return t3_5; > > } > > Instead I scanned for _8 = ~_6 with: > /* { dg-final { scan-tree-dump "_\[0-9\] = ~_\[0-9\]" "forwprop1" } } */ > because rhs has bit_not and lhs doesn't. > Is that OK ?
That's ok -- note I usually scan cddce1 instead of forwprop to have DCE run on the IL. Ok (with or without changing to scan cddce1 instead for not-1<<). Thanks, Richard. > Bootstrap+tested on x86_64-unknown-linux-gnu. > Cross-tested on arm*-*-*, aarch64*-*-* > > Thanks, > Prathamesh > > > > -- > > Marc Glisse > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)