On Fri, Aug 17, 2012 at 2:12 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Fri, Aug 17, 2012 at 2:04 PM, Steven Bosscher <stevenb....@gmail.com> > wrote: >> On Fri, Aug 17, 2012 at 1:54 PM, Richard Guenther >> <richard.guent...@gmail.com> wrote: >>> Well, another effect of reducing the size of BITMAP_WORD is that >>> operations are not performed in a mode optimally using CPU regs >>> (did you check code generation differences on a 64bit host?). >> >> I did, on x86_64 and on powerpc64. The effect is not dramatic, most of >> these machines can perform 32 bits operations just fine (I think the >> only exception would be alpha, maybe?). > > I wonder how bad code gets when we unconditionally use GCCs generic > vector support to do > > Index: gcc/bitmap.c > =================================================================== > --- gcc/bitmap.c (revision 190469) > +++ gcc/bitmap.c (working copy) > @@ -1446,6 +1446,17 @@ bitmap_compl_and_into (bitmap a, const_b > unsigned ix; > BITMAP_WORD ior = 0; > > +#if BITMAP_ELEMENT_WORDS == 5 > + typedef v4si unsigned int __attribute__((vector_size((16)))); > + v4si cleared4 = *(v4si *)&a_elt->bits[0] & *(v4si *)&b_elt->bits[0]; > + BITMAP_WORD cleared5 = a_elt->bits[4] & b_elt->bits[4]; > + v4si r4 = *(v4si *)&b_elt->bits[0] ^ cleared4; > + BITMAP_WORD r5 = b_elt->bits[4] ^ cleared5; > + *(v4si *)&a_elt->bits[0] = r4; > + a_elt->bits[4] = r5; > + ior4 |= r4; > + ior5 |= r5; > +#else > for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++) > > of course with a proper #if GCC_VERSION. The theory is we should be > able to lower it to the same scalar code or even v2si operations where > only those are available ... > > Just an idea and eventually an opportunity to improve generic vector > lowering if the above really does not work out.
Or figure out if or why not the vectorizer does catch this (of course we do not enable that with -O2 which we eventually should in a very conservative mode). Richard. > Richard. > >> Ciao! >> Steven