On Fri, Aug 17, 2012 at 1:46 PM, Steven Bosscher <stevenb....@gmail.com> wrote:
> Hello,
>
> On a 64-bits host we leave 32 bits of each bitmap_element unused. And
> actually, on a 32-bits host we do that too for GGC-allocated bitmaps
> (due to alignment).
>
> With this patch, we use those 32 bits by making BITMAP_WORD an
> unsigned int (instead of unsigned long) and having 5 elts per
> bitmap_element (instead of 2 or 4).
>
> The effect is that computing the bit position gets more expensive.
> Currently this only needs a div/mod by a power-of-two, but with this
> patch it's a div/mod 5. But because most bitmaps are sparse in blocks,
> bitmaps with 160 bits bitmap_elements have fewer searches. And I think
> that in general the call overhead to the bitmap functions
> (bitmap_bit_p/bitmap_set_bit are not inline functions, they're
> relatively large) and the pointer chasing is more expensive than
> requiring div/mod 5. Compile time for a set of cc1-i files doesn't
> change, but compile time for the small test case of PR54146 goes down
> by ~10%, and peak memory goes down a bit too (harder to measure so I
> don't know exactly by how much).
>
> Looking for your thoughts and comments,

Well, another effect of reducing the size of BITMAP_WORD is that
operations are not performed in a mode optimally using CPU regs
(did you check code generation differences on a 64bit host?).
I think another idea was to even actually increase BITMAP_WORD size
to be a vector type where supported to benefit from that fact, eventually
getting rid of all inner loops (though we unroll them completely anyway I hope).

I suppose using vector registers could be achieved in by specializing as well,
in your proposed case we'd have one xmm op and a 32bit reg op.

I guess that's one vote in favor of not wasting those 32 bits which is a shame.

Looking for other comments.

Thanks,
Richard.

> Ciao!
> Steven

Reply via email to