On Fri, Aug 17, 2012 at 1:46 PM, Steven Bosscher <stevenb....@gmail.com> wrote: > Hello, > > On a 64-bits host we leave 32 bits of each bitmap_element unused. And > actually, on a 32-bits host we do that too for GGC-allocated bitmaps > (due to alignment). > > With this patch, we use those 32 bits by making BITMAP_WORD an > unsigned int (instead of unsigned long) and having 5 elts per > bitmap_element (instead of 2 or 4). > > The effect is that computing the bit position gets more expensive. > Currently this only needs a div/mod by a power-of-two, but with this > patch it's a div/mod 5. But because most bitmaps are sparse in blocks, > bitmaps with 160 bits bitmap_elements have fewer searches. And I think > that in general the call overhead to the bitmap functions > (bitmap_bit_p/bitmap_set_bit are not inline functions, they're > relatively large) and the pointer chasing is more expensive than > requiring div/mod 5. Compile time for a set of cc1-i files doesn't > change, but compile time for the small test case of PR54146 goes down > by ~10%, and peak memory goes down a bit too (harder to measure so I > don't know exactly by how much). > > Looking for your thoughts and comments,
Well, another effect of reducing the size of BITMAP_WORD is that operations are not performed in a mode optimally using CPU regs (did you check code generation differences on a 64bit host?). I think another idea was to even actually increase BITMAP_WORD size to be a vector type where supported to benefit from that fact, eventually getting rid of all inner loops (though we unroll them completely anyway I hope). I suppose using vector registers could be achieved in by specializing as well, in your proposed case we'd have one xmm op and a 32bit reg op. I guess that's one vote in favor of not wasting those 32 bits which is a shame. Looking for other comments. Thanks, Richard. > Ciao! > Steven