On Fri, 19 Oct 2018, Steven Bosscher wrote: > On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote: > > Yeah. I also noticed some 'obvious' shortcomings in the heuristics... > > I guess in the end well predicted branches in the out of line code are > > important...
I specifically meant the fact that we happily update ->current to ->first and we do not check ->first->index == indx. We also decide when we want to walk backwards from current via head->indx / 2 < indx rather than (head->indx - head->first->indx) / 2 + head->first->indx < indx - but that's a heuristic assuming evenly distributed set bits anyway. > What also would help is to put bitmaps on their own obstack to improve > cache locality. That's true. I guess increasing bitmap_element size to cover a whole cache-line would be excessive ;) On lp64 targets it's size is currently 40 bytes which makes multiple ones not a perfect fit for a usual cacheline (128 bytes). > > As for the patch, I never hacked it with "production code" in mind, it > was just a proof of concept. Not all of it is optimal or even safe > as-is. For example you probably should add > "gcc_checking_assert(!(BITMAP)->tree-form)" tests in the > bmp_iter_*_init functions. Those are already there. > And perhaps semi-splaying trees work better > for the use cases of GCC (x.f. "Rehabilitation of an unloved child: > semi-splaying"). I implemented classic splay trees because I could not > find a semi-splay tree implementation in any of the usual text books > while classic splay tree implementations were given in all of those > books ;-) I think the classic splay tree mimics best the existing behavior of the linked-list implementation (in find_element). Another thing to note would be that the ->current cache is basically unused for the tree representation (it should always equal ->first), but we do neither document that fact nor exploit it by not updating ->indx or ->current. Anyway, I'll give the paper a read. Overall it's clear that there are places in GCC and testcases that make it necessary to address the linar-complexity of our bitmap implementation... Thanks, Richard.