More fallout from looking at the code generation...

for_each_set_bit() forces it's bitmap parameter out into memory.  For an
arbitrary sized bitmap, this is fine - and likely preferable as it's an
in-memory to begin with.

However, more than half the current users of for_each_set_bit() are
operating over an single int/long, and this too is spilled to the
stack.  Worse, x86 seems to be the only architecture which (tries, but
not very well) to optimise find_{first,next}_bit() for GPR-sized
quantities, meaning that for_each_set_bit() hides 2 backing function calls.

The ARM (v)GIC code in particular suffers horribly because of this.

We also have several interesting opencoded forms:
* evtchn_check_pollers() is a (preprocessor identical) opencoding.
* hvm_emulate_writeback() is equivalent.
* for_each_vp() exists just to hardcode a constant and swap the other
two parameters.

and several others forms which I think could be expressed more cleanly
as for_each_set_bit().

We also have the while()/ffs() forms which are "just" for_each_set_bit()
and some even manage to not spill their main variable to memory.


I want to get to a position where there is one clear API to use, and
that the compiler will handle nicely.  Xen's code generation will
definitely improve as a consequence.


Sadly, transforming the ideal while()/ffs() form into a for() loop is a
bit tricky.  This works:

for ( unsigned int v = (val), (bit);
      v;
      v &= v - 1 )
if ( 1 )
{
    (bit) = ffs(v) - 1;
    goto body;
}
else
    body:

which is a C metaprogramming trick borrowed from PuTTY to make:

for_each_BLAH ( bit, val )
{
    // nice loop body
}

work, while having the ffs() calculated logically within the loop body.

The first issue I expect people to have with the above is the raw 'body'
label, although with a macro that can be fixed using body_ ## __COUNTER__.

A full example is https://godbolt.org/z/oMGfah696 although a real
example in Xen is going to have to be variadic for at least ffs() and
ffsl().


Now, from an API point of view, it would be lovely if we could make a
single for_each_set_bit() which covers both cases, and while I can
distinguish the two forms by whether there are 2 or 3 args, I expect
MISRA is going to have a fit at that.  Also there's a difference based
on the scope of 'bit' and also whether modifications to 'val' in the
loop body take effect on the loop condition (they don't because a copy
is taken).

So I expect everyone is going to want a new API to use here.  But what
to call it?

More than half of the callers in Xen really want the GPR form, so we
could introduce a new bitmap_for_each_set_bit(), move all the callers
over, then introduce a "new" for_each_set_bit() which is only of the GPR
form.

Or does anyone want to suggest an alternative name?

Thoughts?

~Andrew

Reply via email to