On 14/03/2024 2:30 pm, Jan Beulich wrote:
> On 13.03.2024 18:27, Andrew Cooper wrote:
>> --- a/xen/drivers/passthrough/x86/iommu.c
>> +++ b/xen/drivers/passthrough/x86/iommu.c
>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct 
>> domain_iommu *hd,
>>      if ( contig_mask )
>>      {
>>          /* See pt-contig-markers.h for a description of the marker scheme. 
>> */
>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
> introduced).

It's sad that there are competing APIs with different bit-labelling, but
the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
ARM that I studied in detail).

I firmly believe that fewer APIs which are fully well defined (and can
optimise based on the compiler's idea of safety) is still better than a
maze of APIs with different behaviours.

> But what I first of all would like to have clarification on is what your
> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
> Potential side-by-side uses would be odd now, and would continue to be odd
> if the difference in bit labeling was retained. Since we're switching to
> a consolidated set of basic helpers, such an anomaly would better not
> survive imo.

I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
the existing mess so RISC-V wasn't making it yet-worse.

But yes - it warrants thinking about.


I was intending to do the fls() next then popcnt().   The latter has
quite a lot of cleanup wanting to come with it, and is more
architecturally invasive, and I know I've got a years-old outstanding
piece of work to try and do popcnt more nicely on x86.

I have wanted ffz() in the past.  I think I just went with explicit ~
because I didn't want to continue this debate at the time.

However, I (very much more) do not want a situation where ffs() and
ffz() have different bit-labellings.


There are no builtins, and having now studied the architectures we care
about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
leading/trailing zeros" instruction.

So using ffs(~val) really will get you the best code generation
available, and seeing as it halves the number of bitops to maintain, I
think this is the best tradeoff overall.

I intend to put ffz() and __ffs() into linux-compat.h and leave them
there to discourage their use generally.

~Andrew

Reply via email to