On 14/03/2024 6:47 pm, Andrew Cooper wrote:
> On 14/03/2024 2:30 pm, Jan Beulich wrote:
>> On 13.03.2024 18:27, Andrew Cooper wrote:
>>> --- a/xen/drivers/passthrough/x86/iommu.c
>>> +++ b/xen/drivers/passthrough/x86/iommu.c
>>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct 
>>> domain_iommu *hd,
>>>      if ( contig_mask )
>>>      {
>>>          /* See pt-contig-markers.h for a description of the marker scheme. 
>>> */
>>> -        unsigned int i, shift = find_first_set_bit(contig_mask);
>>> +        unsigned int i, shift = ffsl(contig_mask) - 1;
>> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and
>> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were
>> introduced).
> It's sad that there are competing APIs with different bit-labelling, but
> the optimiser does cancel the -1 with arch_ffs() (for at least x86 and
> ARM that I studied in detail).
>
> I firmly believe that fewer APIs which are fully well defined (and can
> optimise based on the compiler's idea of safety) is still better than a
> maze of APIs with different behaviours.
>
>> But what I first of all would like to have clarification on is what your
>> (perhaps just abstract at this point) plans are wrt ffz() / ffzl().
>> Potential side-by-side uses would be odd now, and would continue to be odd
>> if the difference in bit labeling was retained. Since we're switching to
>> a consolidated set of basic helpers, such an anomaly would better not
>> survive imo.
> I honestly hadn't got that far yet.  I was mainly trying to dis-entangle
> the existing mess so RISC-V wasn't making it yet-worse.
>
> But yes - it warrants thinking about.
>
>
> I was intending to do the fls() next then popcnt().   The latter has
> quite a lot of cleanup wanting to come with it, and is more
> architecturally invasive, and I know I've got a years-old outstanding
> piece of work to try and do popcnt more nicely on x86.
>
> I have wanted ffz() in the past.  I think I just went with explicit ~
> because I didn't want to continue this debate at the time.
>
> However, I (very much more) do not want a situation where ffs() and
> ffz() have different bit-labellings.
>
>
> There are no builtins, and having now studied the architectures we care
> about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count
> leading/trailing zeros" instruction.

Hopefully obviously, I meant ones here.   My point is that the compiler
emitted code always has a NOT in it somewhere.

>
> So using ffs(~val) really will get you the best code generation
> available, and seeing as it halves the number of bitops to maintain, I
> think this is the best tradeoff overall.
>
> I intend to put ffz() and __ffs() into linux-compat.h and leave them
> there to discourage their use generally.
>
> ~Andrew


Reply via email to