On 14/03/2024 2:30 pm, Jan Beulich wrote: > On 13.03.2024 18:27, Andrew Cooper wrote: >> --- a/xen/drivers/passthrough/x86/iommu.c >> +++ b/xen/drivers/passthrough/x86/iommu.c >> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct >> domain_iommu *hd, >> if ( contig_mask ) >> { >> /* See pt-contig-markers.h for a description of the marker scheme. >> */ >> - unsigned int i, shift = find_first_set_bit(contig_mask); >> + unsigned int i, shift = ffsl(contig_mask) - 1; > The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and > why I think find_first_set_bit() and __ffs() (but no __ffsl()) were > introduced).
It's sad that there are competing APIs with different bit-labelling, but the optimiser does cancel the -1 with arch_ffs() (for at least x86 and ARM that I studied in detail). I firmly believe that fewer APIs which are fully well defined (and can optimise based on the compiler's idea of safety) is still better than a maze of APIs with different behaviours. > But what I first of all would like to have clarification on is what your > (perhaps just abstract at this point) plans are wrt ffz() / ffzl(). > Potential side-by-side uses would be odd now, and would continue to be odd > if the difference in bit labeling was retained. Since we're switching to > a consolidated set of basic helpers, such an anomaly would better not > survive imo. I honestly hadn't got that far yet. I was mainly trying to dis-entangle the existing mess so RISC-V wasn't making it yet-worse. But yes - it warrants thinking about. I was intending to do the fls() next then popcnt(). The latter has quite a lot of cleanup wanting to come with it, and is more architecturally invasive, and I know I've got a years-old outstanding piece of work to try and do popcnt more nicely on x86. I have wanted ffz() in the past. I think I just went with explicit ~ because I didn't want to continue this debate at the time. However, I (very much more) do not want a situation where ffs() and ffz() have different bit-labellings. There are no builtins, and having now studied the architectures we care about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count leading/trailing zeros" instruction. So using ffs(~val) really will get you the best code generation available, and seeing as it halves the number of bitops to maintain, I think this is the best tradeoff overall. I intend to put ffz() and __ffs() into linux-compat.h and leave them there to discourage their use generally. ~Andrew