On 14/03/2024 6:47 pm, Andrew Cooper wrote: > On 14/03/2024 2:30 pm, Jan Beulich wrote: >> On 13.03.2024 18:27, Andrew Cooper wrote: >>> --- a/xen/drivers/passthrough/x86/iommu.c >>> +++ b/xen/drivers/passthrough/x86/iommu.c >>> @@ -641,7 +641,7 @@ struct page_info *iommu_alloc_pgtable(struct >>> domain_iommu *hd, >>> if ( contig_mask ) >>> { >>> /* See pt-contig-markers.h for a description of the marker scheme. >>> */ >>> - unsigned int i, shift = find_first_set_bit(contig_mask); >>> + unsigned int i, shift = ffsl(contig_mask) - 1; >> The need for subtracting 1 is why personally I dislike ffs() / ffsl() (and >> why I think find_first_set_bit() and __ffs() (but no __ffsl()) were >> introduced). > It's sad that there are competing APIs with different bit-labelling, but > the optimiser does cancel the -1 with arch_ffs() (for at least x86 and > ARM that I studied in detail). > > I firmly believe that fewer APIs which are fully well defined (and can > optimise based on the compiler's idea of safety) is still better than a > maze of APIs with different behaviours. > >> But what I first of all would like to have clarification on is what your >> (perhaps just abstract at this point) plans are wrt ffz() / ffzl(). >> Potential side-by-side uses would be odd now, and would continue to be odd >> if the difference in bit labeling was retained. Since we're switching to >> a consolidated set of basic helpers, such an anomaly would better not >> survive imo. > I honestly hadn't got that far yet. I was mainly trying to dis-entangle > the existing mess so RISC-V wasn't making it yet-worse. > > But yes - it warrants thinking about. > > > I was intending to do the fls() next then popcnt(). The latter has > quite a lot of cleanup wanting to come with it, and is more > architecturally invasive, and I know I've got a years-old outstanding > piece of work to try and do popcnt more nicely on x86. > > I have wanted ffz() in the past. I think I just went with explicit ~ > because I didn't want to continue this debate at the time. > > However, I (very much more) do not want a situation where ffs() and > ffz() have different bit-labellings. > > > There are no builtins, and having now studied the architectures we care > about... https://godbolt.org/z/KasP41n1e ...not even x86 has a "count > leading/trailing zeros" instruction.
Hopefully obviously, I meant ones here. My point is that the compiler emitted code always has a NOT in it somewhere. > > So using ffs(~val) really will get you the best code generation > available, and seeing as it halves the number of bitops to maintain, I > think this is the best tradeoff overall. > > I intend to put ffz() and __ffs() into linux-compat.h and leave them > there to discourage their use generally. > > ~Andrew