On Fri, Jul 24, 2020 at 12:19:59AM -0700, Ian Rogers wrote: > for_each_set_bit, or similar functions like for_each_cpu, may be hot > within the kernel. If many bits were set then one could imagine on > Intel a "bt" instruction with every bit may be faster than the function > call and word length find_next_bit logic. Add a benchmark to measure > this. > This benchmark on AMD rome and Intel skylakex shows "bt" is not a good > option except for very small bitmaps.
Small bitmaps is a common case in the kernel (e.g. cpu bitmaps) But the current code isn't that great for small bitmaps. It always looks horrific when I look at PT traces or brstackinsn, especially since it was optimized purely for code size at some point. Probably would be better to have different implementations for different sizes. -Andi

