On 27/08/2024 2:57 pm, Andrew Cooper wrote: > There are two issues. First, pi_test_and_clear_on() pulls the cache-line to > the CPU and dirties it even if there's nothing outstanding, but the final > for_each_set_bit() is O(256) when O(8) would do, and would avoid multiple > atomic updates to the same IRR word. > > Rewrite it from scratch, explaining what's going on at each step. > > Bloat-o-meter reports 177 -> 145 (net -32), but the better aspect is the > removal calls to __find_{first,next}_bit() hidden behind for_each_set_bit(). > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > --- > CC: Jan Beulich <jbeul...@suse.com> > CC: Roger Pau Monné <roger....@citrix.com> > CC: Stefano Stabellini <sstabell...@kernel.org> > CC: Julien Grall <jul...@xen.org> > CC: Volodymyr Babchuk <volodymyr_babc...@epam.com> > CC: Bertrand Marquis <bertrand.marq...@arm.com> > CC: Michal Orzel <michal.or...@amd.com> > CC: Oleksii Kurochko <oleksii.kuroc...@gmail.com> > > The main purpose of this is to get rid of bitmap_for_each(). > > v2: > * Extend the comments
FWIW, Gitlab CI has gained one reliable failure for this series (which includes the hweight series too, because of how I've got my branch arranged). It is a timeout (domU not reporting in after boot), and as it is specific to the AlderLake runner, it's very likely to be this patch. I guess I need to triple-check the IRR scatter logic... ~Andrew