Re: [PATCH] sched: Optimize __calc_delta.

2021-03-03 Thread Josh Don
On Wed, Mar 3, 2021 at 2:02 AM Peter Zijlstra wrote: > > On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote: > > On gcc, the asm versions of `fls` are about the same speed as the > > builtin. On clang, the versions that use fls (fls,fls64) are more than > > twice as slow as the builtin. This

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-03 Thread Josh Don
> you made fact_hi u32, why can't we unconditionally use fls() ? Thanks for clarifying with ILP32; will remove this macro and simplify to just fls().

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-03 Thread Peter Zijlstra
On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote: > From: Clement Courbet > > A significant portion of __calc_delta time is spent in the loop > shifting a u64 by 32 bits. Use `fls` instead of iterating. > > This is ~7x faster on benchmarks. > > The generic `fls` implementation (`generic

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-03 Thread Peter Zijlstra
On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote: > On gcc, the asm versions of `fls` are about the same speed as the > builtin. On clang, the versions that use fls (fls,fls64) are more than > twice as slow as the builtin. This is because the way the `fls` function > is written, clang puts

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-03 Thread Peter Zijlstra
On Tue, Mar 02, 2021 at 12:55:07PM -0800, Josh Don wrote: > On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra wrote: > > > > On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote: > > > From: Clement Courbet > > > > > > A significant portion of __calc_delta time is spent in the loop > > > shiftin

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-02 Thread Josh Don
From: Clement Courbet A significant portion of __calc_delta time is spent in the loop shifting a u64 by 32 bits. Use `fls` instead of iterating. This is ~7x faster on benchmarks. The generic `fls` implementation (`generic_fls`) is still ~4x faster than the loop. Architectures that have a better

Re: [PATCH] sched: Optimize __calc_delta.

2021-03-02 Thread Josh Don
On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra wrote: > > On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote: > > From: Clement Courbet > > > > A significant portion of __calc_delta time is spent in the loop > > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating. > > > > Th

Re: [PATCH] sched: Optimize __calc_delta.

2021-02-26 Thread Peter Zijlstra
On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote: > From: Clement Courbet > > A significant portion of __calc_delta time is spent in the loop > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating. > > This is ~7x faster on benchmarks. Have you tried on hardware without s

[PATCH] sched: Optimize __calc_delta.

2021-02-26 Thread Josh Don
From: Clement Courbet A significant portion of __calc_delta time is spent in the loop shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating. This is ~7x faster on benchmarks. Signed-off-by: Clement Courbet Signed-off-by: Josh Don --- kernel/sched/fair.c | 30 +++