On Wed, Mar 3, 2021 at 2:02 AM Peter Zijlstra wrote:
>
> On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote:
> > On gcc, the asm versions of `fls` are about the same speed as the
> > builtin. On clang, the versions that use fls (fls,fls64) are more than
> > twice as slow as the builtin. This
> you made fact_hi u32, why can't we unconditionally use fls() ?
Thanks for clarifying with ILP32; will remove this macro and simplify
to just fls().
On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote:
> From: Clement Courbet
>
> A significant portion of __calc_delta time is spent in the loop
> shifting a u64 by 32 bits. Use `fls` instead of iterating.
>
> This is ~7x faster on benchmarks.
>
> The generic `fls` implementation (`generic
On Tue, Mar 02, 2021 at 12:57:37PM -0800, Josh Don wrote:
> On gcc, the asm versions of `fls` are about the same speed as the
> builtin. On clang, the versions that use fls (fls,fls64) are more than
> twice as slow as the builtin. This is because the way the `fls` function
> is written, clang puts
On Tue, Mar 02, 2021 at 12:55:07PM -0800, Josh Don wrote:
> On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra wrote:
> >
> > On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote:
> > > From: Clement Courbet
> > >
> > > A significant portion of __calc_delta time is spent in the loop
> > > shiftin
From: Clement Courbet
A significant portion of __calc_delta time is spent in the loop
shifting a u64 by 32 bits. Use `fls` instead of iterating.
This is ~7x faster on benchmarks.
The generic `fls` implementation (`generic_fls`) is still ~4x faster
than the loop.
Architectures that have a better
On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra wrote:
>
> On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote:
> > From: Clement Courbet
> >
> > A significant portion of __calc_delta time is spent in the loop
> > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating.
> >
> > Th
On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote:
> From: Clement Courbet
>
> A significant portion of __calc_delta time is spent in the loop
> shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating.
>
> This is ~7x faster on benchmarks.
Have you tried on hardware without s
From: Clement Courbet
A significant portion of __calc_delta time is spent in the loop
shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating.
This is ~7x faster on benchmarks.
Signed-off-by: Clement Courbet
Signed-off-by: Josh Don
---
kernel/sched/fair.c | 30 +++
9 matches
Mail list logo