0x 1.18
0x 0x 1.17
0x 0x 1.25
-----
geo.mean 2.06
Signed-off-by: Clement Courbet
---
Changes in v2:
- Refactored _find_next_common_bit into _find_next_bit., as suggested
by Yury Norov. This has no adverse effects on the performance side,
Thanks for the comments Yury.
> But I'd like also to keep _find_next_bit() consistent with
> _find_next_bit_le()
Not sure I understand what you're suggesting here: Do you want a
find_next_and_bit_le() or do you want to make _find_next_bit_le() more
like _find_next_bit() ? In the latter case we mi
0x 1.18
0x 0x 1.17
0x 0x 1.25
-----
geo.mean 2.06
Signed-off-by: Clement Courbet
---
Changes in v2:
- Refactored _find_next_common_bit into _find_next_bit., as suggested
by Yury Norov. This has no adverse effects on the performance side,
as the
Hi Alexey,
> Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig
> (which has OPTIMIZE_INLINING).
I have not actually marked _find_next_bit() inline, it just turns out
that my compiler inlines it.
I've tried out marking the function inline and OPTIMIZE_INLINING does
not un-inlin
0x 1.18
0x 0x 1.17
0x 0x 1.25
-----
geo.mean 2.06
Signed-off-by: Clement Courbet
---
Changes in v2:
- Refactored _find_next_common_bit into _find_next_bit., as suggested
by Yury Norov. This has no adverse effects on the performance side,
as the
Hi Yury,
I've tried your benchmark on x86-64 (haswell). Inlining is a pretty small
increase in binary size: 48B (2%).
In terms of speed, results are not very stable from one run to another
(I've included two runs to give you an idea), but overall there seems
to be small improvement on the random-
On Tue, Aug 18, 2020 at 9:58 PM Nick Desaulniers
wrote:
On Tue, Aug 18, 2020 at 12:25 PM Nick Desaulniers
wrote:
>
> On Tue, Aug 18, 2020 at 12:19 PM Linus Torvalds
> wrote:
> >
> > And honestly, a compiler that uses 'bcmp' is just broken. WTH? It's
> > the year 2020, we don't use bcmp. It's th
s
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.349992] find_next_and_bit_ref: 193 cycles, 1 iterations
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
Signed-off-by: Clement Courbet
---
Changes in v2:
- Refactored _find_next_common_bit into _find_next_bit.,
0x 1.18
0x 0x 1.17
0x 0x 1.25
-----
geo.mean 2.06
Signed-off-by: Clement Courbet
---
include/asm-generic/bitops/find.h | 16
include/linux/bitmap.h | 2 ++
lib/cpumask.c
Reviewed-By: Clement Courbet
Thanks for the addition, Yury ! I've used a modified version of v1
for measuring improvements from find_next_and_bit() on x86 and arm
and found it very useful.
> > Note that on Arm (), the new c implementation still outperforms the
> > old one that uses c+ the asm implementation of `find_next_bit` [3].
> What is 'c+'? Is it typo?
I meant "a mix of C and asm" ~(C + asm). Rephrased.
> If you find generic find_bit() on arm faster that asm one, we'd
> defin
t: 4118 cycles, 66 iterations
[ 267.342627] Start testing find_next_and_bit() with sparse bitmap
[ 267.356919] find_next_and_bit: 91 cycles, 1 iterations
Signed-off-by: Clement Courbet
---
Changes in v2:
- Refactored _find_next_common_bit into _find_next_bit., as suggested
by Yury Norov. T
> So I think it really worth to be separated patch. Really, it's
> completely nontrivial why adding new function in lib/find_bit.c
> requires including asm-generic/bitops/find.h in arm and uncore32
> asm/bitops.h headers (bug?). And why doing that makes you guard
> find_first_bit and find_first_zer
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 1e17fb8edc5ad6587e9303ccdebce853bc8cf30c
Gitweb:
https://git.kernel.org/tip/1e17fb8edc5ad6587e9303ccdebce853bc8cf30c
Author:Clement Courbet
AuthorDate:Wed, 03 Mar 2021 14:46:53 -08:00
14 matches
Mail list logo