[powerpc/merge] PMU: Kernel warning while running pmu/ebb selftests

2021-12-18 Thread Sachin Sant
While running kernel selftests (lost_exception_test) against latest powerpc merge/next branch code (5.16.0-rc5-03218-g798527287598) following warning is seen: [ 172.851380] [ cut here ] [ 172.851391] WARNING: CPU: 8 PID: 2901 at arch/powerpc/include/asm/hw_irq.h:246 powe

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-18 Thread Thomas Gleixner
On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote: > On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: > I just bisected a boot failure on my AMD test desktop to this patch as > commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()") in > -next. It looks like there is a

Re: [PATCH/RFC] mm: add and use batched version of __tlb_remove_table()

2021-12-18 Thread Nikita Yushchenko
Oh gawd, that's terrible. Never, ever duplicate code like that. What the patch does is: - formally shift the loop one level down in the call graph, adding instances of __tmp_remove_tables() exactly to locations where instances of __tmp_remove_table() already exist, - on architectures where __tm

Re: [PATCH/RFC] mm: add and use batched version of __tlb_remove_table()

2021-12-18 Thread Nikita Yushchenko
17.12.2021 21:39, Sam Ravnborg wrote: Hi Nikita, How about adding the following to tlb.h: #ifndef __tlb_remove_tables static void __tlb_remove_tables(...) { } #endif And then the few archs that want to override __tlb_remove_tables needs to do a #define __tlb_remove_tables __tlb_re

Re: [PATCH/RFC] mm: add and use batched version of __tlb_remove_table()

2021-12-18 Thread Nikita Yushchenko
This allows archs to optimize it, by freeing multiple tables in a single release_pages() call. This is faster than individual put_page() calls, especially with memcg accounting enabled. Could we quantify "faster"? There's a non-trivial amount of code being added here and it would be nice to bac

Re: [PATCH v1 0/5] Implement livepatch on PPC32

2021-12-18 Thread Christophe Leroy
Le 14/12/2021 à 15:01, Steven Rostedt a écrit : > On Tue, 14 Dec 2021 08:35:14 +0100 > Christophe Leroy wrote: > >>> Will continue investigating. >>> >> >> trace_selftest_startup_function_graph() calls register_ftrace_direct() >> which returns -ENOSUPP because powerpc doesn't select >> CONF

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-18 Thread Nathan Chancellor
On Sat, Dec 18, 2021 at 11:25:14AM +0100, Thomas Gleixner wrote: > On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote: > > On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: > > I just bisected a boot failure on my AMD test desktop to this patch as > > commit f48235900182 ("PCI/MSI

Re: [patch V3 28/35] PCI/MSI: Simplify pci_irq_get_affinity()

2021-12-18 Thread Cédric Le Goater
On 12/18/21 11:25, Thomas Gleixner wrote: On Fri, Dec 17 2021 at 15:30, Nathan Chancellor wrote: On Fri, Dec 10, 2021 at 11:19:26PM +0100, Thomas Gleixner wrote: I just bisected a boot failure on my AMD test desktop to this patch as commit f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()"

[PATCH 01/17] all: don't use bitmap_weight() where possible

2021-12-18 Thread Yury Norov
Don't call bitmap_weight() if the following code can get by without it. Signed-off-by: Yury Norov --- drivers/net/dsa/b53/b53_common.c | 6 +- drivers/net/ethernet/broadcom/bcmsysport.c | 6 +- drivers/thermal/intel/intel_powerclamp.c | 9 +++-- 3 files changed, 5 inserti

[PATCH v2 00/17] lib/bitmap: optimize bitmap_weight() usage

2021-12-18 Thread Yury Norov
In many cases people use bitmap_weight()-based functions to compare the result against a number of expression: if (cpumask_weight(...) > 1) do_something(); This may take considerable amount of time on many-cpus machines because cpumask_weight(...) will traverse every word

[PATCH 02/17] drivers: rename num_*_cpus variables

2021-12-18 Thread Yury Norov
Some drivers declare num_active_cpus and num_present_cpus, despite that kernel has macros with corresponding names in linux/cpumask.h, and the drivers include cpumask.h The following patches switch num_*_cpus() to real functions, which causes build failures for the drivers. Signed-off-by: Yury No

[PATCH 03/17] fix open-coded for_each_set_bit()

2021-12-18 Thread Yury Norov
Mellanox driver has an open-coded for_each_set_bit(). Fix it. Signed-off-by: Yury Norov --- drivers/net/ethernet/mellanox/mlx4/cmd.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mella

[PATCH 04/17] all: replace bitmap_weight with bitmap_empty where appropriate

2021-12-18 Thread Yury Norov
In many cases, kernel code calls bitmap_weight() to check if any bit of a given bitmap is set. It's better to use bitmap_empty() in that case because bitmap_empty() stops traversing the bitmap as soon as it finds first set bit, while bitmap_weight() counts all bits unconditionally. Signed-off-by:

[PATCH 05/17] all: replace cpumask_weight with cpumask_empty where appropriate

2021-12-18 Thread Yury Norov
In many cases, kernel code calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed

[PATCH 06/17] all: replace nodes_weight with nodes_empty where appropriate

2021-12-18 Thread Yury Norov
Kernel code calls nodes_weight() to check if any bit of a given nodemask is set. We can do it more efficiently with nodes_empty() because nodes_empty() stops traversing the nodemask as soon as it finds first set bit, while nodes_weight() counts all bits unconditionally. Signed-off-by: Yury Norov

[PATCH 07/17] lib/bitmap: add bitmap_weight_{cmp, eq, gt, ge, lt, le} functions

2021-12-18 Thread Yury Norov
Many kernel users use bitmap_weight() to compare the result against some number or expression: if (bitmap_weight(...) > 1) do_something(); It works OK, but may be significantly improved for large bitmaps: if first few words count set bits to a number greater than given, we

[PATCH 08/17] all: replace bitmap_weight with bitmap_weight_{eq, gt, ge, lt, le} where appropriate

2021-12-18 Thread Yury Norov
Kernel code calls bitmap_weight() to compare the weight of bitmap with a given number. We can do it more efficiently with bitmap_weight_{eq, ...} because conditional bitmap_weight may stop traversing the bitmap earlier, as soon as condition is met. This patch replaces bitmap_weight with conditiona

[PATCH 09/17] lib/cpumask: add cpumask_weight_{eq,gt,ge,lt,le}

2021-12-18 Thread Yury Norov
Kernel code calls cpumask_weight() to compare the weight of cpumask with a given number. We can do it more efficiently with cpumask_weight_{eq, ...} because conditional cpumask_weight may stop traversing the cpumask earlier, as soon as condition is met. Signed-off-by: Yury Norov --- arch/ia64/mm

[PATCH 10/17] lib/nodemask: add nodemask_weight_{eq,gt,ge,lt,le}

2021-12-18 Thread Yury Norov
Kernel code calls nodes_weight() to compare the weight of nodemask with a given number. We can do it more efficiently with nodes_weight_{eq, ...} because conditional nodes_weight may stop traversing the nodemask earlier, as soon as condition is met. Signed-off-by: Yury Norov --- drivers/acpi/num

[PATCH 11/17] lib/nodemask: add num_node_state_eq()

2021-12-18 Thread Yury Norov
Kernel code calls num_node_state() to compare number of nodes with a given number. The underlying code calls bitmap_weight(), and we can do it more efficiently with num_node_state_eq because conditional nodes_weight may stop traversing the nodemask earlier, as soon as condition is met. Signed-off-

[PATCH 12/17] kernel/cpu.c: fix init_cpu_online

2021-12-18 Thread Yury Norov
cpu_online_mask has an associate counter of online cpus, which should be initialized in init_cpu_online() Fixes: 0c09ab96fc82010 (cpu/hotplug: Cache number of online CPUs) Signed-off-by: Yury Norov --- kernel/cpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/cpu.c b/kernel/cpu.c i

[PATCH 13/17] kernel/cpu: add num_possible_cpus counter

2021-12-18 Thread Yury Norov
Similarly to the online cpus, the cpu_possible_mask is actively used in the kernel. This patch adds a counter for possible cpus, so that users that call num_possible_cpus() would know the result immediately, instead of calling the bitmap_weight for the mask underlying. Suggested-by: Nicholas Piggi

[PATCH 14/17] kernel/cpu: add num_present_cpu counter

2021-12-18 Thread Yury Norov
Similarly to the online cpus, the cpu_present_mask is actively used in the kernel. This patch adds a counter for present cpus, so that users that call num_present_cpus() would know the result immediately, instead of calling the bitmap_weight for the mask. Suggested-by: Nicholas Piggin Signed-off-

[PATCH 15/17] kernel/cpu: add num_active_cpu counter

2021-12-18 Thread Yury Norov
Similarly to the online cpus, the cpu_active_mask is actively used in the kernel. This patch adds a counter for active cpus, so that users that call num_active_cpus() would know the result immediately, instead of calling the bitmap_weight for the mask. Suggested-by: Nicholas Piggin Signed-off-by:

[PATCH 16/17] tools/bitmap: sync bitmap_weight

2021-12-18 Thread Yury Norov
Pull bitmap_weight_{cmp,eq,gt,ge,lt,le} from mother kernel and use where applicable. Signed-off-by: Yury Norov --- tools/include/linux/bitmap.h | 44 tools/lib/bitmap.c | 20 tools/perf/util/pmu.c| 2 +- 3 files changed, 65

[PATCH 17/17] MAINTAINERS: add cpumask and nodemask files to BITMAP_API

2021-12-18 Thread Yury Norov
cpumask and nodemask APIs are thin wrappers around basic bitmap API, and corresponding files are not formally maintained. This patch adds them to BITMAP_API section, so that bitmap folks would have closer look at it. Signed-off-by: Yury Norov --- MAINTAINERS | 4 1 file changed, 4 insertion

Re: [PATCH 01/17] all: don't use bitmap_weight() where possible

2021-12-18 Thread Yury Norov
On Sat, Dec 18, 2021 at 2:16 PM Michał Mirosław wrote: > > On Sat, Dec 18, 2021 at 01:19:57PM -0800, Yury Norov wrote: > > Don't call bitmap_weight() if the following code can get by > > without it. > > > > Signed-off-by: Yury Norov > > --- > > drivers/net/dsa/b53/b53_common.c | 6 +---

Re: [PATCH/RFC] mm: add and use batched version of __tlb_remove_table()

2021-12-18 Thread Dave Hansen
On 12/18/21 6:31 AM, Nikita Yushchenko wrote: >>> This allows archs to optimize it, by >>> freeing multiple tables in a single release_pages() call. This is >>> faster than individual put_page() calls, especially with memcg >>> accounting enabled. >> >> Could we quantify "faster"?  There's a non-tr

[GIT PULL] Please pull powerpc/linux.git powerpc-5.16-4 tag

2021-12-18 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Linus, Please pull some more powerpc fixes for 5.16: The following changes since commit 5bb60ea611db1e04814426ed4bd1c95d1487678e: powerpc/32: Fix hardlockup on vmap stack overflow (2021-11-24 21:00:51 +1100) are available in the git reposito