[PATCH v8 3/3] ARM: ftrace: Add MODULE_PLTS support

2021-03-30 Thread Alexander A Sverdlin
From: Alexander Sverdlin Teach ftrace_make_call() and ftrace_make_nop() about PLTs. Teach PLT code about FTRACE and all its callbacks. Otherwise the following might happen: [ cut here ] WARNING: CPU: 14 PID: 2265 at .../arch/arm/kernel/insn.c:14 __arm_gen_branch+0x83/0x8

[PATCH v8 2/3] ARM: Add warn suppress parameter to arm_gen_branch_link()

2021-03-30 Thread Alexander A Sverdlin
From: Alexander Sverdlin Will be used in the following patch. No functional change. Signed-off-by: Alexander Sverdlin --- arch/arm/include/asm/insn.h | 8 arch/arm/kernel/ftrace.c| 2 +- arch/arm/kernel/insn.c | 19 ++- 3 files changed, 15 insertions(+), 14

[PATCH v8 1/3] ARM: PLT: Move struct plt_entries definition to header

2021-03-30 Thread Alexander A Sverdlin
From: Alexander Sverdlin No functional change, later it will be re-used in several files. Signed-off-by: Alexander Sverdlin --- arch/arm/include/asm/module.h | 9 + arch/arm/kernel/module-plts.c | 9 - 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm/incl

[PATCH v8 0/3] ARM: Implement MODULE_PLT support in FTRACE

2021-03-30 Thread Alexander A Sverdlin
From: Alexander Sverdlin FTRACE's function tracer currently doesn't always work on ARM with MODULE_PLT option enabled. If the module is loaded too far, FTRACE's code modifier cannot cope with introduced veneers and turns the function tracer off globally. ARM64 already has a solution for the prob

[PATCH v3] gpio: pl061: Support implementations without GPIOINTR line

2021-03-19 Thread Alexander A Sverdlin
From: Alexander Sverdlin There are several implementations of PL061 which lack GPIOINTR signal in hardware and only have individual GPIOMIS[7:0] interrupts. Use the hierarchical interrupt support of the gpiolib in these cases (if at least 8 IRQs are configured for the PL061). One in-tree example

[PATCH v2] gpio: pl061: Support implementations without GPIOINTR line

2021-03-18 Thread Alexander A Sverdlin
From: Alexander Sverdlin There are several implementations of PL061 which lack GPIOINTR signal in hardware and only have individual GPIOMIS[7:0] interrupts. Use the hierarchical interrupt support of the gpiolib in these cases (if at least 8 IRQs are configured for the PL061). One in-tree example

[PATCH] rapidio/mport_cdev: Fix race in mport_cdev_release()

2021-03-17 Thread Alexander A Sverdlin
From: Alexander Sverdlin While get_dma_channel() is protected against concurrent calls, there is a race against kref_put() in mport_cdev_release(): CPU0CPU1 get_dma_channel() kref_init(&priv->md->dma_ref); ... mport_cdev_release_dma() kref_put(&md->dma

[PATCH 2/2] mtd: char: Get rid of Big MTD Lock

2021-02-17 Thread Alexander A Sverdlin
From: Alexander Sverdlin Get rid of central chrdev MTD lock, which prevents simultaneous operations on completely independent physical MTD chips. Replace it with newly introduced per-master mutex. Signed-off-by: Alexander Sverdlin --- drivers/mtd/mtdchar.c | 14 -- drivers/mtd/mt

[PATCH 1/2] mtd: char: Drop mtd_mutex usage from mtdchar_open()

2021-02-17 Thread Alexander A Sverdlin
From: Alexander Sverdlin It looks unnecessary in the function, remove it from the function having in mind to remove it completely. Signed-off-by: Alexander Sverdlin --- drivers/mtd/mtdchar.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/mtd/mtdchar.c b/

[PATCH 2/6] MIPS: Implement atomic_cmpxchg_relaxed()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin This will save one SYNCW on Octeon and improve tight uncontended spinlock loop performance by 17%. Signed-off-by: Alexander Sverdlin --- arch/mips/include/asm/atomic.h | 3 +++ arch/mips/include/asm/cmpxchg.h | 2 ++ 2 files changed, 5 insertions(+) diff --git a/arch

[PATCH 4/6] MIPS: Octeon: qspinlock: Exclude mmiowb()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin On Octeon mmiowb() is SYNCW, which is already contained in smp_store_release(). Removing superfluous barrier brings around 10% performance on uncontended tight spinlock loops. Signed-off-by: Alexander Sverdlin --- arch/mips/include/asm/spinlock.h | 2 ++ 1 file changed

[PATCH 0/6] MIPS: qspinlock: Try to reduce reduce the spinlock regression

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin The switch to qspinlock brought a massive regression in spinlocks on Octeon. Even after applying this series (and a patch in the ARCH-independent code [1]) tight contended (6 cores, 1 thread per core) spinlock loop is still 50% slower as previous ticket-based implementati

[PATCH 6/6] MIPS: cmpxchg: Use cmpxchg_local() for {cmp_}xchg_small()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin It makes no sense to fold smp_mb__before_llsc()/smp_llsc_mb() again and again, leave only one barrier pair in the outer function. This removes one SYNCW from __xchg_small() and brings around 10% performance improvement in a tight spinlock loop with 6 threads on a 6 core

[PATCH 1/6] MIPS: Octeon: Implement __smp_store_release()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW only. This brings around 10% performance on tight uncontended spinlock loops. Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link below. On 6-core Octeon machine: sysbench --test

[PATCH 5/6] MIPS: Provide {atomic_}xchg_relaxed()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin This has the effect of removing one redundant SYNCW from queued_spin_lock_slowpath() on Octeon. Signed-off-by: Alexander Sverdlin --- arch/mips/include/asm/atomic.h | 2 ++ arch/mips/include/asm/cmpxchg.h | 4 2 files changed, 6 insertions(+) diff --git a/arch/m

[PATCH 3/6] MIPS: Octeon: qspinlock: Flush write buffer

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Flushing the write buffer brings aroung 10% performace on the tight uncontended spinlock loops on Octeon. Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks."). Signed-off-by: Alexander Sverdlin --- arch/mips/include/asm/spinlock.h | 3 +++ 1 file changed, 3 inser

[PATCH 2/2] ARM: mcs_spinlock: Drop smp_wmb in arch_mcs_spin_lock_contended()

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Drop smp_wmb in arch_mcs_spin_lock_contended() after adding in into ARCH-independent code. Signed-off-by: Alexander Sverdlin --- arch/arm/include/asm/mcs_spinlock.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/arm/include/asm/mcs_spinlock.h b/arch/arm/inc

[PATCH 1/2] qspinlock: Ensure writes are pushed out of core write buffer

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Ensure writes are pushed out of core write buffer to prevent waiting code on another cores from spinning longer than necessary. 6 threads running tight spinlock loop competing for the same lock on 6 cores on MIPS/Octeon do 100 iterations... before the patch in:4

[PATCH v7 2/2] ARM: ftrace: Add MODULE_PLTS support

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Teach ftrace_make_call() and ftrace_make_nop() about PLTs. Teach PLT code about FTRACE and all its callbacks. Otherwise the following might happen: [ cut here ] WARNING: CPU: 14 PID: 2265 at .../arch/arm/kernel/insn.c:14 __arm_gen_branch+0x83/0x8

[PATCH v7 0/2] ARM: Implement MODULE_PLT support in FTRACE

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin FTRACE's function tracer currently doesn't always work on ARM with MODULE_PLT option enabled. If the module is loaded too far, FTRACE's code modifier cannot cope with introduced veneers and turns the function tracer off globally. ARM64 already has a solution for the prob

[PATCH v7 1/2] ARM: PLT: Move struct plt_entries definition to header

2021-01-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin No functional change, later it will be re-used in several files. Signed-off-by: Alexander Sverdlin --- arch/arm/include/asm/module.h | 9 + arch/arm/kernel/module-plts.c | 9 - 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm/incl

[PATCH 2/2] MIPS: OCTEON: Don't add kernel sections into memblock allocator

2020-12-03 Thread Alexander A Sverdlin
From: Alexander Sverdlin Because check_kernel_sections_mem() does exactly this for all platforms. Signed-off-by: Alexander Sverdlin --- arch/mips/cavium-octeon/setup.c | 9 - 1 file changed, 9 deletions(-) diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c

[PATCH 1/2] MIPS: Don't round up kernel sections size for memblock_add()

2020-12-03 Thread Alexander A Sverdlin
From: Alexander Sverdlin Linux doesn't own the memory immediately after the kernel image. On Octeon bootloader places a shared structure right close after the kernel _end, refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. If check_kernel_sections_mem() rounds the PFNs up

[PATCH] MIPS: Octeon: irq: Alloc desc before configuring IRQ

2020-11-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Allocate the IRQ descriptors where necessary before configuring them via irq_set_chip_and_handler(). Fixes the following soft lockup: watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [modprobe:72] Modules linked in: irq event stamp: 33288 hardirqs last enabled at (3328

[PATCH] tty: serial: uartlite: Support probe deferral

2020-11-27 Thread Alexander A Sverdlin
From: Alexander Sverdlin Give uartlite a chance to be probed when IRQ controller will be finally available and return potential -EPROBE_DEFER as-is. The condition "<=" has been changed to "<" to follow the recommendation in the header of platform_get_irq(). Signed-off-by: Alexander Sverdlin ---

[PATCH] MIPS: reserve the memblock right after the kernel

2020-11-06 Thread Alexander A Sverdlin
From: Alexander Sverdlin Linux doesn't own the memory immediately after the kernel image. On Octeon bootloader places a shared structure right close after the kernel _end, refer to "struct cvmx_bootinfo *octeon_bootinfo" in cavium-octeon/setup.c. If check_kernel_sections_mem() rounds the PFNs up

[PATCH] mtd: spi-nor: Don't copy self-pointing struct around

2020-10-05 Thread Alexander A Sverdlin
From: Alexander Sverdlin spi_nor_parse_sfdp() modifies the passed structure so that it points to itself (params.erase_map.regions to params.erase_map.uniform_region). This makes it impossible to copy the local struct anywhere else. Therefore only use memcpy() in backup-restore scenario. The bug

[PATCH] net: octeon: mgmt: Repair filling of RX ring

2020-05-29 Thread Alexander A Sverdlin
From: Alexander Sverdlin The removal of mips_swiotlb_ops exposed a problem in octeon_mgmt Ethernet driver. mips_swiotlb_ops had an mb() after most of the operations and the removal of the ops had broken the receive functionality of the driver. My code inspection has shown no other places except o

[PATCH] macvlan: Skip loopback packets in RX handler

2020-05-26 Thread Alexander A Sverdlin
From: Alexander Sverdlin Ignore loopback-originatig packets soon enough and don't try to process L2 header where it doesn't exist. The very similar br_handle_frame() in bridge code performs exactly the same check. This is an example of such ICMPv6 packet: skb len=96 headroom=40 headlen=96 tailr