[PATCH] uprobes: Remove redundant spinlock in uprobe_deny_signal

2024-07-31 Thread Liao Chang
Since clearing a bit in thread_info is an atomic operation, the spinlock is redundant and can be removed, reducing lock contention is good for performance. Signed-off-by: Liao Chang --- kernel/events/uprobes.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel

[PATCH] uprobes: Improve scalability by reducing the contention on siglock

2024-08-01 Thread Liao Chang
f which are from 4.5M/s to 6.4M/s and 3.3M/s to 5.1M/s individually. [1] https://lore.kernel.org/all/20240731214256.3588718-1-and...@kernel.org [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com Signed-off-by: Liao Chang --- include/linux/uprobes.h | 1 + ke

Re: [PATCH 3/8] uprobes: protected uprobe lifetime with SRCU

2024-08-01 Thread Liao, Chang
vaddr, &is_swbp); > if (!uprobe) { > if (is_swbp > 0) { > /* No matching uprobe; signal SIGTRAP. */ > @@ -2223,6 +2239,7 @@ static void handle_swbp(struct pt_regs *regs) > */ > instruction_pointer_set(regs, bp_vaddr); > } > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > return; > } > > @@ -2258,12 +2275,12 @@ static void handle_swbp(struct pt_regs *regs) > if (arch_uprobe_skip_sstep(&uprobe->arch, regs)) > goto out; > > - if (!pre_ssout(uprobe, regs, bp_vaddr)) > - return; > + if (pre_ssout(uprobe, regs, bp_vaddr)) > + goto out; > Regardless what pre_ssout() returns, it always reach the label 'out', so the if block is unnecessary. > - /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ > out: > - put_uprobe(uprobe); > + /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */ > + srcu_read_unlock(&uprobes_srcu, srcu_idx); > } > > /* -- BR Liao, Chang

Re: [PATCH 3/8] uprobes: protected uprobe lifetime with SRCU

2024-08-01 Thread Liao, Chang
在 2024/8/2 0:49, Andrii Nakryiko 写道: > On Thu, Aug 1, 2024 at 5:23 AM Liao, Chang wrote: >> >> >> >> 在 2024/8/1 5:42, Andrii Nakryiko 写道: >>> To avoid unnecessarily taking a (brief) refcount on uprobe during >>> breakpoint handling in handle_swbp

Re: [PATCH] uprobes: Improve scalability by reducing the contention on siglock

2024-08-01 Thread Liao, Chang
在 2024/8/1 22:06, Oleg Nesterov 写道: > On 08/01, Liao Chang wrote: >> >> @@ -2276,22 +2277,25 @@ static void handle_singlestep(struct uprobe_task >> *utask, struct pt_regs *regs) >> int err = 0; >> >> uprobe = utask->active_uprobe; &

Re: [PATCH 6/8] perf/uprobe: split uprobe_unregister()

2024-08-01 Thread Liao, Chang
nt_enable(struct trace_event_call *call, > diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > index 73a6b041bcce..928c73cde32e 100644 > --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c > @@ -478,7 +478,8 @@ static void testmod_unregister_uprobe(void) > mutex_lock(&testmod_uprobe_mutex); > > if (uprobe.uprobe) { > - uprobe_unregister(uprobe.uprobe, &uprobe.consumer); > + uprobe_unregister_nosync(uprobe.uprobe, &uprobe.consumer); > + uprobe_unregister_sync(); > uprobe.offset = 0; > uprobe.uprobe = NULL; > } -- BR Liao, Chang

Re: [PATCH 6/8] perf/uprobe: split uprobe_unregister()

2024-08-05 Thread Liao, Chang
在 2024/8/6 4:01, Andrii Nakryiko 写道: > On Fri, Aug 2, 2024 at 8:05 AM Andrii Nakryiko > wrote: >> >> On Thu, Aug 1, 2024 at 7:41 PM Liao, Chang wrote: >>> >>> >>> >>> 在 2024/8/1 5:42, Andrii Nakryiko 写道: >>>> From:

Re: [PATCH] uprobes: Improve scalability by reducing the contention on siglock

2024-08-05 Thread Liao, Chang
在 2024/8/2 17:24, Oleg Nesterov 写道: > On 08/02, Liao, Chang wrote: >> >> >> 在 2024/8/1 22:06, Oleg Nesterov 写道: >>> On 08/01, Liao Chang wrote: >>>> >>>> @@ -2276,22 +2277,25 @@ static void handle_singlestep(struct uprobe_task

Re: [PATCH] uprobes: Improve scalability by reducing the contention on siglock

2024-08-08 Thread Liao, Chang
nal flag when TIF_SIGPENDING is cleared. - handle_singlestep() check the deny signal flag and restore TIF_SIGPENDING if necessary. Does this approach look correct to you,do do you have any other way to implement the "flag"? Thanks. > > Oleg. > > On 08/06, Oleg Nestero

Re: [PATCH 0/8] uprobes: RCU-protected hot path optimizations

2024-08-08 Thread Liao, Chang
e. > [...] >> >> (To clarify. In fact I think that a new TIF_ or even PF_ flag makes more >> sense, >> afaics it can have more users. But I don't think that uprobes can provide >> enough >> justification for that right now) I also face the same choice when Oleg suggested me to add new flag to track the denied flag, due to I haven't encountered scenarios outside of uprobe that would deny signal, so I'm not confident of introduce new TIF_ flag without a fully understanding of potential potential impacts. >> >> Oleg. >> -- BR Liao, Chang

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-08 Thread Liao, Chang
he CC list for broader visibility and potential collaboration. Thanks. 在 2024/7/27 17:44, Liao Chang 写道: > The profiling result of single-thread model of selftests bench reveals > performance bottlenecks in find_uprobe() and caches_clean_inval_pou() on > ARM64. On my local testing mach

Re: [PATCH] uprobes: Improve scalability by reducing the contention on siglock

2024-08-08 Thread Liao, Chang
在 2024/8/8 18:28, Oleg Nesterov 写道: > On 08/08, Liao, Chang wrote: >> >> - pre_ssout() resets the deny signal flag >> >> - uprobe_deny_signal() sets the deny signal flag when TIF_SIGPENDING is >> cleared. >> >> - handle_single

[PATCH v2 1/2] uprobes: Remove redundant spinlock in uprobe_deny_signal()

2024-08-08 Thread Liao Chang
Since clearing a bit in thread_info is an atomic operation, the spinlock is redundant and can be removed, reducing lock contention is good for performance. Acked-by: Oleg Nesterov Signed-off-by: Liao Chang --- kernel/events/uprobes.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel

[PATCH v2 2/2] uprobes: Remove the spinlock within handle_singlestep()

2024-08-08 Thread Liao Chang
rg [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com Signed-off-by: Liao Chang --- include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 8 +--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h

[PATCH v2 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-08-08 Thread Liao Chang
ps://lore.kernel.org/all/20240801082407.1618451-1-liaocha...@huawei.com Liao Chang (2): uprobes: Remove redundant spinlock in uprobe_deny_signal() uprobes: Remove the spinlock within handle_singlestep() include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 10 +- 2 files changed

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-08 Thread Liao, Chang
ing the patch until Andrii's changes are settle down. > > Oleg. > > > On 08/08, Liao, Chang wrote: >> >> Hi Andrii and Oleg. >> >> This patch sent by me two weeks ago also aim to optimize the performance of >> uprobe >> on arm64. I

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-09 Thread Liao, Chang
在 2024/8/9 2:26, Andrii Nakryiko 写道: > On Thu, Aug 8, 2024 at 1:45 AM Liao, Chang wrote: >> >> Hi Andrii and Oleg. >> >> This patch sent by me two weeks ago also aim to optimize the performance of >> uprobe >> on arm64. I notice recent discussion

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-12 Thread Liao, Chang
在 2024/8/9 2:26, Andrii Nakryiko 写道: > On Thu, Aug 8, 2024 at 1:45 AM Liao, Chang wrote: >> >> Hi Andrii and Oleg. >> >> This patch sent by me two weeks ago also aim to optimize the performance of >> uprobe >> on arm64. I notice recent discussion

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-12 Thread Liao, Chang
在 2024/8/10 2:40, Andrii Nakryiko 写道: > On Fri, Aug 9, 2024 at 11:34 AM Andrii Nakryiko > wrote: >> >> On Fri, Aug 9, 2024 at 12:16 AM Liao, Chang wrote: >>> >>> >>> >>> 在 2024/8/9 2:26, Andrii Nakryiko 写道: >>>> On Thu, Aug 8,

Re: [PATCH v2 2/2] uprobes: Remove the spinlock within handle_singlestep()

2024-08-13 Thread Liao, Chang
在 2024/8/12 19:29, Oleg Nesterov 写道: > On 08/09, Liao Chang wrote: >> >> --- a/include/linux/uprobes.h >> +++ b/include/linux/uprobes.h >> @@ -75,6 +75,7 @@ struct uprobe_task { >> >> struct uprobe *active_uprobe; >>

Re: [PATCH v2 1/2] uprobes: Remove redundant spinlock in uprobe_deny_signal()

2024-08-13 Thread Liao, Chang
在 2024/8/12 20:07, Oleg Nesterov 写道: > On 08/09, Liao Chang wrote: >> >> Since clearing a bit in thread_info is an atomic operation, the spinlock >> is redundant and can be removed, reducing lock contention is good for >> performance. > > My ack still

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-13 Thread Liao, Chang
在 2024/8/13 1:49, Andrii Nakryiko 写道: > On Mon, Aug 12, 2024 at 4:11 AM Liao, Chang wrote: >> >> >> >> 在 2024/8/9 2:26, Andrii Nakryiko 写道: >>> On Thu, Aug 8, 2024 at 1:45 AM Liao, Chang wrote: >>>> >>>> Hi Andrii and Oleg. >&g

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-13 Thread Liao, Chang
在 2024/8/13 1:57, Andrii Nakryiko 写道: > On Mon, Aug 12, 2024 at 5:05 AM Liao, Chang wrote: >> >> >> >> 在 2024/8/10 2:40, Andrii Nakryiko 写道: >>> On Fri, Aug 9, 2024 at 11:34 AM Andrii Nakryiko >>> wrote: >>>> >>>> On Fri,

[PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-08-14 Thread Liao Chang
and 'ret' variants has been significantly reduced. Due to the emulation of 'push' instruction needs to access userspace memory, it spent more cycles than the other. [0] https://lore.kernel.org/all/caef4bzao4eg6hr2hzxypn+7uer4chs0r99zln02ezz5yruv...@mail.gmail.com/ Signed-off-by:

[PATCH v3 1/2] uprobes: Remove redundant spinlock in uprobe_deny_signal()

2024-08-14 Thread Liao Chang
Since clearing a bit in thread_info is an atomic operation, the spinlock is redundant and can be removed, reducing lock contention is good for performance. Acked-by: Oleg Nesterov Signed-off-by: Liao Chang --- kernel/events/uprobes.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel

[PATCH v3 2/2] uprobes: Remove the spinlock within handle_singlestep()

2024-08-14 Thread Liao Chang
rg [2] https://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com Acked-by: Oleg Nesterov Signed-off-by: Liao Chang --- include/linux/uprobes.h | 1 + kernel/events/uprobes.c | 8 +--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/linux/uprobes.h b/in

[PATCH v3 0/2] uprobes: Improve scalability by reducing the contention on siglock

2024-08-14 Thread Liao Chang
s://lore.kernel.org/all/20240727094405.1362496-1-liaocha...@huawei.com [3] https://lore.kernel.org/all/20240801082407.1618451-1-liaocha...@huawei.com Liao Chang (2): uprobes: Remove redundant spinlock in uprobe_deny_signal() uprobes: Remove the spinlock within handle_singlestep() include/linux/u

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-14 Thread Liao, Chang
在 2024/8/15 2:42, Andrii Nakryiko 写道: > On Tue, Aug 13, 2024 at 9:17 PM Liao, Chang wrote: >> >> >> >> 在 2024/8/13 1:49, Andrii Nakryiko 写道: >>> On Mon, Aug 12, 2024 at 4:11 AM Liao, Chang wrote: >>>> >>>> >>>> >&g

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-15 Thread Liao, Chang
在 2024/8/15 0:57, Andrii Nakryiko 写道: > On Tue, Aug 13, 2024 at 9:17 PM Liao, Chang wrote: >> >> >> >> 在 2024/8/13 1:49, Andrii Nakryiko 写道: >>> On Mon, Aug 12, 2024 at 4:11 AM Liao, Chang wrote: >>>> >>>> >>>> >&g

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-08-21 Thread Liao, Chang
Hi, Mark My bad for taking so long to rely, I generally agree with your suggestions to STP emulation. 在 2024/8/15 17:58, Mark Rutland 写道: > On Wed, Aug 14, 2024 at 08:03:56AM +0000, Liao Chang wrote: >> As Andrii pointed out, the uprobe/uretprobe selftest bench run into a >> c

Re: [PATCH] uprobes: Optimize the allocation of insn_slot for performance

2024-08-21 Thread Liao, Chang
在 2024/8/16 0:53, Andrii Nakryiko 写道: > On Wed, Aug 14, 2024 at 7:58 PM Liao, Chang wrote: >> >> >> >> 在 2024/8/15 2:42, Andrii Nakryiko 写道: >>> On Tue, Aug 13, 2024 at 9:17 PM Liao, Chang wrote: >>>> >>>> >>>> >>

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-08-27 Thread Liao, Chang
Hi, Mark Would you like to discuss this patch further, or do you still believe emulating STP to push FP/LR into the stack in kernel is not a good idea? Thanks. 在 2024/8/21 15:55, Liao, Chang 写道: > Hi, Mark > > My bad for taking so long to rely, I generally agree with your suggestions

Re: [PATCH] arm64: insn: Simulate nop and push instruction for better uprobe performance

2024-08-30 Thread Liao, Chang
在 2024/8/30 3:26, Andrii Nakryiko 写道: > On Tue, Aug 27, 2024 at 4:34 AM Liao, Chang wrote: >> >> Hi, Mark >> >> Would you like to discuss this patch further, or do you still believe >> emulating >> STP to push FP/LR into the stack in kernel is not a go

[PATCH] riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe

2021-03-30 Thread Liao Chang
'SIE', and reach __find_get_block where it requires the interrupt must be enabled. Fix this is very trivial, just restore the value of 'sstatus' in pt_regs with backup one at 2) when the instruction being single stepped cause a page fault. Fixes: c22b0bcb1dd02 ("riscv: A

[PATCH v2] riscv/kprobe: Restore local irqflag if kprobe is cancelled

2021-04-16 Thread Liao Chang
'SIE' is masked in this new saved irqflag. After kprobe is serviced, the CPU 'sstatus' is restored with 'SIE' masked. This overwritten 'sstatus' cause BUG_ON() in __find_get_block. This bug is already fixed on arm64 by Jisheng Zhang. Fixes: c22b0bcb1dd