Re: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2023-12-31 Thread Waiman Long
On 12/31/23 16:55, David Laight wrote: per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number. This requires the cpu number be 64bit. However the value is osq_lock() comes from a 32bit xchg() and there isn't a way of telling gcc the high bits are zero (they are) so there will always be an

Re: [PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

2023-12-31 Thread Waiman Long
On 12/31/23 16:54, David Laight wrote: When osq_lock() returns false or osq_unlock() returns static analysis shows that node->next should always be NULL. This means that it isn't necessary to explicitly set it to NULL prior to atomic_xchg(&lock->tail, curr) on extry to osq_lock(). Just in case

Re: [PATCH next v2 3/5] locking/osq_lock: Use node->prev_cpu instead of saving node->prev.

2023-12-31 Thread Waiman Long
On 12/31/23 16:54, David Laight wrote: node->prev is only used to update 'prev' in the unlikely case of concurrent unqueues. This can be replaced by a check for node->prev_cpu changing and then calling decode_cpu() to get the changed 'prev' pointer. node->cpu (or more particularly) prev->cpu is

Re: [PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

2023-12-31 Thread Waiman Long
On 12/31/23 16:52, David Laight wrote: The vcpu_is_preempted() test stops osq_lock() spinning if a virtual cpu is no longer running. Although patched out for bare-metal the code still needs the cpu number. Reading this from 'prev->cpu' is a pretty much guaranteed have a cache miss when osq_unl

Re: [PATCH next v2 1/5] locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.

2023-12-31 Thread Waiman Long
On 12/31/23 16:51, David Laight wrote: Since node->locked cannot be set before the assignment to prev->next it is save to clear it in the slow path. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/locking/

[PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().

2023-12-31 Thread David Laight
per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number. This requires the cpu number be 64bit. However the value is osq_lock() comes from a 32bit xchg() and there isn't a way of telling gcc the high bits are zero (they are) so there will always be an instruction to clear the high bits. The c

[PATCH next v2 4/5] locking/osq_lock: Avoid writing to node->next in the osq_lock() fast path.

2023-12-31 Thread David Laight
When osq_lock() returns false or osq_unlock() returns static analysis shows that node->next should always be NULL. This means that it isn't necessary to explicitly set it to NULL prior to atomic_xchg(&lock->tail, curr) on extry to osq_lock(). Just in case there a non-obvious race condition that ca

[PATCH next v2 3/5] locking/osq_lock: Use node->prev_cpu instead of saving node->prev.

2023-12-31 Thread David Laight
node->prev is only used to update 'prev' in the unlikely case of concurrent unqueues. This can be replaced by a check for node->prev_cpu changing and then calling decode_cpu() to get the changed 'prev' pointer. node->cpu (or more particularly) prev->cpu is only used for the osq_wait_next() call in

[PATCH next v2 2/5] locking/osq_lock: Optimise the vcpu_is_preempted() check.

2023-12-31 Thread David Laight
The vcpu_is_preempted() test stops osq_lock() spinning if a virtual cpu is no longer running. Although patched out for bare-metal the code still needs the cpu number. Reading this from 'prev->cpu' is a pretty much guaranteed have a cache miss when osq_unlock() is waking up the next cpu. Instead s

[PATCH next v2 1/5] locking/osq_lock: Defer clearing node->locked until the slow osq_lock() path.

2023-12-31 Thread David Laight
Since node->locked cannot be set before the assignment to prev->next it is save to clear it in the slow path. Signed-off-by: David Laight --- kernel/locking/osq_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c index

[PATCH next v2 0/5] locking/osq_lock: Optimisations to osq_lock code.

2023-12-31 Thread David Laight
This is an updated series of optimisations to osq_lock.c Patches #1 and #3 from v1 have been applied by Linus. Some of the generated code issues I was getting were caused by CONFIG_DEBUG_PREEMPT being set. No idea why, it isn't any more. Patch #1 is the node->locked part of the old #2. Patch #2 r

[PATCH 3/3] arm64: dts: qcom: qcs404: Use specific compatible for hfpll

2023-12-31 Thread Luca Weiss
Follow the updated bindings and use a QCS404-specific compatible for the HFPLL. Signed-off-by: Luca Weiss --- arch/arm64/boot/dts/qcom/qcs404.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi b/arch/arm64/boot/dts/qcom/qcs404.dtsi inde

[PATCH 2/3] clk: qcom: hfpll: Add QCS404-specific compatible

2023-12-31 Thread Luca Weiss
It doesn't appear that the configuration is for the HFPLL is generic, so add a qcs404-specific compatible and rename the existing struct to qcs404. Signed-off-by: Luca Weiss --- drivers/clk/qcom/hfpll.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/hf

[PATCH 1/3] dt-bindings: clock: qcom,hfpll: Convert to YAML

2023-12-31 Thread Luca Weiss
Convert the .txt documentation to .yaml. Take the liberty to change the compatibles for ipq8064, apq8064, msm8974 and msm8960 to follow the updated naming schema. These compatibles are not used upstream yet. Also add a compatible for QCS404 since that SoC upstream already uses qcom,hfpll compatib

[PATCH 0/3] Convert qcom,hfpll documentation to yaml + related changes

2023-12-31 Thread Luca Weiss
files changed, 87 insertions(+), 66 deletions(-) --- base-commit: 39676dfe52331dba909c617f213fdb21015c8d10 change-id: 20231231-hfpll-yaml-9266f012365c Best regards, -- Luca Weiss

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:59 > > On Sat, 30 Dec 2023 at 12:41, Linus Torvalds > wrote: > > > > UNTESTED patch to just do the "this_cpu_write()" parts attached. > > Again, note how we do end up doing that this_cpu_ptr conversion later > > anyway, but at least it's off the cr

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Linus Torvalds > Sent: 30 December 2023 20:41 > > On Fri, 29 Dec 2023 at 12:57, David Laight wrote: > > > > this_cpu_ptr() is rather more expensive than raw_cpu_read() since > > the latter can use an 'offset from register' (%gs for x86-84). > > > > Add a 'self' field to 'struct optimistic_s

RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data accesses.

2023-12-31 Thread David Laight
From: Waiman Long > Sent: 31 December 2023 03:04 > The presence of debug_smp_processor_id in your compiled code is likely > due to the setting of CONFIG_DEBUG_PREEMPT in your kernel config. > > #ifdef CONFIG_DEBUG_PREEMPT >   extern unsigned int debug_smp_processor_id(void); > # define smp_p