On 11/01/2017 06:01 PM, Boris Ostrovsky wrote:
> On 11/01/2017 04:58 PM, Waiman Long wrote:
>> +/* TODO: To be removed in a future kernel version */
>> static __init int xen_parse_nopvspin(char *arg)
>> {
>> -xen_pvspin = false;
>> +pr_warn("xen_n
h 2 deprecates Xen's xen_nopvspin parameter as it is no longer
needed.
Waiman Long (2):
x86/paravirt: Add kernel parameter to choose paravirt lock type
x86/xen: Deprecate xen_nopvspin
Documentation/admin-guide/kernel-parameters.txt | 11 ---
arch/x86/include/asm/paravirt.h
this new parameter
in determining if pvqspinlock should be used. The parameter, however,
will override Xen's xen_nopvspin in term of disabling unfair lock.
Signed-off-by: Waiman Long
---
Documentation/admin-guide/kernel-parameters.txt | 7 +
arch/x86/include/asm/paravirt.h
With the new pvlock_type kernel parameter, xen_nopvspin is no longer
needed. This patch deprecates the xen_nopvspin parameter by removing
its documentation and treating it as an alias of "pvlock_type=queued".
Signed-off-by: Waiman Long
---
Documentation/admin-guide/kernel-parameter
On 11/01/2017 03:01 PM, Boris Ostrovsky wrote:
> On 11/01/2017 12:28 PM, Waiman Long wrote:
>> On 11/01/2017 11:51 AM, Juergen Gross wrote:
>>> On 01/11/17 16:32, Waiman Long wrote:
>>>> Currently, there are 3 different lock types that can be chosen
On 11/01/2017 11:51 AM, Juergen Gross wrote:
> On 01/11/17 16:32, Waiman Long wrote:
>> Currently, there are 3 different lock types that can be chosen for
>> the x86 architecture:
>>
>> - qspinlock
>> - pvqspinlock
>> - unfair lock
>>
>> One
this new parameter
in determining if pvqspinlock should be used. The parameter, however,
will override Xen's xen_nopvspin in term of disabling unfair lock.
Signed-off-by: Waiman Long
---
Documentation/admin-guide/kernel-parameters.txt | 7 +
arch/x86/include/asm/paravirt.h
t can decide whether to use paravitualized
>> spinlocks, the current fallback to the unfair test-and-set scheme, or
>> to mimic the bare metal behavior.
>>
>> V3:
>> - remove test for hypervisor environment from virt_spin_lock(9 as
>> suggested by Waiman Long
>
On 09/06/2017 12:04 PM, Peter Zijlstra wrote:
> On Wed, Sep 06, 2017 at 11:49:49AM -0400, Waiman Long wrote:
>>> #define virt_spin_lock virt_spin_lock
>>> static inline bool virt_spin_lock(struct qspinlock *lock)
>>> {
>>> + if (!s
)
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index 294294c71ba4..838d235b87ef 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -76,6 +76,10 @@
> #define MAX_NODES4
> #endif
>
> +#ifdef CONFIG_PARAVIRT
> +DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
> +#endif
> +
> /*
> * Per-CPU queue node structures; we can never have more than 4 nested
> * contexts: task, softirq, hardirq, nmi.
Acked-by: Waiman Long
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
)
>
> if (!xen_pvspin) {
> printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
> + static_branch_disable(&virt_spin_lock_key);
> return;
> }
> printk(KERN_DEBUG "xen: PV spinlocks enabled\n"
On 09/06/2017 09:06 AM, Peter Zijlstra wrote:
> On Wed, Sep 06, 2017 at 08:44:09AM -0400, Waiman Long wrote:
>> On 09/06/2017 03:08 AM, Peter Zijlstra wrote:
>>> Guys, please trim email.
>>>
>>> On Tue, Sep 05, 2017 at 10:31:46AM -0400, Waiman Long wrote:
&g
On 09/06/2017 03:08 AM, Peter Zijlstra wrote:
> Guys, please trim email.
>
> On Tue, Sep 05, 2017 at 10:31:46AM -0400, Waiman Long wrote:
>> For clarification, I was actually asking if you consider just adding one
>> more jump label to skip it for Xen/KVM instead of making
>
On 09/05/2017 10:24 AM, Waiman Long wrote:
> On 09/05/2017 10:18 AM, Juergen Gross wrote:
>> On 05/09/17 16:10, Waiman Long wrote:
>>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>>>> There are cases where a guest tries to switch spinlocks to bare metal
On 09/05/2017 10:18 AM, Juergen Gross wrote:
> On 05/09/17 16:10, Waiman Long wrote:
>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>>> There are cases where a guest tries to switch spinlocks to bare metal
>>> behavior (e.g. by setting "xen_nopvspin" bo
On 09/05/2017 10:08 AM, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 10:02:57AM -0400, Waiman Long wrote:
>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>>> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
>>> +{
>>> + if (!s
On 09/05/2017 09:24 AM, Juergen Gross wrote:
> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
> has the downside of falling back to unfair test and set scheme for
> qspinlocks due to virt_spin_lock() detec
On 09/05/2017 09:24 AM, Juergen Gross wrote:
> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
> has the downside of falling back to unfair test and set scheme for
> qspinlocks due to virt_spin_lock() detec
intends to reduce this performance
overhead by replacing the C __kvm_vcpu_is_preempted() function by
an optimized version of __raw_callee_save___kvm_vcpu_is_preempted()
written in assembly.
Waiman Long (2):
x86/paravirt: Change vcp_is_preempted() arg type to long
x86/kvm: Provide opt
__raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
Suggested-by: Peter Zijlstra
Signed-off-by: Waiman Long
---
arch/x86/kernel/asm-offsets_64.c | 9 +
arch/x86/kernel/kvm.c| 24
2 files changed, 33 insertions(+)
diff --git a/arch/x86
number won't exceed
32 bits.
Signed-off-by: Waiman Long
---
arch/x86/include/asm/paravirt.h | 2 +-
arch/x86/include/asm/qspinlock.h | 2 +-
arch/x86/kernel/kvm.c| 2 +-
arch/x86/kernel/paravirt-spinlocks.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
On 02/16/2017 11:09 AM, Peter Zijlstra wrote:
> On Wed, Feb 15, 2017 at 04:37:49PM -0500, Waiman Long wrote:
>> The cpu argument in the function prototype of vcpu_is_preempted()
>> is changed from int to long. That makes it easier to provide a better
>> optimized assembly ver
On 02/16/2017 11:48 AM, Peter Zijlstra wrote:
> On Wed, Feb 15, 2017 at 04:37:50PM -0500, Waiman Long wrote:
>> +/*
>> + * Hand-optimize version for x86-64 to avoid 8 64-bit register saving and
>> + * restoring to/from the stack. It is assumed that the preempted value
>>
reempted()
can have some impact on system performance on a VM guest, especially
of x86-64 guest, this patch set intends to reduce this performance
overhead by replacing the C __kvm_vcpu_is_preempted() function by
an optimized version of __raw_callee_save___kvm_vcpu_is_preempted()
written in assembly
number won't exceed
32 bits.
Signed-off-by: Waiman Long
---
arch/x86/include/asm/paravirt.h | 2 +-
arch/x86/include/asm/qspinlock.h | 2 +-
arch/x86/kernel/kvm.c| 2 +-
arch/x86/kernel/paravirt-spinlocks.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
__raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
Suggested-by: Peter Zijlstra
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 30 ++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 85ed343
performance on a VM guest, especially
of x86-64 guest, this patch set intends to reduce this performance
overhead by replacing the C __kvm_vcpu_is_preempted() function by
an optimized version of __raw_callee_save___kvm_vcpu_is_preempted()
written in assembly.
Waiman Long (2):
x86/parav
__raw_callee_save___kvm_vcpu_is_preempt() to verify that they matched.
Suggested-by: Peter Zijlstra
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 28
1 file changed, 28 insertions(+)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 85ed343
number won't exceed
32 bits.
Signed-off-by: Waiman Long
---
arch/x86/include/asm/paravirt.h | 2 +-
arch/x86/include/asm/qspinlock.h | 2 +-
arch/x86/kernel/kvm.c| 2 +-
arch/x86/kernel/paravirt-spinlocks.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
On 02/14/2017 04:39 AM, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 05:34:01PM -0500, Waiman Long wrote:
>> It is the address of &steal_time that will exceed the 32-bit limit.
> That seems extremely unlikely. That would mean we have more than 4G
> worth of per-cpu variab
On 02/13/2017 04:52 PM, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 03:12:45PM -0500, Waiman Long wrote:
>> On 02/13/2017 02:42 PM, Waiman Long wrote:
>>> On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
>>>> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra
On 02/13/2017 03:06 PM, h...@zytor.com wrote:
> On February 13, 2017 2:53:43 AM PST, Peter Zijlstra
> wrote:
>> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>>> That way we'd end up with something like:
>>>
>>> asm("
>>> push %rdi;
>>> movslq %edi, %rdi;
>>> movq __per_cpu_offs
On 02/13/2017 02:42 PM, Waiman Long wrote:
> On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
>> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>>> That way we'd end up with something like:
>>>
>>> asm("
>>> push %rdi;
>>
On 02/13/2017 05:53 AM, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 11:47:16AM +0100, Peter Zijlstra wrote:
>> That way we'd end up with something like:
>>
>> asm("
>> push %rdi;
>> movslq %edi, %rdi;
>> movq __per_cpu_offset(,%rdi,8), %rax;
>> cmpb $0, %[offset](%rax);
>> setne %al;
>> pop %rd
On 02/13/2017 05:47 AM, Peter Zijlstra wrote:
> On Fri, Feb 10, 2017 at 12:00:43PM -0500, Waiman Long wrote:
>
>>>> +asm(
>>>> +".pushsection .text;"
>>>> +".global __raw_callee_save___kvm_vcpu_is_preempted;"
>
On 02/10/2017 11:35 AM, Waiman Long wrote:
> On 02/10/2017 11:19 AM, Peter Zijlstra wrote:
>> On Fri, Feb 10, 2017 at 10:43:09AM -0500, Waiman Long wrote:
>>> It was found when running fio sequential write test with a XFS ramdisk
>>> on a VM running on a 2-socket x86-6
On 02/10/2017 11:19 AM, Peter Zijlstra wrote:
> On Fri, Feb 10, 2017 at 10:43:09AM -0500, Waiman Long wrote:
>> It was found when running fio sequential write test with a XFS ramdisk
>> on a VM running on a 2-socket x86-64 system, the %CPU times as reported
>> by perf were as
fio [k] osq_lock
10.14% 10.14% fio [k] __kvm_vcpu_is_preempted
On bare metal, the patch doesn't introduce any performance
regression. On KVM guest, it produces noticeable performance
improvement (up to 7%).
Signed-off-by: Waiman Long
---
v1->v2:
- Rerun the fio test on a differe
On 02/08/2017 02:05 PM, Peter Zijlstra wrote:
> On Wed, Feb 08, 2017 at 01:00:24PM -0500, Waiman Long wrote:
>> It was found when running fio sequential write test with a XFS ramdisk
>> on a 2-socket x86-64 system, the %CPU times as reported by perf were
>> as follows:
>
On 02/08/2017 02:05 PM, Peter Zijlstra wrote:
> On Wed, Feb 08, 2017 at 01:00:25PM -0500, Waiman Long wrote:
>> As the vcpu_is_preempted() call is pretty costly compared with other
>> checks within mutex_spin_on_owner() and rwsem_spin_on_owner(), they
>> are done at a red
As the vcpu_is_preempted() call is pretty costly compared with other
checks within mutex_spin_on_owner() and rwsem_spin_on_owner(), they
are done at a reduce frequency of once every 256 iterations.
Signed-off-by: Waiman Long
---
kernel/locking/mutex.c | 5 -
kernel/locking/rwsem-xadd.c
and rwsem slowpaths, there isn't much to gain by making
it callee-save. So it is now changed to a normal function call instead.
With this patch applied, the aggregrate bandwidth of the fio sequential
write test increased slightly from 2563.3MB/s to 2588.1MB/s (about 1%).
Signed-off-by: W
jump() are also removed.
A simple build and boot test was done to verify it.
Signed-off-by: Waiman Long
---
v1->v2:
- Remove init functions kvm_spinlock_init_jump() and
xen_init_spinlocks_jump().
arch/x86/include/asm/spinlock.h | 3 ---
arch/x86/kernel/kvm.c
This is a follow-up of commit cfd8983f03c7b2 ("x86, locking/spinlocks:
Remove ticket (spin)lock implementation"). The static_key structure
paravirt_ticketlocks_enabled is now removed as it is no longer used.
A simple build and boot test was done to verify it.
Signed-off-by: W
On 05/04/2015 10:05 AM, Peter Zijlstra wrote:
On Thu, Apr 30, 2015 at 02:49:26PM -0400, Waiman Long wrote:
On 04/29/2015 02:11 PM, Peter Zijlstra wrote:
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote:
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed
27;/'kick' order for both node and head.
In any case, like I just wrote on the other email, I've stuck some
things in my queue (up to and including patch 11) and if it all works
out we can continue from there.
---
Subject: pvqspinlock: Implement simple paravirt support for the qspin
On 04/29/2015 02:27 PM, Linus Torvalds wrote:
On Wed, Apr 29, 2015 at 11:11 AM, Peter Zijlstra wrote:
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote:
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed even if the other CPU is not even close to being
On 04/29/2015 02:11 PM, Peter Zijlstra wrote:
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote:
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed even if the other CPU is not even close to being halted. This
extra cmpxchg can harm slowpath performance
linear feedback shift register.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock.c | 68 +++-
kernel/locking/qspinlock_paravirt.h | 324 +++
2 files changed, 391 insertions(+), 1 deletions(-)
create mode 100644 kernel/locking
From: Peter Zijlstra (Intel)
When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
arch/x86/include/asm/qspinlock.h
fs under the
pv-qspinlock directory.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock_paravirt.h | 100 ++-
1 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/qspinlock_paravirt.h
b/kernel/locking/qspinlock_paravirt.h
index
()
so as to do the pv_kick() only if it is really necessary.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock.c | 10 ++--
kernel/locking/qspinlock_paravirt.h | 76 +-
2 files changed, 61 insertions(+), 25 deletions(-)
diff --git a/kernel/locking
From: David Vrabel
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: David Vrabel
Signed-off-by: Waiman Long
---
arch/x86/xen/spinlock.c | 64
is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.
All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
include/asm-gene
locked bit
into a new clear_pending_set_locked() function.
This patch also simplifies the trylock operation before queuing by
calling queue_spin_trylock() directly.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
include/asm-generic/qspinlock_types.h |2 +
kernel
lock is acquired, the queue node can be released to
be used later.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
include/asm-generic/qspinlock.h | 132 +
include/asm-generic/qspinlock_types.h | 58 +
kernel/Kconfig.locks
k: Add pending bit
qspinlock: Optimize for smaller NR_CPUS
qspinlock: Revert to test-and-set on hypervisors
pvqspinlock, x86: Implement the paravirt qspinlock call patching
Waiman Long (9):
qspinlock: A simple generic 4-byte queue spinlock
qspinlock, x86: Enable x86-64 to use queue spinl
significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
arch/x86/Kconfig |2 +-
arch/x86/include/asm/paravirt.h | 29 +++
optimization which will make the queue spinlock code perform
better than the generic implementation.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
arch/x86/Kconfig |1 +
arch/x86/include/asm/qspinlock.h | 20
arch/x86/include
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 43 +++
kernel/Kconfig.locks |2 +-
2 files
imeUsr Time
-- -
ticketlock 2075 10.00 216.35 3.49
qspinlock 3023 10.00 198.20 4.80
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
kernel/locking/qsp
without using atomic op.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock_paravirt.h | 28 +---
1 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/kernel/locking/qspinlock_paravirt.h
b/kernel/locking/qspinlock_paravirt.h
index 9b4ac3d..41ee033 100644
r Zijlstra (Intel)
Signed-off-by: Waiman Long
---
include/asm-generic/qspinlock_types.h | 12 +++-
kernel/locking/qspinlock.c| 119 +++--
2 files changed, 107 insertions(+), 24 deletions(-)
diff --git a/include/asm-generic/qspinlock_types.h
b/include/a
On 04/13/2015 11:09 AM, Peter Zijlstra wrote:
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
+__visible void __pv_queue_spin_unlock(struct qspinlock *lock)
+{
+ struct __qspinlock *l = (void *)lock;
+ struct pv_node *node;
+
+ if (likely(cmpxchg(&l->
On 04/13/2015 11:08 AM, Peter Zijlstra wrote:
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
+static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
+{
+ struct __qspinlock *l = (void *)lock;
+ struct qspinlock **lp = NULL;
+ struct pv_node
On 04/13/2015 10:47 AM, Peter Zijlstra wrote:
On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
+void __init __pv_init_lock_hash(void)
+{
+ int pv_hash_size = 4 * num_possible_cpus();
+
+ if (pv_hash_size< (1U<< LFSR_MIN_BITS))
+ pv_hash_s
On 04/09/2015 02:23 PM, Peter Zijlstra wrote:
On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote:
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote:
+#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket))
+static struct qspinlock **pv_hash(struct
mance benefit of qspinlock versus
ticket spinlock which got reduced in VM3 due to the overhead of
constant vCPUs halting and kicking.
Signed-off-by: Waiman Long
---
arch/x86/include/asm/qspinlock.h | 15 +--
kernel/locking/qspinlock.c| 94 +--
kernel/locking/qspinlock_unf
On 04/08/2015 08:01 AM, David Vrabel wrote:
On 07/04/15 03:55, Waiman Long wrote:
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
This basically looks the same as the version I wrote, except I
locked bit
into a new clear_pending_set_locked() function.
This patch also simplifies the trylock operation before queuing by
calling queue_spin_trylock() directly.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
include/asm-generic/qspinlock_types.h |2 +
kernel
From: Peter Zijlstra (Intel)
When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
arch/x86/include/asm/qspinlock.h
r Zijlstra (Intel)
Signed-off-by: Waiman Long
---
include/asm-generic/qspinlock_types.h | 12 +++-
kernel/locking/qspinlock.c| 119 +++--
2 files changed, 107 insertions(+), 24 deletions(-)
diff --git a/include/asm-generic/qspinlock_types.h
b/include/a
optimization which will make the queue spinlock code perform
better than the generic implementation.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
arch/x86/Kconfig |1 +
arch/x86/include/asm/qspinlock.h | 20
arch/x86/include
imeUsr Time
-- -
ticketlock 2075 10.00 216.35 3.49
qspinlock 3023 10.00 198.20 4.80
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
kernel/locking/qsp
lock is acquired, the queue node can be released to
be used later.
Signed-off-by: Waiman Long
Signed-off-by: Peter Zijlstra (Intel)
---
include/asm-generic/qspinlock.h | 132 +
include/asm-generic/qspinlock_types.h | 58 +
kernel/Kconfig.locks
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 43 +++
kernel/Kconfig.locks |2 +-
2 files
is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.
All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
include/asm-gene
vCPU state (vcpu_hashed) which enables the code
to delay CPU kicking until at unlock time. Once this state is set,
the new lock holder will set _Q_SLOW_VAL and fill in the hash table
on behalf of the halted queue head vCPU.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock.c
ntel) (4):
qspinlock: Add pending bit
qspinlock: Optimize for smaller NR_CPUS
qspinlock: Revert to test-and-set on hypervisors
pvqspinlock: Implement the paravirt qspinlock for x86
Waiman Long (11):
qspinlock: A simple generic 4-byte queue spinlock
qspinlock, x86: Enable x86-64 to use queue
without using atomic op.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock_paravirt.h | 28 +---
1 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/kernel/locking/qspinlock_paravirt.h
b/kernel/locking/qspinlock_paravirt.h
index a210061..a9fe10d 100644
significantly lowers the overhead of having
CONFIG_PARAVIRT_SPINLOCKS enabled, even for native code.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
arch/x86/Kconfig |2 +-
arch/x86/include/asm/paravirt.h | 28 +++-
to do that which
will only be enabled if CONFIG_DEBUG_SPINLOCK is defined because of
the performance overhead it introduces.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock_paravirt.h | 58 +++
1 files changed, 58 insertions(+), 0 deletions(-)
diff --
value 0 in a somewhat random fashion depending
on the LFSR taps that is being used. Callers can provide their own
taps value or use the default.
Signed-off-by: Waiman Long
---
include/linux/lfsr.h | 80 ++
1 files changed, 80 insertions(+), 0
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: Waiman Long
---
arch/x86/xen/spinlock.c | 63 ---
kernel/Kconfig.locks|2
linear feedback shift register.
Signed-off-by: Waiman Long
---
kernel/locking/qspinlock.c | 69 -
kernel/locking/qspinlock_paravirt.h | 321 +++
2 files changed, 389 insertions(+), 1 deletions(-)
create mode 100644 kernel/locking
On 04/02/2015 03:48 PM, Peter Zijlstra wrote:
On Thu, Apr 02, 2015 at 07:20:57PM +0200, Peter Zijlstra wrote:
pv_wait_head():
pv_hash()
/* MB as per cmpxchg */
cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
VS
__pv_queue_spin_unlock():
if (xchg(&l->locked, 0
On 04/01/2015 05:03 PM, Peter Zijlstra wrote:
On Wed, Apr 01, 2015 at 03:58:58PM -0400, Waiman Long wrote:
On 04/01/2015 02:48 PM, Peter Zijlstra wrote:
I am sorry that I don't quite get what you mean here. My point is that in
the hashing step, a cpu will need to scan an empty bucket to pu
On 04/01/2015 01:12 PM, Peter Zijlstra wrote:
On Wed, Apr 01, 2015 at 12:20:30PM -0400, Waiman Long wrote:
After more careful reading, I think the assumption that the presence of an
unused bucket means there is no match is not true. Consider the scenario:
1. cpu 0 puts lock1 into hb[0]
2. cpu
On 04/01/2015 02:48 PM, Peter Zijlstra wrote:
On Wed, Apr 01, 2015 at 02:54:45PM -0400, Waiman Long wrote:
On 04/01/2015 02:17 PM, Peter Zijlstra wrote:
On Wed, Apr 01, 2015 at 07:42:39PM +0200, Peter Zijlstra wrote:
Hohumm.. time to think more I think ;-)
So bear with me, I've not r
On 04/01/2015 02:17 PM, Peter Zijlstra wrote:
On Wed, Apr 01, 2015 at 07:42:39PM +0200, Peter Zijlstra wrote:
Hohumm.. time to think more I think ;-)
So bear with me, I've not really pondered this well so it could be full
of holes (again).
After the cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_V
On 03/19/2015 08:25 AM, Peter Zijlstra wrote:
On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:
So I was now thinking of hashing the lock pointer; let me go and quickly
put something together.
A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda b
On 03/30/2015 12:29 PM, Peter Zijlstra wrote:
On Mon, Mar 30, 2015 at 12:25:12PM -0400, Waiman Long wrote:
I did it differently in my PV portion of the qspinlock patch. Instead of
just waking up the CPU, the new lock holder will check if the new queue head
has been halted. If so, it will set
On 03/27/2015 10:07 AM, Konrad Rzeszutek Wilk wrote:
On Thu, Mar 26, 2015 at 09:21:53PM +0100, Peter Zijlstra wrote:
On Wed, Mar 25, 2015 at 03:47:39PM -0400, Konrad Rzeszutek Wilk wrote:
Ah nice. That could be spun out as a seperate patch to optimize the existing
ticket locks I presume.
Yes I
On 03/25/2015 03:47 PM, Konrad Rzeszutek Wilk wrote:
On Mon, Mar 16, 2015 at 02:16:13PM +0100, Peter Zijlstra wrote:
Hi Waiman,
As promised; here is the paravirt stuff I did during the trip to BOS last week.
All the !paravirt patches are more or less the same as before (the only real
change is
On 03/19/2015 08:25 AM, Peter Zijlstra wrote:
On Thu, Mar 19, 2015 at 11:12:42AM +0100, Peter Zijlstra wrote:
So I was now thinking of hashing the lock pointer; let me go and quickly
put something together.
A little something like so; ideally we'd allocate the hashtable since
NR_CPUS is kinda b
On 03/19/2015 06:01 AM, Peter Zijlstra wrote:
On Wed, Mar 18, 2015 at 10:45:55PM -0400, Waiman Long wrote:
On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
I do have some concern about this call site patching mechanism as the
modification is not atomic. The spin_unlock() calls are in many places
On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
Implement the paravirt qspinlock for x86-kvm.
We use the regular paravirt call patching to switch between:
native_queue_spin_lock_slowpath()__pv_queue_spin_lock_slowpath()
native_queue_spin_unlock() __pv_queue_spin_unlock()
We u
+extern void __pv_queue_spin_lock_slowpath(struct qspinlock *lock, u32 val);
+extern void __pv_queue_spin_unlock(struct qspinlock *lock);
+
/*
* Initializier
*/
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -18,6 +18,9 @@
* Authors: Waiman Long
* Peter
On 03/16/2015 09:16 AM, Peter Zijlstra wrote:
Hi Waiman,
As promised; here is the paravirt stuff I did during the trip to BOS last week.
All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).
The paravirt stuff is 'simple
obably caused
by the fact that contended qspinlock produces much less cacheline
contention than contended ticket spinlock and the test system is an
8-socket server.
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 143 -
kernel/Kconfig.locks |
1 - 100 of 112 matches
Mail list logo