On Thu, 23 Apr 2015 21:27:19 +0200
Thomas Huth wrote:
> On Thu, 23 Apr 2015 17:29:06 +0200
> Greg Kurz wrote:
>
> > The current memory accessors logic is:
> > - little endian if little_endian
> > - native endian (i.e. no byteswap) if !little_endian
> >
> > If we want to fully support cross-end
On Thu, 23 Apr 2015 17:29:42 +0200
Greg Kurz wrote:
> This patch brings cross-endian support to vhost when used to implement
> legacy virtio devices. Since it is a relatively rare situation, the
> feature availability is controlled by a kernel config option (not set
> by default).
>
> The vq->is
On 24/04/2015 03:16, Zhang, Yang Z wrote:
>> This is interesting since previous measurements on KVM have had
>> the exact opposite results. I think we need to understand this a
>> lot more.
>
> What I can tell is that vmexit is heavy. So it is reasonable to see
> the improvement under some case
Paolo Bonzini wrote on 2015-04-24:
>
>
> On 24/04/2015 03:16, Zhang, Yang Z wrote:
>>> This is interesting since previous measurements on KVM have had the
>>> exact opposite results. I think we need to understand this a lot
>>> more.
>>
>> What I can tell is that vmexit is heavy. So it is reaso
On 24/04/2015 09:46, Zhang, Yang Z wrote:
> > On the other hand vmexit is lighter and lighter on newer processors; a
> > Sandy Bridge has less than half the vmexit cost of a Core 2 (IIRC 1000
> > vs. 2500 clock cycles approximately).
>
> 1000 cycles? I remember it takes about 4000 cycle even in
On Fri, 24 Apr 2015 09:04:21 +0200
Cornelia Huck wrote:
> On Thu, 23 Apr 2015 21:27:19 +0200
> Thomas Huth wrote:
>
Thomas's e-mail did not make it to my mailbox... weird. :-\
> > On Thu, 23 Apr 2015 17:29:06 +0200
> > Greg Kurz wrote:
> >
> > > The current memory accessors logic is:
> > > -
On Fri, 24 Apr 2015 09:19:26 +0200
Cornelia Huck wrote:
> On Thu, 23 Apr 2015 17:29:42 +0200
> Greg Kurz wrote:
>
> > This patch brings cross-endian support to vhost when used to implement
> > legacy virtio devices. Since it is a relatively rare situation, the
> > feature availability is contro
On Fri, 24 Apr 2015 10:06:19 +0200
Greg Kurz wrote:
> On Fri, 24 Apr 2015 09:19:26 +0200
> Cornelia Huck wrote:
>
> > On Thu, 23 Apr 2015 17:29:42 +0200
> > Greg Kurz wrote:
> > > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> > > index bb6a5b4..b980b53 100644
> > > --
Paolo Bonzini wrote on 2015-04-24:
>
>
> On 24/04/2015 09:46, Zhang, Yang Z wrote:
>>> On the other hand vmexit is lighter and lighter on newer
>>> processors; a Sandy Bridge has less than half the vmexit cost of a
>>> Core 2 (IIRC
>>> 1000 vs. 2500 clock cycles approximately).
>>
>> 1000 cycles
A couple of small fixes for accessing 32-bit KVM registers on big
endian, and to sign extend struct kvm_regs registers so as to work on
MIPS64 hosts.
James Hogan (2):
mips/kvm: Fix Big endian 32-bit register access
mips/kvm: Sign extend registers written to KVM
target-mips/kvm.c | 21 +++
Fix access to 32-bit registers on big endian targets. The pointer passed
to the kernel must be for the actual 32-bit value, not a temporary
64-bit value, otherwise on big endian systems the kernel will only
interpret the upper half.
Signed-off-by: James Hogan
Cc: Paolo Bonzini
Cc: Leon Alrae
Cc
In case we're running on a 64-bit host, be sure to sign extend the
general purpose registers and hi/lo/pc before writing them to KVM, so as
to take advantage of MIPS32/MIPS64 compatibility.
Signed-off-by: James Hogan
Cc: Paolo Bonzini
Cc: Leon Alrae
Cc: Aurelien Jarno
Cc: kvm@vger.kernel.org
C
Signed-off-by: Greg Kurz
---
drivers/vhost/vhost.h | 17 +++--
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8c1c792..6a49960 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -173,34 +173,39 @@ sta
The current memory accessors logic is:
- little endian if little_endian
- native endian (i.e. no byteswap) if !little_endian
If we want to fully support cross-endian vhost, we also need to be
able to convert to big endian.
Instead of changing the little_endian argument to some 3-value enum, this
Signed-off-by: Greg Kurz
---
drivers/net/macvtap.c |9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 27ecc5c..a2f2958 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -49,14 +49,19 @@ struct macvtap
Signed-off-by: Greg Kurz
---
include/linux/vringh.h | 17 +++--
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index a3fa537..3ed62ef 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -226,33 +226,38 @
Signed-off-by: Greg Kurz
---
drivers/net/tun.c |9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 857dca4..3c3d6c0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -206,14 +206,19 @@ struct tun_struct {
u32
This patch brings cross-endian support to vhost when used to implement
legacy virtio devices. Since it is a relatively rare situation, the
feature availability is controlled by a kernel config option (not set
by default).
The vq->is_le boolean field is added to cache the endianness to be
used for
Signed-off-by: Greg Kurz
---
include/linux/virtio_config.h | 17 +++--
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index ca3ed78..bd1a582 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/v
Only cosmetic and documentation changes since v5.
---
Greg Kurz (8):
virtio: introduce virtio_is_little_endian() helper
tun: add tun_is_little_endian() helper
macvtap: introduce macvtap_is_little_endian() helper
vringh: introduce vringh_is_little_endian() helper
vhos
The VNET_LE flag was introduced to fix accesses to virtio 1.0 headers
that are always little-endian. It can also be used to handle the special
case of a legacy little-endian device implemented by a big-endian host.
Let's add a flag and ioctls for big-endian devices as well. If both flags
are set,
On Fri, Apr 24, 2015 at 02:24:15PM +0200, Greg Kurz wrote:
> Only cosmetic and documentation changes since v5.
>
> ---
Looks sane to me. I plan to review and apply next week.
> Greg Kurz (8):
> virtio: introduce virtio_is_little_endian() helper
> tun: add tun_is_little_endian() helpe
Hi
Please, send any topic that you are interested in covering.
Call details:
By popular demand, a google calendar public entry with it
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
(Let me know if you have any problems wi
In the pv_scan_next() function, the slow cmpxchg atomic operation is
performed even if the other CPU is not even close to being halted. This
extra cmpxchg can harm slowpath performance.
This patch introduces the new mayhalt flag to indicate if the other
spinning CPU is close to being halted or not
Before this patch, a CPU may have been kicked twice before getting
the lock - one before it becomes queue head and once before it gets
the lock. All these CPU kicking and halting (VMEXIT) can be expensive
and slow down system performance, especially in an overcommitted guest.
This patch adds a new
This patch enables the accumulation of PV qspinlock statistics
when either one of the following three sets of CONFIG parameters
are enabled:
1) CONFIG_LOCK_STAT && CONFIG_DEBUG_FS
2) CONFIG_KVM_DEBUG_FS
3) CONFIG_XEN_DEBUG_FS
The accumulated lock statistics will be reported in debugfs under th
From: Peter Zijlstra (Intel)
We use the regular paravirt call patching to switch between:
native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath()
native_queue_spin_unlock()__pv_queue_spin_unlock()
We use a callee saved call for the unlock function which reduces the
From: David Vrabel
This patch adds the necessary Xen specific code to allow Xen to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: David Vrabel
Signed-off-by: Waiman Long
---
arch/x86/xen/spinlock.c | 64 +++
Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to
This patch adds the necessary KVM specific code to allow KVM to
support the CPU halting and kicking operations needed by the queue
spinlock PV code.
Signed-off-by: Waiman Long
---
arch/x86/kernel/kvm.c | 43 +++
kernel/Kconfig.locks |2 +-
2 files c
From: Peter Zijlstra (Intel)
When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.
By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxch
Provide a separate (second) version of the spin_lock_slowpath for
paravirt along with a special unlock path.
The second slowpath is generated by adding a few pv hooks to the
normal slowpath, but where those will compile away for the native
case, they expand into special wait/wake code for the pv v
This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.
1) the logic for the exchange of new and previous tail code words
into a new xchg_tail() function.
2) the logic for clearing the pending bit and setting the loc
From: Peter Zijlstra (Intel)
When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.
Signed-off-by: Peter Zijlstra (Intel)
Signed-off-by: Waiman Long
---
arch/x86/include/asm/qspinlock.h |
This patch introduces a new generic queue spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queue spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much
v15->v16:
- Remove the lfsr patch and use linear probing as lfsr is not really
necessary in most cases.
- Move the paravirt PV_CALLEE_SAVE_REGS_THUNK code to an asm header.
- Add a patch to collect PV qspinlock statistics which also
supersedes the PV lock hash debug patch.
- Add PV qspinl
This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queue spinlock for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queue spinlock is not enabled.
Currently, there is some incompatibi
From: Peter Zijlstra (Intel)
Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.
It is possible so observe the pending bit without the locked bit when
the last owner has ju
On Thu, Apr 23, 2015 at 01:46:55PM +0200, Paolo Bonzini wrote:
> From: Radim Krčmář
>
> The kvmclock spec says that the host will increment a version field to
> an odd number, then update stuff, then increment it to an even number.
> The host is buggy and doesn't do this, and the result is observ
Ping, Paolo, did this slip through the cracks ?
Bandan Das writes:
> This extends the sanity checks done on known common Qemu binary
> paths when the user supplies a QEMU= on the command line
>
> Fixes: b895b967db94937d5b593c51b95eb32d2889a764
>
> Signed-off-by: Bandan Das
> ---
> x86/run | 4
Somehow these GPUs manage not to respond to a PCI bus reset, removing
our primary mechanism for resetting graphics cards. The result is
that these devices typically work well for a single VM boot. If the
VM is rebooted or restarted, the guest driver is not able to init the
card from the dirty sta
Drop unnecessary rdtsc_barrier(), as has been determined empirically,
see 057e6a8c660e95c3f4e7162e00e2fee1fc90c50d for details.
Noticed by Andy Lutomirski.
Improves clock_gettime() by approximately 15% on
Intel i7-3520M @ 2.90GHz.
Signed-off-by: Marcelo Tosatti
diff --git a/arch/x86/include/
On 04/24/2015 09:36 PM, Marcelo Tosatti wrote:
>
> Drop unnecessary rdtsc_barrier(), as has been determined empirically,
> see 057e6a8c660e95c3f4e7162e00e2fee1fc90c50d for details.
>
> Noticed by Andy Lutomirski.
>
> Improves clock_gettime() by approximately 15% on
> Intel i7-3520M @ 2.90GHz.
>
43 matches
Mail list logo