Re: [Qemu-devel] [PATCH V8 0/5] VMXNET3 paravirtual NIC device implementation

2012-12-26 Thread Dmitry Fleytman
Hi Everyone

Is there any progress with these patches?
Is there a chance they will be committed any time soon?

Thanks,
Dmitry


On Fri, Dec 7, 2012 at 1:15 PM, Dmitry Fleytman  wrote:

> This set of patches implements VMWare VMXNET3 paravirtual NIC device.
> The device supports of all the device features including offload
> capabilties,
> VLANs and etc.
> The device is tested on different OSes:
> Fedora 15
> Ubuntu 10.4
> Centos 6.2
> Windows 2008R2
> Windows 2008 64bit
> Windows 2008 32bit
> Windows 2003 64bit
> Windows 2003 32bit
>
> Changes in V8:
>Reported-by: Stefan Hajnoczi 
>Issues reported by Stefan Hajnoczi reviewed and mostly fixed:
>
> > +}
> > +curr_src_off += src[i].iov_len;
> > +}
> > +return j;
> > +}
>
> The existing iov_copy() function provides equivalent functionality.  I
> don't think iov_rebuild() is needed.
>
> [DF] Done. Thanks, missed it.
>
>
> > +size -= len;
> > +}
> > +iovec_off += iov[i].iov_len;
> > +}
> > +return res;
> > +}
>
> Rename this net_checksum_add_iov() and place it in net/checksum.c,
> then the new dependency on net from block can be dropped.
>
> [DF] Done.
>
> > +vmw_shmem_read(hwaddr addr, void *buf, int len)
> >  {
> >  VMW_SHPRN("SHMEM r: %" PRIx64 ", len: %d to %p", addr, len, buf);
> >  cpu_physical_memory_read(addr, buf, len);
> >  }
>
> All changes to this file should be squashed with the previous patch.
>
> [DF] Done
>
> > +#ifdef VMXNET_DEBUG_SHMEM_ACCESS
> > +#define VMW_SHPRN(fmt, ...)
> >  \
> > +do {
> >  \
> > +printf("[%s][SH][%s]: " fmt "\n", VMXNET_DEVICE_NAME, __func__,
> >  \
> > +## __VA_ARGS__);
> >  \
> > +} while (0)
> > +#else
> > +#define VMW_SHPRN(fmt, ...) do {} while (0)
> > +#endif
>
> Please use QEMU tracing.  It eliminates all this boilerplate and
> conditional compilation.  Tracing can be enabled/disabled at runtime
> and works with SystemTap/DTrace.  See docs/tracing.txt.
>
> [DF] We'd like to stick with compile time logic in this case becase of 2
> reasons:
> [DF] 1. These printouts are intended for reverse engineering/development
> only and there is
> [DF]no need to enable them at run time
> [DF] 2. There is a big number of printouts, all driver-device
> communication is traced,
> [DF]they hit performance even on strongest x86 in case of run-time
> logic
>
>
> > +struct eth_header {
> > +uint8_t  h_dest[ETH_ALEN];   /* destination eth addr */
> > +uint8_t  h_source[ETH_ALEN]; /* source ether addr*/
> > +uint16_t h_proto;/* packet type ID field */
> > +};
>
> Looks like it's copy-pasted stuff from /usr/include/linux/if_*.h,
> /usr/include/netinet/*.h, and friends.  If the system-wide headers are
> included names will collide for some of the macros at least.
>
> Did you check if the slirp/ definitions can be reused?
>
> [DF] Yes, you are right. This is copy-pasted from different places.
> [DF] Slips definishing do not fully cover our needs.
>
>
> I'd rather we import network header definitions once in a generic
> place into the source tree.  That way vmxnet and other components
> don't need to redefine these structs.
>
> [DF] Exaclty! Our intention is to create generic header with network
> definitions and make everyone use it.
> [DF] We can move our header to some shared place if you want, however I'd
> do it in parallel with cleanup
> [DF] of similar definitions in existing code and this is a big change that
> os out of scope of these patches.
>
> > +
> >
> *===*/
>
> Is this huge comment box a sign that the code should be split into a
> foo_tx.c and an foo_rx.c file?
>
> [DF] As for me this file is not that big to be splitted (<800 lines),
> however I'll do this if you insist :)
>
> > +size_t vmxnet_tx_pkt_send(VmxnetTxPktH pkt, NetClientState *vc)
>
> 'vc' is an old name that was used for VLANClientState.  The struct has
> since been renamed to NetClientState and the rest of QEMU uses 'nc'
> instead of 'vc'.
>
> [DF] Fixed. Thanks.
>
> > +/* tx module context handle */
> > +typedef void *VmxnetTxPktH;
>
> Forward-declaring the struct is nicer:
>
> typedef struct VmxnetTxPkt VmxnetTxPkt;
>
> The definition of VmxnetTxPkt is still hidden from the caller but you
> avoid the void* and casting.  In vmxnet_pkt.c define using:
>
> struct VmxnetTxPkt {
> ...
> };
>
> [DF] Agree, fixed. Thanks.
>
>
> > diff --git a/hw/vmxnet_utils.h b/hw/vmxnet_utils.h
> > index 7fd9a01..fac3b7b 100644
> > --- a/hw/vmxnet_utils.h
> > +++ b/hw/vmxnet_utils.h
>
> Please squash these fixes into the previous patch.
>
>
> [DF] Oops, my bad. Fixed.
>
>
> > +uint8_t msix_used;
> > +/* Whether MSI support was installed successfully */
> > +uint8_t msi_used;
>
> These two fields should be bool.
>
> > +/* Whether automatic interrupts masking enabled */
> > +uint8_t auto_int_masking

[Qemu-devel] [PATCH] fix include file in hw/loader.h

2012-12-26 Thread Wenchao Xia
  Now header files have gone into includes, so include file path
needs change also to avoid build break.

Signed-off-by: Wenchao Xia 
---
 hw/loader.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/loader.h b/hw/loader.h
index 26480ad..5e61c95 100644
--- a/hw/loader.h
+++ b/hw/loader.h
@@ -1,5 +1,6 @@
 #ifndef LOADER_H
 #define LOADER_H
+#include "qapi/qmp/qdict.h"
 
 /* loader.c */
 int get_image_size(const char *filename);
@@ -30,7 +31,7 @@ int rom_load_all(void);
 void rom_set_fw(void *f);
 int rom_copy(uint8_t *dest, hwaddr addr, size_t size);
 void *rom_ptr(hwaddr addr);
-void do_info_roms(Monitor *mon);
+void do_info_roms(Monitor *mon, const QDict *qdict);
 
 #define rom_add_file_fixed(_f, _a, _i)  \
 rom_add_file(_f, NULL, _a, _i)
-- 
1.7.1





[Qemu-devel] [PULL] pci,virtio

2012-12-26 Thread Michael S. Tsirkin
Included here is v3 virtio typesafety change - no comments
were made to v3 - I made my best to address all comment
and got no response to v3 so I assume it's OK now.

There are more optimizations in my tree but
they are a bit more scary - I'll let them
stay there as I'll be away for a week.

The following changes since commit 27dd7730582be85c7d4f680f5f71146629809c86:

  Merge remote-tracking branch 'bonzini/header-dirs' into staging (2012-12-19 
17:15:39 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony

for you to fetch changes up to 89d62be9f4fb538db7f919a2be7df2544ffc02c5:

  virtio-pci: don't poll masked vectors (2012-12-26 11:49:29 +0200)


pci,virtio

This optimizes MSIX handling in virtio-pci.
Also included is pci express capability bugfix.

Signed-off-by: Michael S. Tsirkin 


Knut Omang (1):
  pcie: Fix bug in pcie_ext_cap_set_next

Michael S. Tsirkin (4):
  virtio: make bindings typesafe
  msi: add API to get notified about pending bit poll
  msix: expose access to masked/pending state
  virtio-pci: don't poll masked vectors

 hw/pci/msix.c|  19 +++--
 hw/pci/msix.h|   6 ++-
 hw/pci/pci.h |   4 ++
 hw/pci/pcie.c|   2 +-
 hw/s390-virtio-bus.c |  24 ---
 hw/vfio_pci.c|   2 +-
 hw/virtio-pci.c  | 112 +++
 hw/virtio.c  |   2 +-
 hw/virtio.h  |  26 ++--
 9 files changed, 136 insertions(+), 61 deletions(-)



[Qemu-devel] [PATCH 0/8] virtio-pci: msix masking optimizations

2012-12-26 Thread Michael S. Tsirkin
This patchset implements two msix masking optimizations.
It works fine for me but I did not have the time to do
performance testing yet so I do not know
whether it helps and which workloads.
Sending out now as I'll be on vacation for a week.
Please review and comment.

Thanks!

Michael S. Tsirkin (8):
  virtio: don't waste irqfds on control vqs
  msix: add api to access msix message
  kvm: add stub for update msi route
  virtio-pci: cache msix messages
  virtio: backend virtqueue notifier masking
  virtio-net: set/clear vhost_started in reverse order
  vhost: set started flag while start is in progress
  vhost: backend masking support

 hw/pci/msix.c   |   2 +-
 hw/pci/msix.h   |   1 +
 hw/vhost.c  | 112 ++-
 hw/vhost.h  |  10 +++
 hw/vhost_net.c  |  27 +++-
 hw/vhost_net.h  |   3 +
 hw/virtio-net.c |  22 +-
 hw/virtio-pci.c | 203 
 hw/virtio-pci.h |   2 +
 hw/virtio.h |  15 -
 kvm-stub.c  |   5 ++
 11 files changed, 350 insertions(+), 52 deletions(-)

-- 
MST



[Qemu-devel] [PATCH 2/8] msix: add api to access msix message

2012-12-26 Thread Michael S. Tsirkin
Will be used by virtio pci.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/msix.c | 2 +-
 hw/pci/msix.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 9eee657..e231a0d 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -27,7 +27,7 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
-static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
+MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 {
 uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
 MSIMessage msg;
diff --git a/hw/pci/msix.h b/hw/pci/msix.h
index d0c4429..e648410 100644
--- a/hw/pci/msix.h
+++ b/hw/pci/msix.h
@@ -5,6 +5,7 @@
 #include "hw/pci/pci.h"
 
 void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg);
+MSIMessage msix_get_message(PCIDevice *dev, unsigned int vector);
 int msix_init(PCIDevice *dev, unsigned short nentries,
   MemoryRegion *table_bar, uint8_t table_bar_nr,
   unsigned table_offset, MemoryRegion *pba_bar,
-- 
MST




[Qemu-devel] [PATCH 4/8] virtio-pci: cache msix messages

2012-12-26 Thread Michael S. Tsirkin
Some guests mask a vector then unmask without changing it.
Store vectors to avoid kvm system calls in this case.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-pci.c | 119 ++--
 hw/virtio-pci.h |   1 +
 2 files changed, 100 insertions(+), 20 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 3c258a4..1d77146 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -487,8 +487,6 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
*proxy,
 unsigned int vector,
 MSIMessage msg)
 {
-VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
-EventNotifier *n = virtio_queue_get_guest_notifier(vq);
 VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
 int ret;
 
@@ -500,18 +498,94 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
*proxy,
 irqfd->virq = ret;
 }
 irqfd->users++;
+return 0;
+}
 
-ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd->virq);
-if (ret < 0) {
-if (--irqfd->users == 0) {
-kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
+ unsigned int vector)
+{
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+if (--irqfd->users == 0) {
+kvm_irqchip_release_virq(kvm_state, irqfd->virq);
+}
+}
+
+static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
+{
+PCIDevice *dev = &proxy->pci_dev;
+VirtIODevice *vdev = proxy->vdev;
+unsigned int vector;
+int ret, queue_no;
+MSIMessage msg;
+
+for (queue_no = 0; queue_no < nvqs; queue_no++) {
+if (!virtio_queue_get_num(vdev, queue_no)) {
+break;
+}
+vector = virtio_queue_vector(vdev, queue_no);
+if (vector >= msix_nr_vectors_allocated(dev)) {
+continue;
+}
+msg = msix_get_message(dev, vector);
+ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector, msg);
+if (ret < 0) {
+goto undo;
 }
-return ret;
 }
 return 0;
+
+undo:
+while (--queue_no >= 0) {
+vector = virtio_queue_vector(vdev, queue_no);
+if (vector >= msix_nr_vectors_allocated(dev)) {
+continue;
+}
+kvm_virtio_pci_vq_vector_release(proxy, vector);
+}
+return ret;
 }
 
-static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
+static void kvm_virtio_pci_vector_release(VirtIOPCIProxy *proxy, int nvqs)
+{
+PCIDevice *dev = &proxy->pci_dev;
+VirtIODevice *vdev = proxy->vdev;
+unsigned int vector;
+int queue_no;
+
+for (queue_no = 0; queue_no < nvqs; queue_no++) {
+if (!virtio_queue_get_num(vdev, queue_no)) {
+break;
+}
+vector = virtio_queue_vector(vdev, queue_no);
+if (vector >= msix_nr_vectors_allocated(dev)) {
+continue;
+}
+kvm_virtio_pci_vq_vector_release(proxy, vector);
+}
+}
+
+static int kvm_virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
+unsigned int queue_no,
+unsigned int vector,
+MSIMessage msg)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
+EventNotifier *n = virtio_queue_get_guest_notifier(vq);
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+int ret;
+
+if (irqfd->msg.data != msg.data || irqfd->msg.address != msg.address) {
+ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, msg);
+if (ret < 0) {
+return ret;
+}
+}
+
+ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd->virq);
+return ret;
+}
+
+static void kvm_virtio_pci_vq_vector_mask(VirtIOPCIProxy *proxy,
  unsigned int queue_no,
  unsigned int vector)
 {
@@ -522,13 +596,9 @@ static void 
kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
 
 ret = kvm_irqchip_remove_irqfd_notifier(kvm_state, n, irqfd->virq);
 assert(ret == 0);
-
-if (--irqfd->users == 0) {
-kvm_irqchip_release_virq(kvm_state, irqfd->virq);
-}
 }
 
-static int kvm_virtio_pci_vector_use(PCIDevice *dev, unsigned vector,
+static int kvm_virtio_pci_vector_unmask(PCIDevice *dev, unsigned vector,
  MSIMessage msg)
 {
 VirtIOPCIProxy *proxy = container_of(dev, VirtIOPCIProxy, pci_dev);
@@ -542,7 +612,7 @@ static int kvm_virtio_pci_vector_use(PCIDevice *dev, 
unsigned vector,
 if (virtio_queue_vector(vdev, queue_no) != vector) {
 continue;
 }
-ret = kvm_virtio_pci_vq_vector_use(proxy, queue_no, vector, msg);
+ret = kvm_virtio_pci_vq_vector_unmask(proxy, queue_no, vector, msg);
   

[Qemu-devel] [PATCH 8/8] vhost: backend masking support

2012-12-26 Thread Michael S. Tsirkin
Support backend guest notifier masking in vhost-net:
create eventfd at device init, when masked,
make vhost use that as eventfd instead of
sending an interrupt.

Signed-off-by: Michael S. Tsirkin 
---
 hw/vhost.c  | 95 +
 hw/vhost.h  | 10 ++
 hw/vhost_net.c  | 27 ++--
 hw/vhost_net.h  |  3 ++
 hw/virtio-net.c | 18 +++
 5 files changed, 137 insertions(+), 16 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 4fa5007..cee8aad 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -612,7 +612,7 @@ static void vhost_log_stop(MemoryListener *listener,
 /* FIXME: implement */
 }
 
-static int vhost_virtqueue_init(struct vhost_dev *dev,
+static int vhost_virtqueue_start(struct vhost_dev *dev,
 struct VirtIODevice *vdev,
 struct vhost_virtqueue *vq,
 unsigned idx)
@@ -681,16 +681,11 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 goto fail_kick;
 }
 
-file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
-r = ioctl(dev->control, VHOST_SET_VRING_CALL, &file);
-if (r) {
-r = -errno;
-goto fail_call;
-}
+/* Clear and discard previous events if any. */
+event_notifier_test_and_clear(&vq->masked_notifier);
 
 return 0;
 
-fail_call:
 fail_kick:
 fail_alloc:
 cpu_physical_memory_unmap(vq->ring, virtio_queue_get_ring_size(vdev, idx),
@@ -708,7 +703,7 @@ fail_alloc_desc:
 return r;
 }
 
-static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
+static void vhost_virtqueue_stop(struct vhost_dev *dev,
 struct VirtIODevice *vdev,
 struct vhost_virtqueue *vq,
 unsigned idx)
@@ -746,11 +741,39 @@ static void vhost_eventfd_del(MemoryListener *listener,
 {
 }
 
+static int vhost_virtqueue_init(struct vhost_dev *dev,
+struct vhost_virtqueue *vq, int n)
+{
+struct vhost_vring_file file = {
+.index = n,
+};
+int r = event_notifier_init(&vq->masked_notifier, 0);
+if (r < 0) {
+return r;
+}
+
+file.fd = event_notifier_get_fd(&vq->masked_notifier);
+r = ioctl(dev->control, VHOST_SET_VRING_CALL, &file);
+if (r) {
+r = -errno;
+goto fail_call;
+}
+return 0;
+fail_call:
+event_notifier_cleanup(&vq->masked_notifier);
+return r;
+}
+
+static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
+{
+event_notifier_cleanup(&vq->masked_notifier);
+}
+
 int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
bool force)
 {
 uint64_t features;
-int r;
+int i, r;
 if (devfd >= 0) {
 hdev->control = devfd;
 } else {
@@ -768,6 +791,13 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd, 
const char *devpath,
 if (r < 0) {
 goto fail;
 }
+
+for (i = 0; i < hdev->nvqs; ++i) {
+r = vhost_virtqueue_init(hdev, hdev->vqs + i, i);
+if (r < 0) {
+goto fail_vq;
+}
+}
 hdev->features = features;
 
 hdev->memory_listener = (MemoryListener) {
@@ -795,6 +825,10 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd, 
const char *devpath,
 memory_listener_register(&hdev->memory_listener, &address_space_memory);
 hdev->force = force;
 return 0;
+fail_vq:
+while (--i >= 0) {
+vhost_virtqueue_cleanup(hdev->vqs + i);
+}
 fail:
 r = -errno;
 close(hdev->control);
@@ -803,6 +837,10 @@ fail:
 
 void vhost_dev_cleanup(struct vhost_dev *hdev)
 {
+int i;
+for (i = 0; i < hdev->nvqs; ++i) {
+vhost_virtqueue_cleanup(hdev->vqs + i);
+}
 memory_listener_unregister(&hdev->memory_listener);
 g_free(hdev->mem);
 g_free(hdev->mem_sections);
@@ -869,6 +907,37 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 }
 }
 
+/* Test and clear event pending status.
+ * Should be called after unmask to avoid losing events.
+ */
+bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
+{
+struct vhost_virtqueue *vq = hdev->vqs + n;
+assert(hdev->started);
+return event_notifier_test_and_clear(&vq->masked_notifier);
+}
+
+/* Mask/unmask events from this vq. */
+void vhost_virtqueue_mask(struct vhost_dev *hdev, VirtIODevice *vdev, int n,
+ bool mask)
+{
+struct VirtQueue *vvq = virtio_get_queue(vdev, n);
+int r;
+
+assert(hdev->started);
+
+struct vhost_vring_file file = {
+.index = n,
+};
+if (mask) {
+file.fd = event_notifier_get_fd(&hdev->vqs[n].masked_notifier);
+} else {
+file.fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vvq));
+}
+r = ioctl(hdev->control, VHOST_SET_VRING_CALL, &file);
+assert(r >= 0);
+}
+
 /* Host notifiers must be enabl

[Qemu-devel] [PATCH 7/8] vhost: set started flag while start is in progress

2012-12-26 Thread Michael S. Tsirkin
This makes it possible to use started flag for sanity checking
of callbacks that happen during start/stop.

Signed-off-by: Michael S. Tsirkin 
---
 hw/vhost.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index b6d73ca..4fa5007 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -873,6 +873,9 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
 int i, r;
+
+hdev->started = true;
+
 if (!vdev->binding->set_guest_notifiers) {
 fprintf(stderr, "binding does not support guest notifiers\n");
 r = -ENOSYS;
@@ -918,8 +921,6 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 }
 }
 
-hdev->started = true;
-
 return 0;
 fail_log:
 fail_vq:
@@ -934,6 +935,8 @@ fail_features:
 vdev->binding->set_guest_notifiers(vdev->binding_opaque, hdev->nvqs, 
false);
 fail_notifiers:
 fail:
+
+hdev->started = false;
 return r;
 }
 
-- 
MST




[Qemu-devel] [PATCH 5/8] virtio: backend virtqueue notifier masking

2012-12-26 Thread Michael S. Tsirkin
some backends (notably vhost) can mask events
at their source in a way that is more efficient
than masking through kvm.
Specifically
- masking in kvm uses rcu write side so it has high latency
- in kvm on unmask we always send an interrupt
masking at source does not have these issues.

Add such support in virtio.h and use in virtio-pci.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-pci.c | 79 ++---
 hw/virtio.h | 13 ++
 2 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 1d77146..aad9d6d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -510,6 +510,31 @@ static void 
kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy,
 }
 }
 
+static int kvm_virtio_pci_irqfd_use(VirtIOPCIProxy *proxy,
+ unsigned int queue_no,
+ unsigned int vector)
+{
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
+EventNotifier *n = virtio_queue_get_guest_notifier(vq);
+int ret;
+ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd->virq);
+return ret;
+}
+
+static void kvm_virtio_pci_irqfd_release(VirtIOPCIProxy *proxy,
+  unsigned int queue_no,
+  unsigned int vector)
+{
+VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
+EventNotifier *n = virtio_queue_get_guest_notifier(vq);
+VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
+int ret;
+
+ret = kvm_irqchip_remove_irqfd_notifier(kvm_state, n, irqfd->virq);
+assert(ret == 0);
+}
+
 static int kvm_virtio_pci_vector_use(VirtIOPCIProxy *proxy, int nvqs)
 {
 PCIDevice *dev = &proxy->pci_dev;
@@ -531,6 +556,16 @@ static int kvm_virtio_pci_vector_use(VirtIOPCIProxy 
*proxy, int nvqs)
 if (ret < 0) {
 goto undo;
 }
+/* If guest supports masking, set up irqfd now.
+ * Otherwise, delay until unmasked in the frontend.
+ */
+if (proxy->vdev->guest_notifier_mask) {
+ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
+if (ret < 0) {
+kvm_virtio_pci_vq_vector_release(proxy, vector);
+goto undo;
+}
+}
 }
 return 0;
 
@@ -540,6 +575,9 @@ undo:
 if (vector >= msix_nr_vectors_allocated(dev)) {
 continue;
 }
+if (proxy->vdev->guest_notifier_mask) {
+kvm_virtio_pci_irqfd_release(proxy, vector, queue_no);
+}
 kvm_virtio_pci_vq_vector_release(proxy, vector);
 }
 return ret;
@@ -560,6 +598,12 @@ static void kvm_virtio_pci_vector_release(VirtIOPCIProxy 
*proxy, int nvqs)
 if (vector >= msix_nr_vectors_allocated(dev)) {
 continue;
 }
+/* If guest supports masking, clean up irqfd now.
+ * Otherwise, it was cleaned when masked in the frontend.
+ */
+if (proxy->vdev->guest_notifier_mask) {
+kvm_virtio_pci_irqfd_release(proxy, vector, queue_no);
+}
 kvm_virtio_pci_vq_vector_release(proxy, vector);
 }
 }
@@ -581,7 +625,19 @@ static int kvm_virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 }
 }
 
-ret = kvm_irqchip_add_irqfd_notifier(kvm_state, n, irqfd->virq);
+/* If guest supports masking, irqfd is already setup, unmask it.
+ * Otherwise, set it up now.
+ */
+if (proxy->vdev->guest_notifier_mask) {
+proxy->vdev->guest_notifier_mask(proxy->vdev, queue_no, false);
+/* Test after unmasking to avoid losing events. */
+if (proxy->vdev->guest_notifier_pending &&
+proxy->vdev->guest_notifier_pending(proxy->vdev, queue_no)) {
+event_notifier_set(n);
+}
+} else {
+ret = kvm_virtio_pci_irqfd_use(proxy, queue_no, vector);
+}
 return ret;
 }
 
@@ -589,13 +645,14 @@ static void kvm_virtio_pci_vq_vector_mask(VirtIOPCIProxy 
*proxy,
  unsigned int queue_no,
  unsigned int vector)
 {
-VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
-EventNotifier *n = virtio_queue_get_guest_notifier(vq);
-VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector];
-int ret;
-
-ret = kvm_irqchip_remove_irqfd_notifier(kvm_state, n, irqfd->virq);
-assert(ret == 0);
+/* If guest supports masking, keep irqfd but mask it.
+ * Otherwise, clean it up now.
+ */ 
+if (proxy->vdev->guest_notifier_mask) {
+proxy->vdev->guest_notifier_mask(proxy->vdev, queue_no, true);
+} else {
+kvm_virtio_pci_irqfd_release(proxy, vector, queue_no);
+}
 }
 
 static int kvm_virtio_pci_vector_unmask(PCIDevice *dev, unsigned vector,
@@ -668,7 +725,11 @@ static void kvm_virtio_pci_vector_poll(PCIDevice *dev,
 }
   

[Qemu-devel] [PATCH 1/8] virtio: don't waste irqfds on control vqs

2012-12-26 Thread Michael S. Tsirkin
Pass nvqs to set_guest_notifiers. This makes it possible to
save on irqfds by not allocating one for the control vq
for virtio-net.

Signed-off-by: Michael S. Tsirkin 
---
 hw/vhost.c  | 10 +++---
 hw/virtio-pci.c | 19 ++-
 hw/virtio-pci.h |  1 +
 hw/virtio.h |  2 +-
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 4e1cb47..b6d73ca 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -879,7 +879,9 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 goto fail;
 }
 
-r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
+r = vdev->binding->set_guest_notifiers(vdev->binding_opaque,
+   hdev->nvqs,
+   true);
 if (r < 0) {
 fprintf(stderr, "Error binding guest notifier: %d\n", -r);
 goto fail_notifiers;
@@ -929,7 +931,7 @@ fail_vq:
 }
 fail_mem:
 fail_features:
-vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
+vdev->binding->set_guest_notifiers(vdev->binding_opaque, hdev->nvqs, 
false);
 fail_notifiers:
 fail:
 return r;
@@ -950,7 +952,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 vhost_sync_dirty_bitmap(hdev, &hdev->mem_sections[i],
 0, (hwaddr)~0x0ull);
 }
-r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, false);
+r = vdev->binding->set_guest_notifiers(vdev->binding_opaque,
+   hdev->nvqs,
+   false);
 if (r < 0) {
 fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
 fflush(stderr);
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index af9a56c..3c258a4 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -535,7 +535,7 @@ static int kvm_virtio_pci_vector_use(PCIDevice *dev, 
unsigned vector,
 VirtIODevice *vdev = proxy->vdev;
 int ret, queue_no;
 
-for (queue_no = 0; queue_no < VIRTIO_PCI_QUEUE_MAX; queue_no++) {
+for (queue_no = 0; queue_no < proxy->nvqs_with_notifiers; queue_no++) {
 if (!virtio_queue_get_num(vdev, queue_no)) {
 break;
 }
@@ -565,7 +565,7 @@ static void kvm_virtio_pci_vector_release(PCIDevice *dev, 
unsigned vector)
 VirtIODevice *vdev = proxy->vdev;
 int queue_no;
 
-for (queue_no = 0; queue_no < VIRTIO_PCI_QUEUE_MAX; queue_no++) {
+for (queue_no = 0; queue_no < proxy->nvqs_with_notifiers; queue_no++) {
 if (!virtio_queue_get_num(vdev, queue_no)) {
 break;
 }
@@ -587,7 +587,7 @@ static void kvm_virtio_pci_vector_poll(PCIDevice *dev,
 EventNotifier *notifier;
 VirtQueue *vq;
 
-for (queue_no = 0; queue_no < VIRTIO_PCI_QUEUE_MAX; queue_no++) {
+for (queue_no = 0; queue_no < proxy->nvqs_with_notifiers; queue_no++) {
 if (!virtio_queue_get_num(vdev, queue_no)) {
 break;
 }
@@ -631,7 +631,7 @@ static bool virtio_pci_query_guest_notifiers(DeviceState *d)
 return msix_enabled(&proxy->pci_dev);
 }
 
-static int virtio_pci_set_guest_notifiers(DeviceState *d, bool assign)
+static int virtio_pci_set_guest_notifiers(DeviceState *d, int nvqs, bool 
assign)
 {
 VirtIOPCIProxy *proxy = to_virtio_pci_proxy(d);
 VirtIODevice *vdev = proxy->vdev;
@@ -639,6 +639,15 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, 
bool assign)
 bool with_irqfd = msix_enabled(&proxy->pci_dev) &&
 kvm_msi_via_irqfd_enabled();
 
+nvqs = MIN(nvqs, VIRTIO_PCI_QUEUE_MAX);
+
+/* When deassigning, pass a consistent nvqs value
+ * to avoid leaking notifiers.
+ */
+assert(assign || nvqs == proxy->nvqs_with_notifiers);
+
+proxy->nvqs_with_notifiers = nvqs;
+
 /* Must unset vector notifier while guest notifier is still assigned */
 if (proxy->vector_irqfd && !assign) {
 msix_unset_vector_notifiers(&proxy->pci_dev);
@@ -646,7 +655,7 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, 
bool assign)
 proxy->vector_irqfd = NULL;
 }
 
-for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) {
+for (n = 0; n < nvqs; n++) {
 if (!virtio_queue_get_num(vdev, n)) {
 break;
 }
diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index b58d9a2..b0f17e2 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -51,6 +51,7 @@ typedef struct {
 bool ioeventfd_disabled;
 bool ioeventfd_started;
 VirtIOIRQFD *vector_irqfd;
+int nvqs_with_notifiers;
 } VirtIOPCIProxy;
 
 void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev);
diff --git a/hw/virtio.h b/hw/virtio.h
index 1dec9dc..329b426 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -99,7 +99,7 @@ typedef struct {
 int (*load_done)(DeviceState *d, QEMUFile *f);
 unsigned (*get_features)(DeviceState *d);
 bool (*query_guest_notifiers)(DeviceState *d);
-int (*set_guest_notifiers)(Device

[Qemu-devel] [PATCH 6/8] virtio-net: set/clear vhost_started in reverse order

2012-12-26 Thread Michael S. Tsirkin
As vhost started is cleared last thing on stop,
set it first things on start. This makes it
possible to use vhost_started while start is in
progress which is used by follow-up patches.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 5d03b31..b756d57 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -126,12 +126,12 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
 if (!vhost_net_query(tap_get_vhost_net(n->nic->nc.peer), &n->vdev)) {
 return;
 }
+n->vhost_started = 1;
 r = vhost_net_start(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
 if (r < 0) {
 error_report("unable to start vhost net: %d: "
  "falling back on userspace virtio", -r);
-} else {
-n->vhost_started = 1;
+n->vhost_started = 0;
 }
 } else {
 vhost_net_stop(tap_get_vhost_net(n->nic->nc.peer), &n->vdev);
-- 
MST




[Qemu-devel] [PATCH 3/8] kvm: add stub for update msi route

2012-12-26 Thread Michael S. Tsirkin
Will be used by virtio-pci.

Signed-off-by: Michael S. Tsirkin 
---
 kvm-stub.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kvm-stub.c b/kvm-stub.c
index 5b97152..81f8967 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -131,6 +131,11 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 {
 }
 
+int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
+{
+return -ENOSYS;
+}
+
 int kvm_irqchip_add_irqfd_notifier(KVMState *s, EventNotifier *n, int virq)
 {
 return -ENOSYS;
-- 
MST




[Qemu-devel] Graphics performance

2012-12-26 Thread Erik Rull

Hi all,

which is the graphics emulation with the lowest CPU usage for 2D-only GUIs? 
(e.g. Win XP without Direct3D usage)? I just need to drive a virtual 
graphics display with 1024x768 (@16bit colors). At the moment I use the 
cirrus graphics card emulation. Is there something more efficient? 
Terminal/Console is either a real display or VNC - maybe for the two 
versions different adaptors bring the best performance for each of them?


Thanks.

Best regards,

Erik



Re: [Qemu-devel] [PATCH 6/6] usb-tablet: Allow connecting to ehci

2012-12-26 Thread Erik Rull

Hi Gerd, hi Hans,

is my assumption correct that if I check out and compile this version from 
GIT master that the usb-tablet device is automatically routed to ehci 
without changing anything else in the qemu call arguments? (And the 
performance enhancement takes place automatically)

If not - what has to be changed to get it working?

Best regards,

Erik



Gerd Hoffmann wrote:

From: Hans de Goede 

Our ehci code has is capable of significantly lowering the wakeup rate
for the hcd emulation while the device is idle. It is possible to add
similar code ot the uhci emulation, but that simply is not there atm,
and there is no reason why a (virtual) usb-tablet can not be a USB-2 device.

Making usb-hid devices connect to the emulated ehci controller instead
of the emulated uhci controller on vms which have both lowers the cpuload
for a fully idle vm from 20% to 2-3% (on my laptop).

An alternative implementation to using a property to select the tablet
type, would be simply making it a new device type, ie usb-tablet2, but the
downside of that is that this will require libvirt changes to be available
through libvirt at all, and then management tools changes to become the
default for new vms, where as using a property will automatically get
any pc-1.3 type vms the lower cpuload.

[ kraxel: adapt compat property for post-1.3 merge ]

Signed-off-by: Hans de Goede 
Signed-off-by: Gerd Hoffmann 

tablet compat fixup

Signed-off-by: Gerd Hoffmann 





Re: [Qemu-devel] Graphics performance

2012-12-26 Thread Dunrong Huang
On Wed, Dec 26, 2012 at 8:04 PM, Erik Rull  wrote:
> Hi all,
>
> which is the graphics emulation with the lowest CPU usage for 2D-only GUIs?
> (e.g. Win XP without Direct3D usage)? I just need to drive a virtual
> graphics display with 1024x768 (@16bit colors). At the moment I use the
> cirrus graphics card emulation. Is there something more efficient?
> Terminal/Console is either a real display or VNC - maybe for the two
> versions different adaptors bring the best performance for each of them?
>
you shoud try spice. spice depends on qxl video card which is a
paravirtual graphics card.
It means you must install qxl driver in your guest to make this card work.

More details, please refer:
http://www.linux-kvm.org/page/SPICE
> Thanks.
>
> Best regards,
>
> Erik
>



-- 
Best Regards,

Dunrong Huang



Re: [Qemu-devel] [PATCH 0/2 v2] introduce visitor for parsing suffixed integer

2012-12-26 Thread Eduardo Habkost
On Sun, Dec 23, 2012 at 02:34:21PM -0600, Anthony Liguori wrote:
> Igor Mammedov  writes:
> 
> > v2:
> >  * Naming changes:
> > - s/visit_type_uint_suffixed_int/visit_type_suffixed_int/
> > - use 'suffix_factor' instead of 'unit'
> >  * Added  documentation to visit_type_suffixed_int()
> >  * Fixed errp check.
> >  * Style fixes
> 
> This is not how visitors are supposed to be used.
> 
> Just treat tsc_freq as a string property and parse it in the setter.

So, why visit_type_size() does exist? Should we work to eliminate it? If
not, why is it different from a "frequency" type?

-- 
Eduardo



Re: [Qemu-devel] Graphics performance

2012-12-26 Thread Erik Rull

Hi,

thanks for your quick reply.

Dunrong Huang wrote:

On Wed, Dec 26, 2012 at 8:04 PM, Erik Rull  wrote:

Hi all,

which is the graphics emulation with the lowest CPU usage for 2D-only GUIs?
(e.g. Win XP without Direct3D usage)? I just need to drive a virtual
graphics display with 1024x768 (@16bit colors). At the moment I use the
cirrus graphics card emulation. Is there something more efficient?
Terminal/Console is either a real display or VNC - maybe for the two
versions different adaptors bring the best performance for each of them?


you shoud try spice. spice depends on qxl video card which is a
paravirtual graphics card.
It means you must install qxl driver in your guest to make this card work.

More details, please refer:
http://www.linux-kvm.org/page/SPICE


Hi just had a look at this page - but it looks as if I cannot use this for 
a real hardware display:

""
Client

To connect to a virtual machine using SPICE, you need a client application.
""

Is a hardware monitor capable to display the QXL format? And for my remote 
clients - can I still use VNC?
A change in the guest would be okay, but I still must be able to use the 
existing client environment...


Best regards,

Erik




[Qemu-devel] [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue

2012-12-26 Thread Gleb Natapov
The following changes since commit e376a788ae130454ad5e797f60cb70d0308babb6:

  Merge remote-tracking branch 'kwolf/for-anthony' into staging (2012-12-13 
14:32:28 -0600)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

for you to fetch changes up to 0a2a59d35cbabf63c91340a1c62038e3e60538c1:

  qemu-kvm/pci-assign: 64 bits bar emulation (2012-12-25 14:37:52 +0200)


Will Auld (1):
  target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

Xudong Hao (1):
  qemu-kvm/pci-assign: 64 bits bar emulation

 hw/kvm/pci-assign.c   |   14 ++
 target-i386/cpu.h |2 ++
 target-i386/kvm.c |   14 ++
 target-i386/machine.c |   21 +
 4 files changed, 47 insertions(+), 4 deletions(-)



[Qemu-devel] [PATCH 2/2] qemu-kvm/pci-assign: 64 bits bar emulation

2012-12-26 Thread Gleb Natapov
From: Xudong Hao 

Enable 64 bits bar emulation.

Test pass with the current seabios which already support 64bit pci bars.

Signed-off-by: Xudong Hao 
Reviewed-by: Alex Williamson 
Signed-off-by: Gleb Natapov 
---
 hw/kvm/pci-assign.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index e80dad0..addc205 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -46,6 +46,7 @@
 #define IORESOURCE_IRQ  0x0400
 #define IORESOURCE_DMA  0x0800
 #define IORESOURCE_PREFETCH 0x2000  /* No side effects */
+#define IORESOURCE_MEM_64   0x0010
 
 //#define DEVICE_ASSIGNMENT_DEBUG
 
@@ -442,9 +443,13 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 
 /* handle memory io regions */
 if (cur_region->type & IORESOURCE_MEM) {
-int t = cur_region->type & IORESOURCE_PREFETCH
-? PCI_BASE_ADDRESS_MEM_PREFETCH
-: PCI_BASE_ADDRESS_SPACE_MEMORY;
+int t = PCI_BASE_ADDRESS_SPACE_MEMORY;
+if (cur_region->type & IORESOURCE_PREFETCH) {
+t |= PCI_BASE_ADDRESS_MEM_PREFETCH;
+}
+if (cur_region->type & IORESOURCE_MEM_64) {
+t |= PCI_BASE_ADDRESS_MEM_TYPE_64;
+}
 
 /* map physical memory */
 pci_dev->v_addrs[i].u.r_virtbase = mmap(NULL, cur_region->size,
@@ -632,7 +637,8 @@ again:
 rp->valid = 0;
 rp->resource_fd = -1;
 size = end - start + 1;
-flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH;
+flags &= IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH
+ | IORESOURCE_MEM_64;
 if (size == 0 || (flags & ~IORESOURCE_PREFETCH) == 0) {
 continue;
 }
-- 
1.7.10.4




[Qemu-devel] [PATCH 1/2] target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs

2012-12-26 Thread Gleb Natapov
From: Will Auld 

CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to the
hypervisor vcpu specific locations to store the value of the emulated MSRs.
In this way the IA32_TSC_ADJUST value will be included in all reads to
the TSC MSR whether through rdmsr or rdtsc.

As this is a new MSR that the guest may access and modify its value needs
to be migrated along with the other MRSs. The changes here are specifically
for recognizing when IA32_TSC_ADJUST is enabled in CPUID and code added
for migrating its value.

Signed-off-by: Will Auld 
Reviewed-by: Andreas Färber 
Signed-off-by: Marcelo Tosatti 
---
 target-i386/cpu.h |2 ++
 target-i386/kvm.c |   14 ++
 target-i386/machine.c |   21 +
 3 files changed, 37 insertions(+)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 386c4f6..477da33 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -295,6 +295,7 @@
 #define MSR_IA32_APICBASE_BSP   (1<<8)
 #define MSR_IA32_APICBASE_ENABLE(1<<11)
 #define MSR_IA32_APICBASE_BASE  (0xf<<12)
+#define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_TSCDEADLINE0x6e0
 
 #define MSR_MTRRcap0xfe
@@ -774,6 +775,7 @@ typedef struct CPUX86State {
 uint64_t pv_eoi_en_msr;
 
 uint64_t tsc;
+uint64_t tsc_adjust;
 uint64_t tsc_deadline;
 
 uint64_t mcg_status;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f669281..ae6ce1f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -62,6 +62,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
+static bool has_msr_tsc_adjust;
 static bool has_msr_tsc_deadline;
 static bool has_msr_async_pf_en;
 static bool has_msr_pv_eoi_en;
@@ -676,6 +677,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hsave_pa = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == MSR_TSC_ADJUST) {
+has_msr_tsc_adjust = true;
+continue;
+}
 if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) {
 has_msr_tsc_deadline = true;
 continue;
@@ -1013,6 +1018,9 @@ static int kvm_put_msrs(CPUX86State *env, int level)
 if (has_msr_hsave_pa) {
 kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
 }
+if (has_msr_tsc_adjust) {
+kvm_msr_entry_set(&msrs[n++], MSR_TSC_ADJUST, env->tsc_adjust);
+}
 if (has_msr_tsc_deadline) {
 kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, env->tsc_deadline);
 }
@@ -1273,6 +1281,9 @@ static int kvm_get_msrs(CPUX86State *env)
 if (has_msr_hsave_pa) {
 msrs[n++].index = MSR_VM_HSAVE_PA;
 }
+if (has_msr_tsc_adjust) {
+msrs[n++].index = MSR_TSC_ADJUST;
+}
 if (has_msr_tsc_deadline) {
 msrs[n++].index = MSR_IA32_TSCDEADLINE;
 }
@@ -1350,6 +1361,9 @@ static int kvm_get_msrs(CPUX86State *env)
 case MSR_IA32_TSC:
 env->tsc = msrs[i].data;
 break;
+case MSR_TSC_ADJUST:
+env->tsc_adjust = msrs[i].data;
+break;
 case MSR_IA32_TSCDEADLINE:
 env->tsc_deadline = msrs[i].data;
 break;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 4771508..4229dde 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -328,6 +328,24 @@ static const VMStateDescription vmstate_fpop_ip_dp = {
 }
 };
 
+static bool tsc_adjust_needed(void *opaque)
+{
+CPUX86State *env = opaque;
+
+return env->tsc_adjust != 0;
+}
+
+static const VMStateDescription vmstate_msr_tsc_adjust = {
+.name = "cpu/msr_tsc_adjust",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT64(tsc_adjust, CPUX86State),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static bool tscdeadline_needed(void *opaque)
 {
 CPUX86State *env = opaque;
@@ -478,6 +496,9 @@ static const VMStateDescription vmstate_cpu = {
 .vmsd = &vmstate_fpop_ip_dp,
 .needed = fpop_ip_dp_needed,
 }, {
+.vmsd = &vmstate_msr_tsc_adjust,
+.needed = tsc_adjust_needed,
+}, {
 .vmsd = &vmstate_msr_tscdeadline,
 .needed = tscdeadline_needed,
 }, {
-- 
1.7.10.4




Re: [Qemu-devel] [PATCH] fix include file in hw/loader.h

2012-12-26 Thread Wenchao Xia
  please ignore this mail, sorry for trouble.

>Now header files have gone into includes, so include file path
> needs change also to avoid build break.
> 
> Signed-off-by: Wenchao Xia 
> ---
>   hw/loader.h |3 ++-
>   1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/loader.h b/hw/loader.h
> index 26480ad..5e61c95 100644
> --- a/hw/loader.h
> +++ b/hw/loader.h
> @@ -1,5 +1,6 @@
>   #ifndef LOADER_H
>   #define LOADER_H
> +#include "qapi/qmp/qdict.h"
> 
>   /* loader.c */
>   int get_image_size(const char *filename);
> @@ -30,7 +31,7 @@ int rom_load_all(void);
>   void rom_set_fw(void *f);
>   int rom_copy(uint8_t *dest, hwaddr addr, size_t size);
>   void *rom_ptr(hwaddr addr);
> -void do_info_roms(Monitor *mon);
> +void do_info_roms(Monitor *mon, const QDict *qdict);
> 
>   #define rom_add_file_fixed(_f, _a, _i)  \
>   rom_add_file(_f, NULL, _a, _i)
> 


-- 
Best Regards

Wenchao Xia




Re: [Qemu-devel] [PATCH 6/6] usb-tablet: Allow connecting to ehci

2012-12-26 Thread Hao Luo
Erik,

You can take a look at my recent post about usb tablet peforamnce over ehci, in 
which I mentioned a libvirt config of how to let usb tablet connect to ehci.

2012-12-26



Hao Luo



发件人:Erik Rull
发送时间:2012-12-26 20:08
主题:Re: [Qemu-devel] [PATCH 6/6] usb-tablet: Allow connecting to ehci
收件人:"Gerd Hoffmann"
抄送:"Hans de Goede","qemu-devel"

Hi Gerd, hi Hans, 

is my assumption correct that if I check out and compile this version from  
GIT master that the usb-tablet device is automatically routed to ehci  
without changing anything else in the qemu call arguments? (And the  
performance enhancement takes place automatically) 
If not - what has to be changed to get it working? 

Best regards, 

Erik 



Gerd Hoffmann wrote: 
> From: Hans de Goede  
> 
> Our ehci code has is capable of significantly lowering the wakeup rate 
> for the hcd emulation while the device is idle. It is possible to add 
> similar code ot the uhci emulation, but that simply is not there atm, 
> and there is no reason why a (virtual) usb-tablet can not be a USB-2 device. 
> 
> Making usb-hid devices connect to the emulated ehci controller instead 
> of the emulated uhci controller on vms which have both lowers the cpuload 
> for a fully idle vm from 20% to 2-3% (on my laptop). 
> 
> An alternative implementation to using a property to select the tablet 
> type, would be simply making it a new device type, ie usb-tablet2, but the 
> downside of that is that this will require libvirt changes to be available 
> through libvirt at all, and then management tools changes to become the 
> default for new vms, where as using a property will automatically get 
> any pc-1.3 type vms the lower cpuload. 
> 
> [ kraxel: adapt compat property for post-1.3 merge ] 
> 
> Signed-off-by: Hans de Goede  
> Signed-off-by: Gerd Hoffmann  
> 
> tablet compat fixup 
> 
> Signed-off-by: Gerd Hoffmann  

Re: [Qemu-devel] Graphics performance

2012-12-26 Thread Dunrong Huang
On Wed, Dec 26, 2012 at 9:22 PM, Erik Rull  wrote:

>>> which is the graphics emulation with the lowest CPU usage for 2D-only
>>> GUIs?
>>> (e.g. Win XP without Direct3D usage)? I just need to drive a virtual
>>> graphics display with 1024x768 (@16bit colors). At the moment I use the
>>> cirrus graphics card emulation. Is there something more efficient?
>>> Terminal/Console is either a real display or VNC - maybe for the two
>>> versions different adaptors bring the best performance for each of them?
>>>
>> you shoud try spice. spice depends on qxl video card which is a
>> paravirtual graphics card.
>> It means you must install qxl driver in your guest to make this card work.
>>
>> More details, please refer:
>> http://www.linux-kvm.org/page/SPICE
>
>
> Hi just had a look at this page - but it looks as if I cannot use this for a
> real hardware display:
> ""
> Client
>
> To connect to a virtual machine using SPICE, you need a client application.
> ""
which os you are using? window or linux? If you use linux, you can
install a spice client easily by e.g.:
sudo apt-get install spice-client
or
sudo emerge spice-gtk

Then connect to your guest with:
spicy -h host_ip -p port

If you use windows, download a binary from:
http://spice-space.org/download.html
>
> Is a hardware monitor capable to display the QXL format? And for my remote
> clients - can I still use VNC?
Any monitor is capable QXL format, actually spice display is not
relevant to hardware monitor.

> A change in the guest would be okay, but I still must be able to use the
> existing client environment...
>
If you want to use  VNC for display, pass "-vnc :1" to qemu command line.
This way, VNC server is running in QEMU, not guest, you donot need to change
anything in guest.




-- 
Best Regards,

Dunrong Huang



Re: [Qemu-devel] [Bug 1093691] [NEW] QEMU build fails on OpenBSD/mips64

2012-12-26 Thread Richard Henderson
On 12/25/2012 01:12 PM, Brad Smith wrote:
> Public bug reported:
> 
> Building QEMU 1.2.1 on OpenBSD/mips64 fails as follows although I
> believe QEMU was also broken with 1.1.x as well..
...
> In file included from /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg.c:50:
> /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg-op.h: In function 
> 'tcg_gen_div_i64':
> /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg-op.h:1229: error: 
> 'TCG_TARGET_HAS_div_i64' undeclared (first use in this function)

The tcg/mips target only supports mips32.

In order to build on mips64 you'll need to use the interpreter backend,
configurable with --enable-tcg-interpreter.

We could be more clever and enable this by default for mips64...


r~



Re: [Qemu-devel] [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru devices assigned to KVM guests

2012-12-26 Thread Bjorn Helgaas
On Mon, Nov 26, 2012 at 11:46 PM, Gleb Natapov  wrote:
> On Mon, Nov 26, 2012 at 09:46:12PM -0200, Marcelo Tosatti wrote:
>> On Tue, Nov 20, 2012 at 02:09:46PM +, Pandarathil, Vijaymohan R wrote:
>> >
>> >
>> > > -Original Message-
>> > > From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
>> > > Sent: Tuesday, November 20, 2012 5:41 AM
>> > > To: Pandarathil, Vijaymohan R
>> > > Cc: k...@vger.kernel.org; linux-...@vger.kernel.org; 
>> > > qemu-devel@nongnu.org;
>> > > linux-ker...@vger.kernel.org
>> > > Subject: Re: [PATCH 0/4] AER-KVM: Error containment of PCI pass-thru
>> > > devices assigned to KVM guests
>> > >
>> > > On Tue, Nov 20, 2012 at 06:31:48AM +, Pandarathil, Vijaymohan R 
>> > > wrote:
>> > > > Add support for error containment when a PCI pass-thru device assigned 
>> > > > to
>> > > a KVM
>> > > > guest encounters an error. This is for PCIe devices/drivers that 
>> > > > support
>> > > AER
>> > > > functionality. When the OS is notified of an error in a device either
>> > > > through the firmware first approach or through an interrupt handled by
>> > > the AER
>> > > > root port driver, concerned subsystems are notified by invoking 
>> > > > callbacks
>> > > > registered by these subsystems. The device is also marked as tainted 
>> > > > till
>> > > the
>> > > > corresponding driver recovery routines are successful.
>> > > >
>> > > > KVM module registers for a notification of such errors. In the KVM
>> > > callback
>> > > > routine, a global counter is incremented to keep track of the error
>> > > > notification. Before each CPU enters guest mode to execute guest code,
>> > > > appropriate checks are done to see if the impacted device belongs to 
>> > > > the
>> > > guest
>> > > > or not. If the device belongs to the guest, qemu hypervisor for the 
>> > > > guest
>> > > is
>> > > > informed and the guest is immediately brought down, thus preventing or
>> > > > minimizing chances of any bad data being written out by the guest 
>> > > > driver
>> > > > after the device has encountered an error.
>> > >
>> > > I'm surprised that the hypervisor would shut down the guest when PCIe
>> > > AER kicks in for a pass-through device.  Shouldn't we pass the AER event
>> > > into the guest and deal with it there?
>> >
>> > Agreed. That would be the ideal behavior and is planned in a future patch.
>> > Lack of control over the capabilities/type of the OS/drivers running in
>> > the guest is also a concern in passing along the event to the guest.
>> >
>> > My understanding is that in the current implementation of Linux/KVM, these
>> > errors are not handled at all and can potentially cause a guest hang or
>> > crash or even data corruption depending on the implementation of the guest
>> > driver for the device. As a first step, these patches make the behavior
>> > better by doing error containment with a predictable behavior when such
>> > errors occur.
>>
>> For both ACPI notifications and Linux PCI AER driver there is a way for
>> the PCI driver to receive a notification, correct?
>>
>> Can just have virt/kvm/assigned-dev.c code register such a notifier (as
>> a "PCI driver") and then perform appropriate action?
>>
>> Also the semantics of "tainted driver" is not entirely clear.
>>
>> Is there any reason for not having this feature for VFIO only, as KVM
>> device assigment is being phased out?
>>
> Exactly. We shouldn't add checks to guest entry code and introduce new
> userspace ABI to add minor feature to deprecated code. New userspace ABI
> means that QEMU changes are needed, so the feature will be fully functional
> only with latest QEMU which is capable of using VFIO anyway.

I'm ignoring these patches for now.  Please address the review
comments if you think we still need to do something here.

Bjorn



Re: [Qemu-devel] [RFC] lively write vmstate with predictable size

2012-12-26 Thread Wenchao Xia
> Hi, Juan
>Thank u for reviewing on this, have some questions below.
> 
>> Wenchao Xia  wrote:
>>> resent the mail to mail-list.
>>> ---
>>>
>>> Hi, Paolo and Juan
>>>   Currently savevm needs pause vm, and I am working on that make it
>>>lively. Considering the flexibility I'd like to split out the
>>>functions apart as following:
>>>1) snapshot lively internal/external
>>>2) save vmstate lively internal/external
>>>3) assemble them as will
>>>
>>>1) was sent at
>>>http://lists.nongnu.org/archive/html/qemu-devel/2012-12/msg02393.html
>>>
>>>but for 2), I think it have problem because file size may grow to a size
>>>out of control. Considering the migration code, I'd like to propose a
>>>way to fix it as following:
>>>
>>>Migration logic:
>>>Src send->dest recv->data analysis->copy data
>>>Savevm logic:
>>>Src send->write data to qcow2.
>>>
>>> My suggestion:
>>>Savevm logic:
>>>Src send->dest recv->data analysis->write data to qcow2/external with
>>>addr.
>>>
>>>  The idea is do the write operation after data analysis, and overwrite
>>>old data if address overlaps. So this will need qcow2 support
>>>write snapshot data at "address", and also change some savevm logic.
>>
>> We could change the code to do it, but it is not going to be
>> trivial. Furthermore we should disable xbzrle (in this case, it just
>> makes no sense)
>>
>> The easiest way that I can think of is changing the
>> arch_init.c:
>>
>> We can:
>> a- reserve a big enough area in the qcow2 image/external to store the
>> whole ram
>> b- change ram_save_block() on that file to directly write to the "right"
>>  position inside the image.
>> c- change ram_load() to understand the new format.
>>
>I agree with your way for that it saves computation for
> data analysis. But why a big area should be reserved and new format
> is needed? I thought every thing would be same as old vmstate save
> case with vm stopped, if qemu can overwrite the contents in the right
> place and leaves remaining tag unchanged.
> 
Hi, Juan
  After reconsideration, I think preallocation is indeed needed, or a
file format is needed which make write to a position larger than the
file should extend the file to the same size of postion(maybe some
filesystem support hole can make it)?
  I suppose a solution to make things relative easy:
1) at start qemu make sure space is preallocated or a file support
 holes.
2) calculate the vmstate size, and arrange the contents lineiar with
communication headers, that is what is in save_block_hdr() and other key
stamps.
3) write those data lively with headers to the arranged position.

  In this way the qcow2 internal snapshot format need not be changed,
but sacrifice some space used to store communication stamps, which I
think would not be too many. And if the coding in this approach is
carefully and predict the remove the write stamps, it would be easy
to remove the stamps later as a new snapshot format, if a flag is
added in the code. But before all a good arrangement of vmstate
is needed.
  what do you think about it? :)

>how about write a new function replace save_block_hdr() and following
> buffer writing, which would be an abstract function used to write data
> with addr? It still encapsulate it as save_block_hdr() logic, but in
> savevm case it will directly write data to a right place. In this way
> at vmstate contents level, nothing changes, it still stores the same
> stream as if vm is stopped.
> 
>> Notice that this would have to be a different implementation that
>> sending data over tcp, as there we want (something) similar to the
>> current code.
>>
>I think it would be a choice between performance and code unifying,
> personally I guess performance is the first to consider for qemu.
> 
>>
>>>  Could u give some some comments on this to see if it is workable?
>>
>> It needs lots of code rearragement as far as I can think, it is doable
>> but it is quite a bit of work.
>>
>> Later, Juan.
>>
> 
> 


-- 
Best Regards

Wenchao Xia




[Qemu-devel] Translate GFN to ram_addr

2012-12-26 Thread YoungJoon Lee
Hello,

I'm learning about qemu migration. My qemu version is 0.13.0

I want to translate GFN(which is Guest physical address >> 12) to ram_addr.

I found cpu_get_physical_page_desc function, which i guess this translation.
I tried next code but it's result is not i want to.(wrong address returned)

target_phys_addr_t start_addr=(gfn)<

Re: [Qemu-devel] [Bug 1093691] [NEW] QEMU build fails on OpenBSD/mips64

2012-12-26 Thread Brad Smith
On Wed, Dec 26, 2012 at 09:55:38AM -0800, Richard Henderson wrote:
> On 12/25/2012 01:12 PM, Brad Smith wrote:
> > Public bug reported:
> > 
> > Building QEMU 1.2.1 on OpenBSD/mips64 fails as follows although I
> > believe QEMU was also broken with 1.1.x as well..
> ...
> > In file included from /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg.c:50:
> > /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg-op.h: In function 
> > 'tcg_gen_div_i64':
> > /usr/obj/ports/qemu-1.2.1/qemu-1.2.1/tcg/tcg-op.h:1229: error: 
> > 'TCG_TARGET_HAS_div_i64' undeclared (first use in this function)
> 
> The tcg/mips target only supports mips32.
> 
> In order to build on mips64 you'll need to use the interpreter backend,
> configurable with --enable-tcg-interpreter.
> 
> We could be more clever and enable this by default for mips64...

Strange. It was building with older releases and I'm fairly certain what
was built actually ran Ok. Looking at the MIPS TCG code it looks as if
there is some code in there to deal with 32-bit vs 64-bit MIPS.

Anyway, if the MIPS TCG code does not officially support 64-bit MIPS
then the build infrastructure should be fixed to not shoot people in
the feet trying to build QEMU. The 32-bit SPARC support needs to be
fixed too.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




Re: [Qemu-devel] [Bug 1087114] [NEW] assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

2012-12-26 Thread Brad Smith
On Thu, Dec 13, 2012 at 04:26:50PM +0800, Zhi Yong Wu wrote:
> On Thu, Dec 6, 2012 at 12:02 PM, Brad Smith <1087...@bugs.launchpad.net> 
> wrote:
> > Public bug reported:
> >
> > QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the
> > command line params do not seem to matter.
> >
> > assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c",
> > line 1220, function "bdrv_drain_all"
> Just i hit the same issue on my large scale perf testing, mayb i
> should try virtio-blk to work around before it is fixed by some guy.

What OS are you using to host QEMU?

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




[Qemu-devel] [Bug 1087114] Re: assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

2012-12-26 Thread Brad Smith
Paolo,

As you wrote the fallback code which is used when sem_timedwait() is
missing could you please take a look at this when you have some time? I
can test any patches you might come up with.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1087114

Title:
  assertion "QLIST_EMPTY(&bs->tracked_requests)" failed

Status in QEMU:
  New

Bug description:
  QEMU 1.3.0 on OpenBSD now crashes with an error as shown below and the
  command line params do not seem to matter.

  assertion "QLIST_EMPTY(&bs->tracked_requests)" failed: file "block.c",
  line 1220, function "bdrv_drain_all"

  #1  0x030d1bce24aa in abort () at /usr/src/lib/libc/stdlib/abort.c:70
  p = (struct atexit *) 0x30d11897000
  mask = 4294967263
  cleanup_called = 1
  #2  0x030d1bc5ff44 in __assert2 (file=Variable "file" is not available.
  ) at /usr/src/lib/libc/gen/assert.c:52
  No locals.
  #3  0x030b0d383a03 in bdrv_drain_all () at block.c:1220
  bs = (BlockDriverState *) 0x30d13f3b630
  busy = false
  __func__ = "bdrv_drain_all"
  #4  0x030b0d43acfc in bmdma_cmd_writeb (bm=0x30d0f5f56a8, val=8) at 
hw/ide/pci.c:312
  __func__ = "bmdma_cmd_writeb"
  #5  0x030b0d43b450 in bmdma_write (opaque=0x30d0f5f56a8, addr=0, val=8, 
size=1) at hw/ide/piix.c:76
  bm = (BMDMAState *) 0x30d0f5f56a8
  #6  0x030b0d5c2ce6 in memory_region_write_accessor (opaque=0x30d0f5f57d0, 
addr=0, value=0x30d18c288f0, size=1, shift=0, mask=255)
  at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:334
  mr = (MemoryRegion *) 0x30d0f5f57d0
  tmp = 8
  #7  0x030b0d5c2dc5 in access_with_adjusted_size (addr=0, 
value=0x30d18c288f0, size=1, access_size_min=1, access_size_max=4, 
  access=0x30b0d5c2c6b , 
opaque=0x30d0f5f57d0) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:364
  access_mask = 255
  access_size = 1
  i = 0
  #8  0x030b0d5c3222 in memory_region_iorange_write (iorange=0x30d1d5e7400, 
offset=0, width=1, data=8)
  at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/memory.c:439
  mrio = (MemoryRegionIORange *) 0x30d1d5e7400
  mr = (MemoryRegion *) 0x30d0f5f57d0
  __func__ = "memory_region_iorange_write"
  #9  0x030b0d5c019a in ioport_writeb_thunk (opaque=0x30d1d5e7400, 
addr=49216, data=8) at /home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:212
  ioport = (IORange *) 0x30d1d5e7400
  #10 0x030b0d5bfb65 in ioport_write (index=0, address=49216, data=8) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:83
  func = (IOPortWriteFunc *) 0x30b0d5c0148 
  default_func = {0x30b0d5bfbbc , 0x30b0d5bfc61 
, 0x30b0d5bfd0c }
  #11 0x030b0d5c0704 in cpu_outb (addr=49216, val=8 '\b') at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/ioport.c:289
  No locals.
  #12 0x030b0d6067dd in helper_outb (port=49216, data=8) at 
/home/ports/pobj/qemu-1.3.0-debug/qemu-1.3.0/target-i386/misc_helper.c:72
  No locals.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1087114/+subscriptions



[Qemu-devel] [PATCH KVM v2 4/4] KVM: i8259: refactor pic_set_irq level logic

2012-12-26 Thread Matthew Ogilvie
No change in functionality.

Clarify that the only difference between level triggered and
edge triggered interrupts is on the leading edge.

Signed-off-by: Matthew Ogilvie 
---
 arch/x86/kvm/i8259.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 76d8dc1..63f731b 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -95,26 +95,20 @@ static inline int pic_set_irq1(struct kvm_kpic_state *s, 
int irq, int level)
 {
int mask, ret = 1;
mask = 1 << irq;
-   if (s->elcr & mask) /* level triggered */
-   if (level) {
+   if (level) {
+   if ((s->last_irr & mask) == 0 ||  /* edge for edge-triggered */
+   (s->elcr & mask)) {   /* or level triggered */
ret = !(s->irr & mask);
s->irr |= mask;
-   s->last_irr |= mask;
-   } else {
-   s->irr &= ~mask;
-   s->last_irr &= ~mask;
-   }
-   else/* edge triggered */
-   if (level) {
-   if ((s->last_irr & mask) == 0) {
-   ret = !(s->irr & mask);
-   s->irr |= mask;
-   }
-   s->last_irr |= mask;
-   } else {
-   s->irr &= ~mask;
-   s->last_irr &= ~mask;
}
+   s->last_irr |= mask;
+   } else {
+   /* Dropping level clears the interrupt regardless of edge
+* trigger vs level trigger.
+*/
+   s->irr &= ~mask;
+   s->last_irr &= ~mask;
+   }
 
return (s->imr & mask) ? -1 : ret;
 }
-- 
1.7.10.2.484.gcd07cc5




[Qemu-devel] [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high

2012-12-26 Thread Matthew Ogilvie
Reading the spec, it is clear that most modes normally leave the IRQ
output line high, and only pulse it low to generate a leading edge.
Especially the most commonly used mode 2.

The KVM i8254 model does not try to emulate the duration of the pulse at
all, so just swap the high/low settings it to leave it high most of
the time.

This fix is a prerequisite to improving the i8259 model to handle
the trailing edge of an interupt request as indicated in its spec:
If it gets a trailing edge of an IRQ line before it starts to service
the interrupt, the request should be canceled.

See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
or search the net for 23124406.pdf.

Risks:

There is a risk that migrating a running guest between versions
with and without this patch will lose or gain a single timer
interrupt during the migration process.  The only case where
this is likely to be serious is probably losing a single-shot (mode 4)
interrupt, but if my understanding of how things work is good, then
that should only be possible if a whole slew of conditions are
all met:

 1. The guest is configured to run in a "tickless" mode (like
modern Linux).
 2. The guest is for some reason still using the i8254 rather
than something more modern like an HPET.  (The combination
of 1 and 2 should be rare.)
 3. The migration is going from a fixed version back to the
old version.  (Not sure how common this is, but it should
be rarer than migrating from old to new.)
 4. There are not going to be any "timely" events/interrupts
(keyboard, network, process sleeps, etc) that cause the guest
to reset the PIT mode 4 one-shot counter "soon enough".

This combination should be rare enough that more complicated
solutions are not worth the effort.

Signed-off-by: Matthew Ogilvie 
---
 arch/x86/kvm/i8254.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index c1d30b2..cd4ec60 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
}
spin_unlock(&ps->inject_lock);
if (inject) {
-   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
+   /* Clear previous interrupt, then create a rising
+* edge to request another interupt, and leave it at
+* level=1 until time to inject another one.
+*/
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
+   kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
 
/*
 * Provides NMI watchdog support via Virtual Wire mode.
-- 
1.7.10.2.484.gcd07cc5




[Qemu-devel] [PATCH KVM v2 0/4] fix KVM i8259 IRQ trailing-edge logic

2012-12-26 Thread Matthew Ogilvie
Changes since version 1 (from Sep 9):
   * Split off patch 1; this is the critical prerequisite to
 make the i8254 work with the fixed i8259.
   * Add patch 2, to make additional changes to the i8254
 to make it consistent with the spec and with proposed changes
 to qemu's i8254 model.

Background:

According to the spec, the i8259 will cancel an interrupt if
the line goes low before it starts servicing it, even when
it is considered edge-triggered.  This is a problem
with an old Microport UNIX guest (ca 1987), where the
guest masks off IRQ14 in the slave i8259, but the host's
master i8259 model will still try to deliver an interrupt even after
IRQ2 has been brought low, resulting in a spurious interrupt (IRQ15).

Before the i8259 can be fixed, the i8254 model needs to be fixed
so it doesn't depend on the broken i8259.

Alternative: This could be narrowly fixed for IRQ2 only with no
risk at all (and no need to touch the i8254), but if possible it
seems reasonable to fix it for all lines at the same time.

But there may be some risk:

First, I think I've convinced myself that between the i8254 and i8259,
the only substantial migration breakage should be when a
whole series of conditions are met, and the combination
should be rare enough not to worry about it.  See writeup
in my qemu patch series (version 8; Dec 16).

Second, there is also the possibility that other devices are relying
on the broken model.  I'm especially concerned with various users
of a function called

qemu_irq_pulse()

in the qemu source tree, which immediately lowers IRQ line after
raising it.  If any of those calls are routed straight into
the i8259 (as opposed to an APIC or some other chip), then it
probably won't behave as desired.

I'll probably dig into qemu_irq_pulse() callers more carefully
eventually, but there are lot of them, and any high-level incite
anyone can provide into them would be helpful.

See the Dec 16 patch series I sent to the qemu mailing list for
more information.
http://search.gmane.org/?query=ogilvie&group=gmane.comp.emulators.qemu

Matthew Ogilvie (4):
  KVM: fix i8254 IRQ0 to be normally high
  KVM: additional i8254 output fixes
  KVM: fix i8259 interrupt high to low transition logic
  KVM: i8259: refactor pic_set_irq level logic

 arch/x86/kvm/i8254.c | 50 +++---
 arch/x86/kvm/i8259.c | 36 ++--
 2 files changed, 53 insertions(+), 33 deletions(-)

-- 
1.7.10.2.484.gcd07cc5




[Qemu-devel] [PATCH KVM v2 3/4] KVM: fix i8259 interrupt high to low transition logic

2012-12-26 Thread Matthew Ogilvie
Intel's definition of "edge triggered" means: "asserted with a
low-to-high transition at the time an interrupt is registered
and then kept high until the interrupt is served via one of the
EOI mechanisms or goes away unhandled."

So the only difference between edge triggered and level triggered
is in the leading edge, with no difference in the trailing edge.

This bug manifested itself when the guest was Microport UNIX
System V/386 v2.1 (ca. 1987), because it would sometimes mask
off IRQ14 in the slave IMR after it had already been asserted.
The master would still try to deliver an interrupt to the CPU even
though IRQ2 had dropped again, resulting in a spurious interupt
(IRQ15) and a panicked UNIX kernel.

Signed-off-by: Matthew Ogilvie 
---
 arch/x86/kvm/i8259.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index cc31f7c..76d8dc1 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -111,8 +111,10 @@ static inline int pic_set_irq1(struct kvm_kpic_state *s, 
int irq, int level)
s->irr |= mask;
}
s->last_irr |= mask;
-   } else
+   } else {
+   s->irr &= ~mask;
s->last_irr &= ~mask;
+   }
 
return (s->imr & mask) ? -1 : ret;
 }
@@ -169,14 +171,10 @@ static void pic_update_irq(struct kvm_pic *s)
 {
int irq2, irq;
 
+   /* slave PIC notifies master PIC via IRQ2 */
irq2 = pic_get_irq(&s->pics[1]);
-   if (irq2 >= 0) {
-   /*
-* if irq request by slave pic, signal master PIC
-*/
-   pic_set_irq1(&s->pics[0], 2, 1);
-   pic_set_irq1(&s->pics[0], 2, 0);
-   }
+   pic_set_irq1(&s->pics[0], 2, irq2 >= 0);
+
irq = pic_get_irq(&s->pics[0]);
pic_irq_request(s->kvm, irq >= 0);
 }
-- 
1.7.10.2.484.gcd07cc5




[Qemu-devel] [PATCH KVM v2 2/4] KVM: additional i8254 output fixes

2012-12-26 Thread Matthew Ogilvie
Make git_get_out() consistent with spec.  Currently pit_get_out()
doesn't affect IRQ0, but it can be read by the guest in other ways.
This makes it consistent with proposed changes in qemu's i8254 model
as well.

See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
or search the net for 23124406.pdf.

Signed-off-by: Matthew Ogilvie 
---
 arch/x86/kvm/i8254.c | 44 ++--
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index cd4ec60..fd38938 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int channel)
 
WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock));
 
+   /* FIXME: Add some way to represent a paused timer and return
+*   the paused-at counter value, to better model gate pausing,
+*   "wait until next CLK pulse to load counter" logic, etc.
+*/
t = kpit_elapsed(kvm, c, channel);
d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC);
 
@@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int channel)
counter = (c->count - d) & 0x;
break;
case 3:
-   /* XXX: may be incorrect for odd counts */
-   counter = c->count - (mod_64((2 * d), c->count));
+   counter = (c->count - (mod_64((2 * d), c->count))) & 0xfffe;
break;
default:
counter = c->count - mod_64(d, c->count);
@@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int channel)
switch (c->mode) {
default:
case 0:
-   out = (d >= c->count);
-   break;
case 1:
-   out = (d < c->count);
+   out = (d >= c->count);
break;
case 2:
-   out = ((mod_64(d, c->count) == 0) && (d != 0));
+   out = (mod_64(d, c->count) != (c->count - 1) || c->gate == 0);
break;
case 3:
-   out = (mod_64(d, c->count) < ((c->count + 1) >> 1));
+   out = (mod_64(d, c->count) < ((c->count + 1) >> 1) || c->gate 
== 0);
break;
case 4:
case 5:
-   out = (d == c->count);
+   out = (d != c->count);
break;
}
 
@@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
 
/*
 * The largest possible initial count is 0; this is equivalent
-* to 216 for binary counting and 104 for BCD counting.
+* to pow(2,16) for binary counting and pow(10,4) for BCD counting.
 */
if (val == 0)
val = 0x1;
@@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
 
if (channel != 0) {
ps->channels[channel].count_load_time = ktime_get();
+
+   /* In gate-triggered one-shot modes,
+* indirectly model some pit_get_out()
+* cases by setting the load time way
+* back until gate-triggered.
+* (Generally only affects reading status
+* from channel 2 speaker,
+* due to hard-wired gates on other
+* channels.)
+*
+* FIXME: This might be redesigned if a paused
+* timer state is added for pit_get_count().
+*/
+   if (ps->channels[channel].mode == 1 ||
+   ps->channels[channel].mode == 5) {
+   u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ);
+   ps->channels[channel].count_load_time =
+ktime_sub(ps->channels[channel].count_load_time,
+  ns_to_ktime(delta));
+   }
return;
}
 
@@ -383,7 +404,6 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
 * mode 1 is one shot, mode 2 is period, otherwise del timer */
switch (ps->channels[0].mode) {
case 0:
-   case 1:
 /* FIXME: enhance mode 4 precision */
case 4:
create_pit_timer(kvm, val, 0);
@@ -393,6 +413,10 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
create_pit_timer(kvm, val, 1);
break;
default:
+   /* Modes 1 and 5 are triggered by gate leading edge,
+* but channel 0's gate is hard-wired high and has
+* no edges (on normal real hardware).
+*/
destroy_pit_timer(kvm->arch.vpit);
}
 }
-- 
1.7.10.2.484.gcd07cc5