Re: [Qemu-devel] [Qemu-arm] [PATCH 2/2] block: m25p80: Introduce Die Erase command

2016-12-19 Thread Krzeminski, Marcin (Nokia - PL/Wroclaw)


> -Original Message-
> From: Edgar E. Iglesias [mailto:edgar.igles...@gmail.com]
> Sent: Monday, December 19, 2016 8:28 AM
> To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> 
> Cc: qemu-devel@nongnu.org; peter.mayd...@linaro.org; rfsw-
> patc...@mlist.nokia.com; qemu-...@nongnu.org; c...@kaod.org
> Subject: Re: [Qemu-arm] [PATCH 2/2] block: m25p80: Introduce Die Erase
> command
> 
> On Mon, Dec 19, 2016 at 06:21:13AM +, Krzeminski, Marcin (Nokia -
> PL/Wroclaw) wrote:
> >
> >
> > > -Original Message-
> > > From: Edgar E. Iglesias [mailto:edgar.igles...@gmail.com]
> > > Sent: Friday, December 16, 2016 5:36 PM
> > > To: Krzeminski, Marcin (Nokia - PL/Wroclaw)
> > > 
> > > Cc: qemu-devel@nongnu.org; peter.mayd...@linaro.org; rfsw-
> > > patc...@mlist.nokia.com; qemu-...@nongnu.org; c...@kaod.org
> > > Subject: Re: [Qemu-arm] [PATCH 2/2] block: m25p80: Introduce Die
> > > Erase command
> > >
> > > On Fri, Dec 16, 2016 at 02:27:42PM +0100,
> > > marcin.krzemin...@nokia.com
> > > wrote:
> > > > From: Marcin Krzeminski 
> > > >
> > > > Big flash chips (like mt25qu01g) are consisted from dies.
> > > > Because of that some manufactures remove support for Chip Erase
> > > > giving Die Erase command instead.To avoid unnecessary code
> > > > complication, support for chip erase for mt25qu01g is not removed.
> > > >
> > > > Signed-off-by: Marcin Krzeminski 
> > > > ---
> > > >  hw/block/m25p80.c | 33 +
> > > >  1 file changed, 29 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index
> > > > 2bc7028..0bc1fbf 100644
> > > > --- a/hw/block/m25p80.c
> > > > +++ b/hw/block/m25p80.c
> > > > @@ -216,8 +216,8 @@ static const FlashPartInfo known_devices[] = {
> > > >  { INFO("n25q128", 0x20ba18,  0,  64 << 10, 256, 0) },
> > > >  { INFO("n25q256a",0x20ba19,  0,  64 << 10, 512, ER_4K) },
> > > >  { INFO("n25q512a",0x20ba20,  0,  64 << 10, 1024, ER_4K) },
> > > > -{ INFO("mt25ql01g",   0x20ba21,  0,  64 << 10, 2048, ER_4K) },
> > > > -{ INFO("mt25qu01g",   0x20bb21,  0,  64 << 10, 2048, ER_4K) },
> > > > +{ INFO("mt25ql01g",   0x20ba21, 0x1040,  64 << 10, 2048, ER_4K) },
> > > > +{ INFO("mt25qu01g",   0x20bb21, 0x1040,  64 << 10, 2048, ER_4K) },
> > > >
> > > >  /* Spansion -- single (large) sector size only, at least
> > > >   * for the chips listed here (without boot sectors).
> > > > @@ -358,6 +358,8 @@ typedef enum {
> > > >
> > > >  REVCR = 0x65,
> > > >  WEVCR = 0x61,
> > > > +
> > > > +DIE_ERASE = 0xC4,
> > > >  } FlashCMD;
> > > >
> > > >  typedef enum {
> > > > @@ -411,6 +413,7 @@ typedef struct Flash {
> > > >  bool reset_enable;
> > > >  bool quad_enable;
> > > >  uint8_t ear;
> > > > +uint32_t die_cnt;
> > > >
> > > >  int64_t dirty_page;
> > > >
> > > > @@ -492,7 +495,7 @@ static inline void flash_sync_area(Flash *s,
> > > > int64_t off, int64_t len)
> > > >
> > > >  static void flash_erase(Flash *s, int offset, FlashCMD cmd)  {
> > > > -uint32_t len;
> > > > +uint32_t len = 0;
> > >
> > > Do you really need this?
> >
> > Compilation warning.
> 
> Of course, because you are logging len in a code path that doesn't use the
> variable.
> 
> Let me be more clear below:
> 
> > >
> > > >  uint8_t capa_to_assert = 0;
> > > >
> > > >  switch (cmd) {
> > > > @@ -513,6 +516,16 @@ static void flash_erase(Flash *s, int offset,
> > > FlashCMD cmd)
> > > >  case BULK_ERASE:
> > > >  len = s->size;
> > > >  break;
> > > > +case DIE_ERASE:
> > > > +if (s->die_cnt) {
> > > > +len = s->size / s->die_cnt;
> > > > +offset = offset & (~(len-1));
> > > > +} else {
> > > > +qemu_log_mask(LOG_GUEST_ERROR, "M25P80: %d die erase
> > > > + not
> > > supported by"
> > > > +  " device\n", len);
> 
> Don't log len here...

Sure, I did not get your intention :)

Thanks,
Marcin
> 
> 
> 
> > > > +return;
> > > > +}
> > > > +break;
> > > >  default:
> > > >  abort();
> > > >  }
> > > > @@ -634,6 +647,7 @@ static void complete_collecting_data(Flash *s)
> > > >  case ERASE4_32K:
> > > >  case ERASE_SECTOR:
> > > >  case ERASE4_SECTOR:
> > > > +case DIE_ERASE:
> > > >  flash_erase(s, s->cur_addr, s->cmd_in_progress);
> > > >  break;
> > > >  case WRSR:
> > > > @@ -684,6 +698,7 @@ static void reset_memory(Flash *s)
> > > >  s->write_enable = false;
> > > >  s->reset_enable = false;
> > > >  s->quad_enable = false;
> > > > +s->die_cnt = 0;
> > > >
> > > >  switch (get_man(s)) {
> > > >  case MAN_NUMONYX:
> > > > @@ -716,7 +731,15 @@ static void reset_memory(Flash *s)
> > > >  s->four_bytes_address_mode = true;
> > > >  }
> > > >  if (!(s->nonvolatile_cfg & NVCFG_LOWER_SEGMENT_MASK)) {
> > > > -s->ear = s

Re: [Qemu-devel] [RESEND Patch v1 36/37] vhost-user/msg: handling VHOST_USER_SET_FEATURES

2016-12-19 Thread Wei Wang

On 12/19/2016 01:59 PM, Wei Wang wrote:

If the featuer bits sent by the slave are not equal to the ones that
were sent by the master, perform a reset of the master device.

Signed-off-by: Wei Wang 
---
  hw/net/vhost_net.c   |  2 ++
  hw/virtio/vhost-user.c   | 20 
  hw/virtio/virtio-pci.c   | 20 
  hw/virtio/virtio-pci.h   |  2 ++
  include/net/vhost-user.h | 14 ++
  net/vhost-user.c | 14 +-
  6 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 8256018..e8a2d4f 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
  
  /* virtio-net-pci */
  
+void master_reset_virtio_net(VirtIODevice *vdev)

+{
+VirtIONet *net = VIRTIO_NET(vdev);
+VirtIONetPCI *net_pci = container_of(net, VirtIONetPCI, vdev);
+VirtIOPCIProxy *proxy = &net_pci->parent_obj;
+DeviceState *qdev = DEVICE(proxy);
+DeviceState *qdev_new;
+Error *err = NULL;
+
+virtio_pci_reset(qdev);
+qdev_unplug(qdev, &err);
+qdev->realized = false;
+qdev_new = qdev_device_add(qdev->opts, &err);
+if (!qdev_new) {
+qemu_opts_del(qdev->opts);
+}
+object_unref(OBJECT(qdev));
+}
+


I still have a problem with this patch. Looks like the virtio reset here 
only clears the registers and queue related things. Do we have a power 
reset of virtio, which has the same effect as re-plugging into the 
virtio device (the driver probe() are re-invoked and feature bits are 
re-negotiated). Thanks.


Best,
Wei



[Qemu-devel] [PATCH RFC v2 2/4] block/qapi: reduce the coupling between the bdrv_query_stats and bdrv_query_bds_stats

2016-12-19 Thread Dou Liyang
the bdrv_query_stats and bdrv_query_bds_stats functions need to call
each other, that increases the coupling. it also makes the program
complicated and makes some unnecessary judgements.

remove the call from bdrv_query_bds_stats to bdrv_query_stats, just
take some recursion to make it clearly.

avoid judging whether the blk is NULL during querying the bds stats.
it is unnecessary.

Signed-off-by: Dou Liyang 
---
 block/qapi.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index a62e862..bc622cd 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -357,10 +357,6 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo 
**p_info,
 qapi_free_BlockInfo(info);
 }
 
-static BlockStats *bdrv_query_stats(BlockBackend *blk,
-const BlockDriverState *bs,
-bool query_backing);
-
 static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 {
 BlockAcctStats *stats = blk_get_stats(blk);
@@ -428,9 +424,18 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 }
 }
 
-static void bdrv_query_bds_stats(BlockStats *s, const BlockDriverState *bs,
+static BlockStats *bdrv_query_bds_stats(const BlockDriverState *bs,
  bool query_backing)
 {
+BlockStats *s = NULL;
+
+s = g_malloc0(sizeof(*s));
+s->stats = g_malloc0(sizeof(*s->stats));
+
+if (!bs) {
+return s;
+}
+
 if (bdrv_get_node_name(bs)[0]) {
 s->has_node_name = true;
 s->node_name = g_strdup(bdrv_get_node_name(bs));
@@ -440,14 +445,15 @@ static void bdrv_query_bds_stats(BlockStats *s, const 
BlockDriverState *bs,
 
 if (bs->file) {
 s->has_parent = true;
-s->parent = bdrv_query_stats(NULL, bs->file->bs, query_backing);
+s->parent = bdrv_query_bds_stats(bs->file->bs, query_backing);
 }
 
 if (query_backing && bs->backing) {
 s->has_backing = true;
-s->backing = bdrv_query_stats(NULL, bs->backing->bs, query_backing);
+s->backing = bdrv_query_bds_stats(bs->backing->bs, query_backing);
 }
 
+return s;
 }
 
 static BlockStats *bdrv_query_stats(BlockBackend *blk,
@@ -456,17 +462,13 @@ static BlockStats *bdrv_query_stats(BlockBackend *blk,
 {
 BlockStats *s;
 
-s = g_malloc0(sizeof(*s));
-s->stats = g_malloc0(sizeof(*s->stats));
+s = bdrv_query_bds_stats(bs, query_backing);
 
 if (blk) {
 s->has_device = true;
 s->device = g_strdup(blk_name(blk));
 bdrv_query_blk_stats(s->stats, blk);
 }
-if (bs) {
-bdrv_query_bds_stats(s, bs, query_backing);
-}
 
 return s;
 }
-- 
2.5.5






[Qemu-devel] [PATCH RFC v2 0/4] block/qapi: refactor and optimize the qmp_query_blockstats()

2016-12-19 Thread Dou Liyang
These patches aim to refactor the qmp_query_blockstats() and
improve the performance by reducing the running time of it.

qmp_query_blockstats() is used to monitor the blockstats, it
querys all the graph_bdrv_states or monitor_block_backends.

There are the two jobs:

1 For the performance:

1.1 the time it takes(ns) in each time:
the disk numbers | 10| 500
-
before these patches | 19429 | 667722 
after these patches  | 17516 | 557044

1.2 the I/O performance is degraded(%) during the monitor:

the disk numbers | 10| 500
-
before these patches | 1.3   | 14.2
after these patches  | 0.8   | 9.1

used the dd command likes this to test: 
dd if=date_1.dat of=date_2.dat conv=fsync oflag=direct bs=1k count=100k.

2 refactor qmp_query_blockstats():

From:

+--+  +-+
 | 1|  | 4.  |
 |next_query_bds|  |bdrv_query_bds_stats +---+
 |  |  | |   |
 +^-+  +-^---+   |
  |  |   |
+-+--+  ++---+   |
| 0. |  | 2. |   |
|qmp_query_blockstats+-->bdrv_query_stats<
||  ||
++  ++---+
 |
   +-v---+
   | 3.  |
   |bdrv_query_blk_stats |
   | |
   +-+

To:

+--+
|  |
   +v---+  |
   +--->  3.|  |
+---+  |   |bdrv_query_bds_stats+--+
| 1.+--+   ||
|   +  ++
|qmp_query_blockstats--+
|   |  |
+---+  |   ++
   |   | 2. |
   +--->|
   |bdrv_query_blk_stats|
   ||
   ++


Dou Liyang (4):
  block: refactor the bdrv_next_node and add some comments
  block/qapi: reduce the coupling between the bdrv_query_stats and
bdrv_query_bds_stats
  block/qapi: acquire a reference instead of a lock during querying
blockstats
  block/qapi: optimize the query function of the blockstats

 block.c  | 16 ---
 block/qapi.c | 92 +---
 2 files changed, 50 insertions(+), 58 deletions(-)

-- 
2.5.5






[Qemu-devel] [PATCH RFC v2 1/4] block: refactor bdrv_next_node for readability

2016-12-19 Thread Dou Liyang
make the bdrv_next_node() clearly and add some comments.

Signed-off-by: Dou Liyang 
---
 block.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 39ddea3..01c9e51 100644
--- a/block.c
+++ b/block.c
@@ -2931,12 +2931,20 @@ bool bdrv_chain_contains(BlockDriverState *top, 
BlockDriverState *base)
 return top != NULL;
 }
 
+/*
+ * Return the BlockDriverStates of all the named nodes.
+ * If @bs is null, return the first one.
+ * Else, return @bs's next sibling, which may be null.
+ *
+ * To iterate over all BlockDriverStates, do
+ * for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(blk)) {
+ * ...
+ * }
+ */
 BlockDriverState *bdrv_next_node(BlockDriverState *bs)
 {
-if (!bs) {
-return QTAILQ_FIRST(&graph_bdrv_states);
-}
-return QTAILQ_NEXT(bs, node_list);
+return bs ? QTAILQ_NEXT(bs, node_list)
+: QTAILQ_FIRST(&graph_bdrv_states);
 }
 
 const char *bdrv_get_node_name(const BlockDriverState *bs)
-- 
2.5.5






[Qemu-devel] [PATCH RFC v2 4/4] block/qapi: optimize the query function of the blockstats

2016-12-19 Thread Dou Liyang
this patch works to optimize the qmp_query_blockstats() by removing
additional performance overhead from the next_query_bds and
bdrv_query_stats.

It removes that two functions, and also makes the structure of the
code clearly.

Signed-off-by: Dou Liyang 
---
 block/qapi.c | 69 +++-
 1 file changed, 26 insertions(+), 43 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 2262918..d561945 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -456,27 +456,6 @@ static BlockStats *bdrv_query_bds_stats(const 
BlockDriverState *bs,
 return s;
 }
 
-static BlockStats *bdrv_query_stats(BlockBackend *blk,
-BlockDriverState *bs,
-bool query_backing)
-{
-BlockStats *s;
-
-bdrv_ref(bs);
-s = bdrv_query_bds_stats(bs, query_backing);
-bdrv_unref(bs);
-
-if (blk) {
-blk_ref(blk);
-s->has_device = true;
-s->device = g_strdup(blk_name(blk));
-bdrv_query_blk_stats(s->stats, blk);
-blk_unref(blk);
-}
-
-return s;
-}
-
 BlockInfoList *qmp_query_block(Error **errp)
 {
 BlockInfoList *head = NULL, **p_next = &head;
@@ -500,37 +479,41 @@ BlockInfoList *qmp_query_block(Error **errp)
 return head;
 }
 
-static bool next_query_bds(BlockBackend **blk, BlockDriverState **bs,
-   bool query_nodes)
-{
-if (query_nodes) {
-*bs = bdrv_next_node(*bs);
-return !!*bs;
-}
-
-*blk = blk_next(*blk);
-*bs = *blk ? blk_bs(*blk) : NULL;
-
-return !!*blk;
-}
-
 BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
  bool query_nodes,
  Error **errp)
 {
 BlockStatsList *head = NULL, **p_next = &head;
-BlockBackend *blk = NULL;
-BlockDriverState *bs = NULL;
+BlockBackend *blk;
+BlockDriverState *bs;
 
 /* Just to be safe if query_nodes is not always initialized */
-query_nodes = has_query_nodes && query_nodes;
+if (has_query_nodes && query_nodes) {
+for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(bs)) {
+BlockStatsList *info = g_malloc0(sizeof(*info));
 
-while (next_query_bds(&blk, &bs, query_nodes)) {
-BlockStatsList *info = g_malloc0(sizeof(*info));
+bdrv_ref(bs);
+info->value = bdrv_query_bds_stats(bs, false);
+bdrv_unref(bs);
 
-info->value = bdrv_query_stats(blk, bs, !query_nodes);
-*p_next = info;
-p_next = &info->next;
+*p_next = info;
+p_next = &info->next;
+}
+} else {
+for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+BlockStatsList *info = g_malloc0(sizeof(*info));
+
+blk_ref(blk);
+BlockStats *s = bdrv_query_bds_stats(blk_bs(blk), true);
+s->has_device = true;
+s->device = g_strdup(blk_name(blk));
+bdrv_query_blk_stats(s->stats, blk);
+blk_unref(blk);
+
+info->value = s;
+*p_next = info;
+p_next = &info->next;
+}
 }
 
 return head;
-- 
2.5.5






Re: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup

2016-12-19 Thread Liu, Yi L
On Tue, Dec 06, 2016 at 06:36:15PM +0800, Peter Xu wrote:
> This RFC series is a continue work for Aviv B.D.'s vfio enablement
> series with vt-d. Aviv has done a great job there, and what we still
> lack there are mostly the following:
> 
> (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> memory region.
> 
> (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> when IOMMU domain switches, things will broke).
> 
> Here I'm trying to solve the above two issues.
> 
> (1) is solved by patch 7, (2) is solved by patch 11-12.
> 
> Basically it contains the following:
> 
> patch 1:picked up from Jason's vhost DMAR series, which is a bugfix
> 
> patch 2-6:  Cleanups/Enhancements for existing vt-d codes (please see
> specific commit message for details, there are patches
> that I thought may be suitable for 2.8 as well, but looks
> like it's too late)
> 
> patch 7:Solve the issue that vfio is notified more than once for
> IOTLB notifications with Aviv's patches
> 
> patch 8-10: Some trivial memory APIs added for further patches, and
> add customize replay() support for MemoryRegion (I see
> Aviv's latest v7 contains similar replay, I can rebase
> onto that, merely the same thing)
> 
> patch 11:   Provide a valid vt-d replay() callback, using page walk
> 
Peter,
Does your patch set based on Aviv's patch? I found the page cannot be
applied in my side.

BTW. it may be better if you can split the patches for mis cleanup
and the patches for replay/"fix duplicate notify".

Thanks,
Yi L
> patch 12:   Enable the domain switch support - we replay() when
> context entry got invalidated
> 
> patch 13:   Enhancement for existing invalidation notification,
> instead of using translate() for each page, we leverage
> the new vtd_page_walk() interface, which should be faster.
> 
> I would glad to hear about any review comments for above patches
> (especially patch 8-13, which is the main part of this series),
> especially any issue I missed in the series.
> 
> =
> Test Done
> =
> 
> Build test passed for x86_64/arm/ppc64.
> 
> Simply tested with x86_64, assigning two PCI devices to a single VM,
> boot the VM using:
> 
> bin=x86_64-softmmu/qemu-system-x86_64
> $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
>  -device intel-iommu,intremap=on,eim=off,cache-mode=on \
>  -netdev user,id=net0,hostfwd=tcp::-:22 \
>  -device virtio-net-pci,netdev=net0 \
>  -device vfio-pci,host=03:00.0 \
>  -device vfio-pci,host=02:00.0 \
>  -trace events=".trace.vfio" \
>  /var/lib/libvirt/images/vm1.qcow2
> 
> pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> vtd_page_walk*
> vtd_replay*
> vtd_inv_desc*
> 
> Then, in the guest, run the following tool:
> 
>   
> https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
> 
> With parameter:
> 
>   ./vfio-bind-group 00:03.0 00:04.0
> 
> Check host side trace log, I can see pages are replayed and mapped in
> 00:04.0 device address space, like:
> 
> ...
> vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x301 lo 
> 0x3be77001
> vtd_page_walk Page walk for ce (0x301, 0x3be77001) iova range 0x0 - 
> 0x80
> vtd_page_walk_level Page walk (base=0x3be77000, level=3) iova range 0x0 - 
> 0x80
> vtd_page_walk_level Page walk (base=0x3c88a000, level=2) iova range 0x0 - 
> 0x4000
> vtd_page_walk_level Page walk (base=0x366cb000, level=1) iova range 0x0 - 
> 0x20
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x366cb000 
> mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xb000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xc000 -> gpa 
> 0x366cb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map

Re: [Qemu-devel] [RFC PATCH 03/13] intel_iommu: renaming gpa to iova where proper

2016-12-19 Thread Liu, Yi L
On Tue, Dec 06, 2016 at 06:36:18PM +0800, Peter Xu wrote:
> There are lots of places in current intel_iommu.c codes that named
> "iova" as "gpa". It is really confusing to use a name "gpa" in these
> places (which is very easily to be understood as "Guest Physical
> Address", while it's not). To make the codes (much) easier to be read, I
> decided to do this once and for all.
> 
> No functional change is made. Only literal ones.
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c | 46 +++---
>  1 file changed, 23 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index f19a8b3..3d98797 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -279,7 +279,7 @@ static void vtd_update_iotlb(IntelIOMMUState *s, uint16_t 
> source_id,
>  uint64_t *key = g_malloc(sizeof(*key));
>  uint64_t gfn = vtd_get_iotlb_gfn(addr, level);
>  
> -VTD_DPRINTF(CACHE, "update iotlb sid 0x%"PRIx16 " gpa 0x%"PRIx64
> +VTD_DPRINTF(CACHE, "update iotlb sid 0x%"PRIx16 " iova 0x%"PRIx64
>  " slpte 0x%"PRIx64 " did 0x%"PRIx16, source_id, addr, slpte,
>  domain_id);
>  if (g_hash_table_size(s->iotlb) >= VTD_IOTLB_MAX_SIZE) {
> @@ -595,12 +595,12 @@ static uint64_t vtd_get_slpte(dma_addr_t base_addr, 
> uint32_t index)
>  return slpte;
>  }
>  
> -/* Given a gpa and the level of paging structure, return the offset of 
> current
> - * level.
> +/* Given a iova and the level of paging structure, return the offset
maybe "an iova" instead of "a iova"
> + * of current level.
>   */
> -static inline uint32_t vtd_gpa_level_offset(uint64_t gpa, uint32_t level)
> +static inline uint32_t vtd_iova_level_offset(uint64_t iova, uint32_t level)
>  {
> -return (gpa >> vtd_slpt_level_shift(level)) &
> +return (iova >> vtd_slpt_level_shift(level)) &
>  ((1ULL << VTD_SL_LEVEL_BITS) - 1);
>  }
>  
> @@ -648,13 +648,13 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
> uint32_t level)
>  }
>  }
>  
> -/* Given the @gpa, get relevant @slptep. @slpte_level will be the last level
> +/* Given the @iova, get relevant @slptep. @slpte_level will be the last level
>   * of the translation, can be used for deciding the size of large page.
>   */
> -static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t gpa,
> -IOMMUAccessFlags flags,
> -uint64_t *slptep, uint32_t *slpte_level,
> -bool *reads, bool *writes)
> +static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova,
> + IOMMUAccessFlags flags,
> + uint64_t *slptep, uint32_t *slpte_level,
> + bool *reads, bool *writes)
>  {
>  dma_addr_t addr = vtd_get_slpt_base_from_context(ce);
>  uint32_t level = vtd_get_level_from_context_entry(ce);
> @@ -663,11 +663,11 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, 
> uint64_t gpa,
>  uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
>  uint64_t access_right_check = 0;
>  
> -/* Check if @gpa is above 2^X-1, where X is the minimum of MGAW in 
> CAP_REG
> - * and AW in context-entry.
> +/* Check if @iova is above 2^X-1, where X is the minimum of MGAW
> + * in CAP_REG and AW in context-entry.
>   */
> -if (gpa & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
> -VTD_DPRINTF(GENERAL, "error: gpa 0x%"PRIx64 " exceeds limits", gpa);
> +if (iova & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
> +VTD_DPRINTF(GENERAL, "error: iova 0x%"PRIx64 " exceeds limits", 
> iova);
>  return -VTD_FR_ADDR_BEYOND_MGAW;
>  }
>  
> @@ -683,13 +683,13 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, 
> uint64_t gpa,
>  }
>  
>  while (true) {
> -offset = vtd_gpa_level_offset(gpa, level);
> +offset = vtd_iova_level_offset(iova, level);
>  slpte = vtd_get_slpte(addr, offset);
>  
>  if (slpte == (uint64_t)-1) {
>  VTD_DPRINTF(GENERAL, "error: fail to access second-level paging "
> -"entry at level %"PRIu32 " for gpa 0x%"PRIx64,
> -level, gpa);
> +"entry at level %"PRIu32 " for iova 0x%"PRIx64,
> +level, iova);
>  if (level == vtd_get_level_from_context_entry(ce)) {
>  /* Invalid programming of context-entry */
>  return -VTD_FR_CONTEXT_ENTRY_INV;
> @@ -701,8 +701,8 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, uint64_t 
> gpa,
>  *writes = (*writes) && (slpte & VTD_SL_W);
>  if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
>  VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> -"gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> -(flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
> + 

[Qemu-devel] [PATCH RFC v2 3/4] block/qapi: acquire a reference instead of a lock during querying blockstats

2016-12-19 Thread Dou Liyang
This patch works to improve the performance of the query requests.

>From the commit 13344f3a, it adds a lock to make query-blockstats
safe by the aio_context_acquire(). the qmp_query_blockstats func
requires/releases the AioContext lock, which takes some time and
blocks the I/O processing. It affects the performance, especially
in the multi-disks guests.

As the low-level details of block statistics inside QEMU, we can
acquire a reference instead of the lock.

Signed-off-by: Dou Liyang 
---
 block/qapi.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index bc622cd..2262918 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -457,17 +457,21 @@ static BlockStats *bdrv_query_bds_stats(const 
BlockDriverState *bs,
 }
 
 static BlockStats *bdrv_query_stats(BlockBackend *blk,
-const BlockDriverState *bs,
+BlockDriverState *bs,
 bool query_backing)
 {
 BlockStats *s;
 
+bdrv_ref(bs);
 s = bdrv_query_bds_stats(bs, query_backing);
+bdrv_unref(bs);
 
 if (blk) {
+blk_ref(blk);
 s->has_device = true;
 s->device = g_strdup(blk_name(blk));
 bdrv_query_blk_stats(s->stats, blk);
+blk_unref(blk);
 }
 
 return s;
@@ -523,13 +527,8 @@ BlockStatsList *qmp_query_blockstats(bool has_query_nodes,
 
 while (next_query_bds(&blk, &bs, query_nodes)) {
 BlockStatsList *info = g_malloc0(sizeof(*info));
-AioContext *ctx = blk ? blk_get_aio_context(blk)
-  : bdrv_get_aio_context(bs);
 
-aio_context_acquire(ctx);
 info->value = bdrv_query_stats(blk, bs, !query_nodes);
-aio_context_release(ctx);
-
 *p_next = info;
 p_next = &info->next;
 }
-- 
2.5.5






[Qemu-devel] Strange/wrong behavior with iSCSI Tape devices in QEMU 2.8.0-rc4

2016-12-19 Thread Holger Schranz

# Strange/wrong behavior in QEMU 2.8.0-rc4

After update from QEMU 2.6.2 to 2.8.0-rc4 the tape devices
and the corresponding medium changer are no longer available
in the VM quest system.

The tape devices and the media changer are declared in the
xml-file for libvirt. Both, tape drives and medium changer
are avalilable via iSCSI.

In a first rough investigation the login in iSCSI runs well
but in QEMU, it seems the devices doesn't reported to the VM-Quest.

--

Best regards

Holger

=

XML-Declration:

.
.
.



  




  name='iqn.2008-09.net.fsc:server.LT260_61003/4'>

  
  
  



  name='iqn.2008-09.net.fsc:server.LT260_61003/5'>

  
  
  




  name='iqn.2008-09.net.fsc:server.LT260_61003/6'>

  
  
  




  name='iqn.2008-09.net.fsc:server.LT60_61005/3'>

  
  
  



  name='iqn.2008-09.net.fsc:server.LT60_61005/4'>

  
  
  

.
.
.

===

correct behavior/Result inside the VM-Quest together with QEMU/KVM 2.6.2 
/ 2.7.0

--
(VCSTCS82:A)IUP1:~ # lsscsi -g
[0:2:0:0]diskFTS  PRAID EP420i 4.25  /dev/sda /dev/sg0
[3:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.5+  /dev/sr0 /dev/sg1
[8:0:2:1]tapeHP   Ultrium 5-SCSI   0001  /dev/st3 /dev/sg6
[8:0:2:2]tapeHP   Ultrium 5-SCSI   0001  /dev/st2 /dev/sg5
[8:0:2:3]mediumx FUJITSU  ETERNUS LT2606.20  - /dev/sg4
[8:0:2:4]tapeHP   Ultrium 6-SCSI   23AB  /dev/st1 /dev/sg3
[8:0:2:5]tapeHP   Ultrium 6-SCSI   23AB  /dev/st0 /dev/sg2
[11:0:0:0]   diskFUJITSU  ETERNUS_DXL    /dev/sdb /dev/sg7
[11:0:0:2]   diskFUJITSU  ETERNUS_DXL    /dev/sdh /dev/sg13
[11:0:0:3]   diskFUJITSU  ETERNUS_DXL    /dev/sdg /dev/sg12
[11:0:0:4]   diskFUJITSU  ETERNUS_DXL    /dev/sdf /dev/sg11
[11:0:0:5]   diskFUJITSU  ETERNUS_DXL    /dev/sde /dev/sg10
[11:0:0:6]   diskFUJITSU  ETERNUS_DXL    /dev/sdd /dev/sg9
[11:0:0:7]   diskFUJITSU  ETERNUS_DXL    /dev/sdc /dev/sg8
[11:0:0:15]  diskFUJITSU  ETERNUS_DXL    /dev/sdi /dev/sg14
(VCSTCS82:A)IUP1:~ #



Wrong behavior/Result inside the VM-Quest together with QEMU/KVM 2.8.0-RC4
--
(VCSTCS82:A)IUP1:~ # lsscsi -g
[0:2:0:0]diskFTS  PRAID EP420i 4.25  /dev/sda /dev/sg0
[3:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.5+  /dev/sr0 /dev/sg1
[11:0:1:0]   diskFUJITSU  ETERNUS_DXL    /dev/sdj /dev/sg10
[11:0:1:2]   diskFUJITSU  ETERNUS_DXL    /dev/sdp /dev/sg16
[11:0:1:3]   diskFUJITSU  ETERNUS_DXL    /dev/sdo /dev/sg15
[11:0:1:4]   diskFUJITSU  ETERNUS_DXL    /dev/sdn /dev/sg14
[11:0:1:5]   diskFUJITSU  ETERNUS_DXL    /dev/sdm /dev/sg13
[11:0:1:6]   diskFUJITSU  ETERNUS_DXL    /dev/sdl /dev/sg12
[11:0:1:7]   diskFUJITSU  ETERNUS_DXL    /dev/sdk /dev/sg11
[11:0:1:15]  diskFUJITSU  ETERNUS_DXL    /dev/sdq /dev/sg17
(VCSTCS82:A)IUP1:~ #



[Qemu-devel] [Bug 1267520] Re: Keyboard input not working when the "-k en-us" argument is specified.

2016-12-19 Thread skovalev
I confirm this too: Qemu 2.6.1.

I have tried to install Fedora with kickstart file through Packer with
`"boot_command": [ "text
ks=http://{{.HTTPIP}}:{{.HTTPPort}}/ks.cfg"]`. But no symbol from
`["", " ", "[:alpha:]"]` were printed. Only `["=", ":", "/", "."]`
could be seen if I manually press "".

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1267520

Title:
  Keyboard input not working when the "-k en-us" argument is specified.

Status in QEMU:
  New

Bug description:
  This bug occurs on qemu compiled with i386_softmmu and x86-64_softmmu on 
linux kernel 3.5.0 (64-bit). (Haven't confirmed this for other targets).
  Whenever I run qemu (both i386 and x86_64) to use the en-us language (even 
though it is the default), I get "Warning: no scancode found for keysym X" (X 
is an integer).
  In the disk image I need qemu to run, I had a shell set up.  The shell 
doesn't register keyboard input when the '-k en-us' command line argument is 
set to run qemu. I did not have this problem with earlier versions of qemu.

  Additional information:
  Setting keymaps directory on command line -L doesn't resolve this.
  Bug occurs with on both curses and sdl VGA output.
  I am running qemu on Ubuntu 12.04 and I have not been able see if the bug is 
distribution-specific. However, I am also experiencing the bug on Kali-Linux; 
another debian based distribution.
  It turns out that all languages reproduce the bug, not just 'en-us'.

  Update: I have narrowed the bug to be attributable to versions later
  than qemu-1.1.2.

  Here's a listing of key being mapped:

  Setting keysym exclam (33) to 258
  Setting keysym at (64) to 259
  Setting keysym numbersign (35) to 260
  Setting keysym dollar (36) to 261
  Setting keysym percent (37) to 262
  Setting keysym asciicircum (94) to 263
  Setting keysym ampersand (38) to 264
  Setting keysym asterisk (42) to 265
  Setting keysym parenleft (40) to 266
  Setting keysym parenright (41) to 267
  Setting keysym minus (45) to 12
  Setting keysym underscore (95) to 268
  Setting keysym equal (61) to 13
  Setting keysym plus (43) to 269
  Setting keysym bracketleft (91) to 26
  Setting keysym braceleft (123) to 282
  Setting keysym bracketright (93) to 27
  Setting keysym braceright (125) to 283
  Setting keysym semicolon (59) to 39
  Setting keysym colon (58) to 295
  Setting keysym apostrophe (39) to 40
  Setting keysym quotedbl (34) to 296
  Setting keysym grave (96) to 41
  Setting keysym asciitilde (126) to 297
  Setting keysym backslash (92) to 43
  Setting keysym bar (124) to 299
  Setting keysym comma (44) to 51
  Setting keysym less (60) to 307
  Setting keysym period (46) to 52
  Setting keysym greater (62) to 308
  Setting keysym slash (47) to 53
  Setting keysym question (63) to 309

  As one can see, the pc-bios/keymaps/common, contaning the QWWERTY
  keys, is not processed in parse_init_keyboard at ui/keymaps.c even
  though the XKB map (keymaps/en-us) includes the file.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1267520/+subscriptions



Re: [Qemu-devel] [PATCH v5 2/3] utils: Add helper to read arm MIDR_EL1 register

2016-12-19 Thread Vijay Kilari
On Fri, Dec 16, 2016 at 7:34 PM, Peter Maydell  wrote:
> On 7 December 2016 at 17:06,   wrote:
>> From: Vijaya Kumar K 
>>
>> Add helper API to read MIDR_EL1 registers to fetch
>> cpu identification information. This helps in
>> adding errata's and architecture specific features.
>>
>> This is implemented only for arm architecture.
>>
>> Signed-off-by: Vijaya Kumar K 
>> ---
>>  include/qemu/aarch64-cpuid.h | 38 
>>  util/Makefile.objs   |  1 +
>>  util/aarch64-cpuid.c | 52 
>> 
>>  3 files changed, 91 insertions(+)
>>
>> diff --git a/include/qemu/aarch64-cpuid.h b/include/qemu/aarch64-cpuid.h
>> new file mode 100644
>> index 000..fb88ed8
>> --- /dev/null
>> +++ b/include/qemu/aarch64-cpuid.h
>> @@ -0,0 +1,38 @@
>> +#ifndef QEMU_AARCH64_CPUID_H
>> +#define QEMU_AARCH64_CPUID_H
>> +
>> +#if defined(__aarch64__) && defined(CONFIG_LINUX)
>> +#define MIDR_IMPLEMENTER_SHIFT  24
>> +#define MIDR_IMPLEMENTER_MASK   (0xffULL << MIDR_IMPLEMENTER_SHIFT)
>> +#define MIDR_ARCHITECTURE_SHIFT 16
>> +#define MIDR_ARCHITECTURE_MASK  (0xf << MIDR_ARCHITECTURE_SHIFT)
>> +#define MIDR_PARTNUM_SHIFT  4
>> +#define MIDR_PARTNUM_MASK   (0xfff << MIDR_PARTNUM_SHIFT)
>> +
>> +#define MIDR_CPU_PART(imp, partnum) \
>> +(((imp) << MIDR_IMPLEMENTER_SHIFT)  | \
>> +(0xf<< MIDR_ARCHITECTURE_SHIFT) | \
>> +((partnum)  << MIDR_PARTNUM_SHIFT))
>> +
>> +#define ARM_CPU_IMP_CAVIUM0x43
>> +#define CAVIUM_CPU_PART_THUNDERX  0x0A1
>> +
>> +#define MIDR_THUNDERX_PASS2  \
>> +   MIDR_CPU_PART(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
>> +#define CPU_MODEL_MASK  (MIDR_IMPLEMENTER_MASK | MIDR_ARCHITECTURE_MASK | \
>> + MIDR_PARTNUM_MASK)
>> +
>> +uint64_t get_aarch64_cpu_id(void);
>> +bool is_thunderx_pass2_cpu(void);
>> +#else
>> +static inline uint64_t get_aarch64_cpu_id(void)
>> +{
>> +return 0;
>> +}
>> +
>> +static inline bool is_thunderx_pass2_cpu(void)
>> +{
>> +return false;
>> +}
>> +#endif
>> +#endif
>> diff --git a/util/Makefile.objs b/util/Makefile.objs
>> index ad0f9c7..a9585c9 100644
>> --- a/util/Makefile.objs
>> +++ b/util/Makefile.objs
>> @@ -36,3 +36,4 @@ util-obj-y += log.o
>>  util-obj-y += qdist.o
>>  util-obj-y += qht.o
>>  util-obj-y += range.o
>> +util-obj-$(CONFIG_LINUX) += aarch64-cpuid.o
>> diff --git a/util/aarch64-cpuid.c b/util/aarch64-cpuid.c
>> new file mode 100644
>> index 000..575f52e
>> --- /dev/null
>> +++ b/util/aarch64-cpuid.c
>> @@ -0,0 +1,52 @@
>> +/*
>> + * Dealing with arm cpu identification information.
>> + *
>> + * Copyright (C) 2016 Cavium, Inc.
>> + *
>> + * Authors:
>> + *  Vijaya Kumar K 
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2.1
>> + * or later.  See the COPYING.LIB file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/aarch64-cpuid.h"
>> +
>> +#if defined(__aarch64__)
>> +static uint64_t qemu_read_aarch64_midr_el1(void)
>> +{
>> +const char *file = 
>> "/sys/devices/system/cpu/cpu0/regs/identification/midr_el1";
>
> If CPU0 happens to be offline (eg hot-unplugged) then this file
> won't exist, and we'll fail to identify any MIDR value.

I thought wrongly that cpu0 cannot be hot-plugged on arm64.
At-least on our platform, it is not allowed.

One solution I think of is to get current running cpu using sched_getcpu()
and fetch midr from that cpu path
OR  read /sys/devices/system/cpu/online and find online cpu.

>
> The API as designed here also doesn't seem to consider
> the idea of big.LITTLE systems -- if there are multiple
> CPUs with different MIDRs, which one should we return here?

Yes, this is the limitation here to handle big.LITTLE configuration.
It was discussed in initial version of this patch series.

https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg01221.html

(From use case point of view, we require only Implementer ID, which
 won't be different for big.LITTLE configuration. I agree that this generic
 function should work for other use cases as well).

So I will add a comment here.

>
>> +char *buf;
>> +uint64_t midr = 0;
>> +
>> +if (!g_file_get_contents(file, &buf, 0, NULL)) {
>> +goto out;
>> +}
>> +
>> +if (qemu_strtoull(buf, NULL, 0, &midr) < 0) {
>> +midr = 0;
>> +goto out;
>> +}
>> +
>> +out:
>> +g_free(buf);
>> +
>> +return midr;
>
> thanks
> -- PMM



Re: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup

2016-12-19 Thread Peter Xu
On Sun, Dec 18, 2016 at 04:42:50PM +0800, Liu, Yi L wrote:
> On Tue, Dec 06, 2016 at 06:36:15PM +0800, Peter Xu wrote:
> > This RFC series is a continue work for Aviv B.D.'s vfio enablement
> > series with vt-d. Aviv has done a great job there, and what we still
> > lack there are mostly the following:
> > 
> > (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> > memory region.
> > 
> > (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> > when IOMMU domain switches, things will broke).
> > 
> > Here I'm trying to solve the above two issues.
> > 
> > (1) is solved by patch 7, (2) is solved by patch 11-12.
> > 
> > Basically it contains the following:
> > 
> > patch 1:picked up from Jason's vhost DMAR series, which is a bugfix
> > 
> > patch 2-6:  Cleanups/Enhancements for existing vt-d codes (please see
> > specific commit message for details, there are patches
> > that I thought may be suitable for 2.8 as well, but looks
> > like it's too late)
> > 
> > patch 7:Solve the issue that vfio is notified more than once for
> > IOTLB notifications with Aviv's patches
> > 
> > patch 8-10: Some trivial memory APIs added for further patches, and
> > add customize replay() support for MemoryRegion (I see
> > Aviv's latest v7 contains similar replay, I can rebase
> > onto that, merely the same thing)
> > 
> > patch 11:   Provide a valid vt-d replay() callback, using page walk
> > 
> Peter,
> Does your patch set based on Aviv's patch? I found the page cannot be
> applied in my side.

Hi, Yi,

This series is based on Aviv's v6 series. If you wanna try it, you may
want to fetch the tree from:

 https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement

So you won't need to bother on the applying.

> 
> BTW. it may be better if you can split the patches for mis cleanup
> and the patches for replay/"fix duplicate notify".

Yes. Here I just want to make sure things are stick together (e.g., to
test the replay, I will need to use the traces). And I feel it awkward
to maintain several series upstream while they interact with each
other. Sorry for the troublesome.

Thanks,

-- peterx



Re: [Qemu-devel] [RFC PATCH 03/13] intel_iommu: renaming gpa to iova where proper

2016-12-19 Thread Peter Xu
On Sun, Dec 18, 2016 at 04:39:11PM +0800, Liu, Yi L wrote:

[...]

> > @@ -595,12 +595,12 @@ static uint64_t vtd_get_slpte(dma_addr_t base_addr, 
> > uint32_t index)
> >  return slpte;
> >  }
> >  
> > -/* Given a gpa and the level of paging structure, return the offset of 
> > current
> > - * level.
> > +/* Given a iova and the level of paging structure, return the offset
> maybe "an iova" instead of "a iova"

Will fix. Thanks,

-- peterx



Re: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup

2016-12-19 Thread Peter Xu
On Tue, Dec 06, 2016 at 06:36:15PM +0800, Peter Xu wrote:
> This RFC series is a continue work for Aviv B.D.'s vfio enablement
> series with vt-d. Aviv has done a great job there, and what we still
> lack there are mostly the following:
> 
> (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> memory region.
> 
> (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> when IOMMU domain switches, things will broke).
> 
> Here I'm trying to solve the above two issues.
> 
> (1) is solved by patch 7, (2) is solved by patch 11-12.
> 
> Basically it contains the following:
> 
> patch 1:picked up from Jason's vhost DMAR series, which is a bugfix
> 
> patch 2-6:  Cleanups/Enhancements for existing vt-d codes (please see
> specific commit message for details, there are patches
> that I thought may be suitable for 2.8 as well, but looks
> like it's too late)
> 
> patch 7:Solve the issue that vfio is notified more than once for
> IOTLB notifications with Aviv's patches
> 
> patch 8-10: Some trivial memory APIs added for further patches, and
> add customize replay() support for MemoryRegion (I see
> Aviv's latest v7 contains similar replay, I can rebase
> onto that, merely the same thing)
> 
> patch 11:   Provide a valid vt-d replay() callback, using page walk
> 
> patch 12:   Enable the domain switch support - we replay() when
> context entry got invalidated
> 
> patch 13:   Enhancement for existing invalidation notification,
> instead of using translate() for each page, we leverage
> the new vtd_page_walk() interface, which should be faster.
> 
> I would glad to hear about any review comments for above patches
> (especially patch 8-13, which is the main part of this series),
> especially any issue I missed in the series.

Hi, Michael,

Could you help have a look on this series? Hope we can move this
series forward a bit since it's still lack of review.

Btw, IMHO we can merge patch 1 now since people might encounter issue
without it (and it has been dangling quite a long time upstream).

Thanks,

-- peterx



Re: [Qemu-devel] [PATCHv3] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-19 Thread Vlad Lungu


On 12/18/2016 10:25 PM, Eduardo Habkost wrote:
> On Thu, Dec 15, 2016 at 02:32:04PM +0200, Vlad Lungu wrote:
>> get_opt_value() truncates the value at the first comma
>> Use memcpy() instead
>> Unescape the module filename and parameters with get_opt_value()
>> before calling mb_add_cmdline()
>>
>> Signed-off-by: Vlad Lungu 
>> ---
>>  hw/i386/multiboot.c | 19 +--
>>  1 file changed, 9 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
>> index 387caa6..6b7b5a9 100644
>> --- a/hw/i386/multiboot.c
>> +++ b/hw/i386/multiboot.c
>> @@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
>> char *cmdline)
>>  hwaddr p = s->offset_cmdlines;
>>  char *b = (char *)s->mb_buf + p;
>>  
>> -get_opt_value(b, strlen(cmdline) + 1, cmdline);
>> +memcpy(b, cmdline, strlen(cmdline) + 1);
>>  s->offset_cmdlines += strlen(b) + 1;
>>  return s->mb_buf_phys + p;
>>  }
>> @@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
>>  mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
>>  
>>  if (initrd_filename) {
>> -char *next_initrd, not_last;
>> +char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
>>  
>>  mbs.offset_mods = mbs.mb_buf_size;
>>  
>> @@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg,
>>  int mb_mod_length;
>>  uint32_t offs = mbs.mb_buf_size;
>>  
>> -next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
>> +next_initrd = (char *)get_opt_value(tmpbuf, 
>> strlen(initrd_filename) + 1, initrd_filename);
> I would prefer to use sizeof(initrd_filename) like Paolo
> suggested.
sizeof(initrd_filename) is 8 (on my machine, x86_64). Maybe sizeof(tmpbuf) 
would be a better idea :-)

Regards,
Vlad




[Qemu-devel] [PATCHv4] multiboot: copy the cmdline verbatim, unescape module strings

2016-12-19 Thread Vlad Lungu
get_opt_value() truncates the value at the first comma
Use memcpy() instead
Unescape the module filename and parameters with get_opt_value()
before calling mb_add_cmdline()

Signed-off-by: Vlad Lungu 
---
 hw/i386/multiboot.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/i386/multiboot.c b/hw/i386/multiboot.c
index 387caa6..efe11ae 100644
--- a/hw/i386/multiboot.c
+++ b/hw/i386/multiboot.c
@@ -109,7 +109,7 @@ static uint32_t mb_add_cmdline(MultibootState *s, const 
char *cmdline)
 hwaddr p = s->offset_cmdlines;
 char *b = (char *)s->mb_buf + p;
 
-get_opt_value(b, strlen(cmdline) + 1, cmdline);
+memcpy(b, cmdline, strlen(cmdline) + 1);
 s->offset_cmdlines += strlen(b) + 1;
 return s->mb_buf_phys + p;
 }
@@ -287,7 +287,7 @@ int load_multiboot(FWCfgState *fw_cfg,
 mbs.offset_bootloader = mbs.offset_cmdlines + cmdline_len;
 
 if (initrd_filename) {
-char *next_initrd, not_last;
+char *next_initrd, not_last, tmpbuf[strlen(initrd_filename) + 1];
 
 mbs.offset_mods = mbs.mb_buf_size;
 
@@ -296,25 +296,24 @@ int load_multiboot(FWCfgState *fw_cfg,
 int mb_mod_length;
 uint32_t offs = mbs.mb_buf_size;
 
-next_initrd = (char *)get_opt_value(NULL, 0, initrd_filename);
+next_initrd = (char *)get_opt_value(tmpbuf, sizeof(tmpbuf), 
initrd_filename);
 not_last = *next_initrd;
-*next_initrd = '\0';
 /* if a space comes after the module filename, treat everything
after that as parameters */
-hwaddr c = mb_add_cmdline(&mbs, initrd_filename);
-if ((next_space = strchr(initrd_filename, ' ')))
+hwaddr c = mb_add_cmdline(&mbs, tmpbuf);
+if ((next_space = strchr(tmpbuf, ' ')))
 *next_space = '\0';
-mb_debug("multiboot loading module: %s\n", initrd_filename);
-mb_mod_length = get_image_size(initrd_filename);
+mb_debug("multiboot loading module: %s\n", tmpbuf);
+mb_mod_length = get_image_size(tmpbuf);
 if (mb_mod_length < 0) {
-fprintf(stderr, "Failed to open file '%s'\n", initrd_filename);
+fprintf(stderr, "Failed to open file '%s'\n", tmpbuf);
 exit(1);
 }
 
 mbs.mb_buf_size = TARGET_PAGE_ALIGN(mb_mod_length + 
mbs.mb_buf_size);
 mbs.mb_buf = g_realloc(mbs.mb_buf, mbs.mb_buf_size);
 
-load_image(initrd_filename, (unsigned char *)mbs.mb_buf + offs);
+load_image(tmpbuf, (unsigned char *)mbs.mb_buf + offs);
 mb_add_mod(&mbs, mbs.mb_buf_phys + offs,
mbs.mb_buf_phys + offs + mb_mod_length, c);
 
-- 
1.9.1




[Qemu-devel] qemu-2.8-rc4 is broken

2016-12-19 Thread Pavel Dovgalyuk
Hi!

 

I encountered the following bug with the latest version of QEMU.

I use windows host and start qemu with the following command line:

qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none

 

Guest system is Windows XP 32-bit. It founds new hardware (including audio 
controller)

and I start playing mp3 file.

After seconds of playing qemu fails with an exception.

 

I tried to bisect between 2.7 and 2.8, but bug is not stable.

It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
"6a928d25b6d8bc3729c3d28326c6db13b9481059".

 

Pavel Dovgalyuk

 



Re: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers

2016-12-19 Thread Peter Xu
On Fri, Dec 16, 2016 at 09:12:05AM +, Liu, Yi L wrote:
> > From: "Aviv Ben-David" 
> > 
> > Adds a list of registered vtd_as's to intel iommu state to save
> > iteration over each PCI device in a search of the corrosponding domain.
> > 
> > Signed-off-by: Aviv Ben-David 
> > ---
> >  hw/i386/intel_iommu.c  | 94 
> > ++
> >  hw/i386/intel_iommu_internal.h |  2 +
> >  include/hw/i386/intel_iommu.h  |  9 
> >  3 files changed, 98 insertions(+), 7 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 05973b9..d872969 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -679,7 +679,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce, 
> > uint64_t 
> > gpa,
> >  }
> >  *reads = (*reads) && (slpte & VTD_SL_R);
> >  *writes = (*writes) && (slpte & VTD_SL_W);
> > -if (!(slpte & access_right_check)) {
> > +if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
> >  VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> >  "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> >  (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
> > @@ -978,6 +978,23 @@ static VTDBus 
> > *vtd_find_as_from_bus_num(IntelIOMMUState 
> > *s, uint8_t bus_num)
> >  return vtd_bus;
> >  }
> >  
> > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t 
> > devfn,
> > +   uint16_t *domain_id)
> > +{
> > +VTDContextEntry ce;
> > +int ret_fr;
> > +
> > +assert(domain_id);
> > +
> > +ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > +if (ret_fr) {
> > +return -1;
> > +}
> > +
> > +*domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> > +return 0;
> > +}
> > +
> >  /* Do a context-cache device-selective invalidation.
> >   * @func_mask: FM field after shifting
> >   */
> > @@ -1064,6 +1081,45 @@ static void 
> > vtd_iotlb_domain_invalidate(IntelIOMMUState 
> > *s, uint16_t domain_id)
> >  &domain_id);
> >  }
> >  
> > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > +   uint16_t domain_id, hwaddr addr,
> > +   uint8_t am)
> > +{
> > +IntelIOMMUNotifierNode *node;
> > +
> > +QLIST_FOREACH(node, &(s->notifiers_list), next) {
> Aviv,
> 
> Regards to the s->notifiers_list, I didn't see the init op to it. Does it 
> happen
> in another patch? If so, it may be better to move it in this patch since this 
> patch introduces both the definition and usage of notifiers_list.
> 
> If it is already clarified, then ignore it.

I think it was missing. It IMHO accidentally worked since QLIST_INIT()
just set the head to NULL and that's what we did when we create the
IntelIOMMUState object.

And what's worse - I found this approach may not work if we do
QLIST_INSERT() in the changed() hook, since if we have more than one
assigned devices we will only register the first one not the rest. A
better approach may be traversing the vt-d buses via
IntelIOMMUState.vtd_as_by_busptr.

Thanks,

-- peterx



Re: [Qemu-devel] [PATCH for-2.9 v2] qom: Make all interface types abstract

2016-12-19 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 14/12/2016 18:47, Markus Armbruster wrote:
>> Paolo Bonzini  writes:
>> 
>>> On 14/12/2016 14:48, Eduardo Habkost wrote:
> How do you find all abstract TypeInfo in the source?  The uninitiated
> might grep for .abstract = true, and be misled.  The initiated will be
> annoyed instead, because grepping for *absence* of .instance_size = is
> bothersome.
>
> I suspect life could be easier going forward if we instead required
> .abstract = true for interfaces, and enforced it with
> assert(ti->instance_size || ti->abstract) here.
 I was doing that before deciding to change type_initialize(). I
 think I still have the commit in my git reflog, I will recover it
 and submit it as v3.
>>>
>>> I think it's worse.
>>>
>>> Interfaces are abstract by definition.  Requiring ".abstract = true"
>>> makes things less intuitive.  v2 seems good.
>> 
>> What makes a TypeInfo declaration an interface?  Whatever it is, it
>> better be *locally* obvious.
>
> The fact that the superclass is an interface:
>
> 1) most interface names are (or should be) "interfacey".  Compare
> device, memory backend, console (all classes) with user-creatable, fw
> path provider, hotplug handler.  A few others simply end with "_IF".  Of
> course naming is the hardest problem in computer science so there are
> some interfaces whose name might apply just as well to a class (stream
> slave, ISA DMA).  However...
>
> 2) ... currently we don't have a single case of an interface that
> doesn't inherit from TYPE_INTERFACE, so all interfaces are declared with
> ".parent = TYPE_INTERFACE".  That does make a TypeInfo obviously an
> interface.
>
> If we ever have a case of interface inheritance, the supertype had
> better have a good name.

I see.

The choice is between a complex default for .abstract that permits us to
elide .abstract = true for interface types, and a simple default that
requires us to spell it out explicitly.

Given that we have the grand total of thirteen interface types, I prefer
simple & explicit.  Thirteen obvious .abstract = true are less of a
mental burden than a complex default.  Even 25 would be for me.



[Qemu-devel] [PATCH v4 0/4] Add HAX support

2016-12-19 Thread Vincent Palatin
I took a stab at trying to rebase/upstream the support for Intel HAXM.
(Hardware Accelerated Execution Manager).
Intel HAX is kernel-based hardware acceleration module for Windows and MacOSX.

I have based my work on the last version of the source code I found:
the emu-2.2-release branch in the external/qemu-android repository as used by
the Android emulator.
In patch 2/4, I have forward-ported the core HAX code from there.
It has been modified to build and run along with the current code base.
It has been simplifying by removing non-UG hardware support / Darwin support /
Android-specific leftovers.

Intel nicely fixed the 2 remaining issues on the kernel side:
- the spurious request  to emulate MMIO access in un-paged mode is no longer
  happening (as seen in iPXE).
- the kernel API now provides a way to remove a memory mapping, so we can
  do a proper MemoryListener implementation.
They will publish soon a new version 6.1.0 of the HAX kernel module including
the fixes (once their QA cycle is completed).
Thanks Yu Ning for making this happen.

In patch 3/4, I have put the plumbing into the QEMU code base, I did some clean
up there and it is reasonably intrusive: i.e.
 Makefile.target   |  1 +
 configure | 18 ++
 cpus.c| 87 ++-
 exec.c| 16 +
 hw/intc/apic_common.c |  3 +-
 include/qom/cpu.h |  5 +++
 include/sysemu/hw_accel.h |  9 +
 qemu-options.hx   | 11 ++
 target-i386/Makefile.objs |  4 +++
 vl.c  | 15 ++--
 10 files changed, 164 insertions(+), 5 deletions(-)

The patch 1/4 just extracts from KVM specific header the cpu_synchronize_
functions that HAX is also using.

The patch 4/4 is the Darwin support. This part is only lightly tested for now,
so it can be considered as 'experimental'.

I have tested the end result on a Windows 10 Pro machine (with UG support)
with the Intel HAXM module dev version and a large ChromiumOS x86_64 image to
exercise various code paths. It looks stable.
I also did a quick regression testing of the integration by running a Linux
build with KVM enabled.

Changes from v3 to v4:
- add RAM unmapping in the MemoryListener thanks to new API in HAX module 6.1.0
  and re-wrote the memory mappings management to deal with this.
- marked no longer used MMIO emulation as unsupported.
- clean-up a few left-overs from removed code.
- re-add an experimental version of the Darwin support.

Changes from v2 to v3:
- fix saving/restoring FPU registers as suggested by Paolo.
- fix Windows build on all targets as contributed by Stefan Weil.
- clean-up IO / MMIO emulation.
- more clean-up of emulation leftovers.

Changes from v1 to v2:
- fix all style issues in the original code to get it through checkpatch.pl.
- remove Darwin support, it was barely tested and not fully functional.
- remove the support for CPU without UG mode.
- fix most review comments

Vincent Palatin (4):
  kvm: move cpu synchronization code
  target-i386: Add Intel HAX files
  Plumb the HAXM-based hardware acceleration support
  hax: add Darwin support

 Makefile.target |1 +
 configure   |   18 +
 cpus.c  |   93 +++-
 exec.c  |   16 +
 gdbstub.c   |1 +
 hax-stub.c  |   39 ++
 hw/i386/kvm/apic.c  |1 +
 hw/i386/kvmvapic.c  |1 +
 hw/intc/apic_common.c   |3 +-
 hw/misc/vmport.c|2 +-
 hw/ppc/pnv_xscom.c  |2 +-
 hw/ppc/ppce500_spin.c   |4 +-
 hw/ppc/spapr.c  |2 +-
 hw/ppc/spapr_hcall.c|2 +-
 hw/s390x/s390-pci-inst.c|1 +
 include/qom/cpu.h   |5 +
 include/sysemu/hax.h|   56 +++
 include/sysemu/hw_accel.h   |   48 ++
 include/sysemu/kvm.h|   23 -
 monitor.c   |2 +-
 qemu-options.hx |   11 +
 qom/cpu.c   |2 +-
 target-arm/cpu.c|2 +-
 target-i386/Makefile.objs   |7 +
 target-i386/hax-all.c   | 1138 +++
 target-i386/hax-darwin.c|  316 
 target-i386/hax-darwin.h|   63 +++
 target-i386/hax-i386.h  |   94 
 target-i386/hax-interface.h |  358 ++
 target-i386/hax-mem.c   |  271 +++
 target-i386/hax-windows.c   |  479 ++
 target-i386/hax-windows.h   |   89 
 target-i386/helper.c|1 +
 target-i386/kvm.c   |1 +
 target-ppc/mmu-hash64.c |2 +-
 target-ppc/translate_init.c |2 +-
 target-s390x/gdbstub.c  |1 +
 vl.c|   15 +-
 38 files changed, 3133 insertions(+), 39 deletions(-)
 create mode 100644 hax-stub.c
 create mode 100644 include/sysemu/hax.h
 create mode 100644 include/sysemu/hw_accel.h
 create mode 100644 target-i386/hax-all.c
 create mode 100644 target-i386/hax-darwin.c
 create mo

[Qemu-devel] [PATCH v4 1/4] kvm: move cpu synchronization code

2016-12-19 Thread Vincent Palatin
Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.

Signed-off-by: Stefan Weil 
Signed-off-by: Vincent Palatin 
---
 cpus.c  |  1 +
 gdbstub.c   |  1 +
 hw/i386/kvm/apic.c  |  1 +
 hw/i386/kvmvapic.c  |  1 +
 hw/misc/vmport.c|  2 +-
 hw/ppc/pnv_xscom.c  |  2 +-
 hw/ppc/ppce500_spin.c   |  4 ++--
 hw/ppc/spapr.c  |  2 +-
 hw/ppc/spapr_hcall.c|  2 +-
 hw/s390x/s390-pci-inst.c|  1 +
 include/sysemu/hw_accel.h   | 39 +++
 include/sysemu/kvm.h| 23 ---
 monitor.c   |  2 +-
 qom/cpu.c   |  2 +-
 target-arm/cpu.c|  2 +-
 target-i386/helper.c|  1 +
 target-i386/kvm.c   |  1 +
 target-ppc/mmu-hash64.c |  2 +-
 target-ppc/translate_init.c |  2 +-
 target-s390x/gdbstub.c  |  1 +
 20 files changed, 58 insertions(+), 34 deletions(-)
 create mode 100644 include/sysemu/hw_accel.h

diff --git a/cpus.c b/cpus.c
index 5213351..fc78502 100644
--- a/cpus.c
+++ b/cpus.c
@@ -33,6 +33,7 @@
 #include "sysemu/block-backend.h"
 #include "exec/gdbstub.h"
 #include "sysemu/dma.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "qmp-commands.h"
 #include "exec/exec-all.h"
diff --git a/gdbstub.c b/gdbstub.c
index de62d26..de9b62b 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -32,6 +32,7 @@
 #define MAX_PACKET_LENGTH 4096
 
 #include "qemu/sockets.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "exec/semihost.h"
 #include "exec/exec-all.h"
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 01cbaa8..328f80c 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -14,6 +14,7 @@
 #include "cpu.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/pci/msi.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "target-i386/kvm_i386.h"
 
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index b30d1b9..2f767b6 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -14,6 +14,7 @@
 #include "exec/exec-all.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
diff --git a/hw/misc/vmport.c b/hw/misc/vmport.c
index c763811..be40930 100644
--- a/hw/misc/vmport.c
+++ b/hw/misc/vmport.c
@@ -25,7 +25,7 @@
 #include "hw/hw.h"
 #include "hw/isa/isa.h"
 #include "hw/i386/pc.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "hw/qdev.h"
 
 //#define VMPORT_DEBUG
diff --git a/hw/ppc/pnv_xscom.c b/hw/ppc/pnv_xscom.c
index 8da2718..cd5c2b8 100644
--- a/hw/ppc/pnv_xscom.c
+++ b/hw/ppc/pnv_xscom.c
@@ -20,7 +20,7 @@
 #include "qapi/error.h"
 #include "hw/hw.h"
 #include "qemu/log.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "target-ppc/cpu.h"
 #include "hw/sysbus.h"
 
diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
index cf958a9..eb219ab 100644
--- a/hw/ppc/ppce500_spin.c
+++ b/hw/ppc/ppce500_spin.c
@@ -29,9 +29,9 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "sysemu/sysemu.h"
 #include "hw/sysbus.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
+#include "sysemu/sysemu.h"
 #include "e500.h"
 
 #define MAX_CPUS 32
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 208ef7b..a642e66 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -36,7 +36,7 @@
 #include "sysemu/device_tree.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/cpus.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "kvm_ppc.h"
 #include "migration/migration.h"
 #include "mmu-hash64.h"
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9a9bedf..b2a8e48 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1,5 +1,6 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/sysemu.h"
 #include "qemu/log.h"
 #include "cpu.h"
@@ -9,7 +10,6 @@
 #include "mmu-hash64.h"
 #include "cpu-models.h"
 #include "trace.h"
-#include "sysemu/kvm.h"
 #include "kvm_ppc.h"
 #include "hw/ppc/spapr_ovec.h"
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 0864d9b..4d0775c 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,7 @@
 #include "s390-pci-bus.h"
 #include "exec/memory-internal.h"
 #include "qemu/error-report.h"
+#include "sysemu/hw_accel.h"
 
 /* #define DEBUG_S390PCI_INST */
 #ifdef DEBUG_S390PCI_INST
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
new file mode 100644
index 000..03812cf
--- /dev/null
+++ b/include/sysemu/hw_accel.h
@@ -0,0 +1,39 @@
+/*
+ * QEMU Hardware accelertors support
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifnd

[Qemu-devel] [PATCH v4 4/4] hax: add Darwin support

2016-12-19 Thread Vincent Palatin
Re-add the MacOSX/Darwin support:
Use the Intel HAX is kernel-based hardware acceleration module
(similar to KVM on Linux).

Based on the original "target-i386: Add Intel HAX to android emulator" patch
from David Chou  from  emu-2.2-release branch in
the external/qemu-android repository.

Signed-off-by: Vincent Palatin 
---
 cpus.c|   5 +
 target-i386/Makefile.objs |   3 +
 target-i386/hax-darwin.c  | 316 ++
 target-i386/hax-darwin.h  |  63 +
 target-i386/hax-i386.h|   8 ++
 5 files changed, 395 insertions(+)
 create mode 100644 target-i386/hax-darwin.c
 create mode 100644 target-i386/hax-darwin.h

diff --git a/cpus.c b/cpus.c
index 0e01791..b8db313 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1264,6 +1264,11 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 return;
 }
 cpu->thread_kicked = true;
+#ifdef CONFIG_DARWIN
+if (hax_enabled()) {
+cpu->exit_request = 1;
+}
+#endif
 err = pthread_kill(cpu->thread->thread, SIG_IPI);
 if (err) {
 fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index acbe7b0..4fcb7f3 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -9,3 +9,6 @@ obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 ifdef CONFIG_WIN32
 obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-windows.o
 endif
+ifdef CONFIG_DARWIN
+obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-darwin.o
+endif
diff --git a/target-i386/hax-darwin.c b/target-i386/hax-darwin.c
new file mode 100644
index 000..240d8d3
--- /dev/null
+++ b/target-i386/hax-darwin.c
@@ -0,0 +1,316 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* HAX module interface - darwin version */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "target-i386/hax-i386.h"
+
+hax_fd hax_mod_open(void)
+{
+int fd = open("/dev/HAX", O_RDWR);
+if (fd == -1) {
+fprintf(stderr, "Failed to open the hax module\n");
+}
+
+fcntl(fd, F_SETFD, FD_CLOEXEC);
+
+return fd;
+}
+
+int hax_populate_ram(uint64_t va, uint32_t size)
+{
+int ret;
+struct hax_alloc_ram_info info;
+
+if (!hax_global.vm || !hax_global.vm->fd) {
+fprintf(stderr, "Allocate memory before vm create?\n");
+return -EINVAL;
+}
+
+info.size = size;
+info.va = va;
+ret = ioctl(hax_global.vm->fd, HAX_VM_IOCTL_ALLOC_RAM, &info);
+if (ret < 0) {
+fprintf(stderr, "Failed to allocate %x memory\n", size);
+return ret;
+}
+return 0;
+}
+
+int hax_set_ram(uint64_t start_pa, uint32_t size, uint64_t host_va, int flags)
+{
+struct hax_set_ram_info info;
+int ret;
+
+info.pa_start = start_pa;
+info.size = size;
+info.va = host_va;
+info.flags = (uint8_t) flags;
+
+ret = ioctl(hax_global.vm->fd, HAX_VM_IOCTL_SET_RAM, &info);
+if (ret < 0) {
+return -errno;
+}
+return 0;
+}
+
+int hax_capability(struct hax_state *hax, struct hax_capabilityinfo *cap)
+{
+int ret;
+
+ret = ioctl(hax->fd, HAX_IOCTL_CAPABILITY, cap);
+if (ret == -1) {
+fprintf(stderr, "Failed to get HAX capability\n");
+return -errno;
+}
+
+return 0;
+}
+
+int hax_mod_version(struct hax_state *hax, struct hax_module_version *version)
+{
+int ret;
+
+ret = ioctl(hax->fd, HAX_IOCTL_VERSION, version);
+if (ret == -1) {
+fprintf(stderr, "Failed to get HAX version\n");
+return -errno;
+}
+
+return 0;
+}
+
+static char *hax_vm_devfs_string(int vm_id)
+{
+char *name;
+
+if (vm_id > MAX_VM_ID) {
+fprintf(stderr, "Too big VM id\n");
+return NULL;
+}
+
+#define HAX_VM_DEVFS "/dev/hax_vm/vmxx"
+name = g_strdup(HAX_VM_DEVFS);
+if (!name) {
+return NULL;
+}
+
+snprintf(name, sizeof HAX_VM_DEVFS, "/dev/hax_vm/vm%02d", vm_id);
+return name;
+}
+
+static char *hax_vcpu_devfs_string(int vm_id, int vcpu_id)
+{
+char *name;
+
+if (vm_id > MAX_VM_ID || vcpu_id > MAX_VCPU_ID) {
+fprintf(stderr, "Too big vm id %x or vcpu id %x\n", vm_id, vcpu_id);
+return NULL;
+}
+
+#define HAX_VCPU_DEVFS "/dev/hax_vmxx/vcpuxx"
+name = g_strdup(HAX_VCPU_DEVFS);
+if (!name) {
+return NULL;
+}
+
+snprintf(name, sizeof HAX_VCPU_DEVFS, "/dev/hax_vm%02d/vcpu%02d",
+ vm_id, vcpu_id);
+return name;
+}
+
+int hax_host_create_vm(struct hax_state *hax, int *vmid)
+{
+int ret;
+int vm_id = 0;
+
+if (hax_invalid_fd(hax->fd)) {
+return -EINVAL;
+}
+
+if (hax->vm) {
+return 0;
+}
+
+ret = ioctl(hax->fd, HAX_IOCTL_CREATE_VM, &vm_id);
+*vmid = vm_id;
+return ret;
+}
+
+hax_fd hax

[Qemu-devel] [PATCH v4 3/4] Plumb the HAXM-based hardware acceleration support

2016-12-19 Thread Vincent Palatin
Use the Intel HAX is kernel-based hardware acceleration module for
Windows (similar to KVM on Linux).

Based on the "target-i386: Add Intel HAX to android emulator" patch
from David Chou 

Signed-off-by: Vincent Palatin 
---
 Makefile.target   |  1 +
 configure | 18 ++
 cpus.c| 87 ++-
 exec.c| 16 +
 hw/intc/apic_common.c |  3 +-
 include/qom/cpu.h |  5 +++
 include/sysemu/hw_accel.h |  9 +
 qemu-options.hx   | 11 ++
 target-i386/Makefile.objs |  4 +++
 vl.c  | 15 ++--
 10 files changed, 164 insertions(+), 5 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 7a5080e..dab81e7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -96,6 +96,7 @@ obj-y += target-$(TARGET_BASE_ARCH)/
 obj-y += disas.o
 obj-y += tcg-runtime.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
+obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 
 obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o
diff --git a/configure b/configure
index 3770d7c..ba32bea 100755
--- a/configure
+++ b/configure
@@ -230,6 +230,7 @@ vhost_net="no"
 vhost_scsi="no"
 vhost_vsock="no"
 kvm="no"
+hax="no"
 colo="yes"
 rdma=""
 gprof="no"
@@ -563,6 +564,7 @@ CYGWIN*)
 ;;
 MINGW32*)
   mingw32="yes"
+  hax="yes"
   audio_possible_drivers="dsound sdl"
   if check_include dsound.h; then
 audio_drv_list="dsound"
@@ -612,6 +614,7 @@ OpenBSD)
 Darwin)
   bsd="yes"
   darwin="yes"
+  hax="yes"
   LDFLAGS_SHARED="-bundle -undefined dynamic_lookup"
   if [ "$cpu" = "x86_64" ] ; then
 QEMU_CFLAGS="-arch x86_64 $QEMU_CFLAGS"
@@ -921,6 +924,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-hax) hax="no"
+  ;;
+  --enable-hax) hax="yes"
+  ;;
   --disable-colo) colo="no"
   ;;
   --enable-colo) colo="yes"
@@ -1373,6 +1380,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   fdt fdt device tree
   bluez   bluez stack connectivity
   kvm KVM acceleration support
+  hax HAX acceleration support
   coloCOarse-grain LOck-stepping VM for Non-stop Service
   rdmaRDMA-based migration support
   vde support for vde network
@@ -5051,6 +5059,7 @@ echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
 echo "COLO support  $colo"
+echo "HAX support   $hax"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -6035,6 +6044,15 @@ case "$target_name" in
   fi
 fi
 esac
+if test "$hax" = "yes" ; then
+  if test "$target_softmmu" = "yes" ; then
+case "$target_name" in
+i386|x86_64)
+  echo "CONFIG_HAX=y" >> $config_target_mak
+;;
+esac
+  fi
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/cpus.c b/cpus.c
index fc78502..0e01791 100644
--- a/cpus.c
+++ b/cpus.c
@@ -35,6 +35,7 @@
 #include "sysemu/dma.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
+#include "sysemu/hax.h"
 #include "qmp-commands.h"
 #include "exec/exec-all.h"
 
@@ -1221,6 +1222,39 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 return NULL;
 }
 
+static void *qemu_hax_cpu_thread_fn(void *arg)
+{
+CPUState *cpu = arg;
+int r;
+qemu_thread_get_self(cpu->thread);
+qemu_mutex_lock(&qemu_global_mutex);
+
+cpu->thread_id = qemu_get_thread_id();
+cpu->created = true;
+cpu->halted = 0;
+current_cpu = cpu;
+
+hax_init_vcpu(cpu);
+qemu_cond_signal(&qemu_cpu_cond);
+
+while (1) {
+if (cpu_can_run(cpu)) {
+r = hax_smp_cpu_exec(cpu);
+if (r == EXCP_DEBUG) {
+cpu_handle_guest_debug(cpu);
+}
+}
+
+while (cpu_thread_is_idle(cpu)) {
+qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+}
+
+qemu_wait_io_event_common(cpu);
+}
+return NULL;
+}
+
+
 static void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1236,7 +1270,33 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 exit(1);
 }
 #else /* _WIN32 */
-abort();
+if (!qemu_cpu_is_self(cpu)) {
+CONTEXT context;
+
+if (SuspendThread(cpu->hThread) == (DWORD)(-1)) {
+fprintf(stderr, "qemu:%s: GetLastError:%lu\n", __func__,
+GetLastError());
+exit(1);
+}
+
+/* On multi-core systems, we are not sure that the thread is actually
+ * suspended until we can get the context.
+ */
+context.ContextFlags = CONTEXT_CONTROL;
+while (GetThreadContext(cpu->hThread, &context) != 0) {
+continue;
+}
+
+if (hax_enabled()) {
+cpu->exit_request = 1;
+}
+
+if (ResumeThread(cpu->hTh

[Qemu-devel] [PATCH v4 2/4] target-i386: Add Intel HAX files

2016-12-19 Thread Vincent Palatin
That's a forward port of the core HAX interface code from the
emu-2.2-release branch in the external/qemu-android repository as used by
the Android emulator.

The original commit was "target-i386: Add Intel HAX to android emulator"
saying:
"""
  Backport of 2b3098ff27bab079caab9b46b58546b5036f5c0c
  from studio-1.4-dev into emu-master-dev

Intel HAX (harware acceleration) will enhance android emulator performance
in Windows and Mac OS X in the systems powered by Intel processors with
"Intel Hardware Accelerated Execution Manager" package installed when
user runs android emulator with Intel target.

Signed-off-by: David Chou 
"""

It has been modified to build and run along with the current code base.
The formatting has been fixed to go through scripts/checkpatch.pl,
and the DPRINTF macros have been updated to get the instanciations checked by
the compiler.

The FPU registers saving/restoring has been updated to match the current
QEMU registers layout.

The implementation has been simplified by doing the following modifications:
- removing the code for supporting the hardware without Unrestricted Guest (UG)
  mode (including all the code to fallback on TCG emulation).
- not including the Darwin support (which is not yet debugged/tested).
- simplifying the initialization by removing the leftovers from the Android
  specific code, then trimming down the remaining logic.
- removing the unused MemoryListener callbacks.

Signed-off-by: Vincent Palatin 
---
 hax-stub.c  |   39 ++
 include/sysemu/hax.h|   56 +++
 target-i386/hax-all.c   | 1138 +++
 target-i386/hax-i386.h  |   86 
 target-i386/hax-interface.h |  358 ++
 target-i386/hax-mem.c   |  271 +++
 target-i386/hax-windows.c   |  479 ++
 target-i386/hax-windows.h   |   89 
 8 files changed, 2516 insertions(+)
 create mode 100644 hax-stub.c
 create mode 100644 include/sysemu/hax.h
 create mode 100644 target-i386/hax-all.c
 create mode 100644 target-i386/hax-i386.h
 create mode 100644 target-i386/hax-interface.h
 create mode 100644 target-i386/hax-mem.c
 create mode 100644 target-i386/hax-windows.c
 create mode 100644 target-i386/hax-windows.h

diff --git a/hax-stub.c b/hax-stub.c
new file mode 100644
index 000..a532dba
--- /dev/null
+++ b/hax-stub.c
@@ -0,0 +1,39 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright (c) 2015, Intel Corporation
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/hax.h"
+
+int hax_sync_vcpus(void)
+{
+return 0;
+}
+
+int hax_populate_ram(uint64_t va, uint32_t size)
+{
+return -ENOSYS;
+}
+
+int hax_init_vcpu(CPUState *cpu)
+{
+return -ENOSYS;
+}
+
+int hax_smp_cpu_exec(CPUState *cpu)
+{
+return -ENOSYS;
+}
diff --git a/include/sysemu/hax.h b/include/sysemu/hax.h
new file mode 100644
index 000..51c8fd5
--- /dev/null
+++ b/include/sysemu/hax.h
@@ -0,0 +1,56 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *  Xin Xiaohui
+ *  Zhang Xiantao
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_HAX_H
+#define QEMU_HAX_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int hax_sync_vcpus(void);
+int hax_init_vcpu(CPUState *cpu);
+int hax_smp_cpu_exec(CPUState *cpu);
+int hax_populate_ram(uint64_t va, uint32_t size);
+
+void hax_cpu_synchronize_state(CPUState *cpu);
+void hax_cpu_synchronize_post_reset(CPUState *cpu);
+void hax_cpu_synchronize_post_init(CPUState *cpu);
+
+#ifdef CONFIG_HAX
+
+int hax_enabled(void);
+
+#include "hw/hw.h"
+#include "qemu/bitops.h"
+#include "exec/memory.h"
+int hax_vcpu_destroy(CPUState *cpu);
+void hax_raise_event(CPUState *cpu);
+void hax_reset_vcpu_state(void *opaque);
+#include "target-i386/hax-interface.h"
+#include "target-i386/hax-i386.h"
+
+#else /* CONFIG_HAX */
+
+#define hax_enabled() (0)
+
+#endif /* CONFIG_HAX */
+
+#endif /* QEMU_HAX_H */
diff --git a/target-i386/hax-all.c b/target-i386/hax-all.c
new file mode 100644
index 000..1f0ef7c
--- /dev/null
+++ b/target-i386/hax-all.c
@@ -0,0 +1,1138 @@
+/*
+ * QEMU HAX support
+ *
+ * Copyright IBM, Corp. 2008
+ *   Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *  Glauber Costa 
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *  Xin Xiaohui
+ *  Zhang Xiantao
+ *
+ * This work is licensed under the t

Re: [Qemu-devel] Lock contention in QEMU

2016-12-19 Thread Stefan Hajnoczi
On Fri, Dec 16, 2016 at 04:42:54PM -0500, Weiwei Jia wrote:
> Has x-data-plane been used (or accepted) widely in systems. I have
> this concern since if it hasn't been widely accepted, it may
> have/cause some problems we don't know. Do you know some hidden
> problems which may caused by QEMU x-data-plane feature in systems?

By now virtio-blk dataplane has been tested a fair amount.  virtio-scsi
dataplane is more recent.

I'm not aware of known issues with virtio-blk dataplane in modern QEMU
but you seem to be using an older QEMU.  Originally dataplane had a lot
of limitations (no live migration, no image file formats, no block
jobs) so be sure to check that your QEMU supports these things with
dataplane if you need the features.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?

2016-12-19 Thread Stefan Hajnoczi
On Fri, Dec 16, 2016 at 10:00:36PM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 15.12.2016 um 07:46 schrieb Alexandre DERUMIER:
> > does rollbacking the kernel to previous version fix the problem ?
> 
> The culprit is the used tuned agent from Redhat
> (https://github.com/redhat-performance/tuned). The used profile
> virtual-host results in these problems. Stopping tuned or using another
> profile like throughput-performance everything is fine again.

Interesting discovery.  Have you filed a bug report about it?

> after upgrading a cluster OS, Qemu, ... i'm experiencing slow and
> volatile network speeds inside my VMs.
>
> Currently I've no idea what causes this but it's related to the host
> upgrades. Before i was running Qemu 2.6.2.
>
> I'm using virtio for the network cards.

Stefan


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v3 0/3] Vhost-user Spec: extension for vhost-pci

2016-12-19 Thread Wei Wang
This spec patch series extends the vhost-user protocol to support the vhost-pci
based inter-VM communiaction.

v2->v3 changes:
1) replace VHOST_USER_SET_DEV_INFO with VHOST_USER_SET_DEVICE_ID 
2) replace VHOST_USER_SET_PEER_CONNECTION with VHOST_USER_SET_VHOST_PCI

v1->v2 changes:
1) start from the simpler case - change "1-slave-N-master" to "1-slave-1-master"
configuration plane. Accordingly, the previous "uuid", "conn_id" are removed;
2) add the _CREATE_ and _DESTROY_ comands to the VHOST_USER_SET_PEER_CONNECTION
message; and
3) fix the VHOST_USER prefix.

Wei Wang (3):
  spec/vhost-user: extend vhost-user to support the vhost-pci based
inter-vm communiaction
  spec/vhost-user: add VHOST_USER_PROTOCOL_F_SET_DEVICE_ID
  spec/vhost-user: add the VHOST_USER_SET_VHOST_PCI message

 docs/specs/vhost-user.txt | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v3 1/3] spec/vhost-user: extend vhost-user to support the vhost-pci based inter-vm communiaction

2016-12-19 Thread Wei Wang
The protocol feature, VHOST_USER_PROTOCOL_F_VHOST_PCI, indicates the
support of vhost-pci. The vhost-pci extension requires the master side
implementation to support an asynchronous socket read method. This is
used when the slave side vhost-pci device and driver finishes the
feature bits negotiation. The negotiated feature bits are sent to the
master. If the feature bits sent by the slave are a subset of the ones
that were sent by the master, the master should perform a reset of the
master device (e.g. virtio_net), to re-negotiate the feature bits using
the ones sent by the slave.

Signed-off-by: Wei Wang 
---
 docs/specs/vhost-user.txt | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index d70bd83..18e49d0 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -17,12 +17,15 @@ The protocol defines 2 sides of the communication, master 
and slave. Master is
 the application that shares its virtqueues, in our case QEMU. Slave is the
 consumer of the virtqueues.
 
-In the current implementation QEMU is the Master, and the Slave is intended to
+In the traditional implementation QEMU is the master, and the slave is 
intended to
 be a software Ethernet switch running in user space, such as Snabbswitch.
 
 Master and slave can be either a client (i.e. connecting) or server (listening)
 in the socket communication.
 
+The current vhost-user protocol is extended to support the vhost-pci based 
inter-VM
+communication. In this case, both the slave and master are QEMU instances.
+
 Message Specification
 -
 
@@ -36,7 +39,7 @@ consists of 3 header fields and a payload:
  * Request: 32-bit type of the request
  * Flags: 32-bit bit field:
- Lower 2 bits are the version (currently 0x01)
-   - Bit 2 is the reply flag - needs to be sent on each reply from the slave
+   - Bit 2 is the reply flag - needs to be sent on each reply
- Bit 3 is the need_reply flag - see VHOST_USER_PROTOCOL_F_REPLY_ACK for
  details.
  * Size - 32-bit size of the payload
@@ -119,9 +122,9 @@ The protocol for vhost-user is based on the existing 
implementation of vhost
 for the Linux Kernel. Most messages that can be sent via the Unix domain socket
 implementing vhost-user have an equivalent ioctl to the kernel implementation.
 
-The communication consists of master sending message requests and slave sending
-message replies. Most of the requests don't require replies. Here is a list of
-the ones that do:
+Traditionally, the communication consists of master sending message requests 
and
+slave sending message replies. Most of the requests don't require replies. Here
+is a list of the ones that do:
 
  * VHOST_USER_GET_FEATURES
  * VHOST_USER_GET_PROTOCOL_FEATURES
@@ -130,6 +133,10 @@ the ones that do:
 
 [ Also see the section on REPLY_ACK protocol extension. ]
 
+Currently, the communication also supports the slave actively sending messages
+to the master. Here is a list of them:
+ * VHOST_USER_SET_FEATURES
+
 There are several messages that the master sends with file descriptors passed
 in the ancillary data:
 
@@ -259,6 +266,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD  1
 #define VHOST_USER_PROTOCOL_F_RARP   2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK  3
+#define VHOST_USER_PROTOCOL_F_VHOST_PCI  4
 
 Message types
 -
@@ -279,8 +287,9 @@ Message types
   Id: 2
   Ioctl: VHOST_SET_FEATURES
   Master payload: u64
+  Slave payload: u64
 
-  Enable features in the underlying vhost implementation using a bitmask.
+  Feature bits negotiation between the master and slave using a bitmask.
   Feature bit VHOST_USER_F_PROTOCOL_FEATURES signals slave support for
   VHOST_USER_GET_PROTOCOL_FEATURES and VHOST_USER_SET_PROTOCOL_FEATURES.
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 3/3] spec/vhost-user: add the VHOST_USER_SET_VHOST_PCI message

2016-12-19 Thread Wei Wang
The VHOST_USER_SET_VHOST_PCI message is introduced to start/stop
the vhost-pci based inter-VM communiaction by the master.

Signed-off-by: Wei Wang 
---
 docs/specs/vhost-user.txt | 12 
 1 file changed, 12 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 80dcfc1..4e2ce60 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -490,6 +490,18 @@ Message types
   This request should be sent only when VHOST_USER_PROTOCOL_F_SET_DEVICE_ID
   has been negotiated.
 
+ * VHOST_USER_SET_VHOST_PCI
+
+  Id: 21
+  Equivalent ioctl: N/A
+  Master payload: u64
+
+  The master requests the slave to start or stop the vhost-pci device for
+  the inter-VM communication.
+  This request should be sent only when VHOST_USER_PROTOCOL_F_VHOST_PCI has
+  been negotiated.
+
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 ---
 The original vhost-user specification only demands replies for certain
-- 
2.7.4




[Qemu-devel] [PATCH v3 2/3] spec/vhost-user: add VHOST_USER_PROTOCOL_F_SET_DEVICE_ID

2016-12-19 Thread Wei Wang
The VHOST_USER_PROTOCOL_F_SET_DEVICE_ID protocol feature indicates
that the slave side implementation supports different types of devices.
The master tells the slave what type of device to create by sending
a VHOST_USER_SET_DEVICE_ID message.

Signed-off-by: Wei Wang 
---
 docs/specs/vhost-user.txt | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 18e49d0..80dcfc1 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -267,6 +267,7 @@ Protocol features
 #define VHOST_USER_PROTOCOL_F_RARP   2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK  3
 #define VHOST_USER_PROTOCOL_F_VHOST_PCI  4
+#define VHOST_USER_PROTOCOL_F_SET_DEVICE_ID  5
 
 Message types
 -
@@ -479,6 +480,16 @@ Message types
   The first 6 bytes of the payload contain the mac address of the guest to
   allow the vhost user backend to construct and broadcast the fake RARP.
 
+ * VHOST_USER_SET_DEVICE_ID
+  Id: 20
+  Equivalent ioctl: N/A
+  Master payload: u64
+
+  The master sends the virtio device id to the slave. The virtio device id
+  indicates the device type of the master device.
+  This request should be sent only when VHOST_USER_PROTOCOL_F_SET_DEVICE_ID
+  has been negotiated.
+
 VHOST_USER_PROTOCOL_F_REPLY_ACK:
 ---
 The original vhost-user specification only demands replies for certain
-- 
2.7.4




Re: [Qemu-devel] [RFC/POC PATCH 0/4] Building TCG tests with emdebian cross compilers

2016-12-19 Thread Marc-André Lureau
Hi

- Original Message -
> Hi Pranith,
> 
> Here is a proof-of-concept series for you to consider rolling into the TCG
> tests
> cleanup. It uses the existing docker make machinery to build a Debian
> image which has arm, arm64 and ppc64el cross compilers in it. Now if
> you run:
> 
>   make arm-tcg-tests
> 
> It will do the requisite build of the docker image and then use that
> to build the TCG tests in the appropriate build directory.
> 
> These apply on top of your existing series. There is also a quick hack
> to disable the running of the tests by default. I think we need two
> stages, maybe a build-FOO-tcg-tests and run-FOO-tcg-tests.
> 
> What do you think?

I like the idea, as long as you can also run the tcg tests without docker. How 
many of the qemu archs debian cross tools support? last time I looked there was 
annoying limitations, but I can't remember the details.

crosstool-ng has perhaps more potential, especially if devs would share their 
config/samples (we could then have a cross-distro packaging with flatpack?)

> Alex Bennée (4):
>   tests/docker: add basic user mapping support
>   new tests/docker/dockerfiles/debian-multiarch-cross.docker
>   tests/tcg: don't run tests by default
>   tests/tcg/Makefile: use docker target for arm-tcg-tests
> 
>  tests/docker/docker.py | 19 +++
>  tests/docker/dockerfiles/debian-bootstrap.docker   |  3 ++
>  .../dockerfiles/debian-multiarch-cross.docker  | 39
>  ++
>  tests/tcg/Makefile.include | 20 +--
>  tests/tcg/arm/Makefile |  2 +-
>  tests/tcg/misc/Makefile|  4 +--
>  6 files changed, 81 insertions(+), 6 deletions(-)
>  create mode 100644 tests/docker/dockerfiles/debian-multiarch-cross.docker
> 
> --
> 2.11.0
> 
> 



Re: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup

2016-12-19 Thread Liu, Yi L
> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Monday, December 19, 2016 5:23 PM
> To: Liu, Yi L 
> Cc: Tian, Kevin ; Lan, Tianyu ;
> jasow...@redhat.com; qemu-devel@nongnu.org; Liu, Yi L 
> Subject: Re: [Qemu-devel] [RFC PATCH 00/13] VT-d replay and misc cleanup
> 
> On Sun, Dec 18, 2016 at 04:42:50PM +0800, Liu, Yi L wrote:
> > On Tue, Dec 06, 2016 at 06:36:15PM +0800, Peter Xu wrote:
> > > This RFC series is a continue work for Aviv B.D.'s vfio enablement
> > > series with vt-d. Aviv has done a great job there, and what we still
> > > lack there are mostly the following:
> > >
> > > (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> > > memory region.
> > >
> > > (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> > > when IOMMU domain switches, things will broke).
> > >
> > > Here I'm trying to solve the above two issues.
> > >
> > > (1) is solved by patch 7, (2) is solved by patch 11-12.
> > >
> > > Basically it contains the following:
> > >
> > > patch 1:picked up from Jason's vhost DMAR series, which is a bugfix
> > >
> > > patch 2-6:  Cleanups/Enhancements for existing vt-d codes (please see
> > > specific commit message for details, there are patches
> > > that I thought may be suitable for 2.8 as well, but looks
> > > like it's too late)
> > >
> > > patch 7:Solve the issue that vfio is notified more than once for
> > > IOTLB notifications with Aviv's patches
> > >
> > > patch 8-10: Some trivial memory APIs added for further patches, and
> > > add customize replay() support for MemoryRegion (I see
> > > Aviv's latest v7 contains similar replay, I can rebase
> > > onto that, merely the same thing)
> > >
> > > patch 11:   Provide a valid vt-d replay() callback, using page walk
> > >
> > Peter,
> > Does your patch set based on Aviv's patch? I found the page cannot be
> > applied in my side.
> 
> Hi, Yi,
> 
> This series is based on Aviv's v6 series. If you wanna try it, you may
> want to fetch the tree from:
> 
>  https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement
> 
> So you won't need to bother on the applying.
> 
Aha, looks like you mentioned you github link previously. I forgot it. Would
try it then. I may rebase my current svm work. thx for your contribution.^_^

> >
> > BTW. it may be better if you can split the patches for mis cleanup
> > and the patches for replay/"fix duplicate notify".
> 
> Yes. Here I just want to make sure things are stick together (e.g., to
> test the replay, I will need to use the traces). And I feel it awkward
> to maintain several series upstream while they interact with each
> other. Sorry for the troublesome.
> 
I totally understand what bothers you. It'll be fine. Follow your plan~

Thanks,
Yi L

> Thanks,
> 
> -- peterx


Re: [Qemu-devel] [qemu patch V4 2/2] kvmclock: reduce kvmclock difference on migration

2016-12-19 Thread Marcelo Tosatti
On Fri, Dec 16, 2016 at 11:41:36AM -0200, Eduardo Habkost wrote:
> On Fri, Dec 16, 2016 at 11:03:33AM +0100, Paolo Bonzini wrote:
> > I'd like to make a few cleanups and add more documentation:
> > 
> 
> Looks good to me.
> 
> Reviewed-by: Eduardo Habkost 

+1




Re: [Qemu-devel] [PATCH v7 08/11] x86, kvm/x86.c: support vcpu preempted check

2016-12-19 Thread Andrea Arcangeli
Hello,

On Wed, Nov 02, 2016 at 05:08:35AM -0400, Pan Xinhui wrote:
> Support the vcpu_is_preempted() functionality under KVM. This will
> enhance lock performance on overcommitted hosts (more runnable vcpus
> than physical cpus in the system) as doing busy waits for preempted
> vcpus will hurt system performance far worse than early yielding.
> 
> Use one field of struct kvm_steal_time ::preempted to indicate that if
> one vcpu is running or not.
> 
> Signed-off-by: Pan Xinhui 
> Acked-by: Paolo Bonzini 
> ---
>  arch/x86/include/uapi/asm/kvm_para.h |  4 +++-
>  arch/x86/kvm/x86.c   | 16 
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
[..]
> +static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
> +{
> + if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
> + return;
> +
> + vcpu->arch.st.steal.preempted = 1;
> +
> + kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
> + &vcpu->arch.st.steal.preempted,
> + offsetof(struct kvm_steal_time, preempted),
> + sizeof(vcpu->arch.st.steal.preempted));
> +}
> +
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> + kvm_steal_time_set_preempted(vcpu);
>   kvm_x86_ops->vcpu_put(vcpu);
>   kvm_put_guest_fpu(vcpu);
>   vcpu->arch.last_host_tsc = rdtsc();

You can't call kvm_steal_time_set_preempted in atomic context (neither
in sched_out notifier nor in vcpu_put() after
preempt_disable)). __copy_to_user in kvm_write_guest_offset_cached
schedules and locks up the host.

kvm->srcu (or kvm->slots_lock) is also not taken and
kvm_write_guest_offset_cached needs to call kvm_memslots which
requires it.

This I think is why postcopy live migration locks up with current
upstream, and it doesn't seem related to userfaultfd at all (initially
I suspected the vmf conversion but it wasn't that) and in theory it
can happen with heavy swapping or page migration too.

Just the page is written so frequently it's unlikely to be swapped
out. The page being written so frequently also means it's very likely
found as re-dirtied when postcopy starts and that pretty much
guarantees an userfault will trigger a scheduling event in
kvm_steal_time_set_preempted in destination. There are opposite
probabilities of reproducing this with swapping vs postcopy live
migration.

For now I applied the below two patches, but this just will skip the
write and only prevent the host instability as nobody checks the
retval of __copy_to_user (what happens to guest after the write is
skipped is not as clear and should be investigated, but at least the
host will survive and not all guests will care about this flag being
updated). For this to be fully safe the preempted information should
be just an hint and not fundamental for correct functionality of the
guest pv spinlock code.

This bug was introduced in commit
0b9f6c4615c993d2b552e0d2bd1ade49b56e5beb in v4.9-rc7.

>From 458897fd44aa9b91459a006caa4051a7d1628a23 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli 
Date: Sat, 17 Dec 2016 18:43:52 +0100
Subject: [PATCH 1/2] kvm: fix schedule in atomic in
 kvm_steal_time_set_preempted()

kvm_steal_time_set_preempted() isn't disabling the pagefaults before
calling __copy_to_user and the kernel debug notices.

Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kvm/x86.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0d238..2dabaeb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2844,7 +2844,17 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /*
+* Disable page faults because we're in atomic context here.
+* kvm_write_guest_offset_cached() would call might_fault()
+* that relies on pagefault_disable() to tell if there's a
+* bug. NOTE: the write to guest memory may not go through if
+* during postcopy live migration or if there's heavy guest
+* paging.
+*/
+   pagefault_disable();
kvm_steal_time_set_preempted(vcpu);
+   pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();


>From 2845eba22ac74c5e313e3b590f9dac33e1b3cfef Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli 
Date: Sat, 17 Dec 2016 19:13:32 +0100
Subject: [PATCH 2/2] kvm: take srcu lock around kvm_steal_time_set_preempted()

kvm_memslots() will be called by kvm_write_guest_offset_cached() so
take the srcu lock.

Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kvm/x86.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2dabaeb..02e6ab4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2844,6 +2844,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   int idx;
/

Re: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers

2016-12-19 Thread Liu, Yi L
> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Monday, December 19, 2016 6:01 PM
> To: Liu, Yi L 
> Cc: bd.a...@gmail.com; qemu-devel@nongnu.org; Michael S. Tsirkin
> ; , Jan Kiszka ; , Alex Williamson
> ; , Jason Wang ; Lan,
> Tianyu ; Tian, Kevin 
> Subject: Re: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map
> and unmap notifiers
> 
> On Fri, Dec 16, 2016 at 09:12:05AM +, Liu, Yi L wrote:
> > > From: "Aviv Ben-David" 
> > >
> > > Adds a list of registered vtd_as's to intel iommu state to save
> > > iteration over each PCI device in a search of the corrosponding domain.
> > >
> > > Signed-off-by: Aviv Ben-David 
> > > ---
> > >  hw/i386/intel_iommu.c  | 94
> ++
> > >  hw/i386/intel_iommu_internal.h |  2 +
> > >  include/hw/i386/intel_iommu.h  |  9 
> > >  3 files changed, 98 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > index 05973b9..d872969 100644
> > > --- a/hw/i386/intel_iommu.c
> > > +++ b/hw/i386/intel_iommu.c
> > > @@ -679,7 +679,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce,
> uint64_t
> > > gpa,
> > >  }
> > >  *reads = (*reads) && (slpte & VTD_SL_R);
> > >  *writes = (*writes) && (slpte & VTD_SL_W);
> > > -if (!(slpte & access_right_check)) {
> > > +if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
> > >  VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> > >  "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> > >  (flags & IOMMU_WO ? "write" : "read"), gpa, 
> > > slpte);
> > > @@ -978,6 +978,23 @@ static VTDBus
> *vtd_find_as_from_bus_num(IntelIOMMUState
> > > *s, uint8_t bus_num)
> > >  return vtd_bus;
> > >  }
> > >
> > > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t
> devfn,
> > > +   uint16_t *domain_id)
> > > +{
> > > +VTDContextEntry ce;
> > > +int ret_fr;
> > > +
> > > +assert(domain_id);
> > > +
> > > +ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > > +if (ret_fr) {
> > > +return -1;
> > > +}
> > > +
> > > +*domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> > > +return 0;
> > > +}
> > > +
> > >  /* Do a context-cache device-selective invalidation.
> > >   * @func_mask: FM field after shifting
> > >   */
> > > @@ -1064,6 +1081,45 @@ static void
> vtd_iotlb_domain_invalidate(IntelIOMMUState
> > > *s, uint16_t domain_id)
> > >  &domain_id);
> > >  }
> > >
> > > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > > +   uint16_t domain_id, hwaddr 
> > > addr,
> > > +   uint8_t am)
> > > +{
> > > +IntelIOMMUNotifierNode *node;
> > > +
> > > +QLIST_FOREACH(node, &(s->notifiers_list), next) {
> > Aviv,
> >
> > Regards to the s->notifiers_list, I didn't see the init op to it. Does it 
> > happen
> > in another patch? If so, it may be better to move it in this patch since 
> > this
> > patch introduces both the definition and usage of notifiers_list.
> >
> > If it is already clarified, then ignore it.
> 
> I think it was missing. It IMHO accidentally worked since QLIST_INIT()
> just set the head to NULL and that's what we did when we create the
> IntelIOMMUState object.
> 
> And what's worse - I found this approach may not work if we do
> QLIST_INSERT() in the changed() hook, since if we have more than one
> assigned devices we will only register the first one not the rest. A
> better approach may be traversing the vt-d buses via
> IntelIOMMUState.vtd_as_by_busptr.
> 
Peter,

In Oct, I also mailed Aviv about using IntelIOMMUState.vtd_as_by_busptr
when trying to connect the vfio notifiers(map/unmap) and vIOMMU. 
However, I reconsidered it later. If I remember correctly, 
IntelIOMMUState.vtd_as_by_busptr not only includes vtd_as for assigned devices,
but also includes virtual devices. When iotlb invalidation comes to vIOMMU, 
there
is no indication for which device in iotlb_inv_desc. So still need to have a 
list to record
vtd_as which needs to be looped. So I keep silent on it after that thought.

Now, you mentioned it may not work in multi-assigned scenario. Maybe it's
time to reconsider it again. 

Regards,
Yi L


Re: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers

2016-12-19 Thread Liu, Yi L
> -Original Message-
> From: Liu, Yi L
> Sent: Friday, December 16, 2016 5:12 PM
> To: bd.a...@gmail.com; qemu-devel@nongnu.org
> Cc: Michael S. Tsirkin ; , Jan Kiszka
> ; , Peter Xu ; , Alex Williamson
> ; , Jason Wang ; Lan,
> Tianyu ; Tian, Kevin ; Liu, Yi L
> 
> Subject: RE: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map
> and unmap notifiers
> 
> > From: "Aviv Ben-David" 
> >
> > Adds a list of registered vtd_as's to intel iommu state to save
> > iteration over each PCI device in a search of the corrosponding domain.
> >
> > Signed-off-by: Aviv Ben-David 
> > ---
> >  hw/i386/intel_iommu.c  | 94
> ++
> >  hw/i386/intel_iommu_internal.h |  2 +
> >  include/hw/i386/intel_iommu.h  |  9 
> >  3 files changed, 98 insertions(+), 7 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 05973b9..d872969 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -679,7 +679,7 @@ static int vtd_gpa_to_slpte(VTDContextEntry *ce,
> uint64_t
> > gpa,
> >  }
> >  *reads = (*reads) && (slpte & VTD_SL_R);
> >  *writes = (*writes) && (slpte & VTD_SL_W);
> > -if (!(slpte & access_right_check)) {
> > +if (!(slpte & access_right_check) && !(flags & IOMMU_NO_FAIL)) {
> >  VTD_DPRINTF(GENERAL, "error: lack of %s permission for "
> >  "gpa 0x%"PRIx64 " slpte 0x%"PRIx64,
> >  (flags & IOMMU_WO ? "write" : "read"), gpa, slpte);
> > @@ -978,6 +978,23 @@ static VTDBus
> *vtd_find_as_from_bus_num(IntelIOMMUState
> > *s, uint8_t bus_num)
> >  return vtd_bus;
> >  }
> >
> > +static int vtd_get_did_dev(IntelIOMMUState *s, uint8_t bus_num, uint8_t
> devfn,
> > +   uint16_t *domain_id)
> > +{
> > +VTDContextEntry ce;
> > +int ret_fr;
> > +
> > +assert(domain_id);
> > +
> > +ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > +if (ret_fr) {
> > +return -1;
> > +}
> > +
> > +*domain_id =  VTD_CONTEXT_ENTRY_DID(ce.hi);
> > +return 0;
> > +}
> > +
> >  /* Do a context-cache device-selective invalidation.
> >   * @func_mask: FM field after shifting
> >   */
> > @@ -1064,6 +1081,45 @@ static void
> vtd_iotlb_domain_invalidate(IntelIOMMUState
> > *s, uint16_t domain_id)
> >  &domain_id);
> >  }
> >
> > +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> > +   uint16_t domain_id, hwaddr addr,
> > +   uint8_t am)
> > +{
> > +IntelIOMMUNotifierNode *node;
> > +
> > +QLIST_FOREACH(node, &(s->notifiers_list), next) {
> Aviv,
> 
> Regards to the s->notifiers_list, I didn't see the init op to it. Does it 
> happen
> in another patch? If so, it may be better to move it in this patch since this
> patch introduces both the definition and usage of notifiers_list.
> 
> If it is already clarified, then ignore it.
> 
> Thanks,
> Yi L
> > +VTDAddressSpace *vtd_as = node->vtd_as;
> > +uint16_t vfio_domain_id;
> > +int ret = vtd_get_did_dev(s, pci_bus_num(vtd_as->bus), 
> > vtd_as->devfn,
> > +  &vfio_domain_id);
> > +
> > +if (!ret && domain_id == vfio_domain_id) {
> > +hwaddr original_addr = addr;
> > +
> > +while (addr < original_addr + (1 << am) * VTD_PAGE_SIZE) {
> > +IOMMUTLBEntry entry = s->iommu_ops.translate(
> > + 
> > &node->vtd_as->iommu,
> > + addr,
> > + IOMMU_NO_FAIL);
> > +
> > +if (entry.perm == IOMMU_NONE &&
> > +node->notifier_flag & IOMMU_NOTIFIER_UNMAP) {
> > +entry.target_as = &address_space_memory;
> > +entry.iova = addr & VTD_PAGE_MASK_4K;
> > +entry.translated_addr = 0;
> > +entry.addr_mask = ~VTD_PAGE_MASK(VTD_PAGE_SHIFT);
> > +memory_region_notify_iommu(&node->vtd_as->iommu, 
> > entry);
> > +addr += VTD_PAGE_SIZE;
> > +} else if (node->notifier_flag & IOMMU_NOTIFIER_MAP) {
> > +memory_region_notify_iommu(&node->vtd_as->iommu,
> > entry);
> > +addr += entry.addr_mask + 1;
> > +}
> > +}
> > +}
> > +}
> > +}
> > +
> >  static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t
> domain_id,
> >hwaddr addr, uint8_t am)
> >  {
> > @@ -1074,6 +1130,8 @@ static void
> vtd_iotlb_page_invalidate(IntelIOMMUState *s,
> > uint16_t domain_id,
> >  info.addr = addr;
> >  info.mask = ~((1 << am) - 1);
> >  g_hash_table_foreach_re

Re: [Qemu-devel] [PATCH 09/21] qcow2: add .bdrv_load_autoloading_dirty_bitmaps

2016-12-19 Thread Vladimir Sementsov-Ogievskiy

16.12.2016 17:37, Max Reitz wrote:

On 14.12.2016 16:54, Vladimir Sementsov-Ogievskiy wrote:

07.12.2016 23:51, Max Reitz wrote:

On 22.11.2016 18:26, Vladimir Sementsov-Ogievskiy wrote:

Auto loading bitmaps are bitmaps in Qcow2, with the AUTO flag set. They
are loaded when the image is opened and become BdrvDirtyBitmaps for the
corresponding drive.

Extra data in bitmaps is not supported for now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   block/Makefile.objs  |   2 +-
   block/qcow2-bitmap.c | 663
+++
   block/qcow2.c|   2 +
   block/qcow2.h|   3 +
   4 files changed, 669 insertions(+), 1 deletion(-)
   create mode 100644 block/qcow2-bitmap.c


[...]


diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
new file mode 100644
index 000..0f797e6
--- /dev/null
+++ b/block/qcow2-bitmap.c
@@ -0,0 +1,663 @@

[...]


+/* Check table entry specification constraints. If cluster_size is
0, offset
+ * alignment is not checked. */
+static int check_table_entry(uint64_t entry, int cluster_size)
+{
+uint64_t offset;
+
+if (entry & BME_TABLE_ENTRY_RESERVED_MASK) {
+return -EINVAL;
+}
+
+offset = entry & BME_TABLE_ENTRY_OFFSET_MASK;
+if (offset != 0) {
+/* if offset specified, bit 0 must is reserved */

-must


+if (entry & 1) {
+return -EINVAL;
+}
+
+if ((cluster_size != 0) && (entry % cluster_size != 0)) {

Why would cluster_size be 0? Also, shouldn't it be offset instead of
entry?

the comment says: "If cluster_size is 0, offset alignment is not checked"

Oops, right. Is there any place where this function is called with
cluster_size being 0, though?


Hmm, I can't find it. Will remove this extra feature.

[...]


--
Best regards,
Vladimir




Re: [Qemu-devel] Is qemu-img amend an atomic operation?

2016-12-19 Thread Maor Lipchuk
On Mon, Dec 19, 2016 at 2:47 PM, Maor Lipchuk  wrote:

> Hi All,
>
> Does amend considered as an atomic operation or should we mark a volume as
> ILLEGAL once the amend operation fails?
>
> also, if I call amend, but downgrade the QCOW volume compatibility level
> from 1.1 to 0.10, is that atomic as well (or not, based on the answer on
> the previous question)?
>
> Regards,
> Maor
>

Adding also Nir and Kevin to the thread.


[Qemu-devel] Is qemu-img amend an atomic operation?

2016-12-19 Thread Maor Lipchuk
Hi All,

Does amend considered as an atomic operation or should we mark a volume as
ILLEGAL once the amend operation fails?

also, if I call amend, but downgrade the QCOW volume compatibility level
from 1.1 to 0.10, is that atomic as well (or not, based on the answer on
the previous question)?

Regards,
Maor


Re: [Qemu-devel] Is qemu-img amend an atomic operation?

2016-12-19 Thread Kevin Wolf
Am 19.12.2016 um 13:49 hat Maor Lipchuk geschrieben:
> 
> On Mon, Dec 19, 2016 at 2:47 PM, Maor Lipchuk  wrote:
> 
> Hi All,
> 
> Does amend considered as an atomic operation or should we mark a volume as
> ILLEGAL once the amend operation fails?
> 
> also, if I call amend, but downgrade the QCOW volume compatibility level
> from 1.1 to 0.10, is that atomic as well (or not, based on the answer on
> the previous question)?
> 
> Regards,
> Maor
> 
> Adding also Nir and Kevin to the thread.

Like with every other operation, the image is supposed to stay
consistent at all times, even if the host crashes in the middle.
Leaked clusters are the worst that should be possible, anything
else would be a bug.

Kevin



[Qemu-devel] [PATCH v6 0/9] replay additions

2016-12-19 Thread Pavel Dovgalyuk
This set of patches includes several fixes for replay and vmstate.

This patches add rrsnapshot option for icount. rrshapshot option creates
start snapshot at record and loads it at replay. It allows preserving
the state of disk images used by virtual machine. This vm state can also
use used to roll back the execution while replaying.

This set of patches includes fixes and additions for icount and
record/replay implementation:
 - VM start/stop in replay mode
 - overlay creation for blkreplay filter
 - rrsnapshot option for record/replay
 - vmstate fix for integratorcp ARM platform
 - vmstate fixes for apic and rtc

v6 changes:
 - Added overlay creation for blkreplay driver
 - Fixed vmstate loading for apic and rtc
 - Fixed instruction counting for apic instruction patching

v5 changes:
 - Recording is stopped when initial snapshot cannot be created
 - Minor changes

v4 changes:
 - Overlay option is removed from blkreplay driver (as suggested by Paolo 
Bonzini)
 - Minor changes

v3 changes:
 - Added rrsnapshot option for specifying the initial snapshot name (as 
suggested by Paolo Bonzini)
 - Minor changes

---

Pavel Dovgalyuk (9):
  icount: update instruction counter on apic patching
  replay: improve interrupt handling
  apic: save apic_delivered flag
  replay: don't use rtc clock on loadvm phase
  integratorcp: adding vmstate for save/restore
  savevm: add public save_vmstate function
  replay: save/load initial state
  block: implement bdrv_snapshot_goto for blkreplay
  blkreplay: create temporary overlay for underlaying devices


 block/blkreplay.c   |   84 +++
 cpu-exec.c  |2 -
 docs/replay.txt |   16 +++
 hw/arm/integratorcp.c   |   62 +
 hw/i386/kvmvapic.c  |6 +++
 hw/intc/apic_common.c   |   32 +++
 hw/timer/mc146818rtc.c  |   14 +--
 include/hw/i386/apic_internal.h |2 +
 include/sysemu/replay.h |9 
 include/sysemu/sysemu.h |1 
 migration/savevm.c  |   33 ++-
 qemu-options.hx |8 +++-
 replay/replay-snapshot.c|   17 
 replay/replay.c |5 ++
 stubs/replay.c  |1 
 target-i386/seg_helper.c|1 
 vl.c|9 +++-
 17 files changed, 282 insertions(+), 20 deletions(-)

-- 
Pavel Dovgalyuk



[Qemu-devel] [PATCH v6 1/9] icount: update instruction counter on apic patching

2016-12-19 Thread Pavel Dovgalyuk
kvmvapic patches the code when some instructions are executed.
E.g. mov 0xff, 0xfffe0080 is interpreted as push 0xff/call ...
This patching is also followed by some side effects (changing apic
and guest memory state). Therefore deterministic execution should take
this operation into account. This patch decreases icount when original
mov instruction is trying to execute. Therefore patching becomes
deterministic and can be replayed correctly.

Signed-off-by: Pavel Dovgalyuk 
---
 hw/i386/kvmvapic.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index b30d1b9..146d47c 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -412,6 +412,12 @@ static void patch_instruction(VAPICROMState *s, X86CPU 
*cpu, target_ulong ip)
 if (!kvm_enabled()) {
 cpu_get_tb_cpu_state(env, ¤t_pc, ¤t_cs_base,
  ¤t_flags);
+/* Account this instruction, because we will exit the tb.
+   This is the first instruction in the block. Therefore
+   there is no need in restoring CPU state. */
+if (use_icount) {
+--cs->icount_decr.u16.low;
+}
 }
 
 pause_all_vcpus();




[Qemu-devel] [PATCH v6 2/9] replay: improve interrupt handling

2016-12-19 Thread Pavel Dovgalyuk
This patch improves interrupt handling in record/replay mode.
Now "interrupt" event is saved only when cc->cpu_exec_interrupt returns true.
This patch also adds missing return to cpu_exec_interrupt function.

Signed-off-by: Pavel Dovgalyuk 
---
 cpu-exec.c   |2 +-
 target-i386/seg_helper.c |1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed..fa08c73 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -508,8 +508,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
True when it is, and we should restart on a new TB,
and via longjmp via cpu_loop_exit.  */
 else {
-replay_interrupt();
 if (cc->cpu_exec_interrupt(cpu, interrupt_request)) {
+replay_interrupt();
 *last_tb = NULL;
 }
 /* The target hook may have updated the 'cpu->interrupt_request';
diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
index fb79f31..d24574d 100644
--- a/target-i386/seg_helper.c
+++ b/target-i386/seg_helper.c
@@ -1331,6 +1331,7 @@ bool x86_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 #endif
 if (interrupt_request & CPU_INTERRUPT_SIPI) {
 do_cpu_sipi(cpu);
+ret = true;
 } else if (env->hflags2 & HF2_GIF_MASK) {
 if ((interrupt_request & CPU_INTERRUPT_SMI) &&
 !(env->hflags & HF_SMM_MASK)) {




[Qemu-devel] [PATCH v6 7/9] replay: save/load initial state

2016-12-19 Thread Pavel Dovgalyuk
This patch implements initial vmstate creation or loading at the start
of record/replay. It is needed for rewinding the execution in the replay mode.

v4 changes:
 - snapshots are not created by default anymore

v3 changes:
 - added rrsnapshot option

Signed-off-by: Pavel Dovgalyuk 
---
 docs/replay.txt  |   16 
 include/sysemu/replay.h  |9 +
 qemu-options.hx  |8 ++--
 replay/replay-snapshot.c |   17 +
 replay/replay.c  |5 +
 vl.c |7 ++-
 6 files changed, 59 insertions(+), 3 deletions(-)

diff --git a/docs/replay.txt b/docs/replay.txt
index 347b2ff..03e1931 100644
--- a/docs/replay.txt
+++ b/docs/replay.txt
@@ -196,6 +196,22 @@ is recorded to the log. In replay phase the queue is 
matched with
 events read from the log. Therefore block devices requests are processed
 deterministically.
 
+Snapshotting
+
+
+New VM snapshots may be created in replay mode. They can be used later
+to recover the desired VM state. All VM states created in replay mode
+are associated with the moment of time in the replay scenario.
+After recovering the VM state replay will start from that position.
+
+Default starting snapshot name may be specified with icount field
+rrsnapshot as follows:
+ -icount shift=7,rr=record,rrfile=replay.bin,rrsnapshot=snapshot_name
+
+This snapshot is created at start of recording and restored at start
+of replaying. It also can be loaded while replaying to roll back
+the execution.
+
 Network devices
 ---
 
diff --git a/include/sysemu/replay.h b/include/sysemu/replay.h
index abb35ca..740b425 100644
--- a/include/sysemu/replay.h
+++ b/include/sysemu/replay.h
@@ -43,6 +43,9 @@ typedef struct ReplayNetState ReplayNetState;
 
 extern ReplayMode replay_mode;
 
+/* Name of the initial VM snapshot */
+extern char *replay_snapshot;
+
 /* Replay process control functions */
 
 /*! Enables recording or saving event log with specified parameters */
@@ -149,4 +152,10 @@ void replay_unregister_net(ReplayNetState *rns);
 void replay_net_packet_event(ReplayNetState *rns, unsigned flags,
  const struct iovec *iov, int iovcnt);
 
+/* VM state operations */
+
+/*! Called at the start of execution.
+Loads or saves initial vmstate depending on execution mode. */
+void replay_vmstate_init(void);
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index c534a2f..32c7d2b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3390,12 +3390,12 @@ re-inject them.
 ETEXI
 
 DEF("icount", HAS_ARG, QEMU_OPTION_icount, \
-"-icount 
[shift=N|auto][,align=on|off][,sleep=on|off,rr=record|replay,rrfile=]\n"
 \
+"-icount 
[shift=N|auto][,align=on|off][,sleep=on|off,rr=record|replay,rrfile=,rrsnapshot=]\n"
 \
 "enable virtual instruction counter with 2^N clock ticks 
per\n" \
 "instruction, enable aligning the host and virtual 
clocks\n" \
 "or disable real time cpu sleeping\n", QEMU_ARCH_ALL)
 STEXI
-@item -icount [shift=@var{N}|auto][,rr=record|replay,rrfile=@var{filename}]
+@item -icount 
[shift=@var{N}|auto][,rr=record|replay,rrfile=@var{filename},rrsnapshot=@var{snapshot}]
 @findex -icount
 Enable virtual instruction counter.  The virtual cpu will execute one
 instruction every 2^@var{N} ns of virtual time.  If @code{auto} is specified
@@ -3428,6 +3428,10 @@ when the shift value is high (how high depends on the 
host machine).
 When @option{rr} option is specified deterministic record/replay is enabled.
 Replay log is written into @var{filename} file in record mode and
 read from this file in replay mode.
+
+Option rrsnapshot is used to create new vm snapshot named @var{snapshot}
+at the start of execution recording. In replay mode this option is used
+to load the initial VM state.
 ETEXI
 
 DEF("watchdog", HAS_ARG, QEMU_OPTION_watchdog, \
diff --git a/replay/replay-snapshot.c b/replay/replay-snapshot.c
index 4980597..f2cf748 100644
--- a/replay/replay-snapshot.c
+++ b/replay/replay-snapshot.c
@@ -59,3 +59,20 @@ void replay_vmstate_register(void)
 {
 vmstate_register(NULL, 0, &vmstate_replay, &replay_state);
 }
+
+void replay_vmstate_init(void)
+{
+if (replay_snapshot) {
+if (replay_mode == REPLAY_MODE_RECORD) {
+if (save_vmstate(cur_mon, replay_snapshot) != 0) {
+error_report("Could not create snapshot for icount record");
+exit(1);
+}
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+if (load_vmstate(replay_snapshot) != 0) {
+error_report("Could not load snapshot for icount replay");
+exit(1);
+}
+}
+}
+}
diff --git a/replay/replay.c b/replay/replay.c
index 7f27cf1..1835b99 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -26,6 +26,7 @@
 #define HEADER_SIZE (sizeof(uint32_t) + sizeof(uint64_t))
 
 ReplayMode replay_mode = REPLAY

[Qemu-devel] [PATCH v6 3/9] apic: save apic_delivered flag

2016-12-19 Thread Pavel Dovgalyuk
This patch implements saving/restoring of static apic_delivered variable.

Signed-off-by: Pavel Dovgalyuk 
---
 hw/intc/apic_common.c   |   32 
 include/hw/i386/apic_internal.h |2 ++
 2 files changed, 34 insertions(+)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index d78c885..ac6cc67 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -384,6 +384,24 @@ static bool apic_common_sipi_needed(void *opaque)
 return s->wait_for_sipi != 0;
 }
 
+static bool apic_irq_delivered_needed(void *opaque)
+{
+return true;
+}
+
+static void apic_irq_delivered_pre_save(void *opaque)
+{
+APICCommonState *s = APIC_COMMON(opaque);
+s->apic_irq_delivered = apic_irq_delivered;
+}
+
+static int apic_irq_delivered_post_load(void *opaque, int version_id)
+{
+APICCommonState *s = APIC_COMMON(opaque);
+apic_irq_delivered = s->apic_irq_delivered;
+return 0;
+}
+
 static const VMStateDescription vmstate_apic_common_sipi = {
 .name = "apic_sipi",
 .version_id = 1,
@@ -396,6 +414,19 @@ static const VMStateDescription vmstate_apic_common_sipi = 
{
 }
 };
 
+static const VMStateDescription vmstate_apic_irq_delivered = {
+.name = "apic_irq_delivered",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = apic_irq_delivered_needed,
+.pre_save = apic_irq_delivered_pre_save,
+.post_load = apic_irq_delivered_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_INT32(apic_irq_delivered, APICCommonState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_apic_common = {
 .name = "apic",
 .version_id = 3,
@@ -430,6 +461,7 @@ static const VMStateDescription vmstate_apic_common = {
 },
 .subsections = (const VMStateDescription*[]) {
 &vmstate_apic_common_sipi,
+&vmstate_apic_irq_delivered,
 NULL
 }
 };
diff --git a/include/hw/i386/apic_internal.h b/include/hw/i386/apic_internal.h
index 1209eb4..20ad28c 100644
--- a/include/hw/i386/apic_internal.h
+++ b/include/hw/i386/apic_internal.h
@@ -189,6 +189,8 @@ struct APICCommonState {
 DeviceState *vapic;
 hwaddr vapic_paddr; /* note: persistence via kvmvapic */
 bool legacy_instance_id;
+
+int apic_irq_delivered; /* for saving static variable */
 };
 
 typedef struct VAPICState {




[Qemu-devel] [PATCH v6 4/9] replay: don't use rtc clock on loadvm phase

2016-12-19 Thread Pavel Dovgalyuk
This patch disables the update of the periodic timer of mc146818rtc
in record/replay mode. State of this timer is saved and therefore does
not need to be updated in record/replay mode.
Read of RTC breaks the replay because all rtc reads have to be the same
as in record mode.

Signed-off-by: Pavel Dovgalyuk 
---
 hw/timer/mc146818rtc.c |   14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
index da209d0..5638777 100644
--- a/hw/timer/mc146818rtc.c
+++ b/hw/timer/mc146818rtc.c
@@ -27,6 +27,7 @@
 #include "hw/hw.h"
 #include "qemu/timer.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/replay.h"
 #include "hw/timer/mc146818rtc.h"
 #include "qapi/visitor.h"
 #include "qapi-event.h"
@@ -734,10 +735,15 @@ static int rtc_post_load(void *opaque, int version_id)
 check_update_timer(s);
 }
 
-uint64_t now = qemu_clock_get_ns(rtc_clock);
-if (now < s->next_periodic_time ||
-now > (s->next_periodic_time + get_max_clock_jump())) {
-periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+/* Periodic timer is deterministic in record/replay mode.
+   No need to update it after loading the vmstate.
+   Reading RTC here may break the execution. */
+if (replay_mode == REPLAY_MODE_NONE) {
+uint64_t now = qemu_clock_get_ns(rtc_clock);
+if (now < s->next_periodic_time ||
+now > (s->next_periodic_time + get_max_clock_jump())) {
+periodic_timer_update(s, qemu_clock_get_ns(rtc_clock));
+}
 }
 
 #ifdef TARGET_I386




[Qemu-devel] [PATCH v6 5/9] integratorcp: adding vmstate for save/restore

2016-12-19 Thread Pavel Dovgalyuk
From: Pavel Dovgalyuk 

VMState added by this patch preserves correct
loading of the integratorcp device state.

Signed-off-by: Pavel Dovgalyuk 
---
 hw/arm/integratorcp.c |   62 +
 1 file changed, 62 insertions(+)

diff --git a/hw/arm/integratorcp.c b/hw/arm/integratorcp.c
index 039812a..ca06e1b 100644
--- a/hw/arm/integratorcp.c
+++ b/hw/arm/integratorcp.c
@@ -53,6 +53,27 @@ static uint8_t integrator_spd[128] = {
0xe, 4, 0x1c, 1, 2, 0x20, 0xc0, 0, 0, 0, 0, 0x30, 0x28, 0x30, 0x28, 0x40
 };
 
+static const VMStateDescription vmstate_integratorcm = {
+.name = "integratorcm",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(cm_osc, IntegratorCMState),
+VMSTATE_UINT32(cm_ctrl, IntegratorCMState),
+VMSTATE_UINT32(cm_lock, IntegratorCMState),
+VMSTATE_UINT32(cm_auxosc, IntegratorCMState),
+VMSTATE_UINT32(cm_sdram, IntegratorCMState),
+VMSTATE_UINT32(cm_init, IntegratorCMState),
+VMSTATE_UINT32(cm_flags, IntegratorCMState),
+VMSTATE_UINT32(cm_nvflags, IntegratorCMState),
+VMSTATE_UINT32(int_level, IntegratorCMState),
+VMSTATE_UINT32(irq_enabled, IntegratorCMState),
+VMSTATE_UINT32(fiq_enabled, IntegratorCMState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static uint64_t integratorcm_read(void *opaque, hwaddr offset,
   unsigned size)
 {
@@ -309,6 +330,19 @@ typedef struct icp_pic_state {
 qemu_irq parent_fiq;
 } icp_pic_state;
 
+static const VMStateDescription vmstate_icp_pic = {
+.name = "icp_pic",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(level, icp_pic_state),
+VMSTATE_UINT32(irq_enabled, icp_pic_state),
+VMSTATE_UINT32(fiq_enabled, icp_pic_state),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void icp_pic_update(icp_pic_state *s)
 {
 uint32_t flags;
@@ -438,6 +472,17 @@ typedef struct ICPCtrlRegsState {
 #define ICP_INTREG_WPROT(1 << 0)
 #define ICP_INTREG_CARDIN   (1 << 3)
 
+static const VMStateDescription vmstate_icp_control = {
+.name = "icp_control",
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(intreg_state, ICPCtrlRegsState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static uint64_t icp_control_read(void *opaque, hwaddr offset,
  unsigned size)
 {
@@ -640,6 +685,21 @@ static void core_class_init(ObjectClass *klass, void *data)
 
 dc->props = core_properties;
 dc->realize = integratorcm_realize;
+dc->vmsd = &vmstate_integratorcm;
+}
+
+static void icp_pic_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = &vmstate_icp_pic;
+}
+
+static void icp_control_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = &vmstate_icp_control;
 }
 
 static const TypeInfo core_info = {
@@ -655,6 +715,7 @@ static const TypeInfo icp_pic_info = {
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(icp_pic_state),
 .instance_init = icp_pic_init,
+.class_init= icp_pic_class_init,
 };
 
 static const TypeInfo icp_ctrl_regs_info = {
@@ -662,6 +723,7 @@ static const TypeInfo icp_ctrl_regs_info = {
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(ICPCtrlRegsState),
 .instance_init = icp_control_init,
+.class_init= icp_control_class_init,
 };
 
 static void integratorcp_register_types(void)




[Qemu-devel] [PATCH v6 8/9] block: implement bdrv_snapshot_goto for blkreplay

2016-12-19 Thread Pavel Dovgalyuk
This patch enables making snapshots with blkreplay used in
block devices.

Signed-off-by: Pavel Dovgalyuk 
---
 block/blkreplay.c |8 
 1 file changed, 8 insertions(+)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index a741654..8a03d62 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -130,6 +130,12 @@ static int coroutine_fn 
blkreplay_co_flush(BlockDriverState *bs)
 return ret;
 }
 
+static int blkreplay_snapshot_goto(BlockDriverState *bs,
+   const char *snapshot_id)
+{
+return bdrv_snapshot_goto(bs->file->bs, snapshot_id);
+}
+
 static BlockDriver bdrv_blkreplay = {
 .format_name= "blkreplay",
 .protocol_name  = "blkreplay",
@@ -145,6 +151,8 @@ static BlockDriver bdrv_blkreplay = {
 .bdrv_co_pwrite_zeroes  = blkreplay_co_pwrite_zeroes,
 .bdrv_co_pdiscard   = blkreplay_co_pdiscard,
 .bdrv_co_flush  = blkreplay_co_flush,
+
+.bdrv_snapshot_goto = blkreplay_snapshot_goto,
 };
 
 static void bdrv_blkreplay_init(void)




[Qemu-devel] [PATCH v6 9/9] blkreplay: create temporary overlay for underlaying devices

2016-12-19 Thread Pavel Dovgalyuk
This patch allows using '-snapshot' behavior in record/replay mode.
blkreplay layer creates temporary overlays on top of underlaying
disk images. It is needed, because creating an overlay over blkreplay
breaks the determinism.

Signed-off-by: Pavel Dovgalyuk 
---
 block/blkreplay.c |   76 +
 stubs/replay.c|1 +
 vl.c  |2 +
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/block/blkreplay.c b/block/blkreplay.c
index 8a03d62..172642f 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -14,12 +14,76 @@
 #include "block/block_int.h"
 #include "sysemu/replay.h"
 #include "qapi/error.h"
+#include "qapi/qmp/qstring.h"
 
 typedef struct Request {
 Coroutine *co;
 QEMUBH *bh;
 } Request;
 
+static BlockDriverState *blkreplay_append_snapshot(BlockDriverState *bs,
+   Error **errp)
+{
+int ret;
+BlockDriverState *bs_snapshot;
+int64_t total_size;
+QemuOpts *opts = NULL;
+char tmp_filename[PATH_MAX + 1];
+QDict *snapshot_options = qdict_new();
+
+/* Prepare options QDict for the overlay file */
+qdict_put(snapshot_options, "file.driver",
+  qstring_from_str("file"));
+qdict_put(snapshot_options, "driver",
+  qstring_from_str("qcow2"));
+
+/* Create temporary file */
+ret = get_tmp_filename(tmp_filename, PATH_MAX + 1);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not get temporary filename");
+goto out;
+}
+qdict_put(snapshot_options, "file.filename",
+  qstring_from_str(tmp_filename));
+
+/* Get the required size from the image */
+total_size = bdrv_getlength(bs);
+if (total_size < 0) {
+error_setg_errno(errp, -total_size, "Could not get image size");
+goto out;
+}
+
+opts = qemu_opts_create(bdrv_qcow2.create_opts, NULL, 0, &error_abort);
+qemu_opt_set_number(opts, BLOCK_OPT_SIZE, total_size, &error_abort);
+ret = bdrv_create(&bdrv_qcow2, tmp_filename, opts, errp);
+qemu_opts_del(opts);
+if (ret < 0) {
+error_prepend(errp, "Could not create temporary overlay '%s': ",
+  tmp_filename);
+goto out;
+}
+
+bs_snapshot = bdrv_open(NULL, NULL, snapshot_options,
+BDRV_O_RDWR | BDRV_O_TEMPORARY, errp);
+snapshot_options = NULL;
+if (!bs_snapshot) {
+ret = -EINVAL;
+goto out;
+}
+
+/* bdrv_append() consumes a strong reference to bs_snapshot (i.e. it will
+ * call bdrv_unref() on it), so in order to be able to return one, we have
+ * to increase bs_snapshot's refcount here */
+bdrv_ref(bs_snapshot);
+bdrv_append(bs_snapshot, bs);
+
+return bs_snapshot;
+
+out:
+QDECREF(snapshot_options);
+return NULL;
+}
+
 static int blkreplay_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -35,6 +99,14 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 
+/* Add temporary snapshot to preserve the image */
+if (!replay_snapshot
+&& !blkreplay_append_snapshot(bs->file->bs, &local_err)) {
+ret = -EINVAL;
+error_propagate(errp, local_err);
+goto fail;
+}
+
 ret = 0;
 fail:
 if (ret < 0) {
@@ -45,6 +117,10 @@ fail:
 
 static void blkreplay_close(BlockDriverState *bs)
 {
+if (!replay_snapshot) {
+/* Unref created snapshot file */
+bdrv_unref(bs->file->bs);
+}
 }
 
 static int64_t blkreplay_getlength(BlockDriverState *bs)
diff --git a/stubs/replay.c b/stubs/replay.c
index d9a6da9..e43e467 100644
--- a/stubs/replay.c
+++ b/stubs/replay.c
@@ -3,6 +3,7 @@
 #include "sysemu/sysemu.h"
 
 ReplayMode replay_mode;
+char *replay_snapshot;
 
 int64_t replay_save_clock(unsigned int kind, int64_t clock)
 {
diff --git a/vl.c b/vl.c
index f2cb4cf..ebc32e0 100644
--- a/vl.c
+++ b/vl.c
@@ -4479,7 +4479,7 @@ int main(int argc, char **argv, char **envp)
 }
 
 /* open the virtual block devices */
-if (snapshot || replay_mode != REPLAY_MODE_NONE) {
+if (snapshot) {
 qemu_opts_foreach(qemu_find_opts("drive"), drive_enable_snapshot,
   NULL, NULL);
 }




[Qemu-devel] [PATCH v6 6/9] savevm: add public save_vmstate function

2016-12-19 Thread Pavel Dovgalyuk
This patch introduces save_vmstate function to allow saving and loading
vmstates from the replay module.

Signed-off-by: Pavel Dovgalyuk 
---
 include/sysemu/sysemu.h |1 +
 migration/savevm.c  |   33 ++---
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 66c6f15..5b1788f 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -75,6 +75,7 @@ void qemu_add_machine_init_done_notifier(Notifier *notify);
 void qemu_remove_machine_init_done_notifier(Notifier *notify);
 
 void hmp_savevm(Monitor *mon, const QDict *qdict);
+int save_vmstate(Monitor *mon, const char *name);
 int load_vmstate(const char *name);
 void hmp_delvm(Monitor *mon, const QDict *qdict);
 void hmp_info_snapshots(Monitor *mon, const QDict *qdict);
diff --git a/migration/savevm.c b/migration/savevm.c
index 0363372..62c8a40 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2013,38 +2013,40 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
-void hmp_savevm(Monitor *mon, const QDict *qdict)
+int save_vmstate(Monitor *mon, const char *name)
 {
 BlockDriverState *bs, *bs1;
 QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
-int ret;
+int ret = -1;
 QEMUFile *f;
 int saved_vm_running;
 uint64_t vm_state_size;
 qemu_timeval tv;
 struct tm tm;
-const char *name = qdict_get_try_str(qdict, "name");
 Error *local_err = NULL;
 AioContext *aio_context;
 
 if (!bdrv_all_can_snapshot(&bs)) {
 monitor_printf(mon, "Device '%s' is writable but does not "
"support snapshots.\n", bdrv_get_device_name(bs));
-return;
+return ret;
 }
 
 /* Delete old snapshots of the same name */
-if (name && bdrv_all_delete_snapshot(name, &bs1, &local_err) < 0) {
-error_reportf_err(local_err,
-  "Error while deleting snapshot on device '%s': ",
-  bdrv_get_device_name(bs1));
-return;
+if (name) {
+ret = bdrv_all_delete_snapshot(name, &bs1, &local_err);
+if (ret < 0) {
+error_reportf_err(local_err,
+  "Error while deleting snapshot on device '%s': ",
+  bdrv_get_device_name(bs1));
+return ret;
+}
 }
 
 bs = bdrv_all_find_vmstate_bs();
 if (bs == NULL) {
 monitor_printf(mon, "No block device can accept snapshots\n");
-return;
+return ret;
 }
 aio_context = bdrv_get_aio_context(bs);
 
@@ -2053,7 +2055,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict)
 ret = global_state_store();
 if (ret) {
 monitor_printf(mon, "Error saving global state\n");
-return;
+return ret;
 }
 vm_stop(RUN_STATE_SAVE_VM);
 
@@ -2099,13 +2101,22 @@ void hmp_savevm(Monitor *mon, const QDict *qdict)
 if (ret < 0) {
 monitor_printf(mon, "Error while creating snapshot on '%s'\n",
bdrv_get_device_name(bs));
+goto the_end;
 }
 
+ret = 0;
+
  the_end:
 aio_context_release(aio_context);
 if (saved_vm_running) {
 vm_start();
 }
+return ret;
+}
+
+void hmp_savevm(Monitor *mon, const QDict *qdict)
+{
+save_vmstate(mon, qdict_get_try_str(qdict, "name"));
 }
 
 void qmp_xen_save_devices_state(const char *filename, Error **errp)




Re: [Qemu-devel] [PATCH v7 3/5] IOMMU: enable intel_iommu map and unmap notifiers

2016-12-19 Thread Peter Xu
On Mon, Dec 19, 2016 at 11:53:32AM +, Liu, Yi L wrote:

[...]

> > > Regards to the s->notifiers_list, I didn't see the init op to it. Does it 
> > > happen
> > > in another patch? If so, it may be better to move it in this patch since 
> > > this
> > > patch introduces both the definition and usage of notifiers_list.
> > >
> > > If it is already clarified, then ignore it.
> > 
> > I think it was missing. It IMHO accidentally worked since QLIST_INIT()
> > just set the head to NULL and that's what we did when we create the
> > IntelIOMMUState object.
> > 
> > And what's worse - I found this approach may not work if we do
> > QLIST_INSERT() in the changed() hook, since if we have more than one
> > assigned devices we will only register the first one not the rest. A
> > better approach may be traversing the vt-d buses via
> > IntelIOMMUState.vtd_as_by_busptr.
> > 
> Peter,
> 
> In Oct, I also mailed Aviv about using IntelIOMMUState.vtd_as_by_busptr
> when trying to connect the vfio notifiers(map/unmap) and vIOMMU. 
> However, I reconsidered it later. If I remember correctly, 
> IntelIOMMUState.vtd_as_by_busptr not only includes vtd_as for assigned 
> devices,
> but also includes virtual devices. When iotlb invalidation comes to vIOMMU, 
> there
> is no indication for which device in iotlb_inv_desc. So still need to have a 
> list to record
> vtd_as which needs to be looped. So I keep silent on it after that thought.
> 
> Now, you mentioned it may not work in multi-assigned scenario. Maybe it's
> time to reconsider it again. 

Hmm, first parameter of vtd_iommu_notify_flag_changed() is memory
region, and that's per-device. So current approach should work even
with multiple devices. Looks like I made a mistake, sorry. :) 

Thanks,

-- peterx



Re: [Qemu-devel] Is qemu-img amend an atomic operation?

2016-12-19 Thread Maor Lipchuk
On Mon, Dec 19, 2016 at 3:10 PM, Kevin Wolf  wrote:

> Am 19.12.2016 um 13:49 hat Maor Lipchuk geschrieben:
> >
> > On Mon, Dec 19, 2016 at 2:47 PM, Maor Lipchuk 
> wrote:
> >
> > Hi All,
> >
> > Does amend considered as an atomic operation or should we mark a
> volume as
> > ILLEGAL once the amend operation fails?
> >
> > also, if I call amend, but downgrade the QCOW volume compatibility
> level
> > from 1.1 to 0.10, is that atomic as well (or not, based on the
> answer on
> > the previous question)?
> >
> > Regards,
> > Maor
> >
> > Adding also Nir and Kevin to the thread.
>
> Like with every other operation, the image is supposed to stay
> consistent at all times, even if the host crashes in the middle.
> Leaked clusters are the worst that should be possible, anything
> else would be a bug.
>
> Kevin
>


Great, thank you!


Re: [Qemu-devel] [PATCH] virtio: fix vring->inuse recalc after migr

2016-12-19 Thread Stefan Hajnoczi
On Fri, Dec 16, 2016 at 05:43:29PM +0100, Halil Pasic wrote:
> 
> 
> On 12/16/2016 05:12 PM, Stefan Hajnoczi wrote:
> >> You are not the first one complaining, so the sentence is definitively
> >> bad. What disturbs me regarding your formulation is that we do not use
> >> uint16_t to represent neither the ring size nor inuse.
> >>
> >> How about "Since max ring size < UINT16_MAX it's safe to use modulo
> >> UINT16_MAX + 1 subtraction."?
> > That doesn't mention "representing the size of the ring" so it's
> > unclear what "safe" means.
> > 
> > Stefan
> > 
> 
> IMHO it is not about representation but about correct arithmetic.
> We introduce the cast, not because representing the ring size as
> int is necessarily an issue, but because we ended up with a wrong
> result. In my opinion how can 'inuse' be represented correctly and
> efficiently concerns the member of struct VirtQueue.

Fair enough, the type of VirtQueue.inuse doesn't need to be justified at
this point in the code.

> Here the important point is how conversions between signed unsigned
> integer types work in C.
> 
> """
> 6.3.1.3 Signed and unsigned integers
> 1 When a value with integer type is converted to another integer
> type other than _Bool, if  the value can be represented by the new
> type, it is unchanged.
> 2 Otherwise, if the new type is unsigned, the value is converted by
> repeatedly adding or  subtracting one more than the maximum value that
> can be represented in the new type until the value is in the range
> of the new type.
> """
> 
> That is we get mod UINT16_MAX + 1  subtraction which is what we
> need if we want to calculate the difference between the two counters
> under the assumption that the actual conceptual difference (that
> is if the counters where of type arbitrary natural) is less or equal that
> queue size which is less than UINT16_MAX.

-vdev->vq[i].inuse = vdev->vq[i].last_avail_idx -
-vdev->vq[i].used_idx;
+vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
+vdev->vq[i].used_idx);

Looking at C99 I learnt the cause of the bug is the integer promotion
performed on the uint16_t subtraction operands.  Previously I was only
aware of integer promotion on varargs and function arguments where no
prototype is declared.

So the original statement behaves like:

vdev->vq[i].inuse = (int)vdev->vq[i].last_avail_idx -
(int)vdev->vq[i].used_idx;

This is the real gotcha for me.

If you feel it helps to explain the signed -> unsigned cast behavior,
feel free.  I don't think it's necessary.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v1 0/2] Add Atmel I2C TPM AT97SC3204T emulated device

2016-12-19 Thread Corey Minyard

On 12/18/2016 07:47 PM, Alastair D'Silva wrote:

On Fri, 2016-12-16 at 17:35 +, Peter Maydell wrote:

(added a couple of people to cc who might have an opinion on the i2c
protocol questions below)

I'm certainly no expert, but I'll try :)


I know a little bit and I've implemented some stuff, so I'll try, too :).


On 29 November 2016 at 19:30, Fabio Urquiza 
wrote:



 some more

For recv I'm less sure how it ought to work, so if you can explain
in terms of the i2c protocol what slave h/w behaviour we're trying
to emulate that would help. At what points in the protocol can
the slave return a NAK?

Our current API seems to envisage that the slave can return a
negative value from I2CSlaveClass::recv instead of a data byte,
but I'm not sure what this means in the i2c protocol.

Negative values are propagated upwards, where they are treated as
errors, eg, in hw/i2c/aspeed_i2c.c:aspeed_i2c_bus_handle_cmd():

int ret = i2c_recv(bus->bus);
if (ret < 0) {
 qemu_log_mask(LOG_GUEST_ERROR, "%s: read failed\n", __func__);
 ret = 0xff;
}

The call to i2c_recv is too late to issue the NAK, I believe they occur
during the start_transfer() call.


Indeed, it is.  On the wire, a NAK after the address bits have been sent 
terminates

the transaction normally.




If I understand your patch correctly, this is adding support
for the slave refusing to ACK when the master sends out the
slave address and r/w bit. I think that makes sense, but rather
than having a state flag in the I2CSlave struct, we should
change the prototype of the I2CSlaveClass event method so that
it can return a value indicating ack or nak.


Hmm, this could end up being quite an invasive change, but ultimately
more elegant. I'm not sure which way the community prefers.


I have a patch that adds a check_event() handler along side the event() 
handler.

If a device wants to send a NAK, it can implement check_event() instead of
event() and return non-zero to NAK.

I toyed with just changing all the event() calls, but there are a 
bunch.  This seemed

like the better approach.  I can send if you like.

-corey


thanks
-- PMM






Re: [Qemu-devel] Can qemu reopen image files?

2016-12-19 Thread Stefan Hajnoczi
On Mon, Dec 19, 2016 at 09:07:43AM +0800, Fam Zheng wrote:
> On Sun, 12/18 20:52, Christopher Pereira wrote:
> > Hi,
> > 
> > We are doing a "qemu-img convert" operation (qcow2, with compression) to
> > shorten the backing-chain (in the middle of the backing-chain).
> > In order to force qemu to reopen files, we do a save and restore operation.
> > Is there a faster way to reopen image files using virsh or qemu?
> 
> No, don't use qemu-img when the image is in use by QEMU. You want to use
> "block-commit" command provided by QMP.

It's worth being more explicit here:

You will corrupt the image file if you access it with another program
while QEMU is using it!

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v4 2/4] target-i386: Add Intel HAX files

2016-12-19 Thread Vincent Palatin
On Mon, Dec 19, 2016 at 11:29 AM, Vincent Palatin  wrote:
> That's a forward port of the core HAX interface code from the
> emu-2.2-release branch in the external/qemu-android repository as used by
> the Android emulator.
>
> The original commit was "target-i386: Add Intel HAX to android emulator"
> saying:
> """
>   Backport of 2b3098ff27bab079caab9b46b58546b5036f5c0c
>   from studio-1.4-dev into emu-master-dev
>
> Intel HAX (harware acceleration) will enhance android emulator performance
> in Windows and Mac OS X in the systems powered by Intel processors with
> "Intel Hardware Accelerated Execution Manager" package installed when
> user runs android emulator with Intel target.
>
> Signed-off-by: David Chou 
> """
>
> It has been modified to build and run along with the current code base.
> The formatting has been fixed to go through scripts/checkpatch.pl,
> and the DPRINTF macros have been updated to get the instanciations checked by
> the compiler.
>
> The FPU registers saving/restoring has been updated to match the current
> QEMU registers layout.
>
> The implementation has been simplified by doing the following modifications:
> - removing the code for supporting the hardware without Unrestricted Guest 
> (UG)
>   mode (including all the code to fallback on TCG emulation).
> - not including the Darwin support (which is not yet debugged/tested).
> - simplifying the initialization by removing the leftovers from the Android
>   specific code, then trimming down the remaining logic.
> - removing the unused MemoryListener callbacks.
>
> Signed-off-by: Vincent Palatin 
> ---
>  hax-stub.c  |   39 ++
>  include/sysemu/hax.h|   56 +++
>  target-i386/hax-all.c   | 1138 
> +++
>  target-i386/hax-i386.h  |   86 
>  target-i386/hax-interface.h |  358 ++
>  target-i386/hax-mem.c   |  271 +++
>  target-i386/hax-windows.c   |  479 ++
>  target-i386/hax-windows.h   |   89 
>  8 files changed, 2516 insertions(+)
>  create mode 100644 hax-stub.c
>  create mode 100644 include/sysemu/hax.h
>  create mode 100644 target-i386/hax-all.c
>  create mode 100644 target-i386/hax-i386.h
>  create mode 100644 target-i386/hax-interface.h
>  create mode 100644 target-i386/hax-mem.c
>  create mode 100644 target-i386/hax-windows.c
>  create mode 100644 target-i386/hax-windows.h
>
> diff --git a/hax-stub.c b/hax-stub.c
> new file mode 100644
> index 000..a532dba
> --- /dev/null
> +++ b/hax-stub.c
> @@ -0,0 +1,39 @@
> +/*
> + * QEMU HAXM support
> + *
> + * Copyright (c) 2015, Intel Corporation
> + *
> + * Copyright 2016 Google, Inc.
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +#include "cpu.h"
> +#include "sysemu/hax.h"
> +
> +int hax_sync_vcpus(void)
> +{
> +return 0;
> +}
> +
> +int hax_populate_ram(uint64_t va, uint32_t size)
> +{
> +return -ENOSYS;
> +}
> +
> +int hax_init_vcpu(CPUState *cpu)
> +{
> +return -ENOSYS;
> +}
> +
> +int hax_smp_cpu_exec(CPUState *cpu)
> +{
> +return -ENOSYS;
> +}
> diff --git a/include/sysemu/hax.h b/include/sysemu/hax.h
> new file mode 100644
> index 000..51c8fd5
> --- /dev/null
> +++ b/include/sysemu/hax.h
> @@ -0,0 +1,56 @@
> +/*
> + * QEMU HAXM support
> + *
> + * Copyright IBM, Corp. 2008
> + *
> + * Authors:
> + *  Anthony Liguori   
> + *
> + * Copyright (c) 2011 Intel Corporation
> + *  Written by:
> + *  Jiang Yunhong
> + *  Xin Xiaohui
> + *  Zhang Xiantao
> + *
> + * Copyright 2016 Google, Inc.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef QEMU_HAX_H
> +#define QEMU_HAX_H
> +
> +#include "config-host.h"
> +#include "qemu-common.h"
> +
> +int hax_sync_vcpus(void);
> +int hax_init_vcpu(CPUState *cpu);
> +int hax_smp_cpu_exec(CPUState *cpu);
> +int hax_populate_ram(uint64_t va, uint32_t size);
> +
> +void hax_cpu_synchronize_state(CPUState *cpu);
> +void hax_cpu_synchronize_post_reset(CPUState *cpu);
> +void hax_cpu_synchronize_post_init(CPUState *cpu);
> +
> +#ifdef CONFIG_HAX
> +
> +int hax_enabled(void);
> +
> +#include "hw/hw.h"
> +#include "qemu/bitops.h"
> +#include "exec/memory.h"
> +int hax_vcpu_destroy(CPUState *cpu);
> +void hax_raise_event(CPUState *cpu);
> +void hax_reset_vcpu_state(void *opaque);
> +#include "target-i386/hax-interface.h"
> +#include "target-i386/hax-i386.h"
> +
> +#else /* CONFIG_HAX */
> +
> +#define hax_enabled() (0)
> +
> +#endif /* CONFIG_HAX */
> +
> +#endif /* QEMU_HAX_H */
> diff --git a/target-i386/hax-all.c b/target-i386/hax-all.c
> new file mode 100644
> index 000..

[Qemu-devel] [PATCH 1/2] memory: provide common macros for mtree_print_mr()

2016-12-19 Thread Peter Xu
mtree_print_mr() has some common codes. Generalize it.

Signed-off-by: Peter Xu 
---
 memory.c | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/memory.c b/memory.c
index 33110e9..5dcc2e1 100644
--- a/memory.c
+++ b/memory.c
@@ -2450,6 +2450,13 @@ struct MemoryRegionList {
 
 typedef QTAILQ_HEAD(queue, MemoryRegionList) MemoryRegionListHead;
 
+#define MR_CHAR_RD(mr) ((mr)->romd_mode ? 'R' : '-')
+#define MR_CHAR_WR(mr) (!(mr)->readonly && !((mr)->rom_device && \
+ (mr)->romd_mode) ? 'W' : '-')
+#define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
+   int128_sub((size), int128_one())) : 0)
+#define MTREE_INDENT "  "
+
 static void mtree_print_mr(fprintf_function mon_printf, void *f,
const MemoryRegion *mr, unsigned int level,
hwaddr base,
@@ -2465,7 +2472,7 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 }
 
 for (i = 0; i < level; i++) {
-mon_printf(f, "  ");
+mon_printf(f, MTREE_INDENT);
 }
 
 if (mr->alias) {
@@ -2488,34 +2495,23 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
" (prio %d, %c%c): alias %s @%s " TARGET_FMT_plx
"-" TARGET_FMT_plx "%s\n",
base + mr->addr,
-   base + mr->addr
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   base + mr->addr + MR_SIZE(mr->size),
mr->priority,
-   mr->romd_mode ? 'R' : '-',
-   !mr->readonly && !(mr->rom_device && mr->romd_mode) ? 'W'
-   : '-',
+   MR_CHAR_RD(mr),
+   MR_CHAR_WR(mr),
memory_region_name(mr),
memory_region_name(mr->alias),
mr->alias_offset,
-   mr->alias_offset
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   mr->alias_offset + MR_SIZE(mr->size),
mr->enabled ? "" : " [disabled]");
 } else {
 mon_printf(f,
TARGET_FMT_plx "-" TARGET_FMT_plx " (prio %d, %c%c): 
%s%s\n",
base + mr->addr,
-   base + mr->addr
-   + (int128_nz(mr->size) ?
-  (hwaddr)int128_get64(int128_sub(mr->size,
-  int128_one())) : 0),
+   base + mr->addr + MR_SIZE(mr->size),
mr->priority,
-   mr->romd_mode ? 'R' : '-',
-   !mr->readonly && !(mr->rom_device && mr->romd_mode) ? 'W'
-   : '-',
+   MR_CHAR_RD(mr),
+   MR_CHAR_WR(mr),
memory_region_name(mr),
mr->enabled ? "" : " [disabled]");
 }
-- 
2.7.4




[Qemu-devel] [PATCH 2/2] memory: hmp: dump flat view for 'info mtree'

2016-12-19 Thread Peter Xu
Dumping flat view will be useful to debug the memory rendering logic,
also it'll be much easier with it to know what memory region is handling
what address range.

Signed-off-by: Peter Xu 
---
 memory.c | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/memory.c b/memory.c
index 5dcc2e1..a9154aa 100644
--- a/memory.c
+++ b/memory.c
@@ -2545,6 +2545,36 @@ static void mtree_print_mr(fprintf_function mon_printf, 
void *f,
 }
 }
 
+static void mtree_print_flatview(fprintf_function p, void *f,
+ AddressSpace *as)
+{
+FlatView *view = address_space_get_flatview(as);
+FlatRange *range = &view->ranges[0];
+MemoryRegion *mr;
+int n = view->nr;
+
+if (n <= 0) {
+p(f, MTREE_INDENT "No rendered FlatView for "
+  "address space '%s'\n", as->name);
+return;
+}
+
+p(f, MTREE_INDENT "FlatView (address space '%s'):\n", as->name);
+
+while (n--) {
+mr = range->mr;
+p(f, MTREE_INDENT MTREE_INDENT TARGET_FMT_plx "-"
+  TARGET_FMT_plx " (prio %d, %c%c): %s\n",
+  int128_get64(range->addr.start),
+  int128_get64(range->addr.start) + MR_SIZE(mr->size),
+  mr->priority, MR_CHAR_RD(mr), MR_CHAR_WR(mr),
+  memory_region_name(mr));
+range++;
+}
+
+flatview_unref(view);
+}
+
 void mtree_info(fprintf_function mon_printf, void *f)
 {
 MemoryRegionListHead ml_head;
@@ -2556,6 +2586,7 @@ void mtree_info(fprintf_function mon_printf, void *f)
 QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
 mon_printf(f, "address-space: %s\n", as->name);
 mtree_print_mr(mon_printf, f, as->root, 1, 0, &ml_head);
+mtree_print_flatview(mon_printf, f, as);
 mon_printf(f, "\n");
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH 0/2] memory: extend "info mtree" with flat view dump

2016-12-19 Thread Peter Xu
Each address space has its own flatview. It's another way to observe
memory info besides the default memory region hierachy, for example,
if we want to know which memory region will handle the write to
specific address, a flatview will suite more here than the default
hierachical dump.

I used it to debug a vt-d memory region overlap issue. Do we need
this? I think we can at least consider patch 1, which is a cleanup of
existing codes. :)

Please review. Thanks,

Peter Xu (2):
  memory: provide common macros for mtree_print_mr()
  memory: hmp: dump flat view for 'info mtree'

 memory.c | 65 +---
 1 file changed, 46 insertions(+), 19 deletions(-)

-- 
2.7.4




Re: [Qemu-devel] [PATCH RFC v2 1/4] block: refactor bdrv_next_node for readability

2016-12-19 Thread Stefan Hajnoczi
On Mon, Dec 19, 2016 at 04:51:23PM +0800, Dou Liyang wrote:
> make the bdrv_next_node() clearly and add some comments.
> 
> Signed-off-by: Dou Liyang 
> ---
>  block.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 39ddea3..01c9e51 100644
> --- a/block.c
> +++ b/block.c
> @@ -2931,12 +2931,20 @@ bool bdrv_chain_contains(BlockDriverState *top, 
> BlockDriverState *base)
>  return top != NULL;
>  }
>  
> +/*
> + * Return the BlockDriverStates of all the named nodes.

This sentence describes how this function is used in a loop.  The
semantics of one call to this function are different: only a *single*
named node's BlockDriverState is returned.

> + * If @bs is null, return the first one.
> + * Else, return @bs's next sibling, which may be null.
> + *
> + * To iterate over all BlockDriverStates, do
> + * for (bs = bdrv_next_node(NULL); bs; bs = bdrv_next_node(blk)) {
> + * ...
> + * }
> + */
>  BlockDriverState *bdrv_next_node(BlockDriverState *bs)
>  {
> -if (!bs) {
> -return QTAILQ_FIRST(&graph_bdrv_states);
> -}
> -return QTAILQ_NEXT(bs, node_list);
> +return bs ? QTAILQ_NEXT(bs, node_list)
> +: QTAILQ_FIRST(&graph_bdrv_states);

The conditional operator (?:) is often considered harder to read than an
if statement.  I see no reason to modify the code.


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH] 9pfs: fix crash when fsdev is missing

2016-12-19 Thread Greg Kurz
If the user passes -device virtio-9p without the corresponding -fsdev, QEMU
dereferences a NULL pointer and crashes.

This is a 2.8 regression introduced by commit 702dbcc274e2c.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index faebd91f5fab..68725b7a1c97 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -3521,7 +3521,7 @@ int v9fs_device_realize_common(V9fsState *s, Error **errp)
 rc = 0;
 out:
 if (rc) {
-if (s->ops->cleanup && s->ctx.private) {
+if (s->ops && s->ops->cleanup && s->ctx.private) {
 s->ops->cleanup(&s->ctx);
 }
 g_free(s->tag);




Re: [Qemu-devel] [PATCH v7 08/11] x86, kvm/x86.c: support vcpu preempted check

2016-12-19 Thread Paolo Bonzini


On 19/12/2016 14:56, Pan Xinhui wrote:
> hi, Andrea
> thanks for your reply. :)
> 
> 在 2016/12/19 19:42, Andrea Arcangeli 写道:
>> Hello,
>>
>> On Wed, Nov 02, 2016 at 05:08:35AM -0400, Pan Xinhui wrote:
>>> Support the vcpu_is_preempted() functionality under KVM. This will
>>> enhance lock performance on overcommitted hosts (more runnable vcpus
>>> than physical cpus in the system) as doing busy waits for preempted
>>> vcpus will hurt system performance far worse than early yielding.
>>>
>>> Use one field of struct kvm_steal_time ::preempted to indicate that if
>>> one vcpu is running or not.
>>>
>>> Signed-off-by: Pan Xinhui 
>>> Acked-by: Paolo Bonzini 
>>> ---
>>>  arch/x86/include/uapi/asm/kvm_para.h |  4 +++-
>>>  arch/x86/kvm/x86.c   | 16 
>>>  2 files changed, 19 insertions(+), 1 deletion(-)
>>>
>> [..]
>>> +static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
>>> +{
>>> +if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
>>> +return;
>>> +
>>> +vcpu->arch.st.steal.preempted = 1;
>>> +
>>> +kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
>>> +&vcpu->arch.st.steal.preempted,
>>> +offsetof(struct kvm_steal_time, preempted),
>>> +sizeof(vcpu->arch.st.steal.preempted));
>>> +}
>>> +
>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>  {
>>> +kvm_steal_time_set_preempted(vcpu);
>>>  kvm_x86_ops->vcpu_put(vcpu);
>>>  kvm_put_guest_fpu(vcpu);
>>>  vcpu->arch.last_host_tsc = rdtsc();
>>
>> You can't call kvm_steal_time_set_preempted in atomic context (neither
>> in sched_out notifier nor in vcpu_put() after
>> preempt_disable)). __copy_to_user in kvm_write_guest_offset_cached
>> schedules and locks up the host.
>>
> yes, you are right! :) we have known the problems.
> I am going to introduce something like kvm_write_guest_XXX_atomic and
> use them instead of kvm_write_guest_offset_cached.
> within pagefault_disable()/enable(), we can not call __copy_to_user I
> think.

Since I have already botched the revert and need to resend the fix, I'm
going to apply Andrea's patches.  The preempted flag is only advisory
anyway.

Thanks,

Paolo

>> kvm->srcu (or kvm->slots_lock) is also not taken and
>> kvm_write_guest_offset_cached needs to call kvm_memslots which
>> requires it.
>>
> let me check the details later. thanks for pointing it out.
> 
>> This I think is why postcopy live migration locks up with current
>> upstream, and it doesn't seem related to userfaultfd at all (initially
>> I suspected the vmf conversion but it wasn't that) and in theory it
>> can happen with heavy swapping or page migration too.
>>
>> Just the page is written so frequently it's unlikely to be swapped
>> out. The page being written so frequently also means it's very likely
>> found as re-dirtied when postcopy starts and that pretty much
>> guarantees an userfault will trigger a scheduling event in
>> kvm_steal_time_set_preempted in destination. There are opposite
>> probabilities of reproducing this with swapping vs postcopy live
>> migration.
>>
> 
> Good analyze. :)
> 
>> For now I applied the below two patches, but this just will skip the
>> write and only prevent the host instability as nobody checks the
>> retval of __copy_to_user (what happens to guest after the write is
>> skipped is not as clear and should be investigated, but at least the
>> host will survive and not all guests will care about this flag being
>> updated). For this to be fully safe the preempted information should
>> be just an hint and not fundamental for correct functionality of the
>> guest pv spinlock code.
>>
>> This bug was introduced in commit
>> 0b9f6c4615c993d2b552e0d2bd1ade49b56e5beb in v4.9-rc7.
>>
>> From 458897fd44aa9b91459a006caa4051a7d1628a23 Mon Sep 17 00:00:00 2001
>> From: Andrea Arcangeli 
>> Date: Sat, 17 Dec 2016 18:43:52 +0100
>> Subject: [PATCH 1/2] kvm: fix schedule in atomic in
>>  kvm_steal_time_set_preempted()
>>
>> kvm_steal_time_set_preempted() isn't disabling the pagefaults before
>> calling __copy_to_user and the kernel debug notices.
>>
>> Signed-off-by: Andrea Arcangeli 
>> ---
>>  arch/x86/kvm/x86.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 1f0d238..2dabaeb 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2844,7 +2844,17 @@ static void kvm_steal_time_set_preempted(struct
>> kvm_vcpu *vcpu)
>>
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>> +/*
>> + * Disable page faults because we're in atomic context here.
>> + * kvm_write_guest_offset_cached() would call might_fault()
>> + * that relies on pagefault_disable() to tell if there's a
>> + * bug. NOTE: the write to guest memory may not go through if
>> + * during postcopy live migration or if there's heavy guest
>> + * paging.
>> + */
>> +pagefault_disable();
>>  kvm_steal_time_set_preempted(vcpu);
>>

[Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region

2016-12-19 Thread Peter Xu
This is preparation work to finally enabled dynamic switching ON/OFF for
VT-d protection. The old VT-d codes is using static IOMMU region, and
that won't satisfy vfio-pci device listeners.

Let me explain.

vfio-pci devices depend on the memory region listener and IOMMU replay
mechanism to make sure the device mapping is coherent with the guest
even if there are domain switches. And there are two kinds of domain
switches:

  (1) switch from domain A -> B
  (2) switch from domain A -> no domain (e.g., turn DMAR off)

Case (1) is handled by the context entry invalidation handling by the
VT-d replay logic. What the replay function should do here is to replay
the existing page mappings in domain B.

However for case (2), we don't want to replay any domain mappings - we
just need the default GPA->HPA mappings (the address_space_memory
mapping). And this patch helps on case (2) to build up the mapping
automatically by leveraging the vfio-pci memory listeners.

Another important thing that this patch does is to seperate
IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not
depend on the DMAR region (like before this patch). It should be a
standalone region, and it should be able to be activated without
DMAR (which is a common behavior of Linux kernel - by default it enables
IR while disabled DMAR).

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 75 ---
 hw/i386/trace-events  |  3 ++
 include/hw/i386/intel_iommu.h |  2 ++
 3 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5f3e351..75a3f4e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1179,9 +1179,42 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
 vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
 }
 
+static void vtd_switch_address_space(IntelIOMMUState *s, bool enabled)
+{
+GHashTableIter iter;
+VTDBus *vtd_bus;
+VTDAddressSpace *as;
+int i;
+
+g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
+while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
+for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) {
+as = vtd_bus->dev_as[i];
+if (as == NULL) {
+continue;
+}
+trace_vtd_switch_address_space(pci_bus_num(vtd_bus->bus),
+   VTD_PCI_SLOT(i), VTD_PCI_FUNC(i),
+   enabled);
+if (enabled) {
+memory_region_add_subregion_overlap(&as->root, 0,
+&as->iommu, 2);
+} else {
+memory_region_del_subregion(&as->root, &as->iommu);
+}
+}
+}
+}
+
 /* Handle Translation Enable/Disable */
 static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
 {
+bool old = s->dmar_enabled;
+
+if (old == en) {
+return;
+}
+
 VTD_DPRINTF(CSR, "Translation Enable %s", (en ? "on" : "off"));
 
 if (en) {
@@ -1196,6 +1229,8 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool 
en)
 /* Ok - report back to driver */
 vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0);
 }
+
+vtd_switch_address_space(s, en);
 }
 
 /* Handle Interrupt Remap Enable/Disable */
@@ -2343,15 +2378,47 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 vtd_dev_as->devfn = (uint8_t)devfn;
 vtd_dev_as->iommu_state = s;
 vtd_dev_as->context_cache_entry.context_cache_gen = 0;
+
+/*
+ * When DMAR is disabled, memory region relationships looks
+ * like:
+ *
+ * - (prio 0, RW): vtd_root
+ *  - (prio 1, RW): vtd_sys_alias
+ *  fee0-feef (prio 64, RW): intel_iommu_ir
+ *
+ * When DMAR is disabled, it becomes:
+ *
+ * - (prio 0, RW): vtd_root
+ *  - (prio 2, RW): intel_iommu
+ *  - (prio 1, RW): vtd_sys_alias
+ *  fee0-feef (prio 64, RW): intel_iommu_ir
+ *
+ * The intel_iommu region is dynamically added/removed.
+ */
 memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s),
  &s->iommu_ops, "intel_iommu", UINT64_MAX);
+memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s),
+ "vtd_sys_alias", get_system_memory(),
+ 0, memory_region_size(get_system_memory()));
 memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s),
   &vtd_mem_ir_ops, s, "intel_iommu_ir",
   VTD_INTERRUPT_ADDR_SIZE);
-memory_region_

Re: [Qemu-devel] [PATCH RFC v2 3/4] block/qapi: acquire a reference instead of a lock during querying blockstats

2016-12-19 Thread Stefan Hajnoczi
On Mon, Dec 19, 2016 at 04:51:25PM +0800, Dou Liyang wrote:
> This patch works to improve the performance of the query requests.
> 
> From the commit 13344f3a, it adds a lock to make query-blockstats
> safe by the aio_context_acquire(). the qmp_query_blockstats func
> requires/releases the AioContext lock, which takes some time and
> blocks the I/O processing. It affects the performance, especially
> in the multi-disks guests.
> 
> As the low-level details of block statistics inside QEMU, we can
> acquire a reference instead of the lock.
> 
> Signed-off-by: Dou Liyang 
> ---
>  block/qapi.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)

This patch changes the locking rules for blk_get_stats() (this covers a
lot of fields), bdrv_get_node_name(), and blk_name().

You must document the new locking rules for these fields in
block-backend.h, block_int.h, etc.

> diff --git a/block/qapi.c b/block/qapi.c
> index bc622cd..2262918 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -457,17 +457,21 @@ static BlockStats *bdrv_query_bds_stats(const 
> BlockDriverState *bs,
>  }
>  
>  static BlockStats *bdrv_query_stats(BlockBackend *blk,
> -const BlockDriverState *bs,
> +BlockDriverState *bs,
>  bool query_backing)
>  {
>  BlockStats *s;
>  
> +bdrv_ref(bs);
>  s = bdrv_query_bds_stats(bs, query_backing);
> +bdrv_unref(bs);
>  
>  if (blk) {
> +blk_ref(blk);
>  s->has_device = true;
>  s->device = g_strdup(blk_name(blk));
>  bdrv_query_blk_stats(s->stats, blk);
> +blk_unref(blk);

This does not look correct.  The caller passed in bs and blk so they
must already have a reference.  If not, then what protects bs and blk
from deletion before/after this function is called?

>  }
>  
>  return s;
> @@ -523,13 +527,8 @@ BlockStatsList *qmp_query_blockstats(bool 
> has_query_nodes,
>  
>  while (next_query_bds(&blk, &bs, query_nodes)) {
>  BlockStatsList *info = g_malloc0(sizeof(*info));
> -AioContext *ctx = blk ? blk_get_aio_context(blk)
> -  : bdrv_get_aio_context(bs);
>  
> -aio_context_acquire(ctx);
>  info->value = bdrv_query_stats(blk, bs, !query_nodes);
> -aio_context_release(ctx);
> -
>  *p_next = info;
>  p_next = &info->next;
>  }
> -- 
> 2.5.5
> 
> 
> 
> 


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH RFC v2 0/4] block/qapi: refactor and optimize the qmp_query_blockstats()

2016-12-19 Thread Stefan Hajnoczi
On Mon, Dec 19, 2016 at 04:51:22PM +0800, Dou Liyang wrote:
> These patches aim to refactor the qmp_query_blockstats() and
> improve the performance by reducing the running time of it.
> 
> qmp_query_blockstats() is used to monitor the blockstats, it
> querys all the graph_bdrv_states or monitor_block_backends.
> 
> There are the two jobs:
> 
> 1 For the performance:
> 
> 1.1 the time it takes(ns) in each time:
> the disk numbers | 10| 500
> -
> before these patches | 19429 | 667722 
> after these patches  | 17516 | 557044
> 
> 1.2 the I/O performance is degraded(%) during the monitor:
> 
> the disk numbers | 10| 500
> -
> before these patches | 1.3   | 14.2
> after these patches  | 0.8   | 9.1

Do you know what is consuming the remaining 9.1%?

I'm surprised to see such a high performance impact caused by a QMP
command.

Please post your QEMU command-line.


signature.asc
Description: PGP signature


Re: [Qemu-devel] Can qemu reopen image files?

2016-12-19 Thread Christopher Pereira

Hi Fam, Stefan,

Thanks for answering.

We use "qemu-img convert" to convert a image in the middle of the chain, 
not the active one.
Those images (and the previous ones in the chain) are read-only and 
there should be no risk in converting them:


E.g.: for the following chain:

   base --> snap1 ---> snap2 ---> snap3 (active)

we do "qemu-img convert" on snap2 (readonly), generating a snap2' with 
the same content as snap2.


Then we do the rebase while the VM is suspended to make sure the image 
files are reopened.


Please confirm if I'm missing something here.

We are not using block-commit since we want to have more control (keep 
the base snapshot unmodified, use compression, etc).


Best regards,
Christopher

On 19-Dec-16 10:55, Stefan Hajnoczi wrote:

On Mon, Dec 19, 2016 at 09:07:43AM +0800, Fam Zheng wrote:

On Sun, 12/18 20:52, Christopher Pereira wrote:

Hi,

We are doing a "qemu-img convert" operation (qcow2, with compression) to
shorten the backing-chain (in the middle of the backing-chain).
In order to force qemu to reopen files, we do a save and restore operation.
Is there a faster way to reopen image files using virsh or qemu?

No, don't use qemu-img when the image is in use by QEMU. You want to use
"block-commit" command provided by QMP.

It's worth being more explicit here:

You will corrupt the image file if you access it with another program
while QEMU is using it!

Stefan




Re: [Qemu-devel] [PATCH v7 08/11] x86, kvm/x86.c: support vcpu preempted check

2016-12-19 Thread Pan Xinhui

hi, Andrea
thanks for your reply. :)

在 2016/12/19 19:42, Andrea Arcangeli 写道:

Hello,

On Wed, Nov 02, 2016 at 05:08:35AM -0400, Pan Xinhui wrote:

Support the vcpu_is_preempted() functionality under KVM. This will
enhance lock performance on overcommitted hosts (more runnable vcpus
than physical cpus in the system) as doing busy waits for preempted
vcpus will hurt system performance far worse than early yielding.

Use one field of struct kvm_steal_time ::preempted to indicate that if
one vcpu is running or not.

Signed-off-by: Pan Xinhui 
Acked-by: Paolo Bonzini 
---
 arch/x86/include/uapi/asm/kvm_para.h |  4 +++-
 arch/x86/kvm/x86.c   | 16 
 2 files changed, 19 insertions(+), 1 deletion(-)


[..]

+static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu)
+{
+   if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
+   return;
+
+   vcpu->arch.st.steal.preempted = 1;
+
+   kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.st.stime,
+   &vcpu->arch.st.steal.preempted,
+   offsetof(struct kvm_steal_time, preempted),
+   sizeof(vcpu->arch.st.steal.preempted));
+}
+
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   kvm_steal_time_set_preempted(vcpu);
kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();


You can't call kvm_steal_time_set_preempted in atomic context (neither
in sched_out notifier nor in vcpu_put() after
preempt_disable)). __copy_to_user in kvm_write_guest_offset_cached
schedules and locks up the host.


yes, you are right! :) we have known the problems.
I am going to introduce something like kvm_write_guest_XXX_atomic and use them 
instead of kvm_write_guest_offset_cached.
within pagefault_disable()/enable(), we can not call __copy_to_user I think.


kvm->srcu (or kvm->slots_lock) is also not taken and
kvm_write_guest_offset_cached needs to call kvm_memslots which
requires it.


let me check the details later. thanks for pointing it out.


This I think is why postcopy live migration locks up with current
upstream, and it doesn't seem related to userfaultfd at all (initially
I suspected the vmf conversion but it wasn't that) and in theory it
can happen with heavy swapping or page migration too.

Just the page is written so frequently it's unlikely to be swapped
out. The page being written so frequently also means it's very likely
found as re-dirtied when postcopy starts and that pretty much
guarantees an userfault will trigger a scheduling event in
kvm_steal_time_set_preempted in destination. There are opposite
probabilities of reproducing this with swapping vs postcopy live
migration.



Good analyze. :)


For now I applied the below two patches, but this just will skip the
write and only prevent the host instability as nobody checks the
retval of __copy_to_user (what happens to guest after the write is
skipped is not as clear and should be investigated, but at least the
host will survive and not all guests will care about this flag being
updated). For this to be fully safe the preempted information should
be just an hint and not fundamental for correct functionality of the
guest pv spinlock code.

This bug was introduced in commit
0b9f6c4615c993d2b552e0d2bd1ade49b56e5beb in v4.9-rc7.

From 458897fd44aa9b91459a006caa4051a7d1628a23 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli 
Date: Sat, 17 Dec 2016 18:43:52 +0100
Subject: [PATCH 1/2] kvm: fix schedule in atomic in
 kvm_steal_time_set_preempted()

kvm_steal_time_set_preempted() isn't disabling the pagefaults before
calling __copy_to_user and the kernel debug notices.

Signed-off-by: Andrea Arcangeli 
---
 arch/x86/kvm/x86.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f0d238..2dabaeb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2844,7 +2844,17 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu 
*vcpu)

 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   /*
+* Disable page faults because we're in atomic context here.
+* kvm_write_guest_offset_cached() would call might_fault()
+* that relies on pagefault_disable() to tell if there's a
+* bug. NOTE: the write to guest memory may not go through if
+* during postcopy live migration or if there's heavy guest
+* paging.
+*/
+   pagefault_disable();
kvm_steal_time_set_preempted(vcpu);
+   pagefault_enable();

can we just add this?
I think it is better to modify kvm_steal_time_set_preempted() and let it run 
correctly in atomic context.

thanks
xinhui


kvm_x86_ops->vcpu_put(vcpu);
kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();


From 2845eba22ac74c5e313e3b590f9dac33e1b3cfef Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli 
Date: Sat, 17 Dec 2016 19:13:32 +0100
Subject: [PATCH 2/2] kvm: take srcu lock a

[Qemu-devel] [PATCH] mirror: prevent 'top' mode mirroring when no backing file specified on the destination

2016-12-19 Thread sochin jiang
 Mirroring using 'top' mode without backing file specified on the target can be 
success,
 but end with a disaster.

 For example:
   Migration can be success in this situation while the virtual machine on the 
destination
 is no longer usable because of backing lost.

 Remind the user earlier and return error in case of misoperation.

Signed-off-by: sochin jiang 
---
 block/mirror.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index 301ba92..3476696 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1038,6 +1038,12 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
 error_setg(errp, "Sync mode 'incremental' not supported");
 return;
 }
+if (mode == MIRROR_SYNC_MODE_TOP && !backing_bs(target))
+{
+error_setg(errp, "Target Backing required using Sync mode 'top'");
+return;
+}
+
 is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
 base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
 mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 13/21] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2016-12-19 Thread Max Reitz
On 17.12.2016 15:58, Vladimir Sementsov-Ogievskiy wrote:
> 09.12.2016 20:05, Max Reitz wrote:
>> On 22.11.2016 18:26, Vladimir Sementsov-Ogievskiy wrote:
>>> Realize block bitmap storing interface, to allow qcow2 images store
>>> persistent bitmaps.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>  block/qcow2-bitmap.c | 451 
>>> +++
>>>  block/qcow2.c|   1 +
> 
> [...]
> 
>>> +
>>> +/* store_bitmap_data()
>>> + * Store bitmap to image, filling bitmap table accordingly.
>>> + */
>>> +static uint64_t *store_bitmap_data(BlockDriverState *bs,
>>> +   BdrvDirtyBitmap *bitmap,
>>> +   uint32_t *bitmap_table_size, Error 
>>> **errp)
>>> +{
>>> +int ret;
>>> +BDRVQcow2State *s = bs->opaque;
>>> +int64_t sector;
>>> +uint64_t dsc;
>>> +uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
>>> +const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
>>> +uint8_t *buf = NULL;
>>> +BdrvDirtyBitmapIter *dbi;
>>> +uint64_t *tb;
>>> +uint64_t tb_size =
>>> +size_to_clusters(s,
>>> +bdrv_dirty_bitmap_serialization_size(bitmap, 0, bm_size));
>>> +
>>> +if (tb_size > BME_MAX_TABLE_SIZE ||
>>> +tb_size * s->cluster_size > BME_MAX_PHYS_SIZE) {
>> Alignment to the opening parenthesis, please.
>>
>>> +error_setg(errp, "Bitmap '%s' is too big", bm_name);
>>> +return NULL;
>>> +}
>>> +
>>> +tb = g_try_new0(uint64_t, tb_size);
>>> +if (tb == NULL) {
>>> +error_setg(errp, "No memory");
>>> +return NULL;
>>> +}
>>> +
>>> +dbi = bdrv_dirty_iter_new(bitmap, 0);
>>> +buf = g_malloc(s->cluster_size);
>>> +dsc = disk_sectors_in_bitmap_cluster(s, bitmap);
>>> +
>>> +while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
>>> +uint64_t cluster = sector / dsc;
>>> +uint64_t end, write_size;
>>> +int64_t off;
>>> +
>>> +sector = cluster * dsc;
>>> +end = MIN(bm_size, sector + dsc);
>>> +write_size =
>>> +bdrv_dirty_bitmap_serialization_size(bitmap, sector, end - 
>>> sector);
>>> +
>>> +off = qcow2_alloc_clusters(bs, s->cluster_size);
>>> +if (off < 0) {
>>> +error_setg_errno(errp, -off,
>>> + "Failed to allocate clusters for bitmap '%s'",
>>> + bm_name);
>>> +goto fail;
>>> +}
>>> +tb[cluster] = off;
>> Somehow I would feel better with either an assert(cluster < tb_size);
>> here or an assert(bdrv_nb_sectors(bs) / dsc == tb_size); (plus the error
>> handling for bdrv_nb_sectors()) above the loop.
> 
> assert((bm_size - 1) / dsc == tb_size - 1) seems ok. and no additional
> error handling. Right?

Right, bm_size is already equal to bdrv_nb_sectors(bs), and it's not
necessarily a multiple of dsc. So that should be good. Alternatively, I
think the following would be slightly easier to read:

assert(DIV_ROUND_UP(bm_size, dsc) == tb_size);

> 
>>> +
>>> +bdrv_dirty_bitmap_serialize_part(bitmap, buf, sector, end - 
>>> sector);
>>> +if (write_size < s->cluster_size) {
>>> +memset(buf + write_size, 0, s->cluster_size - write_size);
>>> +}
>> Should we assert that write_size <= s->cluster_size?
> 
> Ok
> 
> [...].
> 
>>
>>> +const char *name = bdrv_dirty_bitmap_name(bitmap);
>>> +uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
>>> +Qcow2Bitmap *bm;
>>> +
>>> +if (!bdrv_dirty_bitmap_get_persistance(bitmap)) {
>>> +continue;
>>> +}
>>> +
>>> +if (++new_nb_bitmaps > QCOW2_MAX_BITMAPS) {
>>> +error_setg(errp, "Too many persistent bitmaps");
>>> +goto fail;
>>> +}
>>> +
>>> +new_dir_size += calc_dir_entry_size(strlen(name), 0);
>>> +if (new_dir_size > QCOW2_MAX_BITMAP_DIRECTORY_SIZE) {
>>> +error_setg(errp, "Too large bitmap directory");
>>> +goto fail;
>>> +}
>> You only need to increment new_nb_bitmaps and increase new_dir_size if
>> the bitmap does not already exist in the image (i.e. if
>> find_bitmap_by_name() below returns NULL).
> 
> Why? No, I need to check the whole sum and the whole size.

If the bitmap already exists, you don't create a new directory entry but
reuse the existing one. Therefore, the number of bitmaps in the image
and the directory size will not grow then.

Max

>>> +
>>> +if (check_constraints_on_bitmap(bs, name, granularity) < 0) {
>>> +error_setg(errp, "Bitmap '%s' doesn't satisfy the constraints",
>>> +   name);
>>> +goto fail;
>>> +}
>>> +
>>> +bm = find_bitmap_by_name(bm_list, name);
>>> +if (bm == NULL) {
>>> +bm = g_new0(Qcow2Bitmap, 1);
>>> +bm->name = g_strdup(name);
>>> +QSIMPLEQ_

Re: [Qemu-devel] [PATCH 13/21] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2016-12-19 Thread Vladimir Sementsov-Ogievskiy

19.12.2016 18:14, Max Reitz wrote:

On 17.12.2016 15:58, Vladimir Sementsov-Ogievskiy wrote:

09.12.2016 20:05, Max Reitz wrote:

On 22.11.2016 18:26, Vladimir Sementsov-Ogievskiy wrote:

Realize block bitmap storing interface, to allow qcow2 images store
persistent bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-bitmap.c | 451 +++
  block/qcow2.c|   1 +

[...]


+
+/* store_bitmap_data()
+ * Store bitmap to image, filling bitmap table accordingly.
+ */
+static uint64_t *store_bitmap_data(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap,
+   uint32_t *bitmap_table_size, Error **errp)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+int64_t sector;
+uint64_t dsc;
+uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
+const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
+uint8_t *buf = NULL;
+BdrvDirtyBitmapIter *dbi;
+uint64_t *tb;
+uint64_t tb_size =
+size_to_clusters(s,
+bdrv_dirty_bitmap_serialization_size(bitmap, 0, bm_size));
+
+if (tb_size > BME_MAX_TABLE_SIZE ||
+tb_size * s->cluster_size > BME_MAX_PHYS_SIZE) {

Alignment to the opening parenthesis, please.


+error_setg(errp, "Bitmap '%s' is too big", bm_name);
+return NULL;
+}
+
+tb = g_try_new0(uint64_t, tb_size);
+if (tb == NULL) {
+error_setg(errp, "No memory");
+return NULL;
+}
+
+dbi = bdrv_dirty_iter_new(bitmap, 0);
+buf = g_malloc(s->cluster_size);
+dsc = disk_sectors_in_bitmap_cluster(s, bitmap);
+
+while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
+uint64_t cluster = sector / dsc;
+uint64_t end, write_size;
+int64_t off;
+
+sector = cluster * dsc;
+end = MIN(bm_size, sector + dsc);
+write_size =
+bdrv_dirty_bitmap_serialization_size(bitmap, sector, end - sector);
+
+off = qcow2_alloc_clusters(bs, s->cluster_size);
+if (off < 0) {
+error_setg_errno(errp, -off,
+ "Failed to allocate clusters for bitmap '%s'",
+ bm_name);
+goto fail;
+}
+tb[cluster] = off;

Somehow I would feel better with either an assert(cluster < tb_size);
here or an assert(bdrv_nb_sectors(bs) / dsc == tb_size); (plus the error
handling for bdrv_nb_sectors()) above the loop.

assert((bm_size - 1) / dsc == tb_size - 1) seems ok. and no additional
error handling. Right?

Right, bm_size is already equal to bdrv_nb_sectors(bs), and it's not
necessarily a multiple of dsc. So that should be good. Alternatively, I
think the following would be slightly easier to read:

assert(DIV_ROUND_UP(bm_size, dsc) == tb_size);


+
+bdrv_dirty_bitmap_serialize_part(bitmap, buf, sector, end - sector);
+if (write_size < s->cluster_size) {
+memset(buf + write_size, 0, s->cluster_size - write_size);
+}

Should we assert that write_size <= s->cluster_size?

Ok

[...].


+const char *name = bdrv_dirty_bitmap_name(bitmap);
+uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
+Qcow2Bitmap *bm;
+
+if (!bdrv_dirty_bitmap_get_persistance(bitmap)) {
+continue;
+}
+
+if (++new_nb_bitmaps > QCOW2_MAX_BITMAPS) {
+error_setg(errp, "Too many persistent bitmaps");
+goto fail;
+}
+
+new_dir_size += calc_dir_entry_size(strlen(name), 0);
+if (new_dir_size > QCOW2_MAX_BITMAP_DIRECTORY_SIZE) {
+error_setg(errp, "Too large bitmap directory");
+goto fail;
+}

You only need to increment new_nb_bitmaps and increase new_dir_size if
the bitmap does not already exist in the image (i.e. if
find_bitmap_by_name() below returns NULL).

Why? No, I need to check the whole sum and the whole size.

If the bitmap already exists, you don't create a new directory entry but
reuse the existing one. Therefore, the number of bitmaps in the image
and the directory size will not grow then.


new_nb_bitmaps is not number of "newly created bitmaps", but just new 
value of field nb_bitmaps, so, all bitmaps - old and new are calculated 
into new_nb_bitmaps. Anyway, this misunderstanding shows that variable 
name is bad..




Max


+
+if (check_constraints_on_bitmap(bs, name, granularity) < 0) {
+error_setg(errp, "Bitmap '%s' doesn't satisfy the constraints",
+   name);
+goto fail;
+}
+
+bm = find_bitmap_by_name(bm_list, name);
+if (bm == NULL) {
+bm = g_new0(Qcow2Bitmap, 1);
+bm->name = g_strdup(name);
+QSIMPLEQ_INSERT_TAIL(bm_list, bm, entry);
+} else {
+if (!(bm->flags & BME_FLAG_IN_USE) && can_write(bs)) {

Shouldn't we error out right at the beginning of this function i

Re: [Qemu-devel] [PATCH] mirror: prevent 'top' mode mirroring when no backing file specified on the destination

2016-12-19 Thread Eric Blake
On 12/19/2016 04:38 PM, sochin jiang wrote:
>  Mirroring using 'top' mode without backing file specified on the target can 
> be success,
>  but end with a disaster.
> 
>  For example:
>Migration can be success in this situation while the virtual machine on 
> the destination
>  is no longer usable because of backing lost.
> 

I think this is a premature policy decision.  Even though the user can
abuse it to lose data, I think there is still enough technical reason
for why a user might have a valid use case for doing this (perhaps for
creating incremental backups, where the user plans on re-chaining data
back together using 'qemu-img rebase -u'), so I don't think we should
forbid it at the qemu level.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v1 0/2] Add Atmel I2C TPM AT97SC3204T emulated device

2016-12-19 Thread Peter Maydell
On 19 December 2016 at 13:55, Corey Minyard  wrote:
> On 12/18/2016 07:47 PM, Alastair D'Silva wrote:
>>
>> On Fri, 2016-12-16 at 17:35 +, Peter Maydell wrote:
>>> Our current API seems to envisage that the slave can return a
>>> negative value from I2CSlaveClass::recv instead of a data byte,
>>> but I'm not sure what this means in the i2c protocol.
>>
>> Negative values are propagated upwards, where they are treated as
>> errors, eg, in hw/i2c/aspeed_i2c.c:aspeed_i2c_bus_handle_cmd():
>>
>> int ret = i2c_recv(bus->bus);
>> if (ret < 0) {
>>  qemu_log_mask(LOG_GUEST_ERROR, "%s: read failed\n", __func__);
>>  ret = 0xff;
>> }
>>
>> The call to i2c_recv is too late to issue the NAK, I believe they occur
>> during the start_transfer() call.

OK, so if returning negative values from i2c_recv() isn't
the device saying "I am NAKing this", what *do* they mean?

>>> If I understand your patch correctly, this is adding support
>>> for the slave refusing to ACK when the master sends out the
>>> slave address and r/w bit. I think that makes sense, but rather
>>> than having a state flag in the I2CSlave struct, we should
>>> change the prototype of the I2CSlaveClass event method so that
>>> it can return a value indicating ack or nak.
>>>
>> Hmm, this could end up being quite an invasive change, but ultimately
>> more elegant. I'm not sure which way the community prefers.
>
>
> I have a patch that adds a check_event() handler along side the event()
> handler.
> If a device wants to send a NAK, it can implement check_event() instead of
> event() and return non-zero to NAK.
>
> I toyed with just changing all the event() calls, but there are a bunch.
> This seemed
> like the better approach.  I can send if you like.

It looks like there are only a dozen or so. I think it would
be better for the long term just to change the event calls.
We should also document in the comments in the I2CSlaveClass
struct definition exactly what the semantics of the various
functions are.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] mirror: prevent 'top' mode mirroring when no backing file specified on the destination

2016-12-19 Thread Max Reitz
On 19.12.2016 23:38, sochin jiang wrote:
>  Mirroring using 'top' mode without backing file specified on the target can 
> be success,
>  but end with a disaster.
> 
>  For example:
>Migration can be success in this situation while the virtual machine on 
> the destination
>  is no longer usable because of backing lost.
> 
>  Remind the user earlier and return error in case of misoperation.
> 
> Signed-off-by: sochin jiang 
> ---
>  block/mirror.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 301ba92..3476696 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1038,6 +1038,12 @@ void mirror_start(const char *job_id, BlockDriverState 
> *bs,
>  error_setg(errp, "Sync mode 'incremental' not supported");
>  return;
>  }
> +if (mode == MIRROR_SYNC_MODE_TOP && !backing_bs(target))
> +{

Syntactic issue: The opening brace should be on the same line as the "if".

> +error_setg(errp, "Target Backing required using Sync mode 'top'");
> +return;
> +}
> +
>  is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
>  base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
>  mirror_start_job(job_id, bs, BLOCK_JOB_DEFAULT, target, replaces,

General issue: For blockdev-mirror, I think this is a legitimate use
case. As far as I'm aware, libvirt uses the mirror block job for all
backups -- they do so by cancelling the block job after the
BLOCK_JOB_READY event instead of letting it complete. So a user might
want to mirror some drive somewhere else (in sync=top mode, with the new
location not yet having assigned a backing file to it), then cancel the
block job after BLOCK_JOB_READY and only assign the backing file at some
later point.

An even greater issue is that qmp_drive_mirror() opens the target BDS
with BDRV_O_NO_BACKING. Therefore, this will always error out with
drive-mirror and sync=top (unless the source image does not have a
backing file, in which case the sync=top will silently be converted to
sync=full).

For drive-mirror, the target's backing chain will not be set up until
mirror_complete().

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 13/21] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2016-12-19 Thread Max Reitz
On 19.12.2016 16:26, Vladimir Sementsov-Ogievskiy wrote:
> 19.12.2016 18:14, Max Reitz wrote:
>> On 17.12.2016 15:58, Vladimir Sementsov-Ogievskiy wrote:
>>> 09.12.2016 20:05, Max Reitz wrote:
 On 22.11.2016 18:26, Vladimir Sementsov-Ogievskiy wrote:
> Realize block bitmap storing interface, to allow qcow2 images store
> persistent bitmaps.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>   block/qcow2-bitmap.c | 451
> +++
>   block/qcow2.c|   1 +
>>> [...]
>>>
> +
> +/* store_bitmap_data()
> + * Store bitmap to image, filling bitmap table accordingly.
> + */
> +static uint64_t *store_bitmap_data(BlockDriverState *bs,
> +   BdrvDirtyBitmap *bitmap,
> +   uint32_t *bitmap_table_size,
> Error **errp)
> +{
> +int ret;
> +BDRVQcow2State *s = bs->opaque;
> +int64_t sector;
> +uint64_t dsc;
> +uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
> +const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
> +uint8_t *buf = NULL;
> +BdrvDirtyBitmapIter *dbi;
> +uint64_t *tb;
> +uint64_t tb_size =
> +size_to_clusters(s,
> +bdrv_dirty_bitmap_serialization_size(bitmap, 0,
> bm_size));
> +
> +if (tb_size > BME_MAX_TABLE_SIZE ||
> +tb_size * s->cluster_size > BME_MAX_PHYS_SIZE) {
 Alignment to the opening parenthesis, please.

> +error_setg(errp, "Bitmap '%s' is too big", bm_name);
> +return NULL;
> +}
> +
> +tb = g_try_new0(uint64_t, tb_size);
> +if (tb == NULL) {
> +error_setg(errp, "No memory");
> +return NULL;
> +}
> +
> +dbi = bdrv_dirty_iter_new(bitmap, 0);
> +buf = g_malloc(s->cluster_size);
> +dsc = disk_sectors_in_bitmap_cluster(s, bitmap);
> +
> +while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
> +uint64_t cluster = sector / dsc;
> +uint64_t end, write_size;
> +int64_t off;
> +
> +sector = cluster * dsc;
> +end = MIN(bm_size, sector + dsc);
> +write_size =
> +bdrv_dirty_bitmap_serialization_size(bitmap, sector,
> end - sector);
> +
> +off = qcow2_alloc_clusters(bs, s->cluster_size);
> +if (off < 0) {
> +error_setg_errno(errp, -off,
> + "Failed to allocate clusters for
> bitmap '%s'",
> + bm_name);
> +goto fail;
> +}
> +tb[cluster] = off;
 Somehow I would feel better with either an assert(cluster < tb_size);
 here or an assert(bdrv_nb_sectors(bs) / dsc == tb_size); (plus the
 error
 handling for bdrv_nb_sectors()) above the loop.
>>> assert((bm_size - 1) / dsc == tb_size - 1) seems ok. and no additional
>>> error handling. Right?
>> Right, bm_size is already equal to bdrv_nb_sectors(bs), and it's not
>> necessarily a multiple of dsc. So that should be good. Alternatively, I
>> think the following would be slightly easier to read:
>>
>> assert(DIV_ROUND_UP(bm_size, dsc) == tb_size);
>>
> +
> +bdrv_dirty_bitmap_serialize_part(bitmap, buf, sector, end
> - sector);
> +if (write_size < s->cluster_size) {
> +memset(buf + write_size, 0, s->cluster_size -
> write_size);
> +}
 Should we assert that write_size <= s->cluster_size?
>>> Ok
>>>
>>> [...].
>>>
> +const char *name = bdrv_dirty_bitmap_name(bitmap);
> +uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
> +Qcow2Bitmap *bm;
> +
> +if (!bdrv_dirty_bitmap_get_persistance(bitmap)) {
> +continue;
> +}
> +
> +if (++new_nb_bitmaps > QCOW2_MAX_BITMAPS) {
> +error_setg(errp, "Too many persistent bitmaps");
> +goto fail;
> +}
> +
> +new_dir_size += calc_dir_entry_size(strlen(name), 0);
> +if (new_dir_size > QCOW2_MAX_BITMAP_DIRECTORY_SIZE) {
> +error_setg(errp, "Too large bitmap directory");
> +goto fail;
> +}
 You only need to increment new_nb_bitmaps and increase new_dir_size if
 the bitmap does not already exist in the image (i.e. if
 find_bitmap_by_name() below returns NULL).
>>> Why? No, I need to check the whole sum and the whole size.
>> If the bitmap already exists, you don't create a new directory entry but
>> reuse the existing one. Therefore, the number of bitmaps in the image
>> and the directory size will not grow then.
> 
> new_nb_bitmaps is not number of "newly created bitmaps", but just new
> value of field nb_bitmaps, so, all bit

[Qemu-devel] [PATCH v2 1/1] virtio: fix vq->inuse recalc after migr

2016-12-19 Thread Halil Pasic
Correct recalculation of vq->inuse after migration for the corner case
where the avail_idx has already wrapped but used_idx not yet.

Also change the type of the VirtQueue.inuse to unsigned int. This is
done to be consistent with other members representing sizes (VRing.num),
and because C99 guarantees max ring size < UINT_MAX but does not
guarantee max ring size < INT_MAX.

Signed-off-by: Halil Pasic 
Fixes: bccdef6b ("virtio: recalculate vq->inuse after migration")
CC: qemu-sta...@nongnu.org
---
v1 -> v2:
* Reworded comment explaining the cast. (thanks Stefan)
* Changed type of vq->inuse from signed to unsigned
* Fixed misnomer %s/vring->inuse/vq->inuse/
---
 hw/virtio/virtio.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 1af2de2..e37641a 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -92,7 +92,7 @@ struct VirtQueue
 
 uint16_t queue_index;
 
-int inuse;
+unsigned int inuse;
 
 uint16_t vector;
 VirtIOHandleOutput handle_output;
@@ -1855,9 +1855,11 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
 /*
  * Some devices migrate VirtQueueElements that have been popped
  * from the avail ring but not yet returned to the used ring.
+ * Since max ring size < UINT16_MAX it's safe to use modulo
+ * UINT16_MAX + 1 subtraction.
  */
-vdev->vq[i].inuse = vdev->vq[i].last_avail_idx -
-vdev->vq[i].used_idx;
+vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
+vdev->vq[i].used_idx);
 if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
 error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
  "used_idx 0x%x",
-- 
2.8.4




Re: [Qemu-devel] Can qemu reopen image files?

2016-12-19 Thread Eric Blake
On 12/19/2016 09:03 AM, Christopher Pereira wrote:
> Hi Fam, Stefan,
> 
> Thanks for answering.
> 
> We use "qemu-img convert" to convert a image in the middle of the chain,
> not the active one.
> Those images (and the previous ones in the chain) are read-only and
> there should be no risk in converting them:
> 
> E.g.: for the following chain:
> 
>base --> snap1 ---> snap2 ---> snap3 (active)

We typically write that as:

base <- snap1 <- snap2 <- snap3

where the <- operator can be pronounced "backs", and where the direction
of the arrow shows that it is snap1 that depends on base (and not base
that depends on snap1).

> 
> we do "qemu-img convert" on snap2 (readonly), generating a snap2' with
> the same content as snap2.

That part is fine, but why not use QMP to let qemu generate a single
file with the same contents, instead of delegating to qemu-img?

> 
> Then we do the rebase while the VM is suspended to make sure the image
> files are reopened.
> 
> Please confirm if I'm missing something here.

That part is where you are liable to break things.  Qemu does NOT have a
graceful way to reopen the backing chain, so rebasing snap3 to point to
snap2' behind qemu's back is asking for problems.  Since qemu may be
caching things it has already learned about snap2, you have invalidated
that cached data by making snap3 point to snap2', but have no way to
force qemu to reread the backing chain to start reading from snap2'.

But if you would use qemu block-commit to merge "base <- snap1 <- snap2"
into "base'", then the block-commit command will gracefully take care of
rewriting the backing image of snap3 to now point to base.  You achieve
the same result, but without the need for an external qemu-img call,
without the need to pause the guest, and the only thing you have to be
careful of is dealing with the difference in file names.

Or, if you don't want to merge into "base'", you can use block-stream to
merge the other direction, so that "base <- snap1 <- snap2" is converted
into "snap2'" - but that depends on patches that were only barely added
in qemu 2.8 (intermediate block-commit has existed a lot longer than
intermediate block-stream).  But the point remains that you are still
using qemu to do the work, and therefore with no external qemu-img
process interfering with the chain, you don't need any guest downtime or
any risk of breaking qemu operation by invalidating data it may have cached.

> 
> We are not using block-commit since we want to have more control (keep
> the base snapshot unmodified, use compression, etc).

If block-commit and block-stream don't have enough power to do what you
want, then we should patch them to expose that power, rather than
worrying about how to use qemu-img to modify the backing chain behind
qemu's back.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 13/21] qcow2: add .bdrv_store_persistent_dirty_bitmaps()

2016-12-19 Thread Vladimir Sementsov-Ogievskiy

19.12.2016 18:34, Max Reitz wrote:

On 19.12.2016 16:26, Vladimir Sementsov-Ogievskiy wrote:

19.12.2016 18:14, Max Reitz wrote:

On 17.12.2016 15:58, Vladimir Sementsov-Ogievskiy wrote:

09.12.2016 20:05, Max Reitz wrote:

On 22.11.2016 18:26, Vladimir Sementsov-Ogievskiy wrote:

Realize block bitmap storing interface, to allow qcow2 images store
persistent bitmaps.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   block/qcow2-bitmap.c | 451
+++
   block/qcow2.c|   1 +

[...]


+
+/* store_bitmap_data()
+ * Store bitmap to image, filling bitmap table accordingly.
+ */
+static uint64_t *store_bitmap_data(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap,
+   uint32_t *bitmap_table_size,
Error **errp)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+int64_t sector;
+uint64_t dsc;
+uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);
+const char *bm_name = bdrv_dirty_bitmap_name(bitmap);
+uint8_t *buf = NULL;
+BdrvDirtyBitmapIter *dbi;
+uint64_t *tb;
+uint64_t tb_size =
+size_to_clusters(s,
+bdrv_dirty_bitmap_serialization_size(bitmap, 0,
bm_size));
+
+if (tb_size > BME_MAX_TABLE_SIZE ||
+tb_size * s->cluster_size > BME_MAX_PHYS_SIZE) {

Alignment to the opening parenthesis, please.


+error_setg(errp, "Bitmap '%s' is too big", bm_name);
+return NULL;
+}
+
+tb = g_try_new0(uint64_t, tb_size);
+if (tb == NULL) {
+error_setg(errp, "No memory");
+return NULL;
+}
+
+dbi = bdrv_dirty_iter_new(bitmap, 0);
+buf = g_malloc(s->cluster_size);
+dsc = disk_sectors_in_bitmap_cluster(s, bitmap);
+
+while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
+uint64_t cluster = sector / dsc;
+uint64_t end, write_size;
+int64_t off;
+
+sector = cluster * dsc;
+end = MIN(bm_size, sector + dsc);
+write_size =
+bdrv_dirty_bitmap_serialization_size(bitmap, sector,
end - sector);
+
+off = qcow2_alloc_clusters(bs, s->cluster_size);
+if (off < 0) {
+error_setg_errno(errp, -off,
+ "Failed to allocate clusters for
bitmap '%s'",
+ bm_name);
+goto fail;
+}
+tb[cluster] = off;

Somehow I would feel better with either an assert(cluster < tb_size);
here or an assert(bdrv_nb_sectors(bs) / dsc == tb_size); (plus the
error
handling for bdrv_nb_sectors()) above the loop.

assert((bm_size - 1) / dsc == tb_size - 1) seems ok. and no additional
error handling. Right?

Right, bm_size is already equal to bdrv_nb_sectors(bs), and it's not
necessarily a multiple of dsc. So that should be good. Alternatively, I
think the following would be slightly easier to read:

assert(DIV_ROUND_UP(bm_size, dsc) == tb_size);


+
+bdrv_dirty_bitmap_serialize_part(bitmap, buf, sector, end
- sector);
+if (write_size < s->cluster_size) {
+memset(buf + write_size, 0, s->cluster_size -
write_size);
+}

Should we assert that write_size <= s->cluster_size?

Ok

[...].


+const char *name = bdrv_dirty_bitmap_name(bitmap);
+uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
+Qcow2Bitmap *bm;
+
+if (!bdrv_dirty_bitmap_get_persistance(bitmap)) {
+continue;
+}
+
+if (++new_nb_bitmaps > QCOW2_MAX_BITMAPS) {
+error_setg(errp, "Too many persistent bitmaps");
+goto fail;
+}
+
+new_dir_size += calc_dir_entry_size(strlen(name), 0);
+if (new_dir_size > QCOW2_MAX_BITMAP_DIRECTORY_SIZE) {
+error_setg(errp, "Too large bitmap directory");
+goto fail;
+}

You only need to increment new_nb_bitmaps and increase new_dir_size if
the bitmap does not already exist in the image (i.e. if
find_bitmap_by_name() below returns NULL).

Why? No, I need to check the whole sum and the whole size.

If the bitmap already exists, you don't create a new directory entry but
reuse the existing one. Therefore, the number of bitmaps in the image
and the directory size will not grow then.

new_nb_bitmaps is not number of "newly created bitmaps", but just new
value of field nb_bitmaps, so, all bitmaps - old and new are calculated
into new_nb_bitmaps. Anyway, this misunderstanding shows that variable
name is bad..

Yes. But when you store a bitmap of the same name as an existing one,
you are replacing it. The number of bitmaps does not grow in that case.


Oh, I'm stupid)) I see now, you are right.



Max




--
Best regards,
Vladimir




[Qemu-devel] [Bug 1651167] [NEW] hw/ipmi/isa_ipmi_bt.c:283: suspect use of macro ?

2016-12-19 Thread dcb
Public bug reported:

I just had a go at compiling qemu trunk with
llvm trunk. It said:

hw/ipmi/isa_ipmi_bt.c:283:31: warning: logical not is only applied to
the left hand side of this bitwise operator [-Wlogical-not-parentheses]

Source code is

   IPMI_BT_SET_HBUSY(ib->control_reg,
  !IPMI_BT_GET_HBUSY(ib->control_reg));

That use of ! causes trouble. The SET and GET
macros are defined as:

#define IPMI_BT_GET_HBUSY(d)   (((d) >> IPMI_BT_HBUSY_BIT) & 0x1)
#define IPMI_BT_SET_HBUSY(d, v)(d) = (((d) & ~IPMI_BT_HBUSY_MASK) | \
   (((v & 1) << IPMI_BT_HBUSY_BIT)))

I can make the compiler shut up by adding extra () in the last
use of v in the SET macro, like this:

#define IPMI_BT_SET_HBUSY(d, v)(d) = (((d) & ~IPMI_BT_HBUSY_MASK) | \
   v) & 1) << IPMI_BT_HBUSY_BIT)))

I think this is standard good practice when using macro parameters
anyway.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1651167

Title:
  hw/ipmi/isa_ipmi_bt.c:283: suspect use of macro ?

Status in QEMU:
  New

Bug description:
  I just had a go at compiling qemu trunk with
  llvm trunk. It said:

  hw/ipmi/isa_ipmi_bt.c:283:31: warning: logical not is only applied to
  the left hand side of this bitwise operator [-Wlogical-not-
  parentheses]

  Source code is

 IPMI_BT_SET_HBUSY(ib->control_reg,
!IPMI_BT_GET_HBUSY(ib->control_reg));

  That use of ! causes trouble. The SET and GET
  macros are defined as:

  #define IPMI_BT_GET_HBUSY(d)   (((d) >> IPMI_BT_HBUSY_BIT) & 0x1)
  #define IPMI_BT_SET_HBUSY(d, v)(d) = (((d) & ~IPMI_BT_HBUSY_MASK) | \
 (((v & 1) << IPMI_BT_HBUSY_BIT)))

  I can make the compiler shut up by adding extra () in the last
  use of v in the SET macro, like this:

  #define IPMI_BT_SET_HBUSY(d, v)(d) = (((d) & ~IPMI_BT_HBUSY_MASK) | \
 v) & 1) << IPMI_BT_HBUSY_BIT)))

  I think this is standard good practice when using macro parameters
  anyway.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1651167/+subscriptions



Re: [Qemu-devel] [PATCH v1 0/2] Add Atmel I2C TPM AT97SC3204T emulated device

2016-12-19 Thread Corey Minyard

On 12/19/2016 09:31 AM, Peter Maydell wrote:

On 19 December 2016 at 13:55, Corey Minyard  wrote:

On 12/18/2016 07:47 PM, Alastair D'Silva wrote:

On Fri, 2016-12-16 at 17:35 +, Peter Maydell wrote:

Our current API seems to envisage that the slave can return a
negative value from I2CSlaveClass::recv instead of a data byte,
but I'm not sure what this means in the i2c protocol.

Negative values are propagated upwards, where they are treated as
errors, eg, in hw/i2c/aspeed_i2c.c:aspeed_i2c_bus_handle_cmd():

int ret = i2c_recv(bus->bus);
if (ret < 0) {
  qemu_log_mask(LOG_GUEST_ERROR, "%s: read failed\n", __func__);
  ret = 0xff;
}

The call to i2c_recv is too late to issue the NAK, I believe they occur
during the start_transfer() call.

OK, so if returning negative values from i2c_recv() isn't
the device saying "I am NAKing this", what *do* they mean?


It actually makes no sense.  In real I2C hardware, the receiver of the 
byte always
does the ACK/NAK.  The NAK is sent by the receiver of the data to signal 
that it

has finished the transfer.

So when i2c_recv() is called, it's actually the I2c device doing a 
transmit and the
i2c master receiving the data.  So the device cannot send a NAK in that 
scenario.


The start conditions and address are always send by the master to the 
device, so

it makes sense for the start events to be able to return a NAK.

And for i2c_send(), the device should respond with a NAK to terminate the
transfer.  So it makes sense for i2c_send() to be able to return a NAK.  
This

doesn't appear to be properly implemented in a number of places.


If I understand your patch correctly, this is adding support
for the slave refusing to ACK when the master sends out the
slave address and r/w bit. I think that makes sense, but rather
than having a state flag in the I2CSlave struct, we should
change the prototype of the I2CSlaveClass event method so that
it can return a value indicating ack or nak.


Hmm, this could end up being quite an invasive change, but ultimately
more elegant. I'm not sure which way the community prefers.


I have a patch that adds a check_event() handler along side the event()
handler.
If a device wants to send a NAK, it can implement check_event() instead of
event() and return non-zero to NAK.

I toyed with just changing all the event() calls, but there are a bunch.
This seemed
like the better approach.  I can send if you like.

It looks like there are only a dozen or so. I think it would
be better for the long term just to change the event calls.
We should also document in the comments in the I2CSlaveClass
struct definition exactly what the semantics of the various
functions are.


Ok, my memory didn't serve me correctly.  I'll rework my patch for that.

Thanks,

-corey


thanks
-- PMM






Re: [Qemu-devel] Can qemu reopen image files?

2016-12-19 Thread Christopher Pereira

Hi Eric,

Thanks for your great answer.

On 19-Dec-16 12:48, Eric Blake wrote:




Then we do the rebase while the VM is suspended to make sure the image
files are reopened.

That part is where you are liable to break things.  Qemu does NOT have a
graceful way to reopen the backing chain, so rebasing snap3 to point to
snap2' behind qemu's back is asking for problems.  Since qemu may be
caching things it has already learned about snap2, you have invalidated
that cached data by making snap3 point to snap2', but have no way to
force qemu to reread the backing chain to start reading from snap2'.
We are actually doing a save, rebase and restore to reopen the backing 
chain.

We only touch files (rebase) while the VM is down.
Can you please confirm this is 100% safe?


Or, if you don't want to merge into "base'", you can use block-stream to
merge the other direction, so that "base <- snap1 <- snap2" is converted
into "snap2'" - but that depends on patches that were only barely added
in qemu 2.8 (intermediate block-commit has existed a lot longer than
intermediate block-stream).  But the point remains that you are still
using qemu to do the work, and therefore with no external qemu-img
process interfering with the chain, you don't need any guest downtime or
any risk of breaking qemu operation by invalidating data it may have cached.
Right. Since images are backed up remotely, we don't want to merge into 
base nor touch the backing chain at all (only the active snapshot should 
be modified). This is to keep things simple and avoid to re-syncs of 
images (remote backups).


Besides, we don't want to merge the whole backing chain, but an 
intermediate point, so it seems that the clean way is to use the 
"intermediate block-stream" feature.


We didn't try it, because when we researched we got the impression that 
the patches were not stable yet or not included in the qemu versions 
shipped with CentOS, so we went with 'qemu-img convert' because we 
needed something known, simple and stable (we are dealing with critical 
information for gov. orgs.).



If block-commit and block-stream don't have enough power to do what you
want, then we should patch them to expose that power, rather than
worrying about how to use qemu-img to modify the backing chain behind
qemu's back.

"intermediate block-stream" seems to be the right solution for our use case.
Does it also allow QCOW2 compression?
Compression is interesting, especially when files are sync'ed via network.




[Qemu-devel] [PATCH v5 0/4] Add HAX support

2016-12-19 Thread Vincent Palatin
I took a stab at trying to rebase/upstream the support for Intel HAXM.
(Hardware Accelerated Execution Manager).
Intel HAX is kernel-based hardware acceleration module for Windows and MacOSX.

I have based my work on the last version of the source code I found:
the emu-2.2-release branch in the external/qemu-android repository as used by
the Android emulator.
In patch 2/4, I have forward-ported the core HAX code from there.
It has been modified to build and run along with the current code base.
It has been simplifying by removing non-UG hardware support / Darwin support /
Android-specific leftovers.

Intel nicely fixed the 2 remaining issues on the kernel side:
- the spurious request  to emulate MMIO access in un-paged mode is no longer
  happening (as seen in iPXE).
- the kernel API now provides a way to remove a memory mapping, so we can
  do a proper MemoryListener implementation.
They will publish soon a new version 6.1.0 of the HAX kernel module including
the fixes once their QA cycle is completed.
Thanks Yu Ning for making this happen.

In patch 3/4, I have put the plumbing into the QEMU code base, I did some clean
up there and it is reasonably intrusive: i.e.
 Makefile.target   |  1 +
 configure | 18 ++
 cpus.c| 87 ++-
 exec.c| 16 +
 hw/intc/apic_common.c |  3 +-
 include/qom/cpu.h |  5 +++
 include/sysemu/hw_accel.h |  9 +
 qemu-options.hx   | 11 ++
 target-i386/Makefile.objs |  4 +++
 vl.c  | 15 ++--
 10 files changed, 164 insertions(+), 5 deletions(-)

The patch 1/4 just extracts from KVM specific header the cpu_synchronize_
functions that HAX is also using.

The patch 4/4 is the Darwin support. This part is only lightly tested for now,
so it can be considered as 'experimental'.

I have tested the end result on a Windows 10 Pro machine (with UG support)
with the Intel HAXM module dev version and a large ChromiumOS x86_64 image to
exercise various code paths. It looks stable.
I also did a quick regression testing of the integration by running a Linux
build with KVM enabled.

Changes from v4 to v5:
- update HAX fastmmio API with the new MMIO to MMIO transfer.

Changes from v3 to v4:
- add RAM unmapping in the MemoryListener thanks to new API in HAX module 6.1.0
  and re-wrote the memory mappings management to deal with this.
- marked no longer used MMIO emulation as unsupported.
- clean-up a few left-overs from removed code.
- re-add an experimental version of the Darwin support.

Changes from v2 to v3:
- fix saving/restoring FPU registers as suggested by Paolo.
- fix Windows build on all targets as contributed by Stefan Weil.
- clean-up IO / MMIO emulation.
- more clean-up of emulation leftovers.

Changes from v1 to v2:
- fix all style issues in the original code to get it through checkpatch.pl.
- remove Darwin support, it was barely tested and not fully functional.
- remove the support for CPU without UG mode.
- fix most review comments

Vincent Palatin (4):
  kvm: move cpu synchronization code
  target-i386: Add Intel HAX files
  Plumb the HAXM-based hardware acceleration support
  hax: add Darwin support

 Makefile.target |1 +
 configure   |   18 +
 cpus.c  |   93 +++-
 exec.c  |   16 +
 gdbstub.c   |1 +
 hax-stub.c  |   39 ++
 hw/i386/kvm/apic.c  |1 +
 hw/i386/kvmvapic.c  |1 +
 hw/intc/apic_common.c   |3 +-
 hw/misc/vmport.c|2 +-
 hw/ppc/pnv_xscom.c  |2 +-
 hw/ppc/ppce500_spin.c   |4 +-
 hw/ppc/spapr.c  |2 +-
 hw/ppc/spapr_hcall.c|2 +-
 hw/s390x/s390-pci-inst.c|1 +
 include/qom/cpu.h   |5 +
 include/sysemu/hax.h|   56 +++
 include/sysemu/hw_accel.h   |   48 ++
 include/sysemu/kvm.h|   23 -
 monitor.c   |2 +-
 qemu-options.hx |   11 +
 qom/cpu.c   |2 +-
 target-arm/cpu.c|2 +-
 target-i386/Makefile.objs   |7 +
 target-i386/hax-all.c   | 1155 +++
 target-i386/hax-darwin.c|  316 
 target-i386/hax-darwin.h|   63 +++
 target-i386/hax-i386.h  |   94 
 target-i386/hax-interface.h |  361 ++
 target-i386/hax-mem.c   |  271 ++
 target-i386/hax-windows.c   |  479 ++
 target-i386/hax-windows.h   |   89 
 target-i386/helper.c|1 +
 target-i386/kvm.c   |1 +
 target-ppc/mmu-hash64.c |2 +-
 target-ppc/translate_init.c |2 +-
 target-s390x/gdbstub.c  |1 +
 vl.c|   15 +-
 38 files changed, 3153 insertions(+), 39 deletions(-)
 create mode 100644 hax-stub.c
 create mode 100644 include/sysemu/hax.h
 create mode 100644 include/sysemu/hw_accel.h
 create mode 1

[Qemu-devel] [PATCH v5 3/4] Plumb the HAXM-based hardware acceleration support

2016-12-19 Thread Vincent Palatin
Use the Intel HAX is kernel-based hardware acceleration module for
Windows (similar to KVM on Linux).

Based on the "target-i386: Add Intel HAX to android emulator" patch
from David Chou 

Signed-off-by: Vincent Palatin 
---
 Makefile.target   |  1 +
 configure | 18 ++
 cpus.c| 87 ++-
 exec.c| 16 +
 hw/intc/apic_common.c |  3 +-
 include/qom/cpu.h |  5 +++
 include/sysemu/hw_accel.h |  9 +
 qemu-options.hx   | 11 ++
 target-i386/Makefile.objs |  4 +++
 vl.c  | 15 ++--
 10 files changed, 164 insertions(+), 5 deletions(-)

diff --git a/Makefile.target b/Makefile.target
index 7a5080e..dab81e7 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -96,6 +96,7 @@ obj-y += target-$(TARGET_BASE_ARCH)/
 obj-y += disas.o
 obj-y += tcg-runtime.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
+obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 
 obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o
diff --git a/configure b/configure
index 3770d7c..ba32bea 100755
--- a/configure
+++ b/configure
@@ -230,6 +230,7 @@ vhost_net="no"
 vhost_scsi="no"
 vhost_vsock="no"
 kvm="no"
+hax="no"
 colo="yes"
 rdma=""
 gprof="no"
@@ -563,6 +564,7 @@ CYGWIN*)
 ;;
 MINGW32*)
   mingw32="yes"
+  hax="yes"
   audio_possible_drivers="dsound sdl"
   if check_include dsound.h; then
 audio_drv_list="dsound"
@@ -612,6 +614,7 @@ OpenBSD)
 Darwin)
   bsd="yes"
   darwin="yes"
+  hax="yes"
   LDFLAGS_SHARED="-bundle -undefined dynamic_lookup"
   if [ "$cpu" = "x86_64" ] ; then
 QEMU_CFLAGS="-arch x86_64 $QEMU_CFLAGS"
@@ -921,6 +924,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-hax) hax="no"
+  ;;
+  --enable-hax) hax="yes"
+  ;;
   --disable-colo) colo="no"
   ;;
   --enable-colo) colo="yes"
@@ -1373,6 +1380,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   fdt fdt device tree
   bluez   bluez stack connectivity
   kvm KVM acceleration support
+  hax HAX acceleration support
   coloCOarse-grain LOck-stepping VM for Non-stop Service
   rdmaRDMA-based migration support
   vde support for vde network
@@ -5051,6 +5059,7 @@ echo "ATTR/XATTR support $attr"
 echo "Install blobs $blobs"
 echo "KVM support   $kvm"
 echo "COLO support  $colo"
+echo "HAX support   $hax"
 echo "RDMA support  $rdma"
 echo "TCG interpreter   $tcg_interpreter"
 echo "fdt support   $fdt"
@@ -6035,6 +6044,15 @@ case "$target_name" in
   fi
 fi
 esac
+if test "$hax" = "yes" ; then
+  if test "$target_softmmu" = "yes" ; then
+case "$target_name" in
+i386|x86_64)
+  echo "CONFIG_HAX=y" >> $config_target_mak
+;;
+esac
+  fi
+fi
 if test "$target_bigendian" = "yes" ; then
   echo "TARGET_WORDS_BIGENDIAN=y" >> $config_target_mak
 fi
diff --git a/cpus.c b/cpus.c
index fc78502..0e01791 100644
--- a/cpus.c
+++ b/cpus.c
@@ -35,6 +35,7 @@
 #include "sysemu/dma.h"
 #include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
+#include "sysemu/hax.h"
 #include "qmp-commands.h"
 #include "exec/exec-all.h"
 
@@ -1221,6 +1222,39 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 return NULL;
 }
 
+static void *qemu_hax_cpu_thread_fn(void *arg)
+{
+CPUState *cpu = arg;
+int r;
+qemu_thread_get_self(cpu->thread);
+qemu_mutex_lock(&qemu_global_mutex);
+
+cpu->thread_id = qemu_get_thread_id();
+cpu->created = true;
+cpu->halted = 0;
+current_cpu = cpu;
+
+hax_init_vcpu(cpu);
+qemu_cond_signal(&qemu_cpu_cond);
+
+while (1) {
+if (cpu_can_run(cpu)) {
+r = hax_smp_cpu_exec(cpu);
+if (r == EXCP_DEBUG) {
+cpu_handle_guest_debug(cpu);
+}
+}
+
+while (cpu_thread_is_idle(cpu)) {
+qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+}
+
+qemu_wait_io_event_common(cpu);
+}
+return NULL;
+}
+
+
 static void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1236,7 +1270,33 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 exit(1);
 }
 #else /* _WIN32 */
-abort();
+if (!qemu_cpu_is_self(cpu)) {
+CONTEXT context;
+
+if (SuspendThread(cpu->hThread) == (DWORD)(-1)) {
+fprintf(stderr, "qemu:%s: GetLastError:%lu\n", __func__,
+GetLastError());
+exit(1);
+}
+
+/* On multi-core systems, we are not sure that the thread is actually
+ * suspended until we can get the context.
+ */
+context.ContextFlags = CONTEXT_CONTROL;
+while (GetThreadContext(cpu->hThread, &context) != 0) {
+continue;
+}
+
+if (hax_enabled()) {
+cpu->exit_request = 1;
+}
+
+if (ResumeThread(cpu->hTh

[Qemu-devel] [PATCH v5 1/4] kvm: move cpu synchronization code

2016-12-19 Thread Vincent Palatin
Move the generic cpu_synchronize_ functions to the common hw_accel.h header,
in order to prepare for the addition of a second hardware accelerator.

Signed-off-by: Stefan Weil 
Signed-off-by: Vincent Palatin 
---
 cpus.c  |  1 +
 gdbstub.c   |  1 +
 hw/i386/kvm/apic.c  |  1 +
 hw/i386/kvmvapic.c  |  1 +
 hw/misc/vmport.c|  2 +-
 hw/ppc/pnv_xscom.c  |  2 +-
 hw/ppc/ppce500_spin.c   |  4 ++--
 hw/ppc/spapr.c  |  2 +-
 hw/ppc/spapr_hcall.c|  2 +-
 hw/s390x/s390-pci-inst.c|  1 +
 include/sysemu/hw_accel.h   | 39 +++
 include/sysemu/kvm.h| 23 ---
 monitor.c   |  2 +-
 qom/cpu.c   |  2 +-
 target-arm/cpu.c|  2 +-
 target-i386/helper.c|  1 +
 target-i386/kvm.c   |  1 +
 target-ppc/mmu-hash64.c |  2 +-
 target-ppc/translate_init.c |  2 +-
 target-s390x/gdbstub.c  |  1 +
 20 files changed, 58 insertions(+), 34 deletions(-)
 create mode 100644 include/sysemu/hw_accel.h

diff --git a/cpus.c b/cpus.c
index 5213351..fc78502 100644
--- a/cpus.c
+++ b/cpus.c
@@ -33,6 +33,7 @@
 #include "sysemu/block-backend.h"
 #include "exec/gdbstub.h"
 #include "sysemu/dma.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "qmp-commands.h"
 #include "exec/exec-all.h"
diff --git a/gdbstub.c b/gdbstub.c
index de62d26..de9b62b 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -32,6 +32,7 @@
 #define MAX_PACKET_LENGTH 4096
 
 #include "qemu/sockets.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "exec/semihost.h"
 #include "exec/exec-all.h"
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 01cbaa8..328f80c 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -14,6 +14,7 @@
 #include "cpu.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/pci/msi.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "target-i386/kvm_i386.h"
 
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index b30d1b9..2f767b6 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -14,6 +14,7 @@
 #include "exec/exec-all.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/cpus.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
diff --git a/hw/misc/vmport.c b/hw/misc/vmport.c
index c763811..be40930 100644
--- a/hw/misc/vmport.c
+++ b/hw/misc/vmport.c
@@ -25,7 +25,7 @@
 #include "hw/hw.h"
 #include "hw/isa/isa.h"
 #include "hw/i386/pc.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "hw/qdev.h"
 
 //#define VMPORT_DEBUG
diff --git a/hw/ppc/pnv_xscom.c b/hw/ppc/pnv_xscom.c
index 8da2718..cd5c2b8 100644
--- a/hw/ppc/pnv_xscom.c
+++ b/hw/ppc/pnv_xscom.c
@@ -20,7 +20,7 @@
 #include "qapi/error.h"
 #include "hw/hw.h"
 #include "qemu/log.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "target-ppc/cpu.h"
 #include "hw/sysbus.h"
 
diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
index cf958a9..eb219ab 100644
--- a/hw/ppc/ppce500_spin.c
+++ b/hw/ppc/ppce500_spin.c
@@ -29,9 +29,9 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
-#include "sysemu/sysemu.h"
 #include "hw/sysbus.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
+#include "sysemu/sysemu.h"
 #include "e500.h"
 
 #define MAX_CPUS 32
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 208ef7b..a642e66 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -36,7 +36,7 @@
 #include "sysemu/device_tree.h"
 #include "sysemu/block-backend.h"
 #include "sysemu/cpus.h"
-#include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "kvm_ppc.h"
 #include "migration/migration.h"
 #include "mmu-hash64.h"
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 9a9bedf..b2a8e48 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1,5 +1,6 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/sysemu.h"
 #include "qemu/log.h"
 #include "cpu.h"
@@ -9,7 +10,6 @@
 #include "mmu-hash64.h"
 #include "cpu-models.h"
 #include "trace.h"
-#include "sysemu/kvm.h"
 #include "kvm_ppc.h"
 #include "hw/ppc/spapr_ovec.h"
 
diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 0864d9b..4d0775c 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -18,6 +18,7 @@
 #include "s390-pci-bus.h"
 #include "exec/memory-internal.h"
 #include "qemu/error-report.h"
+#include "sysemu/hw_accel.h"
 
 /* #define DEBUG_S390PCI_INST */
 #ifdef DEBUG_S390PCI_INST
diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
new file mode 100644
index 000..03812cf
--- /dev/null
+++ b/include/sysemu/hw_accel.h
@@ -0,0 +1,39 @@
+/*
+ * QEMU Hardware accelertors support
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifnd

[Qemu-devel] [PATCH v5 2/4] target-i386: Add Intel HAX files

2016-12-19 Thread Vincent Palatin
That's a forward port of the core HAX interface code from the
emu-2.2-release branch in the external/qemu-android repository as used by
the Android emulator.

The original commit was "target-i386: Add Intel HAX to android emulator"
saying:
"""
  Backport of 2b3098ff27bab079caab9b46b58546b5036f5c0c
  from studio-1.4-dev into emu-master-dev

Intel HAX (harware acceleration) will enhance android emulator performance
in Windows and Mac OS X in the systems powered by Intel processors with
"Intel Hardware Accelerated Execution Manager" package installed when
user runs android emulator with Intel target.

Signed-off-by: David Chou 
"""

It has been modified to build and run along with the current code base.
The formatting has been fixed to go through scripts/checkpatch.pl,
and the DPRINTF macros have been updated to get the instanciations checked by
the compiler.

The FPU registers saving/restoring has been updated to match the current
QEMU registers layout.

The implementation has been simplified by doing the following modifications:
- removing the code for supporting the hardware without Unrestricted Guest (UG)
  mode (including all the code to fallback on TCG emulation).
- not including the Darwin support (which is not yet debugged/tested).
- simplifying the initialization by removing the leftovers from the Android
  specific code, then trimming down the remaining logic.
- removing the unused MemoryListener callbacks.

Signed-off-by: Vincent Palatin 
---
 hax-stub.c  |   39 ++
 include/sysemu/hax.h|   56 +++
 target-i386/hax-all.c   | 1155 +++
 target-i386/hax-i386.h  |   86 
 target-i386/hax-interface.h |  361 ++
 target-i386/hax-mem.c   |  271 ++
 target-i386/hax-windows.c   |  479 ++
 target-i386/hax-windows.h   |   89 
 8 files changed, 2536 insertions(+)
 create mode 100644 hax-stub.c
 create mode 100644 include/sysemu/hax.h
 create mode 100644 target-i386/hax-all.c
 create mode 100644 target-i386/hax-i386.h
 create mode 100644 target-i386/hax-interface.h
 create mode 100644 target-i386/hax-mem.c
 create mode 100644 target-i386/hax-windows.c
 create mode 100644 target-i386/hax-windows.h

diff --git a/hax-stub.c b/hax-stub.c
new file mode 100644
index 000..a532dba
--- /dev/null
+++ b/hax-stub.c
@@ -0,0 +1,39 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright (c) 2015, Intel Corporation
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "cpu.h"
+#include "sysemu/hax.h"
+
+int hax_sync_vcpus(void)
+{
+return 0;
+}
+
+int hax_populate_ram(uint64_t va, uint32_t size)
+{
+return -ENOSYS;
+}
+
+int hax_init_vcpu(CPUState *cpu)
+{
+return -ENOSYS;
+}
+
+int hax_smp_cpu_exec(CPUState *cpu)
+{
+return -ENOSYS;
+}
diff --git a/include/sysemu/hax.h b/include/sysemu/hax.h
new file mode 100644
index 000..51c8fd5
--- /dev/null
+++ b/include/sysemu/hax.h
@@ -0,0 +1,56 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *  Xin Xiaohui
+ *  Zhang Xiantao
+ *
+ * Copyright 2016 Google, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_HAX_H
+#define QEMU_HAX_H
+
+#include "config-host.h"
+#include "qemu-common.h"
+
+int hax_sync_vcpus(void);
+int hax_init_vcpu(CPUState *cpu);
+int hax_smp_cpu_exec(CPUState *cpu);
+int hax_populate_ram(uint64_t va, uint32_t size);
+
+void hax_cpu_synchronize_state(CPUState *cpu);
+void hax_cpu_synchronize_post_reset(CPUState *cpu);
+void hax_cpu_synchronize_post_init(CPUState *cpu);
+
+#ifdef CONFIG_HAX
+
+int hax_enabled(void);
+
+#include "hw/hw.h"
+#include "qemu/bitops.h"
+#include "exec/memory.h"
+int hax_vcpu_destroy(CPUState *cpu);
+void hax_raise_event(CPUState *cpu);
+void hax_reset_vcpu_state(void *opaque);
+#include "target-i386/hax-interface.h"
+#include "target-i386/hax-i386.h"
+
+#else /* CONFIG_HAX */
+
+#define hax_enabled() (0)
+
+#endif /* CONFIG_HAX */
+
+#endif /* QEMU_HAX_H */
diff --git a/target-i386/hax-all.c b/target-i386/hax-all.c
new file mode 100644
index 000..8892323
--- /dev/null
+++ b/target-i386/hax-all.c
@@ -0,0 +1,1155 @@
+/*
+ * QEMU HAX support
+ *
+ * Copyright IBM, Corp. 2008
+ *   Red Hat, Inc. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *  Glauber Costa 
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *  Xin Xiaohui
+ *  Zhang Xiantao
+ *
+ * This work is licensed under the te

[Qemu-devel] [PATCH v5 4/4] hax: add Darwin support

2016-12-19 Thread Vincent Palatin
Re-add the MacOSX/Darwin support:
Use the Intel HAX is kernel-based hardware acceleration module
(similar to KVM on Linux).

Based on the original "target-i386: Add Intel HAX to android emulator" patch
from David Chou  from  emu-2.2-release branch in
the external/qemu-android repository.

Signed-off-by: Vincent Palatin 
---
 cpus.c|   5 +
 target-i386/Makefile.objs |   3 +
 target-i386/hax-darwin.c  | 316 ++
 target-i386/hax-darwin.h  |  63 +
 target-i386/hax-i386.h|   8 ++
 5 files changed, 395 insertions(+)
 create mode 100644 target-i386/hax-darwin.c
 create mode 100644 target-i386/hax-darwin.h

diff --git a/cpus.c b/cpus.c
index 0e01791..b8db313 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1264,6 +1264,11 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 return;
 }
 cpu->thread_kicked = true;
+#ifdef CONFIG_DARWIN
+if (hax_enabled()) {
+cpu->exit_request = 1;
+}
+#endif
 err = pthread_kill(cpu->thread->thread, SIG_IPI);
 if (err) {
 fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index acbe7b0..4fcb7f3 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -9,3 +9,6 @@ obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 ifdef CONFIG_WIN32
 obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-windows.o
 endif
+ifdef CONFIG_DARWIN
+obj-$(CONFIG_HAX) += hax-all.o hax-mem.o hax-darwin.o
+endif
diff --git a/target-i386/hax-darwin.c b/target-i386/hax-darwin.c
new file mode 100644
index 000..240d8d3
--- /dev/null
+++ b/target-i386/hax-darwin.c
@@ -0,0 +1,316 @@
+/*
+ * QEMU HAXM support
+ *
+ * Copyright (c) 2011 Intel Corporation
+ *  Written by:
+ *  Jiang Yunhong
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/* HAX module interface - darwin version */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "target-i386/hax-i386.h"
+
+hax_fd hax_mod_open(void)
+{
+int fd = open("/dev/HAX", O_RDWR);
+if (fd == -1) {
+fprintf(stderr, "Failed to open the hax module\n");
+}
+
+fcntl(fd, F_SETFD, FD_CLOEXEC);
+
+return fd;
+}
+
+int hax_populate_ram(uint64_t va, uint32_t size)
+{
+int ret;
+struct hax_alloc_ram_info info;
+
+if (!hax_global.vm || !hax_global.vm->fd) {
+fprintf(stderr, "Allocate memory before vm create?\n");
+return -EINVAL;
+}
+
+info.size = size;
+info.va = va;
+ret = ioctl(hax_global.vm->fd, HAX_VM_IOCTL_ALLOC_RAM, &info);
+if (ret < 0) {
+fprintf(stderr, "Failed to allocate %x memory\n", size);
+return ret;
+}
+return 0;
+}
+
+int hax_set_ram(uint64_t start_pa, uint32_t size, uint64_t host_va, int flags)
+{
+struct hax_set_ram_info info;
+int ret;
+
+info.pa_start = start_pa;
+info.size = size;
+info.va = host_va;
+info.flags = (uint8_t) flags;
+
+ret = ioctl(hax_global.vm->fd, HAX_VM_IOCTL_SET_RAM, &info);
+if (ret < 0) {
+return -errno;
+}
+return 0;
+}
+
+int hax_capability(struct hax_state *hax, struct hax_capabilityinfo *cap)
+{
+int ret;
+
+ret = ioctl(hax->fd, HAX_IOCTL_CAPABILITY, cap);
+if (ret == -1) {
+fprintf(stderr, "Failed to get HAX capability\n");
+return -errno;
+}
+
+return 0;
+}
+
+int hax_mod_version(struct hax_state *hax, struct hax_module_version *version)
+{
+int ret;
+
+ret = ioctl(hax->fd, HAX_IOCTL_VERSION, version);
+if (ret == -1) {
+fprintf(stderr, "Failed to get HAX version\n");
+return -errno;
+}
+
+return 0;
+}
+
+static char *hax_vm_devfs_string(int vm_id)
+{
+char *name;
+
+if (vm_id > MAX_VM_ID) {
+fprintf(stderr, "Too big VM id\n");
+return NULL;
+}
+
+#define HAX_VM_DEVFS "/dev/hax_vm/vmxx"
+name = g_strdup(HAX_VM_DEVFS);
+if (!name) {
+return NULL;
+}
+
+snprintf(name, sizeof HAX_VM_DEVFS, "/dev/hax_vm/vm%02d", vm_id);
+return name;
+}
+
+static char *hax_vcpu_devfs_string(int vm_id, int vcpu_id)
+{
+char *name;
+
+if (vm_id > MAX_VM_ID || vcpu_id > MAX_VCPU_ID) {
+fprintf(stderr, "Too big vm id %x or vcpu id %x\n", vm_id, vcpu_id);
+return NULL;
+}
+
+#define HAX_VCPU_DEVFS "/dev/hax_vmxx/vcpuxx"
+name = g_strdup(HAX_VCPU_DEVFS);
+if (!name) {
+return NULL;
+}
+
+snprintf(name, sizeof HAX_VCPU_DEVFS, "/dev/hax_vm%02d/vcpu%02d",
+ vm_id, vcpu_id);
+return name;
+}
+
+int hax_host_create_vm(struct hax_state *hax, int *vmid)
+{
+int ret;
+int vm_id = 0;
+
+if (hax_invalid_fd(hax->fd)) {
+return -EINVAL;
+}
+
+if (hax->vm) {
+return 0;
+}
+
+ret = ioctl(hax->fd, HAX_IOCTL_CREATE_VM, &vm_id);
+*vmid = vm_id;
+return ret;
+}
+
+hax_fd hax

Re: [Qemu-devel] Is block_save_iterate() dead code?

2016-12-19 Thread Thomas Huth
On 16.12.2016 18:03, Dr. David Alan Gilbert wrote:
> * Thomas Huth (th...@redhat.com) wrote:
>> On 18.11.2016 09:13, Thomas Huth wrote:
>>> On 17.11.2016 04:45, David Gibson wrote:
 On Mon, Nov 14, 2016 at 07:34:59PM +0100, Juan Quintela wrote:
> Thomas Huth  wrote:
>> qemu_savevm_state_iterate() expects the iterators to return 1
>> when they are done, and 0 if there is still something left to do.
>> However, ram_save_iterate() does not obey this rule and returns
>> the number of saved pages instead. This causes a fatal hang with
>> ppc64 guests when you run QEMU like this (also works with TCG):
>>
>>  qemu-img create -f qcow2  /tmp/test.qcow2 1M
>>  qemu-system-ppc64 -nographic -nodefaults -m 256 \
>>-hda /tmp/test.qcow2 -serial mon:stdio
>>
>> ... then switch to the monitor by pressing CTRL-a c and try to
>> save a snapshot with "savevm test1" for example.
>>
>> After the first iteration, ram_save_iterate() always returns 0 here,
>> so that qemu_savevm_state_iterate() hangs in an endless loop and you
>> can only "kill -9" the QEMU process.
>> Fix it by using proper return values in ram_save_iterate().
>>
>> Signed-off-by: Thomas Huth 
>
> Reviewed-by: Juan Quintela 
>
> Applied.
>
> I don't know how we broked this so much.

 Note that block save iterate has the same bug...
>>>
>>> I think you're right. Care to send a patch?
>>
>> Looking at this issue again ... could it be that block_save_iterate() is
>> currently just dead code?
>> As far as I can see, the ->save_live_iterate() handlers are only called
>> from qemu_savevm_state_iterate(), right? And qemu_savevm_state_iterate()
>> only calls the handlers if se->ops->is_active(se->opaque) returns true.
>> But block_is_active() seems to only return 0 during savevm, most likely
>> because qemu_savevm_state() explicitly sets the "blk" and "shared"
>> MigrationParams to zero.
>> So to me, it looks like we could also just remove block_save_iterate()
>> completely ... or did I miss something here?
> 
> Doesn't it get called by migrate -b ?

Ah, right, yes, I somehow missed that ... I probably shouldn't do such
experiments at the end of Friday afternoon ;-)

OK, so it seems that
- block_save_iterate() is not called during savevm at all
  (and thus the bad return code does not matter here)
- migrate -b runs block_save_iterate() but the return code is ignored in
  migration_thread()

So we do not have a real problem here, but I think we should still clean
up the return code of block_save_iterate() to be on the safe side for
the future...

 Thomas




Re: [Qemu-devel] [PATCH RFC v2 0/4] block/qapi: refactor and optimize the qmp_query_blockstats()

2016-12-19 Thread Fam Zheng
On Mon, 12/19 15:02, Stefan Hajnoczi wrote:
> On Mon, Dec 19, 2016 at 04:51:22PM +0800, Dou Liyang wrote:
> > These patches aim to refactor the qmp_query_blockstats() and
> > improve the performance by reducing the running time of it.
> > 
> > qmp_query_blockstats() is used to monitor the blockstats, it
> > querys all the graph_bdrv_states or monitor_block_backends.
> > 
> > There are the two jobs:
> > 
> > 1 For the performance:
> > 
> > 1.1 the time it takes(ns) in each time:
> > the disk numbers | 10| 500
> > -
> > before these patches | 19429 | 667722 
> > after these patches  | 17516 | 557044
> > 
> > 1.2 the I/O performance is degraded(%) during the monitor:
> > 
> > the disk numbers | 10| 500
> > -
> > before these patches | 1.3   | 14.2
> > after these patches  | 0.8   | 9.1
> 
> Do you know what is consuming the remaining 9.1%?
> 
> I'm surprised to see such a high performance impact caused by a QMP
> command.

If it's "performance is 9.1% worse only during the 557044 ns when the QMP
command is being processed", it's probably becaues the main loop is stalled a
bit, and it's not a big problem. I'd be very surprised if the degradation is
more longer than that.

Fam

> 
> Please post your QEMU command-line.





Re: [Qemu-devel] [RESEND Patch v1 00/37] Implementation of vhost-pci for inter-vm commucation

2016-12-19 Thread Marc-André Lureau
Hi Wei,

On Mon, Dec 19, 2016 at 7:00 AM Wei Wang  wrote:

> This patch series implements vhost-pci, which is a point-to-point based
> inter-vm
> communication solution. The QEMU side implementation includes the
> vhost-user
> extension, vhost-pci device emulation and management. The current device
> part
> implementation is based on virtio 1.0, but it can be easily upgraded to
> support
> the upcoming virtio 1.1.
>
> The current QEMU implementation supports the polling mode driver on both
> sides
> to receive packets. More features, such as interrupt support, live
> migration
> support, protected memory accesses will be added later.
>


I highly appreciate the effort you put in splitting the patch series and
commenting each, although some are probably superfluous. Before going into
details, I suppose you have kernel side bits too. I'd suggest before
sending individual patches for review, that you send a RFC with links to
the various git trees and instructions to test the proposed device. This
would really help things and potentially bring more people for testing and
comments (think about libvirt side etc). Even better would be to have some
tests (with qtest).

High level question, why do you need to create device dynamically? I would
rather have the following qemu setup:

-chardev socket,id=chr,path=.. -device vhost-pci-net,chardev=chr

This would also avoid some global state (vp_slave etc)

Regarding the protocol changes to support slave request: I tried to
explained that before, apprently I didn't manage to. It is not enough to
support bidirectionnal communications to simply add chardev frontend
handlers. Until now, qemu's code expects an immediate reply after a
request. With your protocol change, it must now also consider that the
slave may send a request before the master request reaches the slave
handler. So all req/reply write()/read() must now handle in between
requests from slave to be race-free (master can read back a request when it
expects a reply). That's not really trivial change, that's why I proposed
to have a secondary channel for slave->master communications in the past.
Not only would this be easier to implement, but the protocol documentation
would also be simpler, the cost is simply 1 additional unix socket (that I
proposed to setup and pass with ancilliary data on the main channel).

Another question, what are vpnet->rqs used for?


> RESEND change: Fixed some coding style issue
>

there are some spelling to be fixed, and perhaps some variables/fields to
rename  (asyn -> async, crq -> ctrlq?) That can be addressed in a detailed
review.

>
> Wei Wang (37):
>   vhost-pci-net: the fundamental vhost-pci-net device emulation
>   vhost-pci-net: the fundamental implementation of vhost-pci-net-pci
>   vhost-user: share the vhost-user protocol related structures
>   vl: add the vhost-pci-slave command line option
>   vhost-pci-slave: start the implementation of vhost-pci-slave
>   vhost-pci-slave: set up the fundamental handlers for the server socket
>   vhost-pci-slave/msg: VHOST_USER_GET_FEATURES
>   vhost-pci-slave/msg: VHOST_USER_SET_FEATURES
>   vhost-pci-slave/msg: VHOST_USER_GET_PROTOCOL_FEATURES
>   vhost-pci-slave/msg: VHOST_USER_SET_PROTOCOL_FEATURES
>   vhost-user/msg: VHOST_USER_PROTOCOL_F_SET_DEVICE_ID
>   vhost-pci-slave/msg: VHOST_USER_SET_DEVICE_ID
>   vhost-pci-slave/msg: VHOST_USER_GET_QUEUE_NUM
>   vhost-pci-slave/msg: VHOST_USER_SET_OWNER
>   vhost-pci-slave/msg: VHOST_USER_SET_MEM_TABLE
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_NUM
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_BASE
>   vhost-user: send guest physical address of virtqueues to the slave
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_ADDR
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_KICK
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_CALL
>   vhost-pci-slave/msg: VHOST_USER_SET_VRING_ENABLE
>   vhost-pci-slave/msg: VHOST_USER_SET_LOG_BASE
>   vhost-pci-slave/msg: VHOST_USER_SET_LOG_FD
>   vhost-pci-slave/msg: VHOST_USER_SEND_RARP
>   vhost-pci-slave/msg: VHOST_USER_GET_VRING_BASE
>   vhost-pci-net: pass the info collected by vp_slave to the device
>   vhost-pci-net: pass the mem and vring info to the driver
>   vhost-pci-slave/msg: VHOST_USER_SET_VHOST_PCI (start)
>   vhost-pci-slave/msg: VHOST_USER_SET_VHOST_PCI (stop)
>   vhost-user/msg: send VHOST_USER_SET_VHOST_PCI (start/stop)
>   vhost-user: add asynchronous read for the vhost-user master
>   vhost-pci-net: send the negotiated feature bits to the master
>   vhost-pci-slave: add "peer_reset"
>   vhost-pci-net: start the vhost-pci-net device
>   vhost-user/msg: handling VHOST_USER_SET_FEATURES
>   vl: enable vhost-pci-slave
>
>  hw/net/Makefile.objs   |   2 +-
>  hw/net/vhost-pci-net.c | 268 
>  hw/net/vhost_net.c |  39 ++
>  hw/virtio/Makefile.objs|   1 +
>  hw/virtio/vhost-pci-slave.c| 570
> +
>  

[Qemu-devel] [PATCH v4 0/6] POWER9 TCG enablements - BCD functions - final part

2016-12-19 Thread Jose Ricardo Ziviani
v4:
 - improves functions to behave exactly like the target

v3:
 - moves shift functions to host-utils.c and added config_int128 guard
 - changes Makefile to always compile host-utils.c
 - redesigns bcd[u]trunc to use bitwise operations
 - removes "target-ppc: Implement bcd_is_valid function" (merged)

v2:
 - bcd[s,sr,us] uses 1 byte for shifting instead of 4 bytes
 - left/right functions in host-utils are out of CONFIG_INT128
 - fixes overflowing issue in left shift and added a testcase

This serie contains 5 new instructions for POWER9 ISA3.0, left/right shifts for 
unsigned quadwords and a small improvement to check whether a bcd value is 
valid or not.

bcds.: Decimal signed shift
bcdus.: Decimal unsigned shift
bcdsr.: Decimal shift and round
bcdtrunc.: Decimal signed trucate
bcdutrunc.: Decimal unsigned truncate

Jose Ricardo Ziviani (6):
  target-ppc: Implement unsigned quadword left/right shift and unit
tests
  target-ppc: Implement bcds. instruction
  target-ppc: Implement bcdus. instruction
  target-ppc: Implement bcdsr. instruction
  target-ppc: Implement bcdtrunc. instruction
  target-ppc: Implement bcdutrunc. instruction

 include/qemu/host-utils.h   |   3 +
 target-ppc/helper.h |   5 +
 target-ppc/int_helper.c | 217 
 target-ppc/translate/vmx-impl.inc.c |  16 +++
 target-ppc/translate/vmx-ops.inc.c  |  13 ++-
 tests/Makefile.include  |   5 +-
 tests/test-shift128.c   |  98 
 util/Makefile.objs  |   2 +-
 util/host-utils.c   |  44 
 9 files changed, 396 insertions(+), 7 deletions(-)
 create mode 100644 tests/test-shift128.c

-- 
2.7.4




[Qemu-devel] [PATCH v4 4/6] target-ppc: Implement bcdsr. instruction

2016-12-19 Thread Jose Ricardo Ziviani
bcdsr.: Decimal shift and round. This instruction works like bcds.
however, when performing right shift, 1 will be added to the
result if the last digit was >= 5.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 48 +
 target-ppc/translate/vmx-impl.inc.c |  1 +
 target-ppc/translate/vmx-ops.inc.c  |  2 ++
 4 files changed, 52 insertions(+)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 99f9a49..6f3991d 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -400,6 +400,7 @@ DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
 DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
 DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdus, i32, avr, avr, avr, i32)
+DEF_HELPER_4(bcdsr, i32, avr, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 15d3fc7..aa3e157 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -3124,6 +3124,54 @@ uint32_t helper_bcdus(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 return cr;
 }
 
+uint32_t helper_bcdsr(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
+{
+int cr;
+int unused = 0;
+int invalid = 0;
+bool ox_flag = false;
+int sgnb = bcd_get_sgn(b);
+ppc_avr_t ret = *b;
+ret.u64[LO_IDX] &= ~0xf;
+
+#if defined(HOST_WORDS_BIGENDIAN)
+int i = a->s8[7];
+ppc_avr_t bcd_one = { .u64 = { 0, 0x10 } };
+#else
+int i = a->s8[8];
+ppc_avr_t bcd_one = { .u64 = { 0x10, 0 } };
+#endif
+
+if (bcd_is_valid(b) == false) {
+return CRF_SO;
+}
+
+if (unlikely(i > 31)) {
+i = 31;
+} else if (unlikely(i < -31)) {
+i = -31;
+}
+
+if (i > 0) {
+ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+} else {
+urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+
+if (bcd_get_digit(&ret, 0, &invalid) >= 5) {
+bcd_add_mag(&ret, &ret, &bcd_one, &invalid, &unused);
+}
+}
+bcd_put_digit(&ret, bcd_preferred_sgn(sgnb, ps), 0);
+
+cr = bcd_cmp_zero(&ret);
+if (unlikely(ox_flag)) {
+cr |= CRF_SO;
+}
+*r = ret;
+
+return cr;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index fc54881..451abb5 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -1018,6 +1018,7 @@ GEN_BCD2(bcdsetsgn)
 GEN_BCD(bcdcpsgn);
 GEN_BCD(bcds);
 GEN_BCD(bcdus);
+GEN_BCD(bcdsr);
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.inc.c 
b/target-ppc/translate/vmx-ops.inc.c
index cdd3abe..fa9c996 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -132,6 +132,8 @@ GEN_HANDLER_E_2(vprtybd, 0x4, 0x1, 0x18, 9, 0, PPC_NONE, 
PPC2_ISA300),
 GEN_HANDLER_E_2(vprtybq, 0x4, 0x1, 0x18, 10, 0, PPC_NONE, PPC2_ISA300),
 
 GEN_VXFORM_DUAL(vsubcuw, xpnd04_1, 0, 22, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_300(bcdsr, 0, 23),
+GEN_VXFORM_300(bcdsr, 0, 31),
 GEN_VXFORM_DUAL(vaddubs, vmul10uq, 0, 8, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_DUAL(vadduhs, vmul10euq, 0, 9, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vadduws, 0, 10),
-- 
2.7.4




[Qemu-devel] [PATCH v4 1/6] target-ppc: Implement unsigned quadword left/right shift and unit tests

2016-12-19 Thread Jose Ricardo Ziviani
This commit implements functions to right and left shifts and the
unittest for them. Such functions is needed due to instructions
that requires them.

Today, there is already a right shift implementation in int128.h
but it's designed for signed numbers.

Signed-off-by: Jose Ricardo Ziviani 
---
 include/qemu/host-utils.h |  3 ++
 tests/Makefile.include|  5 ++-
 tests/test-shift128.c | 98 +++
 util/Makefile.objs|  2 +-
 util/host-utils.c | 44 +
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 tests/test-shift128.c

diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h
index 46187bb..e87de19 100644
--- a/include/qemu/host-utils.h
+++ b/include/qemu/host-utils.h
@@ -516,4 +516,7 @@ static inline uint64_t pow2ceil(uint64_t value)
 return 1ULL << (64 - nlz);
 }
 
+void urshift(uint64_t *plow, uint64_t *phigh, uint32_t shift);
+void ulshift(uint64_t *plow, uint64_t *phigh, uint32_t shift, bool *overflow);
+
 #endif
diff --git a/tests/Makefile.include b/tests/Makefile.include
index b574964..8ccaa3e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -65,6 +65,8 @@ check-unit-$(CONFIG_POSIX) += tests/test-vmstate$(EXESUF)
 endif
 check-unit-y += tests/test-cutils$(EXESUF)
 gcov-files-test-cutils-y += util/cutils.c
+check-unit-y += tests/test-shift128$(EXESUF)
+gcov-files-test-shift128-y = util/host-utils.c
 check-unit-y += tests/test-mul64$(EXESUF)
 gcov-files-test-mul64-y = util/host-utils.c
 check-unit-y += tests/test-int128$(EXESUF)
@@ -464,7 +466,7 @@ test-obj-y = tests/check-qint.o tests/check-qstring.o 
tests/check-qdict.o \
tests/test-x86-cpuid.o tests/test-mul64.o tests/test-int128.o \
tests/test-opts-visitor.o tests/test-qmp-event.o \
tests/rcutorture.o tests/test-rcu-list.o \
-   tests/test-qdist.o \
+   tests/test-qdist.o tests/test-shift128.o \
tests/test-qht.o tests/qht-bench.o tests/test-qht-par.o \
tests/atomic_add-bench.o
 
@@ -572,6 +574,7 @@ tests/test-qmp-commands$(EXESUF): tests/test-qmp-commands.o 
tests/test-qmp-marsh
 tests/test-visitor-serialization$(EXESUF): tests/test-visitor-serialization.o 
$(test-qapi-obj-y)
 tests/test-opts-visitor$(EXESUF): tests/test-opts-visitor.o $(test-qapi-obj-y)
 
+tests/test-shift128$(EXESUF): tests/test-shift128.o $(test-util-obj-y)
 tests/test-mul64$(EXESUF): tests/test-mul64.o $(test-util-obj-y)
 tests/test-bitops$(EXESUF): tests/test-bitops.o $(test-util-obj-y)
 tests/test-crypto-hash$(EXESUF): tests/test-crypto-hash.o $(test-crypto-obj-y)
diff --git a/tests/test-shift128.c b/tests/test-shift128.c
new file mode 100644
index 000..52be6a2
--- /dev/null
+++ b/tests/test-shift128.c
@@ -0,0 +1,98 @@
+/*
+ * Test unsigned left and right shift
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+
+typedef struct {
+uint64_t low;
+uint64_t high;
+uint64_t rlow;
+uint64_t rhigh;
+int32_t shift;
+bool overflow;
+} test_data;
+
+static const test_data test_ltable[] = {
+{ 1223ULL, 0, 1223ULL,   0, 0, false },
+{ 1ULL,0, 2ULL,   0, 1, false },
+{ 1ULL,0, 4ULL,   0, 2, false },
+{ 1ULL,0, 16ULL,  0, 4, false },
+{ 1ULL,0, 256ULL, 0, 8, false },
+{ 1ULL,0, 65536ULL, 0, 16, false },
+{ 1ULL,0, 2147483648ULL, 0, 31, false },
+{ 1ULL,0, 35184372088832ULL, 0, 45, false },
+{ 1ULL,0, 1152921504606846976ULL, 0, 60, false },
+{ 1ULL,0, 0, 1ULL, 64, false },
+{ 1ULL,0, 0, 65536ULL, 80, false },
+{ 1ULL,0, 0, 9223372036854775808ULL, 127, false },
+{ 0ULL,1, 0, 0, 64, true },
+{ 0xULL, 0xULL,
+0x8000ULL, 0x9888ULL, 60, true },
+{ 0xULL, 0xULL,
+0, 0xULL, 64, true },
+{ 0x8ULL, 0, 0, 0x8ULL, 64, false },
+{ 0x8ULL, 0, 0, 0x8000ULL, 124, false },
+{ 0x1ULL, 0, 0, 0x4000ULL, 126, false },
+{ 0x1ULL, 0, 0, 0x8000ULL, 127, false },
+{ 0x1ULL, 0, 0x1ULL, 0, 128, true },
+{ 0, 0, 0ULL, 0, 200, false },
+};
+
+static const test_data test_rtable[] = {
+{ 1223ULL, 0, 1223ULL,   0, 0, false },
+{ 9223372036854775808ULL, 9223372036854775808ULL,
+2147483648L, 2147483648ULL, 32, false },
+{ 9223372036854775808ULL, 9223372036854775808ULL,
+9223372036854775808ULL, 0, 64, false },
+{ 9223372036854775808ULL, 9223372036854775808ULL,
+36028797018963968ULL, 0, 72, false },
+{ 9223372036854775808ULL, 9223372036854775808ULL,
+1ULL, 0, 127, false },
+{ 9223372036854775808ULL, 0, 4611686018427387904ULL, 0, 1, false },
+{ 9223372036854775808ULL, 0, 2305843009213693952ULL, 0, 2, false },
+{ 922

[Qemu-devel] [PATCH v4 2/6] target-ppc: Implement bcds. instruction

2016-12-19 Thread Jose Ricardo Ziviani
bcds.: Decimal shift. Given two registers vra and vrb, this instruction
shift the vrb value by vra bits into the result register.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 40 +
 target-ppc/translate/vmx-impl.inc.c |  3 +++
 target-ppc/translate/vmx-ops.inc.c  |  3 ++-
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 4707db4..1a49b40 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -398,6 +398,7 @@ DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
 DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
 DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
 DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
+DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 7989b1f..35e14dc 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -3043,6 +3043,46 @@ uint32_t helper_bcdsetsgn(ppc_avr_t *r, ppc_avr_t *b, 
uint32_t ps)
 return bcd_cmp_zero(r);
 }
 
+uint32_t helper_bcds(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
+{
+int cr;
+#if defined(HOST_WORDS_BIGENDIAN)
+int i = a->s8[7];
+#else
+int i = a->s8[8];
+#endif
+bool ox_flag = false;
+int sgnb = bcd_get_sgn(b);
+ppc_avr_t ret = *b;
+ret.u64[LO_IDX] &= ~0xf;
+
+if (bcd_is_valid(b) == false) {
+return CRF_SO;
+}
+
+if (unlikely(i > 31)) {
+i = 31;
+} else if (unlikely(i < -31)) {
+i = -31;
+}
+
+if (i > 0) {
+ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+} else {
+urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+}
+bcd_put_digit(&ret, bcd_preferred_sgn(sgnb, ps), 0);
+
+*r = ret;
+
+cr = bcd_cmp_zero(r);
+if (unlikely(ox_flag)) {
+cr |= CRF_SO;
+}
+
+return cr;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index e8e527f..84ebb7e 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -1016,6 +1016,7 @@ GEN_BCD2(bcdcfsq)
 GEN_BCD2(bcdctsq)
 GEN_BCD2(bcdsetsgn)
 GEN_BCD(bcdcpsgn);
+GEN_BCD(bcds);
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
@@ -1090,6 +1091,8 @@ GEN_VXFORM_DUAL(vsubuhs, PPC_ALTIVEC, PPC_NONE, \
 bcdsub, PPC_NONE, PPC2_ALTIVEC_207)
 GEN_VXFORM_DUAL(vaddshs, PPC_ALTIVEC, PPC_NONE, \
 bcdcpsgn, PPC_NONE, PPC2_ISA300)
+GEN_VXFORM_DUAL(vsubudm, PPC2_ALTIVEC_207, PPC_NONE, \
+bcds, PPC_NONE, PPC2_ISA300)
 
 static void gen_vsbox(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.inc.c 
b/target-ppc/translate/vmx-ops.inc.c
index 57dce6e..7b4b009 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -62,7 +62,8 @@ GEN_VXFORM_207(vaddudm, 0, 3),
 GEN_VXFORM_DUAL(vsububm, bcdadd, 0, 16, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_DUAL(vsubuhm, bcdsub, 0, 17, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vsubuwm, 0, 18),
-GEN_VXFORM_207(vsubudm, 0, 19),
+GEN_VXFORM_DUAL(vsubudm, bcds, 0, 19, PPC2_ALTIVEC_207, PPC2_ISA300),
+GEN_VXFORM_300(bcds, 0, 27),
 GEN_VXFORM(vmaxub, 1, 0),
 GEN_VXFORM(vmaxuh, 1, 1),
 GEN_VXFORM(vmaxuw, 1, 2),
-- 
2.7.4




[Qemu-devel] [PATCH v4 3/6] target-ppc: Implement bcdus. instruction

2016-12-19 Thread Jose Ricardo Ziviani
bcdus.: Decimal unsigned shift. This instruction works like bcds. but
considers only unsigned BCDs (no sign in least meaning 4 bits).

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 41 +
 target-ppc/translate/vmx-impl.inc.c |  3 +++
 target-ppc/translate/vmx-ops.inc.c  |  2 +-
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1a49b40..99f9a49 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -399,6 +399,7 @@ DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
 DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
 DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
 DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
+DEF_HELPER_4(bcdus, i32, avr, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 35e14dc..15d3fc7 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -3083,6 +3083,47 @@ uint32_t helper_bcds(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 return cr;
 }
 
+uint32_t helper_bcdus(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
+{
+int cr;
+int i;
+int invalid = 0;
+bool ox_flag = false;
+ppc_avr_t ret = *b;
+
+for (i = 0; i < 32; i++) {
+bcd_get_digit(b, i, &invalid);
+
+if (unlikely(invalid)) {
+return CRF_SO;
+}
+}
+
+#if defined(HOST_WORDS_BIGENDIAN)
+i = a->s8[7];
+#else
+i = a->s8[8];
+#endif
+if (i >= 32) {
+ox_flag = true;
+ret.u64[LO_IDX] = ret.u64[HI_IDX] = 0;
+} else if (i <= -32) {
+ret.u64[LO_IDX] = ret.u64[HI_IDX] = 0;
+} else if (i > 0) {
+ulshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], i * 4, &ox_flag);
+} else {
+urshift(&ret.u64[LO_IDX], &ret.u64[HI_IDX], -i * 4);
+}
+*r = ret;
+
+cr = bcd_cmp_zero(r);
+if (unlikely(ox_flag)) {
+cr |= CRF_SO;
+}
+
+return cr;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index 84ebb7e..fc54881 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -1017,6 +1017,7 @@ GEN_BCD2(bcdctsq)
 GEN_BCD2(bcdsetsgn)
 GEN_BCD(bcdcpsgn);
 GEN_BCD(bcds);
+GEN_BCD(bcdus);
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
@@ -1093,6 +1094,8 @@ GEN_VXFORM_DUAL(vaddshs, PPC_ALTIVEC, PPC_NONE, \
 bcdcpsgn, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_DUAL(vsubudm, PPC2_ALTIVEC_207, PPC_NONE, \
 bcds, PPC_NONE, PPC2_ISA300)
+GEN_VXFORM_DUAL(vsubuwm, PPC_ALTIVEC, PPC_NONE, \
+bcdus, PPC_NONE, PPC2_ISA300)
 
 static void gen_vsbox(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.inc.c 
b/target-ppc/translate/vmx-ops.inc.c
index 7b4b009..cdd3abe 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -61,7 +61,7 @@ GEN_VXFORM(vadduwm, 0, 2),
 GEN_VXFORM_207(vaddudm, 0, 3),
 GEN_VXFORM_DUAL(vsububm, bcdadd, 0, 16, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_DUAL(vsubuhm, bcdsub, 0, 17, PPC_ALTIVEC, PPC_NONE),
-GEN_VXFORM(vsubuwm, 0, 18),
+GEN_VXFORM_DUAL(vsubuwm, bcdus, 0, 18, PPC_ALTIVEC, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubudm, bcds, 0, 19, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_300(bcds, 0, 27),
 GEN_VXFORM(vmaxub, 1, 0),
-- 
2.7.4




[Qemu-devel] [PATCH v4 5/6] target-ppc: Implement bcdtrunc. instruction

2016-12-19 Thread Jose Ricardo Ziviani
bcdtrunc.: Decimal integer truncate. Given a BCD number in vrb and the
number of bytes to truncate in vra, the return register will have vrb
with such bits truncated.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 37 +
 target-ppc/translate/vmx-impl.inc.c |  5 +
 target-ppc/translate/vmx-ops.inc.c  |  4 ++--
 4 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 6f3991d..7f053d8 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -401,6 +401,7 @@ DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
 DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdus, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdsr, i32, avr, avr, avr, i32)
+DEF_HELPER_4(bcdtrunc, i32, avr, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index aa3e157..edcaa12 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -3172,6 +3172,43 @@ uint32_t helper_bcdsr(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 return cr;
 }
 
+uint32_t helper_bcdtrunc(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t ps)
+{
+uint64_t mask;
+uint32_t ox_flag = 0;
+#if defined(HOST_WORDS_BIGENDIAN)
+int i = a->s16[3] + 1;
+#else
+int i = a->s16[4] + 1;
+#endif
+ppc_avr_t ret = *b;
+
+if (bcd_is_valid(b) == false) {
+return CRF_SO;
+}
+
+if (i > 16 && i < 32) {
+if (ret.u64[HI_IDX] >> (i * 4 - 64)) {
+ox_flag = CRF_SO;
+}
+
+mask = (uint64_t)-1 >> (128 - i * 4);
+ret.u64[HI_IDX] &= mask;
+} else if (i >= 0 && i <= 16) {
+if (ret.u64[HI_IDX] || (i < 16 && ret.u64[LO_IDX] >> (i * 4))) {
+ox_flag = CRF_SO;
+}
+
+mask = (uint64_t)-1 >> (64 - i * 4);
+ret.u64[LO_IDX] &= mask;
+ret.u64[HI_IDX] = 0;
+}
+bcd_put_digit(&ret, bcd_preferred_sgn(bcd_get_sgn(b), ps), 0);
+*r = ret;
+
+return bcd_cmp_zero(&ret) | ox_flag;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index 451abb5..1683f42 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -1019,6 +1019,7 @@ GEN_BCD(bcdcpsgn);
 GEN_BCD(bcds);
 GEN_BCD(bcdus);
 GEN_BCD(bcdsr);
+GEN_BCD(bcdtrunc);
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
@@ -1097,6 +1098,10 @@ GEN_VXFORM_DUAL(vsubudm, PPC2_ALTIVEC_207, PPC_NONE, \
 bcds, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_DUAL(vsubuwm, PPC_ALTIVEC, PPC_NONE, \
 bcdus, PPC_NONE, PPC2_ISA300)
+GEN_VXFORM_DUAL(vsubsbs, PPC_ALTIVEC, PPC_NONE, \
+bcdtrunc, PPC_NONE, PPC2_ISA300)
+GEN_VXFORM_DUAL(vsubuqm, PPC2_ALTIVEC_207, PPC_NONE, \
+bcdtrunc, PPC_NONE, PPC2_ISA300)
 
 static void gen_vsbox(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.inc.c 
b/target-ppc/translate/vmx-ops.inc.c
index fa9c996..e6167a4 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -143,14 +143,14 @@ GEN_VXFORM(vaddsws, 0, 14),
 GEN_VXFORM_DUAL(vsububs, bcdadd, 0, 24, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_DUAL(vsubuhs, bcdsub, 0, 25, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM(vsubuws, 0, 26),
-GEN_VXFORM(vsubsbs, 0, 28),
+GEN_VXFORM_DUAL(vsubsbs, bcdtrunc, 0, 28, PPC_NONE, PPC2_ISA300),
 GEN_VXFORM(vsubshs, 0, 29),
 GEN_VXFORM_DUAL(vsubsws, xpnd04_2, 0, 30, PPC_ALTIVEC, PPC_NONE),
 GEN_VXFORM_207(vadduqm, 0, 4),
 GEN_VXFORM_207(vaddcuq, 0, 5),
 GEN_VXFORM_DUAL(vaddeuqm, vaddecuq, 30, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM_207(vsubuqm, 0, 20),
 GEN_VXFORM_207(vsubcuq, 0, 21),
+GEN_VXFORM_DUAL(vsubuqm, bcdtrunc, 0, 20, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubeuqm, vsubecuq, 31, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM(vrlb, 2, 0),
 GEN_VXFORM(vrlh, 2, 1),
-- 
2.7.4




[Qemu-devel] [PATCH v4 6/6] target-ppc: Implement bcdutrunc. instruction

2016-12-19 Thread Jose Ricardo Ziviani
bcdutrunc. Decimal unsigned truncate. Works like bcdtrunc. with
unsigned BCD numbers.

Signed-off-by: Jose Ricardo Ziviani 
---
 target-ppc/helper.h |  1 +
 target-ppc/int_helper.c | 51 +
 target-ppc/translate/vmx-impl.inc.c |  4 +++
 target-ppc/translate/vmx-ops.inc.c  |  2 +-
 4 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 7f053d8..38e5246 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -402,6 +402,7 @@ DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdus, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdsr, i32, avr, avr, avr, i32)
 DEF_HELPER_4(bcdtrunc, i32, avr, avr, avr, i32)
+DEF_HELPER_4(bcdutrunc, i32, avr, avr, avr, i32)
 
 DEF_HELPER_2(xsadddp, void, env, i32)
 DEF_HELPER_2(xssubdp, void, env, i32)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index edcaa12..77f8c13 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -3209,6 +3209,57 @@ uint32_t helper_bcdtrunc(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b, uint32_t ps)
 return bcd_cmp_zero(&ret) | ox_flag;
 }
 
+uint32_t helper_bcdutrunc(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t 
ps)
+{
+int i;
+uint64_t mask;
+uint32_t ox_flag = 0;
+int invalid = 0;
+ppc_avr_t ret = *b;
+
+for (i = 0; i < 32; i++) {
+bcd_get_digit(b, i, &invalid);
+
+if (unlikely(invalid)) {
+return CRF_SO;
+}
+}
+
+#if defined(HOST_WORDS_BIGENDIAN)
+i = a->s16[3];
+#else
+i = a->s16[4];
+#endif
+if (i > 16 && i < 33) {
+if (ret.u64[HI_IDX] >> (i * 4 - 64)) {
+ox_flag = CRF_SO;
+}
+
+mask = (uint64_t)-1 >> (128 - i * 4);
+ret.u64[HI_IDX] &= mask;
+} else if (i > 0 && i <= 16) {
+if (ret.u64[HI_IDX] || (i < 16 && ret.u64[LO_IDX] >> (i * 4))) {
+ox_flag = CRF_SO;
+}
+
+mask = (uint64_t)-1 >> (64 - i * 4);
+ret.u64[LO_IDX] &= mask;
+ret.u64[HI_IDX] = 0;
+} else if (i == 0) {
+if (ret.u64[HI_IDX] || ret.u64[LO_IDX]) {
+ox_flag = CRF_SO;
+}
+ret.u64[HI_IDX] = ret.u64[LO_IDX] = 0;
+}
+
+*r = ret;
+if (r->u64[HI_IDX] == 0 && r->u64[LO_IDX] == 0) {
+return ox_flag | CRF_EQ;
+}
+
+return ox_flag | CRF_GT;
+}
+
 void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 {
 int i;
diff --git a/target-ppc/translate/vmx-impl.inc.c 
b/target-ppc/translate/vmx-impl.inc.c
index 1683f42..3cb6fc2 100644
--- a/target-ppc/translate/vmx-impl.inc.c
+++ b/target-ppc/translate/vmx-impl.inc.c
@@ -1020,6 +1020,7 @@ GEN_BCD(bcds);
 GEN_BCD(bcdus);
 GEN_BCD(bcdsr);
 GEN_BCD(bcdtrunc);
+GEN_BCD(bcdutrunc);
 
 static void gen_xpnd04_1(DisasContext *ctx)
 {
@@ -1102,6 +1103,9 @@ GEN_VXFORM_DUAL(vsubsbs, PPC_ALTIVEC, PPC_NONE, \
 bcdtrunc, PPC_NONE, PPC2_ISA300)
 GEN_VXFORM_DUAL(vsubuqm, PPC2_ALTIVEC_207, PPC_NONE, \
 bcdtrunc, PPC_NONE, PPC2_ISA300)
+GEN_VXFORM_DUAL(vsubcuq, PPC2_ALTIVEC_207, PPC_NONE, \
+bcdutrunc, PPC_NONE, PPC2_ISA300)
+
 
 static void gen_vsbox(DisasContext *ctx)
 {
diff --git a/target-ppc/translate/vmx-ops.inc.c 
b/target-ppc/translate/vmx-ops.inc.c
index e6167a4..139f80c 100644
--- a/target-ppc/translate/vmx-ops.inc.c
+++ b/target-ppc/translate/vmx-ops.inc.c
@@ -149,8 +149,8 @@ GEN_VXFORM_DUAL(vsubsws, xpnd04_2, 0, 30, PPC_ALTIVEC, 
PPC_NONE),
 GEN_VXFORM_207(vadduqm, 0, 4),
 GEN_VXFORM_207(vaddcuq, 0, 5),
 GEN_VXFORM_DUAL(vaddeuqm, vaddecuq, 30, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
-GEN_VXFORM_207(vsubcuq, 0, 21),
 GEN_VXFORM_DUAL(vsubuqm, bcdtrunc, 0, 20, PPC2_ALTIVEC_207, PPC2_ISA300),
+GEN_VXFORM_DUAL(vsubcuq, bcdutrunc, 0, 21, PPC2_ALTIVEC_207, PPC2_ISA300),
 GEN_VXFORM_DUAL(vsubeuqm, vsubecuq, 31, 0xFF, PPC_NONE, PPC2_ALTIVEC_207),
 GEN_VXFORM(vrlb, 2, 0),
 GEN_VXFORM(vrlh, 2, 1),
-- 
2.7.4




Re: [Qemu-devel] [PATCH] intel_iommu: allow dynamic switch of IOMMU region

2016-12-19 Thread Alex Williamson
On Mon, 19 Dec 2016 22:41:26 +0800
Peter Xu  wrote:

> This is preparation work to finally enabled dynamic switching ON/OFF for
> VT-d protection. The old VT-d codes is using static IOMMU region, and
> that won't satisfy vfio-pci device listeners.
> 
> Let me explain.
> 
> vfio-pci devices depend on the memory region listener and IOMMU replay
> mechanism to make sure the device mapping is coherent with the guest
> even if there are domain switches. And there are two kinds of domain
> switches:
> 
>   (1) switch from domain A -> B
>   (2) switch from domain A -> no domain (e.g., turn DMAR off)
> 
> Case (1) is handled by the context entry invalidation handling by the
> VT-d replay logic. What the replay function should do here is to replay
> the existing page mappings in domain B.
> 
> However for case (2), we don't want to replay any domain mappings - we
> just need the default GPA->HPA mappings (the address_space_memory
> mapping). And this patch helps on case (2) to build up the mapping
> automatically by leveraging the vfio-pci memory listeners.
> 
> Another important thing that this patch does is to seperate
> IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not
> depend on the DMAR region (like before this patch). It should be a
> standalone region, and it should be able to be activated without
> DMAR (which is a common behavior of Linux kernel - by default it enables
> IR while disabled DMAR).


This seems like an improvement, but I will note that there are existing
locked memory accounting issues inherent with VT-d and vfio.  With
VT-d, each device has a unique AddressSpace.  This requires that each
is managed via a separate vfio container.  Each container is accounted
for separately for locked pages.  libvirt currently only knows that if
any vfio devices are attached that the locked memory limit for the
process needs to be set sufficient for the VM memory.  When VT-d is
involved, we either need to figure out how to associate otherwise
independent vfio containers to share locked page accounting or teach
libvirt that the locked memory requirement needs to be multiplied by
the number of attached vfio devices.  The latter seems far less
complicated but reduces the containment of QEMU a bit since the
process has the ability to lock potentially many multiples of the VM
address size.  Thanks,

Alex

> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c | 75 
> ---
>  hw/i386/trace-events  |  3 ++
>  include/hw/i386/intel_iommu.h |  2 ++
>  3 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 5f3e351..75a3f4e 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1179,9 +1179,42 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
>  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
>  }
>  
> +static void vtd_switch_address_space(IntelIOMMUState *s, bool enabled)
> +{
> +GHashTableIter iter;
> +VTDBus *vtd_bus;
> +VTDAddressSpace *as;
> +int i;
> +
> +g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
> +while (g_hash_table_iter_next (&iter, NULL, (void**)&vtd_bus)) {
> +for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) {
> +as = vtd_bus->dev_as[i];
> +if (as == NULL) {
> +continue;
> +}
> +trace_vtd_switch_address_space(pci_bus_num(vtd_bus->bus),
> +   VTD_PCI_SLOT(i), VTD_PCI_FUNC(i),
> +   enabled);
> +if (enabled) {
> +memory_region_add_subregion_overlap(&as->root, 0,
> +&as->iommu, 2);
> +} else {
> +memory_region_del_subregion(&as->root, &as->iommu);
> +}
> +}
> +}
> +}
> +
>  /* Handle Translation Enable/Disable */
>  static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>  {
> +bool old = s->dmar_enabled;
> +
> +if (old == en) {
> +return;
> +}
> +
>  VTD_DPRINTF(CSR, "Translation Enable %s", (en ? "on" : "off"));
>  
>  if (en) {
> @@ -1196,6 +1229,8 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool 
> en)
>  /* Ok - report back to driver */
>  vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0);
>  }
> +
> +vtd_switch_address_space(s, en);
>  }
>  
>  /* Handle Interrupt Remap Enable/Disable */
> @@ -2343,15 +2378,47 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
> PCIBus *bus, int devfn)
>  vtd_dev_as->devfn = (uint8_t)devfn;
>  vtd_dev_as->iommu_state = s;
>  vtd_dev_as->context_cache_entry.context_cache_gen = 0;
> +
> +/*
> + * When DMAR is disabled, memory region relationships looks
> + * like:
> + *
> + * - (prio 0, RW): vtd_root
>

Re: [Qemu-devel] [PATCH v6 2/4] hw/intc/arm_gicv3_kvm: Implement get/put functions

2016-12-19 Thread Auger Eric
Hi Vijaya,

On 23/11/2016 13:39, vijay.kil...@gmail.com wrote:
> From: Vijaya Kumar K 
> 
> This actually implements pre_save and post_load methods for in-kernel
> vGICv3.
> 
> Signed-off-by: Pavel Fedin 
> Signed-off-by: Peter Maydell 
> [PMM:
>  * use decimal, not 0bnnn
>  * fixed typo in names of ICC_APR0R_EL1 and ICC_AP1R_EL1
>  * completely rearranged the get and put functions to read and write
>the state in a natural order, rather than mixing distributor and
>redistributor state together]
> Signed-off-by: Vijaya Kumar K 
> [Vijay:
>  * Update macro KVM_VGIC_ATTR
>  * Use 32 bit access for gicd and gicr
>  * GICD_IROUTER, GICD_TYPER, GICR_PROPBASER and GICR_PENDBASER reg
>access  are changed from 64-bit to 32-bit access
>  * Add ICC_SRE_EL1 save and restore
>  * Dropped translate_fn mechanism and coded functions to handle
>save and restore of edge_trigger and priority
>  * Number of APnR register saved/restored based on number of
>priority bits supported]
> ---
> ---
>  hw/intc/arm_gicv3_kvm.c  | 559 
> ++-
>  hw/intc/gicv3_internal.h |   1 +
>  2 files changed, 549 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
> index 199a439..a317fbf 100644
> --- a/hw/intc/arm_gicv3_kvm.c
> +++ b/hw/intc/arm_gicv3_kvm.c
> @@ -23,8 +23,10 @@
>  #include "qapi/error.h"
>  #include "hw/intc/arm_gicv3_common.h"
>  #include "hw/sysbus.h"
> +#include "qemu/error-report.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> +#include "gicv3_internal.h"
>  #include "vgic_common.h"
>  #include "migration/migration.h"
> 
> @@ -44,6 +46,30 @@
>  #define KVM_ARM_GICV3_GET_CLASS(obj) \
>   OBJECT_GET_CLASS(KVMARMGICv3Class, (obj), TYPE_KVM_ARM_GICV3)
>  
> +#define   KVM_DEV_ARM_VGIC_SYSREG(op0, op1, crn, crm, op2) \
> + (ARM64_SYS_REG_SHIFT_MASK(op0, OP0) | \
> +  ARM64_SYS_REG_SHIFT_MASK(op1, OP1) | \
> +  ARM64_SYS_REG_SHIFT_MASK(crn, CRN) | \
> +  ARM64_SYS_REG_SHIFT_MASK(crm, CRM) | \
> +  ARM64_SYS_REG_SHIFT_MASK(op2, OP2))
> +
> +#define ICC_PMR_EL1 \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 4, 6, 0)
> +#define ICC_BPR0_EL1\
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 8, 3)
> +#define ICC_AP0R_EL1(n) \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 8, 4 | n)
> +#define ICC_AP1R_EL1(n) \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 9, n)
> +#define ICC_BPR1_EL1\
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 3)
> +#define ICC_CTLR_EL1\
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 4)
> +#define ICC_IGRPEN0_EL1 \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 6)
> +#define ICC_IGRPEN1_EL1 \
> +KVM_DEV_ARM_VGIC_SYSREG(3, 0, 12, 12, 7)
> +
>  typedef struct KVMARMGICv3Class {
>  ARMGICv3CommonClass parent_class;
>  DeviceRealize parent_realize;
> @@ -57,16 +83,521 @@ static void kvm_arm_gicv3_set_irq(void *opaque, int irq, 
> int level)
>  kvm_arm_gic_set_irq(s->num_irq, irq, level);
>  }
>  
> +#define KVM_VGIC_ATTR(reg, typer) \
> +((typer & KVM_DEV_ARM_VGIC_V3_MPIDR_MASK) | (reg))
> +
> +static inline void kvm_gicd_access(GICv3State *s, int offset,
> +   uint32_t *val, bool write)
> +{
> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_DIST_REGS,
> +  KVM_VGIC_ATTR(offset, 0),
> +  val, write);
> +}
> +
> +static inline void kvm_gicr_access(GICv3State *s, int offset, int cpu,
> +   uint32_t *val, bool write)
> +{
> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_REDIST_REGS,
> +  KVM_VGIC_ATTR(offset, s->cpu[cpu].gicr_typer),
> +  val, write);
> +}
> +
> +static inline void kvm_gicc_access(GICv3State *s, uint64_t reg, int cpu,
> +   uint64_t *val, bool write)
> +{
> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS,
> +  KVM_VGIC_ATTR(reg, s->cpu[cpu].gicr_typer),
> +  val, write);
There is a mismatch here between the kernel define
(KVM_DEV_ARM_VGIC_CPU_SYSREGS in v10) and the QEMU define. I think we
should rather fix that at kernel API level.
./scripts/update-linux-headers.sh normally leaves the kernel header
untouched.

Thanks

Eric
> +}
> +
> +static inline void kvm_gic_line_level_access(GICv3State *s, int irq, int cpu,
> + uint32_t *val, bool write)
> +{
> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO,
> +  KVM_VGIC_ATTR(irq, s->cpu[cpu].gicr_typer) |
> +  (VGIC_LEVEL_INFO_LINE_LEVEL <<
> +   KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT),
> +  val, write);
> +}
> +
> +/* Loop through each distributor IRQ related register; since bits
> + * corresponding to SPIs and

Re: [Qemu-devel] Strange/wrong behavior with iSCSI Tape devices in QEMU 2.8.0-rc4

2016-12-19 Thread John Snow


On 12/19/2016 04:05 AM, Holger Schranz wrote:
> # Strange/wrong behavior in QEMU 2.8.0-rc4
> 
> After update from QEMU 2.6.2 to 2.8.0-rc4 the tape devices
> and the corresponding medium changer are no longer available
> in the VM quest system.
> 
> The tape devices and the media changer are declared in the
> xml-file for libvirt. Both, tape drives and medium changer
> are avalilable via iSCSI.
> 
> In a first rough investigation the login in iSCSI runs well
> but in QEMU, it seems the devices doesn't reported to the VM-Quest.
> 
> --
> 
> Best regards
> 
> Holger
> 

Hi, thanks for the report; do you have a QEMU command line that we can
use to help reproduce the problem? It's easiest if we can cut libvirt
out of the loop.

What is the architecture and version of the guest?

Lastly, it looks like this worked in our 2.7 release if I'm reading you
correctly, so this is a change for the 2.8 release as far as you can
tell, right?

--js

> =
> 
> XML-Declration:
> 
> .
> .
> .
> 
> 
> 
>   
> 
> 
> 
> 
>name='iqn.2008-09.net.fsc:server.LT260_61003/4'>
>   
>   
>   
> 
> 
> 
>name='iqn.2008-09.net.fsc:server.LT260_61003/5'>
>   
>   
>   
> 
> 
> 
> 
>name='iqn.2008-09.net.fsc:server.LT260_61003/6'>
>   
>   
>   
> 
> 
> 
> 
>name='iqn.2008-09.net.fsc:server.LT60_61005/3'>
>   
>   
>   
> 
> 
> 
>name='iqn.2008-09.net.fsc:server.LT60_61005/4'>
>   
>   
>   
> 
> .
> .
> .
> 
> ===
> 
> 
> correct behavior/Result inside the VM-Quest together with QEMU/KVM 2.6.2
> / 2.7.0
> -- 
> (VCSTCS82:A)IUP1:~ # lsscsi -g
> [0:2:0:0]diskFTS  PRAID EP420i 4.25  /dev/sda /dev/sg0
> [3:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.5+  /dev/sr0 /dev/sg1
> [8:0:2:1]tapeHP   Ultrium 5-SCSI   0001  /dev/st3 /dev/sg6
> [8:0:2:2]tapeHP   Ultrium 5-SCSI   0001  /dev/st2 /dev/sg5
> [8:0:2:3]mediumx FUJITSU  ETERNUS LT2606.20  - /dev/sg4
> [8:0:2:4]tapeHP   Ultrium 6-SCSI   23AB  /dev/st1 /dev/sg3
> [8:0:2:5]tapeHP   Ultrium 6-SCSI   23AB  /dev/st0 /dev/sg2
> [11:0:0:0]   diskFUJITSU  ETERNUS_DXL    /dev/sdb /dev/sg7
> [11:0:0:2]   diskFUJITSU  ETERNUS_DXL    /dev/sdh /dev/sg13
> [11:0:0:3]   diskFUJITSU  ETERNUS_DXL    /dev/sdg /dev/sg12
> [11:0:0:4]   diskFUJITSU  ETERNUS_DXL    /dev/sdf /dev/sg11
> [11:0:0:5]   diskFUJITSU  ETERNUS_DXL    /dev/sde /dev/sg10
> [11:0:0:6]   diskFUJITSU  ETERNUS_DXL    /dev/sdd /dev/sg9
> [11:0:0:7]   diskFUJITSU  ETERNUS_DXL    /dev/sdc /dev/sg8
> [11:0:0:15]  diskFUJITSU  ETERNUS_DXL    /dev/sdi /dev/sg14
> (VCSTCS82:A)IUP1:~ #
> 
> 
> 
> Wrong behavior/Result inside the VM-Quest together with QEMU/KVM 2.8.0-RC4
> -- 
> (VCSTCS82:A)IUP1:~ # lsscsi -g
> [0:2:0:0]diskFTS  PRAID EP420i 4.25  /dev/sda /dev/sg0
> [3:0:0:0]cd/dvd  QEMU QEMU DVD-ROM 2.5+  /dev/sr0 /dev/sg1
> [11:0:1:0]   diskFUJITSU  ETERNUS_DXL    /dev/sdj /dev/sg10
> [11:0:1:2]   diskFUJITSU  ETERNUS_DXL    /dev/sdp /dev/sg16
> [11:0:1:3]   diskFUJITSU  ETERNUS_DXL    /dev/sdo /dev/sg15
> [11:0:1:4]   diskFUJITSU  ETERNUS_DXL    /dev/sdn /dev/sg14
> [11:0:1:5]   diskFUJITSU  ETERNUS_DXL    /dev/sdm /dev/sg13
> [11:0:1:6]   diskFUJITSU  ETERNUS_DXL    /dev/sdl /dev/sg12
> [11:0:1:7]   diskFUJITSU  ETERNUS_DXL    /dev/sdk /dev/sg11
> [11:0:1:15]  diskFUJITSU  ETERNUS_DXL    /dev/sdq /dev/sg17
> (VCSTCS82:A)IUP1:~ #
> 



Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default

2016-12-19 Thread Andrea Bolognani
On Wed, 2016-12-14 at 20:26 +0200, Marcel Apfelbaum wrote:
> > > > > > Maybe I just don't quite get the relationship between Root
> > > > > > Complexes and Root Buses, but I guess my question is: what
> > > > > > is preventing us from simply doing whatever a
> > > > > > spapr-pci-host-bridge is doing in order to expose a legacy
> > > > > > PCI Root Bus (pci.*) to the guest, and create a new
> > > > > > spapr-pcie-host-bridge that exposes a PCIe Root Bus (pcie.*)
> > > > > > instead?
> > > > > 
> > > > > Hrm, the suggestion of providing both a vanilla-PCI and PCI-E host
> > > > > bridge came up before.  I think one of us spotted a problem with that,
> > > > > but I don't recall what it was now.  I guess one is how libvirt would
> > > > > map it's stupid-fake-domain-numbers to which root bus to use.
> > > 
> > > This would be a weird configuration, I never heard of something like that
> > > on a bare metal machine, but I never worked on pseries, who knows...
> > 
> > Which aspect?  Having multiple independent host bridges is perfectly
> > reasonable - x86 just doesn't do it well for rather stupid historical
> > reasons.
> 
> I agree about the multiple host-bridges, is actually what pxb/pxb-pcie
> devices (kind of) do.
> 
> I was talking about having one PCI PHB and another PHB which is PCI Express.
> I was referring to one system having both PCI and PCIe PHBs.

Sorry this confused you: what I was talking about was just
having both a legacy PCI PHB and a PCI Express PHB available
in QEMU, not using both for the same guest. The user would
pick one or the other and stick with it.

> > Again, one possible option here is to continue to treat pseries as
> > having a vanilla-PCI bus, but with a special flag saying that it's
> > magically able to connect PCI-E devices.
> 
> A PCIe bus supporting PCI devices is strange (QEMU allows it ...),
> but a PCI bus supporting PCIe devices is hard to "swallow".

I agree.

> I would say maybe make it a special case of a PCIe bus with different rules.
> It can derive from the PCIe bus class and override the usual behavior
> with PAPR specific rule which happen to be similar with the PCI bus rules.

It's pretty clear by now that libvirt will need to have some
special handling of pseries guests. Having to choose between
"weird legacy PCI" and "weird PCI Express", I think I'd rather
go with the latter :)

-- 
Andrea Bolognani / Red Hat / Virtualization



  1   2   >