date:20171129

[Qemu-devel] [PATCH v18 02/10] radix tree test suite: remove ARRAY_SIZE to avoid redefinition

2017-11-29 Thread Wei Wang

ARRAY_SIZE() has been defined in include/linux/kernel.h, and "make"
complains a warning of redefinition of ARRAY_SIZE() in
testing/radix/linux/kernel.h. So, remove ARRAY_SIZE() from there.

Signed-off-by: Wei Wang 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
---
 tools/testing/radix-tree/linux/kernel.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/radix-tree/linux/kernel.h 
b/tools/testing/radix-tree/linux/kernel.h
index c3bc3f3..426f32f 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -17,6 +17,4 @@
 #define pr_debug printk
 #define pr_cont printk
 
-#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
-
 #endif /* _KERNEL_H */
-- 
2.7.4

[Qemu-devel] [PATCH v18 08/10] mm: support reporting free page blocks

2017-11-29 Thread Wei Wang

This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.

One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Acked-by: Michal Hocko 
---
 include/linux/mm.h |  6 
 mm/page_alloc.c| 91 ++
 2 files changed, 97 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ee07314..c1339be 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1924,6 +1924,12 @@ extern void free_area_init_node(int nid, unsigned long * 
zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 
+extern void walk_free_mem_block(void *opaque,
+   int min_order,
+   bool (*report_pfn_range)(void *opaque,
+unsigned long pfn,
+unsigned long num));
+
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
  * into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d4096f4..0f4a197 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4892,6 +4892,97 @@ void show_free_areas(unsigned int filter, nodemask_t 
*nodemask)
show_swap_cache_info();
 }
 
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return false if the callback requests to stop reporting. Otherwise,
+ * return true.
+ */
+static bool walk_free_page_list(void *opaque,
+   struct zone *zone,
+   int order,
+   enum migratetype mt,
+   bool (*report_pfn_range)(void *,
+unsigned long,
+unsigned long))
+{
+   struct page *page;
+   struct list_head *list;
+   unsigned long pfn, flags;
+   bool ret;
+
+   spin_lock_irqsave(&zone->lock, flags);
+   list = &zone->free_area[order].free_list[mt];
+   list_for_each_entry(page, list, lru) {
+   pfn = page_to_pfn(page);
+   ret = report_pfn_range(opaque, pfn, 1 << order);
+   if (!ret)
+   break;
+   }
+   spin_unlock_irqrestore(&zone->lock, flags);
+
+   return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns false, stop iterating the list of free page blocks.
+ * Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ */
+void walk_free_mem_block(void *opaque,
+int min_order,
+bool (*report_pfn_range)(void *opaque,
+ unsigned long pfn,
+ unsigned long num))
+{
+   s

[Qemu-devel] [PATCH v18 01/10] idr: add #include

2017-11-29 Thread Wei Wang

The  was removed from radix-tree.h by the following commit:
f5bba9d11a256ad2a1c2f8e7fc6aabe6416b7890.

Since that commit, tools/testing/radix-tree/ couldn't pass compilation
due to: tools/testing/radix-tree/idr.c:17: undefined reference to
WARN_ON_ONCE. This patch adds the bug.h header to idr.h to solve the
issue.

Signed-off-by: Wei Wang 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
Cc: Jan Kara 
Cc: Eric Biggers 
Cc: Tejun Heo 
Cc: Masahiro Yamada 
---
 include/linux/idr.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 7c3a365..fa14f83 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct idr {
struct radix_tree_root  idr_rt;
-- 
2.7.4

[Qemu-devel] [PATCH v18 06/10] virtio_ring: add a new API, virtqueue_add_one_desc

2017-11-29 Thread Wei Wang

Current virtqueue_add API implementation is based on the scatterlist
struct, which uses kaddr. This is inadequate to all the use case of
vring. For example:
- Some usages don't use IOMMU, in this case the user can directly pass
  in a physical address in hand, instead of going through the sg
  implementation (e.g. the VIRTIO_BALLOON_F_SG feature)
- Sometimes, a guest physical page may not have a kaddr (e.g. high
  memory) but need to use vring (e.g. the VIRTIO_BALLOON_F_FREE_PAGE_VQ
  feature)

The new API virtqueue_add_one_desc enables the caller to assign a vring
desc with a physical address and len. Also, factor out the common code
with virtqueue_add in vring_set_avail.

Signed-off-by: Wei Wang 
Cc: Michael S. Tsirkin 
---
 drivers/virtio/virtio_ring.c | 94 +++-
 include/linux/virtio.h   |  6 +++
 2 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e..0b87123 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -257,6 +257,79 @@ static struct vring_desc *alloc_indirect(struct virtqueue 
*_vq,
return desc;
 }
 
+static void vring_set_avail(struct virtqueue *_vq,
+   unsigned int i)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   unsigned int avail;
+
+   avail = vq->avail_idx_shadow & (vq->vring.num - 1);
+   vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, i);
+
+   /*
+* Descriptors and available array need to be set before we expose the
+* new available array entries.
+*/
+   virtio_wmb(vq->weak_barriers);
+   vq->avail_idx_shadow++;
+   vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+  vq->avail_idx_shadow);
+   vq->num_added++;
+
+   pr_debug("Added buffer head %i to %p\n", i, vq);
+
+   /*
+* This is very unlikely, but theoretically possible.  Kick
+* just in case.
+*/
+   if (unlikely(vq->num_added == (1 << 16) - 1))
+   virtqueue_kick(_vq);
+}
+
+int virtqueue_add_one_desc(struct virtqueue *_vq,
+  uint64_t addr,
+  uint32_t len,
+  bool in_desc,
+  void *data)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_desc *desc;
+   unsigned int i;
+
+   START_USE(vq);
+   BUG_ON(data == NULL);
+
+   if (unlikely(vq->broken)) {
+   END_USE(vq);
+   return -EIO;
+   }
+
+   if (_vq->num_free < 1) {
+   END_USE(vq);
+   return -ENOSPC;
+   }
+
+   i = vq->free_head;
+   desc = &vq->vring.desc[i];
+   desc->addr = cpu_to_virtio64(_vq->vdev, addr);
+   desc->len = cpu_to_virtio32(_vq->vdev, len);
+   if (in_desc)
+   desc->flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_WRITE);
+   else
+   desc->flags = 0;
+   vq->desc_state[i].data = data;
+   vq->desc_state[i].indir_desc = NULL;
+   vq->free_head = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
+   _vq->num_free--;
+
+   vring_set_avail(_vq, i);
+
+   END_USE(vq);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_one_desc);
+
 static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
@@ -269,7 +342,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
struct vring_desc *desc;
-   unsigned int i, n, avail, descs_used, uninitialized_var(prev), err_idx;
+   unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
int head;
bool indirect;
 
@@ -395,26 +468,9 @@ static inline int virtqueue_add(struct virtqueue *_vq,
else
vq->desc_state[head].indir_desc = ctx;
 
-   /* Put entry in available array (but don't update avail->idx until they
-* do sync). */
-   avail = vq->avail_idx_shadow & (vq->vring.num - 1);
-   vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
-
-   /* Descriptors and available array need to be set before we expose the
-* new available array entries. */
-   virtio_wmb(vq->weak_barriers);
-   vq->avail_idx_shadow++;
-   vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
-   vq->num_added++;
-
-   pr_debug("Added buffer head %i to %p\n", head, vq);
+   vring_set_avail(_vq, head);
END_USE(vq);
 
-   /* This is very unlikely, but theoretically possible.  Kick
-* just in case. */
-   if (unlikely(vq->num_added == (1 << 16) - 1))
-   virtqueue_kick(_vq);
-
return 0;
 
 unmap_release:
diff --git a/include/linux/virtio.h b/include/linu

[Qemu-devel] [PATCH v18 03/10] xbitmap: Introduce xbitmap

2017-11-29 Thread Wei Wang

From: Matthew Wilcox 

The eXtensible Bitmap is a sparse bitmap representation which is
efficient for set bits which tend to cluster.  It supports up to
'unsigned long' worth of bits, and this commit adds the bare bones --
xb_set_bit(), xb_clear_bit() and xb_test_bit().

Signed-off-by: Wei Wang 
---
 include/linux/radix-tree.h   |   2 +
 include/linux/xbitmap.h  |  52 +
 lib/Makefile |   2 +-
 lib/radix-tree.c |  26 -
 lib/xbitmap.c| 179 +++
 tools/testing/radix-tree/Makefile|  12 ++-
 tools/testing/radix-tree/linux/xbitmap.h |   1 +
 tools/testing/radix-tree/main.c  |   4 +
 tools/testing/radix-tree/test.h  |   1 +
 9 files changed, 275 insertions(+), 4 deletions(-)
 create mode 100644 include/linux/xbitmap.h
 create mode 100644 lib/xbitmap.c
 create mode 100644 tools/testing/radix-tree/linux/xbitmap.h

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 23a9c89..fe44f4b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -315,6 +315,8 @@ void radix_tree_iter_delete(struct radix_tree_root *,
struct radix_tree_iter *iter, void __rcu **slot);
 void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
 void *radix_tree_delete(struct radix_tree_root *, unsigned long);
+bool __radix_tree_delete(struct radix_tree_root *r, struct radix_tree_node *n,
+   void **slot);
 void radix_tree_clear_tags(struct radix_tree_root *, struct radix_tree_node *,
   void __rcu **slot);
 unsigned int radix_tree_gang_lookup(const struct radix_tree_root *,
diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h
new file mode 100644
index 000..ed75d87
--- /dev/null
+++ b/include/linux/xbitmap.h
@@ -0,0 +1,52 @@
+/*
+ * eXtensible Bitmaps
+ * Copyright (c) 2017 Microsoft Corporation 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * eXtensible Bitmaps provide an unlimited-size sparse bitmap facility.
+ * All bits are initially zero.
+ */
+
+#include 
+
+struct xb {
+   struct radix_tree_root xbrt;
+};
+
+#define XB_INIT {  \
+   .xbrt = RADIX_TREE_INIT(IDR_RT_MARKER | GFP_NOWAIT),\
+}
+#define DEFINE_XB(name)struct xb name = XB_INIT
+
+static inline void xb_init(struct xb *xb)
+{
+   INIT_RADIX_TREE(&xb->xbrt, IDR_RT_MARKER | GFP_NOWAIT);
+}
+
+int xb_set_bit(struct xb *xb, unsigned long bit);
+bool xb_test_bit(const struct xb *xb, unsigned long bit);
+int xb_clear_bit(struct xb *xb, unsigned long bit);
+
+int xb_zero(struct xb *xb, unsigned long start, unsigned long nbits);
+int xb_fill(struct xb *xb, unsigned long start, unsigned long nbits);
+
+static inline bool xb_empty(const struct xb *xb)
+{
+   return radix_tree_empty(&xb->xbrt);
+}
+
+void xb_preload(gfp_t);
+
+static inline void xb_preload_end(void)
+{
+   preempt_enable();
+}
diff --git a/lib/Makefile b/lib/Makefile
index d11c48e..08a8183 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -19,7 +19,7 @@ KCOV_INSTRUMENT_dynamic_debug.o := n
 
 lib-y := ctype.o string.o vsprintf.o cmdline.o \
 rbtree.o radix-tree.o dump_stack.o timerqueue.o\
-idr.o int_sqrt.o extable.o \
+idr.o xbitmap.o int_sqrt.o extable.o \
 sha1.o chacha20.o irq_regs.o argv_split.o \
 flex_proportions.o ratelimit.o show_mem.o \
 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index c8d5556..7000ad6 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -78,6 +78,14 @@ static struct kmem_cache *radix_tree_node_cachep;
 #define IDA_PRELOAD_SIZE   (IDA_MAX_PATH * 2 - 1)
 
 /*
+ * The XB can go up to unsigned long, but also uses a bitmap.
+ */
+#define XB_INDEX_BITS  (BITS_PER_LONG - ilog2(IDA_BITMAP_BITS))
+#define XB_MAX_PATH(DIV_ROUND_UP(XB_INDEX_BITS, \
+   RADIX_TREE_MAP_SHIFT))
+#define XB_PRELOAD_SIZE(XB_MAX_PATH * 2 - 1)
+
+/*
  * Per-cpu pool of preloaded nodes
  */
 struct radix_tree_preload {
@@ -839,6 +847,8 @@ int __radix_tree_create(struct radix_tree_root *root, 
unsigned long index,
offset, 0, 0);
if (!child)
retur

[Qemu-devel] [PATCH v18 07/10] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-11-29 Thread Wei Wang

Add a new feature, VIRTIO_BALLOON_F_SG, which enables the transfer of
balloon (i.e. inflated/deflated) pages using scatter-gather lists to the
host. A scatter-gather list is described by a vring desc.

The implementation of the previous virtio-balloon is not very efficient,
because the balloon pages are transferred to the host by one array each
time. Here is the breakdown of the time in percentage spent on each step
of the balloon inflating process (inflating 7GB of an 8GB idle guest).

1) allocating pages (6.5%)
2) sending PFNs to host (68.3%)
3) address translation (6.1%)
4) madvise (19%)

It takes about 4126ms for the inflating process to complete. The above
profiling shows that the bottlenecks are stage 2) and stage 4).

This patch optimizes step 2) by transferring pages to host in sgs. An sg
describes a chunk of guest physically continuous pages. With this
mechanism, step 4) can also be optimized by doing address translation and
madvise() in chunks rather than page by page.

With this new feature, the above ballooning process takes ~440ms resulting
in an improvement of ~89%.

TODO: optimize stage 1) by allocating/freeing a chunk of pages instead of
a single page each time.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Suggested-by: Michael S. Tsirkin 
Cc: Tetsuo Handa 
---
 drivers/virtio/virtio_balloon.c | 230 +---
 include/uapi/linux/virtio_balloon.h |   1 +
 2 files changed, 212 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7960746..2c21c5a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -32,6 +32,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -79,6 +81,9 @@ struct virtio_balloon {
/* Synchronize access/update to this struct virtio_balloon elements */
struct mutex balloon_lock;
 
+   /* The xbitmap used to record balloon pages */
+   struct xb page_xb;
+
/* The array of pfns we tell the Host about. */
unsigned int num_pfns;
__virtio32 pfns[VIRTIO_BALLOON_ARRAY_PFNS_MAX];
@@ -141,15 +146,121 @@ static void set_page_pfns(struct virtio_balloon *vb,
  page_to_balloon_pfn(page) + i);
 }
 
+static void kick_and_wait(struct virtqueue *vq, wait_queue_head_t wq_head)
+{
+   unsigned int len;
+
+   virtqueue_kick(vq);
+   wait_event(wq_head, virtqueue_get_buf(vq, &len));
+}
+
+static void send_one_desc(struct virtio_balloon *vb,
+ struct virtqueue *vq,
+ uint64_t addr,
+ uint32_t len,
+ bool inbuf,
+ bool batch)
+{
+   int err;
+   unsigned int size;
+
+   /* Detach all the used buffers from the vq */
+   while (virtqueue_get_buf(vq, &size))
+   ;
+
+   err = virtqueue_add_one_desc(vq, addr, len, inbuf, vq);
+   /*
+* This is expected to never fail: there is always at least 1 entry
+* available on the vq, because when the vq is full the worker thread
+* that adds the desc will be put into sleep until at least 1 entry is
+* available to use.
+*/
+   BUG_ON(err);
+
+   /* If batching is requested, we batch till the vq is full */
+   if (!batch || !vq->num_free)
+   kick_and_wait(vq, vb->acked);
+}
+
+/*
+ * Send balloon pages in sgs to host. The balloon pages are recorded in the
+ * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
+ * The page xbitmap is searched for continuous "1" bits, which correspond
+ * to continuous pages, to chunk into sgs.
+ *
+ * @page_xb_start and @page_xb_end form the range of bits in the xbitmap that
+ * need to be searched.
+ */
+static void tell_host_sgs(struct virtio_balloon *vb,
+ struct virtqueue *vq,
+ unsigned long page_xb_start,
+ unsigned long page_xb_end)
+{
+   unsigned long pfn_start, pfn_end;
+   uint64_t addr;
+   uint32_t len, max_len = round_down(UINT_MAX, PAGE_SIZE);
+
+   pfn_start = page_xb_start;
+   while (pfn_start < page_xb_end) {
+   pfn_start = xb_find_next_set_bit(&vb->page_xb, pfn_start,
+page_xb_end);
+   if (pfn_start == page_xb_end + 1)
+   break;
+   pfn_end = xb_find_next_zero_bit(&vb->page_xb,
+   pfn_start + 1,
+   page_xb_end);
+   addr = pfn_start << PAGE_SHIFT;
+   len = (pfn_end - pfn_start) << PAGE_SHIFT;
+   while (len > max_len) {
+   send_one_desc(vb, vq, addr, max_len, true, true);
+   addr += max_len;
+

[Qemu-devel] [PATCH v18 04/10] xbitmap: potential improvement

2017-11-29 Thread Wei Wang

This patch made some changes to the original xbitmap implementation from
the linux-dax tree:

- remove xb_fill() and xb_zero() from xbitmap.h since they are not
  implemented;

- xb_test_bit: changed "ebit > BITS_PER_LONG" to "ebit >= BITS_PER_LONG",
  because bit 64 beyonds the "unsigned long" exceptional entry (0 to 63);

- xb_set_bit: delete the new inserted radix_tree_node when failing to
  get the per cpu ida bitmap, this avoids the kind of memory leak of the
  unused radix tree node left in the tree.

- xb_clear_bit: change it to be a void function, since the original
  implementation reurns nothing than a 0.

- remove the comment above "#define XB_INDEX_BITS", because it causes
  confusion based on the feedbacks from the previous discussion;

- xb_preload: with the original implementation, the CPU that successfully
  do __radix_tree_preload() may get into sleep by kmalloc(), which has a
  risk of getting the caller of xb_preload() scheduled to another CPU
  after waken up, and the new CPU may not have radix_tree_node
  pre-allocated there, this will be a problem when inserting a node to
  the tree later. This patch moves __radix_tree_preload() after kmalloc()
  and returns a boolean to indicate the success or failure. Also, add the
  __must_check annotation to xb_preload for prudence purpose.

Signed-off-by: Wei Wang 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Michael S. Tsirkin 
Cc: Tetsuo Handa 
---
 include/linux/xbitmap.h |  5 +
 lib/radix-tree.c| 27 +--
 lib/xbitmap.c   | 24 +---
 3 files changed, 35 insertions(+), 21 deletions(-)

diff --git a/include/linux/xbitmap.h b/include/linux/xbitmap.h
index ed75d87..b4d8375 100644
--- a/include/linux/xbitmap.h
+++ b/include/linux/xbitmap.h
@@ -36,15 +36,12 @@ int xb_set_bit(struct xb *xb, unsigned long bit);
 bool xb_test_bit(const struct xb *xb, unsigned long bit);
 int xb_clear_bit(struct xb *xb, unsigned long bit);
 
-int xb_zero(struct xb *xb, unsigned long start, unsigned long nbits);
-int xb_fill(struct xb *xb, unsigned long start, unsigned long nbits);
-
 static inline bool xb_empty(const struct xb *xb)
 {
return radix_tree_empty(&xb->xbrt);
 }
 
-void xb_preload(gfp_t);
+bool xb_preload(gfp_t);
 
 static inline void xb_preload_end(void)
 {
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 7000ad6..a039588 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -77,9 +77,6 @@ static struct kmem_cache *radix_tree_node_cachep;
RADIX_TREE_MAP_SHIFT))
 #define IDA_PRELOAD_SIZE   (IDA_MAX_PATH * 2 - 1)
 
-/*
- * The XB can go up to unsigned long, but also uses a bitmap.
- */
 #define XB_INDEX_BITS  (BITS_PER_LONG - ilog2(IDA_BITMAP_BITS))
 #define XB_MAX_PATH(DIV_ROUND_UP(XB_INDEX_BITS, \
RADIX_TREE_MAP_SHIFT))
@@ -2145,17 +2142,35 @@ int ida_pre_get(struct ida *ida, gfp_t gfp)
 }
 EXPORT_SYMBOL(ida_pre_get);
 
-void xb_preload(gfp_t gfp)
+/**
+ *  xb_preload - preload for xb_set_bit()
+ *  @gfp_mask: allocation mask to use for preloading
+ *
+ * Preallocate memory to use for the next call to xb_set_bit(). On success,
+ * return true, with preemption disabled. On error, return false with
+ * preemption not disabled.
+ */
+__must_check bool xb_preload(gfp_t gfp)
 {
-   __radix_tree_preload(gfp, XB_PRELOAD_SIZE);
if (!this_cpu_read(ida_bitmap)) {
struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap), gfp);
 
if (!bitmap)
-   return;
+   return false;
+   /*
+* The per-CPU variable is updated with preemption enabled.
+* If the calling task is unlucky to be scheduled to another
+* CPU which has no ida_bitmap allocation, it will be detected
+* when setting a bit (i.e. __xb_set_bit()).
+*/
bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap);
kfree(bitmap);
}
+
+   if (__radix_tree_preload(gfp, XB_PRELOAD_SIZE) < 0)
+   return false;
+
+   return true;
 }
 EXPORT_SYMBOL(xb_preload);
 
diff --git a/lib/xbitmap.c b/lib/xbitmap.c
index 2b547a73..182aa29 100644
--- a/lib/xbitmap.c
+++ b/lib/xbitmap.c
@@ -39,8 +39,10 @@ int xb_set_bit(struct xb *xb, unsigned long bit)
return 0;
}
bitmap = this_cpu_xchg(ida_bitmap, NULL);
-   if (!bitmap)
+   if (!bitmap) {
+   __radix_tree_delete(root, node, slot);
return -EAGAIN;
+   }
memset(bitmap, 0, sizeof(*bitmap));
bitmap->bitmap[0] = tmp >> RADIX_TREE_EXCEPTIONAL_SHIFT;
rcu_assign_pointer(*slot, bitmap);
@@ -54,8 +56,10 @@ int xb_set_bit(struct xb *xb, unsigned long bit)
retur

[Qemu-devel] [PATCH v18 09/10] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2017-11-29 Thread Wei Wang

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free pages by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host may also requests
the guest to stop the reporting in advance by sending the stop cmd id to
the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 202 +---
 include/uapi/linux/virtio_balloon.h |   4 +
 2 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2c21c5a..035bd3a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -55,7 +55,12 @@ static struct vfsmount *balloon_mnt;
 
 struct virtio_balloon {
struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
 
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -65,6 +70,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
 
+   /* Start to report free pages */
+   bool report_free_page;
+   /* Stores the cmd id given by host to start the free page reporting */
+   uint32_t start_cmd_id;
+   /* Stores STOP_ID as a sign to tell host that the reporting is done */
+   uint32_t stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
 
@@ -159,7 +171,8 @@ static void send_one_desc(struct virtio_balloon *vb,
  uint64_t addr,
  uint32_t len,
  bool inbuf,
- bool batch)
+ bool batch,
+ bool wait)
 {
int err;
unsigned int size;
@@ -178,8 +191,12 @@ static void send_one_desc(struct virtio_balloon *vb,
BUG_ON(err);
 
/* If batching is requested, we batch till the vq is full */
-   if (!batch || !vq->num_free)
-   kick_and_wait(vq, vb->acked);
+   if (!batch || !vq->num_free) {
+   if (wait)
+   kick_and_wait(vq, vb->acked);
+   else
+   virtqueue_kick(vq);
+   }
 }
 
 /*
@@ -212,11 +229,11 @@ static void tell_host_sgs(struct virtio_balloon *vb,
addr = pfn_start << PAGE_SHIFT;
len = (pfn_end - pfn_start) << PAGE_SHIFT;
while (len > max_len) {
-   send_one_desc(vb, vq, addr, max_len, true, true);
+   send_one_desc(vb, vq, addr, max_len, true, true, true);
addr += max_len;
len -= max_len;
}
-   send_one_desc(vb, vq, addr, len, true, true);
+   send_one_desc(vb, vq, addr, len, true, true, true);
pfn_start = pfn_end + 1;
}
 
@@ -401,7 +418,7 @@ static unsigned int leak_balloon_sg_oom(struct 
virtio_balloon *vb)
list_add(&page->lru, &pages);
vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
send_one_desc(vb, vq, virt_to_phys(page_address(page)),
- PAGE_SIZE, true, true);
+ PAGE_SIZE, true, true, true);
release_pages_balloon(vb, &pages);
}
 
@@ -491,17 +508,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
 }
 
-static void virtballoon_changed(struct virtio_device *vdev)
-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(&vb->stop_update_lock, flags);
-   if (!vb->stop_update)
-   queue_work(system_freezable_wq, &vb->update_balloon_size_work);
-   spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
 static inline s64 towards_target(struct virtio_balloon *vb)
 {
s64 target;
@@ -518,6 +524,36 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
 }
 
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+   struct virtio_balloon *vb = vdev->priv;
+

[Qemu-devel] [PATCH v18 10/10] virtio-balloon: don't report free pages when page poisoning is enabled

2017-11-29 Thread Wei Wang

The guest free pages should not be discarded by the live migration thread
when page poisoning is enabled with PAGE_POISONING_NO_SANITY=n, because
skipping the transfer of such poisoned free pages will trigger false
positive when new pages are allocated and checked on the destination.
This patch skips the reporting of free pages in the above case.

Reported-by: Michael S. Tsirkin 
Signed-off-by: Wei Wang 
Cc: Michal Hocko 
---
 drivers/virtio/virtio_balloon.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 035bd3a..6ac4cff 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -652,7 +652,9 @@ static void report_free_page(struct work_struct *work)
/* Start by sending the obtained cmd id to the host with an outbuf */
send_one_desc(vb, vb->free_page_vq, virt_to_phys(&vb->start_cmd_id),
  sizeof(uint32_t), false, true, false);
-   walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
+   if (!(page_poisoning_enabled() &&
+   !IS_ENABLED(CONFIG_PAGE_POISONING_NO_SANITY)))
+   walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
/*
 * End by sending the stop id to the host with an outbuf. Use the
 * non-batching mode here to trigger a kick after adding the stop id.
-- 
2.7.4

Re: [Qemu-devel] [PATCH v17 2/6] radix tree test suite: add tests for xbitmap

2017-11-29 Thread Wei Wang


On 11/07/2017 01:00 AM, Matthew Wilcox wrote:

On Fri, Nov 03, 2017 at 04:13:02PM +0800, Wei Wang wrote:

From: Matthew Wilcox 

Add the following tests for xbitmap:
1) single bit test: single bit set/clear/find;
2) bit range test: set/clear a range of bits and find a 0 or 1 bit in
the range.

Signed-off-by: Wei Wang 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
Cc: Michael S. Tsirkin 
---
  tools/include/linux/bitmap.h|  34 
  tools/include/linux/kernel.h|   2 +
  tools/testing/radix-tree/Makefile   |   7 +-
  tools/testing/radix-tree/linux/kernel.h |   2 -
  tools/testing/radix-tree/main.c |   5 +
  tools/testing/radix-tree/test.h |   1 +
  tools/testing/radix-tree/xbitmap.c  | 278 

Umm.  No.  You've duplicated xbitmap.c into the test framework, so now it can
slowly get out of sync with the one in lib/.  That's not OK.

Put it back the way it was, with the patch I gave you as patch 1/n
(relocating xbitmap.c from tools/testing/radix-tree to lib/).
Then add your enhancements as patch 2/n.  All you should need to
change in your 1/n from
http://git.infradead.org/users/willy/linux-dax.git/commit/727e401bee5ad7d37e0077291d90cc17475c6392
is a bit of Makefile tooling.  Leave the test suite embedded in the file;
that way people might remember to update the test suite when adding
new functionality.



Thanks for you suggestions. Please have a check the v18 patches:
The original implementation is put in patch 4, and the proposed changes 
are separated into patch 5 (we can merge them to patch 4 later if they 
look good to you), and the new APIs are in patch 6.


Best,
Wei

Re: [Qemu-devel] [PATCH v2 0/5] target/m68k: implement 680x0 FPU (part 3)

2017-11-29 Thread Laurent Vivier

Le 29/11/2017 à 15:08, Thomas Huth a écrit :
> On 29.11.2017 14:59, Peter Maydell wrote:
>> On 29 November 2017 at 13:42, Laurent Vivier  wrote:
>>> these functions come from:
>>>
>>> http://previous.alternative-system.com/
>>>
>>> http://svn.code.sf.net/p/previous/code/trunk/src/softfloat/softfloat.c
>>>
>>> This is also a modified softfloat, release 2b
>>> which is BSD license if I'm correct.
>>
>> We can't use softfloat2b code (the part of the license that goes
>> "USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS
>> AND ORGANIZATIONS [...] WHO FURTHERMORE EFFECTIVELY INDEMNIFY JOHN
>> HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE INSTITUTE" isn't
>> GPL compatible).
>>
>> We can use softfloat2a code, which doesn't have that indemnity clause.
> 
> Sigh. That's why WinUAE and Hatari immediately switched back to
> softfloat2a (derived from QEMU) after we've identified the problem
> there. Looks like the Previous folks forgot to do that step, too :-(
> 
>>> This code has also been copied to WinUAE (GPL), where softfloat has been
>>> copied from QEMU:
>>> https://github.com/tonioni/WinUAE/blob/master/softfloat/softfloat.cpp
>>
>> Yes, lots of projects used the softfloat2b code without realising
>> it wasn't GPL compatible (including QEMU -- we had a painful job
>> to fix things up and convert to the 2a codebase a while back).
>>
>>> But I think the bad news comes later:
>>>
>>> all the other functions (sin, cos, tan, log, exp, ...) found in previous
>>> (softfloat_fpsp.c) are "derived" from NeXT library FPSP:
>>>
>>> http://svn.code.sf.net/p/previous/code/trunk/src/softfloat/softfloat_fpsp.c
>>>
>>> /*
>>>
>>>  This C source file is an extension to the SoftFloat IEC/IEEE Floating-point
>>>  Arithmetic Package, Release 2a.
>>>
>>>  Written by Andreas Grabher for Previous, NeXT Computer Emulator.
>>>
>>> =*/
>>> ...
>>> /*
>>> | Algorithms for transcendental functions supported by MC68881 and MC68882
>>> | mathematical coprocessors. The functions are derived from FPSP library.
>>> **/
>>>
>>> FPSP library can be found:
>>>
>>> https://ftp.nice.ch/pub/next/developer/hardware/m68k/
>>>
>>> And the assembly source code is not free at all:
>>>
>>> https://ftp.nice.ch/pub/next/developer/hardware/m68k/_libFPSP.1.p2.N.s/l_fpsp.h
>>>
>>>
>>> |   Copyright (C) Motorola, Inc. 1991
>>> |   All Rights Reserved
>>> |
>>> |   THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF MOTOROLA
>>> |   The copyright notice above does not evidence any
>>> |   actual or intended publication of such source code.
>>>
>>>
>>> So I'm wondering what license apply to the C version found in "Previous".
>>
>> Good question. It's clearly not copied code (since the FPSP library is
>> all native m68k assembly), but presumably it's the same algorithms
>> transliterated into C...
> 
> There also seem to be other versions of that library available, e.g.:
> 
> https://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/arch/m68k/fpsp/l_fpsp.h
> 
> Maybe Andreas (now on CC: ) could clarify which version he used / how
> the C sources were developed?

Thank you Thomas,

This seems to be the same code re-licensed to:

MOTOROLA MICROPROCESSOR & MEMORY TECHNOLOGY GROUP
M68000 Hi-Performance Microprocessor Division
M68040 Software Package

M68040 Software Package Copyright (c) 1993, 1994 Motorola Inc.
All rights reserved.

THE SOFTWARE is provided on an "AS IS" basis and without warranty.
To the maximum extent permitted by applicable law,
MOTOROLA DISCLAIMS ALL WARRANTIES WHETHER EXPRESS OR IMPLIED,
INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE and any warranty against infringement with
regard to the SOFTWARE (INCLUDING ANY MODIFIED VERSIONS THEREOF)
and any accompanying written materials.

To the maximum extent permitted by applicable law,
IN NO EVENT SHALL MOTOROLA BE LIABLE FOR ANY DAMAGES WHATSOEVER
(INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR
OTHER PECUNIARY LOSS) ARISING OF THE USE OR INABILITY TO USE THE
SOFTWARE.  Motorola assumes no responsibility for the maintenance
and support of the SOFTWARE.

You are hereby granted a copyright license to use, modify, and
distribute the SOFTWARE so long as this entire notice is retained
without alteration in any modified and/or redistributed versions,
and that such modified versions are clearly identified as such.
No licenses are granted by implication, estoppel or otherwise
under any patents or trademarks of Motorola, Inc.

Laurent

Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation

2017-11-29 Thread Kevin Wolf

Am 29.11.2017 um 14:56 hat Jeff Cody geschrieben:
> On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> > This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> > coroutine double entry or entry-after-completion", 2017-11-21)
> > 
> > This fixed the symptom of a bug rather than the root cause. Canceling the
> > wait on a sleeping blockjob coroutine is generally fine, we just need to
> > make it work correctly across AioContexts.  To do so, use a QEMUTimer
> > that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> > synchronizes correctly with block_job_sleep_ns.
> > 
> > Signed-off-by: Paolo Bonzini 

> My only concerns were regarding comments, and Kevin can fix them when
> applying if he wants.

I'm squashing in the following changes.

Kevin


diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 956f0d6819..00403d9482 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -135,7 +135,10 @@ typedef struct BlockJob {
  */
 int ret;
 
-/** Timer that is used by @block_job_sleep_ns.  */
+/**
+ * Timer that is used by @block_job_sleep_ns. Accessed under
+ * block_job_mutex (in blockjob.c).
+ */
 QEMUTimer sleep_timer;
 
 /** Non-NULL if this job is part of a transaction */
diff --git a/blockjob.c b/blockjob.c
index 83f5c891be..dd8c685d8a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -37,9 +37,9 @@
 #include "qemu/timer.h"
 #include "qapi-event.h"
 
-/* Right now, this mutex is only needed to synchronize accesses to job->busy,
- * such as concurrent calls to block_job_do_yield and block_job_enter.
- */
+/* Right now, this mutex is only needed to synchronize accesses to job->busy
+ * and and job->sleep_timer, such as concurrent calls to block_job_do_yield and
+ * block_job_enter. */
 static QemuMutex block_job_mutex;
 
 static void block_job_lock(void)
@@ -760,6 +760,12 @@ static bool block_job_should_pause(BlockJob *job)
 return job->pause_count > 0;
 }
 
+/* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
+ * Reentering the job coroutine with block_job_enter() before the timer has
+ * expired is allowed and cancels the timer.
+ *
+ * If @ns is (uint64_t) -1, no timer is scheduled and block_job_enter() must be
+ * called explicitly. */
 static void block_job_do_yield(BlockJob *job, uint64_t ns)
 {
 block_job_lock();

Re: [Qemu-devel] [PATCH 4/4] blockjob: reimplement block_job_sleep_ns to allow cancellation

2017-11-29 Thread Jeff Cody

On Wed, Nov 29, 2017 at 03:21:37PM +0100, Kevin Wolf wrote:
> Am 29.11.2017 um 14:56 hat Jeff Cody geschrieben:
> > On Wed, Nov 29, 2017 at 11:25:13AM +0100, Paolo Bonzini wrote:
> > > This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
> > > coroutine double entry or entry-after-completion", 2017-11-21)
> > > 
> > > This fixed the symptom of a bug rather than the root cause. Canceling the
> > > wait on a sleeping blockjob coroutine is generally fine, we just need to
> > > make it work correctly across AioContexts.  To do so, use a QEMUTimer
> > > that calls block_job_enter.  Use a mutex to ensure that block_job_enter
> > > synchronizes correctly with block_job_sleep_ns.
> > > 
> > > Signed-off-by: Paolo Bonzini 
> 
> > My only concerns were regarding comments, and Kevin can fix them when
> > applying if he wants.
> 
> I'm squashing in the following changes.
> 
> Kevin
> 
> 
> diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> index 956f0d6819..00403d9482 100644
> --- a/include/block/blockjob.h
> +++ b/include/block/blockjob.h
> @@ -135,7 +135,10 @@ typedef struct BlockJob {
>   */
>  int ret;
>  
> -/** Timer that is used by @block_job_sleep_ns.  */
> +/**
> + * Timer that is used by @block_job_sleep_ns. Accessed under
> + * block_job_mutex (in blockjob.c).
> + */
>  QEMUTimer sleep_timer;
>  
>  /** Non-NULL if this job is part of a transaction */
> diff --git a/blockjob.c b/blockjob.c
> index 83f5c891be..dd8c685d8a 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -37,9 +37,9 @@
>  #include "qemu/timer.h"
>  #include "qapi-event.h"
>  
> -/* Right now, this mutex is only needed to synchronize accesses to job->busy,
> - * such as concurrent calls to block_job_do_yield and block_job_enter.
> - */
> +/* Right now, this mutex is only needed to synchronize accesses to job->busy
> + * and and job->sleep_timer, such as concurrent calls to block_job_do_yield 
> and

s/and and/and

> + * block_job_enter. */
>  static QemuMutex block_job_mutex;
>  
>  static void block_job_lock(void)
> @@ -760,6 +760,12 @@ static bool block_job_should_pause(BlockJob *job)
>  return job->pause_count > 0;
>  }
>  
> +/* Yield, and schedule a timer to reenter the coroutine after @ns 
> nanoseconds.
> + * Reentering the job coroutine with block_job_enter() before the timer has
> + * expired is allowed and cancels the timer.
> + *
> + * If @ns is (uint64_t) -1, no timer is scheduled and block_job_enter() must 
> be
> + * called explicitly. */
>  static void block_job_do_yield(BlockJob *job, uint64_t ns)
>  {
>  block_job_lock();

Re: [Qemu-devel] [PATCH v6 0/4] vITS Reset

2017-11-29 Thread Peter Maydell

On 28 November 2017 at 15:41, Eric Auger  wrote:
> At the moment the ITS is not properly reset. On System reset or
> reboot, previous ITS register values and caches are left
> unchanged. Some of the registers might point to some guest RAM
> tables which are not valid anymore. This leads to state
> inconsistencies that are detected by the kernel save/restore
> code. And eventually this may cause qemu abort.
>
> This series adds vITS reset modality:
> - the 2 first patches bring a minimalist reset through individual
>   register writes. However, with kernel versions < 4.15, this reset
>   is not complete (vITS caches are not voided).
> - With kernel versions >= 4.15 we can rely on a new ITS KVM device
>   reset IOTCL. The last 2 patches introduce the full reset.
>
> Tested with 4.11, 4.14 and 4.15 host kernels.

Applied v6 to target-arm.next for 2.12; thanks.

-- PMM

Re: [Qemu-devel] [for-2.12 4/7] pci: Simplify pci_bus_is_root()

2017-11-29 Thread Marcel Apfelbaum


On 29/11/2017 15:12, David Gibson wrote:

On Wed, Nov 29, 2017 at 12:45:28PM +0200, Marcel Apfelbaum wrote:

On 29/11/2017 10:46, David Gibson wrote:

pci_bus_is_root() currently relies on a method in the PCIBusClass.
But it's always known if a PCI bus is a root bus when we create it, so
using a dynamic method is overkill.

This replaces it with an IS_ROOT bit in a new flags field, which is set on
root buses and otherwise clear.  As a bonus this removes the special
is_root logic from pci_expander_bridge, since it already creates its bus
as a root bus.



I agree it makes the code simpler, I am not sure about
the enum. Why not make it a boolean field and when will need
more flags just convert it to an enum (or not)?


Well, I have some patches I'm working on (rather slowly) which add a
second flag.



OK

Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel



Thanks,
Marcel


Signed-off-by: David Gibson 
---
   hw/pci-bridge/pci_expander_bridge.c |  6 --
   hw/pci/pci.c| 14 ++
   include/hw/pci/pci.h| 12 +++-
   3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/hw/pci-bridge/pci_expander_bridge.c 
b/hw/pci-bridge/pci_expander_bridge.c
index 5652cf06e9..11dfa9258e 100644
--- a/hw/pci-bridge/pci_expander_bridge.c
+++ b/hw/pci-bridge/pci_expander_bridge.c
@@ -65,11 +65,6 @@ static int pxb_bus_num(PCIBus *bus)
   return pxb->bus_nr;
   }
-static bool pxb_is_root(PCIBus *bus)
-{
-return true; /* by definition */
-}
-
   static uint16_t pxb_bus_numa_node(PCIBus *bus)
   {
   PXBDev *pxb = convert_to_pxb(bus->parent_dev);
@@ -82,7 +77,6 @@ static void pxb_bus_class_init(ObjectClass *class, void *data)
   PCIBusClass *pbc = PCI_BUS_CLASS(class);
   pbc->bus_num = pxb_bus_num;
-pbc->is_root = pxb_is_root;
   pbc->numa_node = pxb_bus_numa_node;
   }
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 6e11dc2fec..5fab7f23b3 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -126,14 +126,9 @@ static void pci_bus_unrealize(BusState *qbus, Error **errp)
   vmstate_unregister(NULL, &vmstate_pcibus, bus);
   }
-static bool pcibus_is_root(PCIBus *bus)
-{
-return !bus->parent_dev;
-}
-
   static int pcibus_num(PCIBus *bus)
   {
-if (pcibus_is_root(bus)) {
+if (pci_bus_is_root(bus)) {
   return 0; /* pci host bridge */
   }
   return bus->parent_dev->config[PCI_SECONDARY_BUS];
@@ -156,7 +151,6 @@ static void pci_bus_class_init(ObjectClass *klass, void 
*data)
   k->unrealize = pci_bus_unrealize;
   k->reset = pcibus_reset;
-pbc->is_root = pcibus_is_root;
   pbc->bus_num = pcibus_num;
   pbc->numa_node = pcibus_numa_node;
   }
@@ -385,6 +379,7 @@ static void pci_root_bus_init(PCIBus *bus, DeviceState 
*parent,
   bus->slot_reserved_mask = 0x0;
   bus->address_space_mem = address_space_mem;
   bus->address_space_io = address_space_io;
+bus->flags |= PCI_BUS_IS_ROOT;
   /* host bridge */
   QLIST_INIT(&bus->child);
@@ -397,11 +392,6 @@ bool pci_bus_is_express(PCIBus *bus)
   return object_dynamic_cast(OBJECT(bus), TYPE_PCIE_BUS);
   }
-bool pci_bus_is_root(PCIBus *bus)
-{
-return PCI_BUS_GET_CLASS(bus)->is_root(bus);
-}
-
   void pci_root_bus_new_inplace(PCIBus *bus, size_t bus_size, DeviceState 
*parent,
 const char *name,
 MemoryRegion *address_space_mem,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 77d92a3dc4..cbb3386207 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -404,6 +404,11 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, 
int);
   #define PCI_BUS_GET_CLASS(obj) OBJECT_GET_CLASS(PCIBusClass, (obj), 
TYPE_PCI_BUS)
   #define TYPE_PCIE_BUS "PCIE"
+enum PCIBusFlags {
+/* This bus is the root of a PCI domain */
+PCI_BUS_IS_ROOT = 0x0001,
+};
+
   typedef struct PCIBusClass {
   /*< private >*/
   BusClass parent_class;
@@ -416,6 +421,7 @@ typedef struct PCIBusClass {
   struct PCIBus {
   BusState qbus;
+enum PCIBusFlags flags;
   PCIIOMMUFunc iommu_fn;
   void *iommu_opaque;
   uint8_t devfn_min;
@@ -440,8 +446,12 @@ struct PCIBus {
   Notifier machine_done;
   };
+static inline bool pci_bus_is_root(PCIBus *bus)
+{
+return !!(bus->flags & PCI_BUS_IS_ROOT);
+}
+
   bool pci_bus_is_express(PCIBus *bus);
-bool pci_bus_is_root(PCIBus *bus);
   void pci_root_bus_new_inplace(PCIBus *bus, size_t bus_size, DeviceState 
*parent,
 const char *name,
 MemoryRegion *address_space_mem,

Re: [Qemu-devel] [for-2.12 6/7] pci: Eliminate redundant PCIDevice::bus pointer

2017-11-29 Thread Marcel Apfelbaum


On 29/11/2017 13:41, Eduardo Habkost wrote:

On Wed, Nov 29, 2017 at 12:54:04PM +0200, Marcel Apfelbaum wrote:

On 29/11/2017 10:46, David Gibson wrote:

The bus pointer in PCIDevice is basically redundant with QOM information.
It's always initialized to the qdev_get_parent_bus(), the only difference
is the type.

Therefore this patch eliminates the field, instead creating a pci_get_bus()
helper to do the type mangling to derive it conveniently from the QOM
Device object underneath.




Hi David,

I personally don't see why the caching of the bus bothers you.
It makes the code a little easier to read, but indeed is a duplication
of data; let's see what Michael has to say, is a matter of
taste I suppose.


I'm all for removing duplication of data because it makes the
code less fragile.  We don't even take care of clearing
pci_dev->bus when the device is removed from the bus, so there's
a window between unplugging a device and actually freeing the
object where pci_dev->bus can become a dangling pointer.



This is a good point.

Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel


Reviewed-by: Eduardo Habkost

Re: [Qemu-devel] [qemu-s390x] [PATCH v1 for-2.12 00/15] s390x/tcg: cleanup and fix program interrupts

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 15:06:33 +0100
David Hildenbrand  wrote:

> On 29.11.2017 14:51, Thomas Huth wrote:
> > On 28.11.2017 21:33, David Hildenbrand wrote:  
> >> I have quite some patches on my queue for 2.12. (booting Fedora 26/27
> >> guests, floating interrupts, machine checks, missing instructions ...)
> >>
> >> So let's start slowly :) This series gets rid of program_interrupt() and
> >> potential_page_fault(). We now always properly restore the cpu state when
> >> injecting/delivering a program interrupt. So there is no need to update
> >> the state via potential_page_fault() anymore.  
> > 
> > I think this series is basically a very good idea! But...
> > OK, this is kind of bike-shed-painting now, but since we're currently in
> > hard freeze anyway and got plenty of time for discussion:
> > Something that bothers me a little bit is the name of the new function
> > "program_interrupt_ra()" ... that would IMHO be OK if the old function
> > "program_interrupt" would still stay, but since that is removed and the
> > _ra function is the only generic way that is left to inject a program
> > interrupt, could we maybe name the new function somewhat nicer right
> > from the start? Something like "s390_program_interrupt" maybe (which is
> > similar to tcg_s390_program_interrupt and kvm_s390_program_interrupt
> > that we have already)?  
> 
> Sure I can do that, other opinions?

Fine with me as well.

[Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end

2017-11-29 Thread Fam Zheng

While we look at the fixes for 2.11, I briefly prototyped this series to see if
it makes sense as a simplification of the drain API for 2.12.

The idea is to let AioContext manage quiesce callbacks, then the block layer
only needs to do the in-flight request waiting. This lets us get rid of the
callback recursion (both up and down).

Many commit logs and code comments are definitely missing, but it would be good
to get high level review and maybe some testing already.

Fam

Fam Zheng (9):
  block: Remove unused bdrv_requests_pending
  aio: Add drain begin/end API to AioContext
  blockjob: Implement AioContext drain ops
  throttle: Implement AioContext drain ops
  qed: Implement AioContext drain ops
  block: Use aio_context_drained_begin in bdrv_set_aio_context
  block: Switch to use AIO drained begin/end API
  block: Drop old drained_{begin,end} callbacks
  blockjob: Drop unused functions

 block.c|  30 +
 block/block-backend.c  |  22 ---
 block/io.c | 134 +++--
 block/qed.c|  34 ---
 block/throttle.c   |  34 ---
 blockjob.c |  67 -
 include/block/aio.h|  27 -
 include/block/block.h  |  16 -
 include/block/block_int.h  |  12 
 include/block/blockjob_int.h   |  14 -
 include/sysemu/block-backend.h |   8 ---
 util/async.c   |  73 ++
 12 files changed, 186 insertions(+), 285 deletions(-)

-- 
2.14.3

[Qemu-devel] [PATCH RFC 2/9] aio: Add drain begin/end API to AioContext

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 include/block/aio.h | 27 +---
 util/async.c| 73 +
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index e9aeeaec94..40c2f64544 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -47,6 +47,15 @@ typedef void QEMUBHFunc(void *opaque);
 typedef bool AioPollFn(void *opaque);
 typedef void IOHandler(void *opaque);
 
+typedef void AioDrainFn(void *opaque);
+typedef struct AioDrainOps {
+AioDrainFn *drained_begin;
+AioDrainFn *drained_end;
+void *opaque;
+bool is_new;
+QTAILQ_ENTRY(AioDrainOps) next;
+} AioDrainOps;
+
 struct Coroutine;
 struct ThreadPool;
 struct LinuxAioState;
@@ -147,6 +156,9 @@ struct AioContext {
 int epollfd;
 bool epoll_enabled;
 bool epoll_available;
+
+QTAILQ_HEAD(, AioDrainOps) drain_ops;
+bool drain_ops_updated;
 };
 
 /**
@@ -441,9 +453,9 @@ int64_t aio_compute_timeout(AioContext *ctx);
  *
  * Disable the further processing of external clients.
  */
-static inline void aio_disable_external(AioContext *ctx)
+static inline bool aio_disable_external(AioContext *ctx)
 {
-atomic_inc(&ctx->external_disable_cnt);
+return atomic_fetch_inc(&ctx->external_disable_cnt) == 0;
 }
 
 /**
@@ -452,7 +464,7 @@ static inline void aio_disable_external(AioContext *ctx)
  *
  * Enable the processing of external clients.
  */
-static inline void aio_enable_external(AioContext *ctx)
+static inline bool aio_enable_external(AioContext *ctx)
 {
 int old;
 
@@ -462,6 +474,7 @@ static inline void aio_enable_external(AioContext *ctx)
 /* Kick event loop so it re-arms file descriptors */
 aio_notify(ctx);
 }
+return old == 1;
 }
 
 /**
@@ -564,4 +577,12 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
  int64_t grow, int64_t shrink,
  Error **errp);
 
+void aio_context_drained_begin(AioContext *ctx);
+void aio_context_drained_end(AioContext *ctx);
+
+void aio_context_add_drain_ops(AioContext *ctx,
+   AioDrainFn *begin, AioDrainFn *end, void 
*opaque);
+void aio_context_del_drain_ops(AioContext *ctx,
+   AioDrainFn *begin, AioDrainFn *end, void 
*opaque);
+
 #endif
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e..cca0efd263 100644
--- a/util/async.c
+++ b/util/async.c
@@ -402,6 +402,7 @@ AioContext *aio_context_new(Error **errp)
 AioContext *ctx;
 
 ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
+QTAILQ_INIT(&ctx->drain_ops);
 aio_context_setup(ctx);
 
 ret = event_notifier_init(&ctx->notifier, false);
@@ -506,3 +507,75 @@ void aio_context_release(AioContext *ctx)
 {
 qemu_rec_mutex_unlock(&ctx->lock);
 }
+
+/* Called with ctx->lock */
+void aio_context_drained_begin(AioContext *ctx)
+{
+AioDrainOps *ops;
+
+/* TODO: When all external fds are handled in the following drain_ops
+ * callbacks, aio_disable_external can be dropped. */
+aio_disable_external(ctx);
+restart:
+ctx->drain_ops_updated = false;
+QTAILQ_FOREACH(ops, &ctx->drain_ops, next) {
+ops->drained_begin(ops->opaque);
+if (ctx->drain_ops_updated) {
+goto restart;
+}
+}
+}
+
+/* Called with ctx->lock */
+void aio_context_drained_end(AioContext *ctx)
+{
+AioDrainOps *ops;
+
+restart:
+ctx->drain_ops_updated = false;
+QTAILQ_FOREACH(ops, &ctx->drain_ops, next) {
+if (ops->is_new) {
+continue;
+}
+ops->drained_end(ops->opaque);
+if (ctx->drain_ops_updated) {
+goto restart;
+}
+}
+if (aio_enable_external(ctx)) {
+QTAILQ_FOREACH(ops, &ctx->drain_ops, next) {
+ops->is_new = false;
+}
+}
+}
+
+/* Called with ctx->lock */
+void aio_context_add_drain_ops(AioContext *ctx,
+   AioDrainFn *begin, AioDrainFn *end, void 
*opaque)
+{
+AioDrainOps *ops = g_new0(AioDrainOps, 1);
+ops->drained_begin = begin;
+ops->drained_end = end;
+ops->opaque = opaque;
+ops->is_new = true;
+QTAILQ_INSERT_TAIL(&ctx->drain_ops, ops, next);
+ctx->drain_ops_updated = true;
+}
+
+/* Called with ctx->lock */
+void aio_context_del_drain_ops(AioContext *ctx,
+   AioDrainFn *begin, AioDrainFn *end, void 
*opaque)
+{
+AioDrainOps *ops;
+
+QTAILQ_FOREACH(ops, &ctx->drain_ops, next) {
+if (ops->drained_begin == begin &&
+ops->drained_end == end &&
+ops->opaque == opaque) {
+QTAILQ_REMOVE(&ctx->drain_ops, ops, next);
+ctx->drain_ops_updated = true;
+g_free(ops);
+return;
+}
+}
+}
-- 
2.14.3

[Qemu-devel] [PATCH RFC 1/9] block: Remove unused bdrv_requests_pending

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/io.c| 18 --
 include/block/block_int.h |  1 -
 2 files changed, 19 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4fdf93a014..7f07972489 100644
--- a/block/io.c
+++ b/block/io.c
@@ -134,24 +134,6 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
 assert(old >= 1);
 }
 
-/* Check if any requests are in-flight (including throttled requests) */
-bool bdrv_requests_pending(BlockDriverState *bs)
-{
-BdrvChild *child;
-
-if (atomic_read(&bs->in_flight)) {
-return true;
-}
-
-QLIST_FOREACH(child, &bs->children, next) {
-if (bdrv_requests_pending(child->bs)) {
-return true;
-}
-}
-
-return false;
-}
-
 typedef struct {
 Coroutine *co;
 BlockDriverState *bs;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a5482775ec..e107163594 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1045,7 +1045,6 @@ bool blk_dev_is_tray_open(BlockBackend *blk);
 bool blk_dev_is_medium_locked(BlockBackend *blk);
 
 void bdrv_set_dirty(BlockDriverState *bs, int64_t offset, int64_t bytes);
-bool bdrv_requests_pending(BlockDriverState *bs);
 
 void bdrv_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap **out);
 void bdrv_undo_clear_dirty_bitmap(BdrvDirtyBitmap *bitmap, HBitmap *in);
-- 
2.14.3

[Qemu-devel] [PATCH RFC 4/9] throttle: Implement AioContext drain ops

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/throttle.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/block/throttle.c b/block/throttle.c
index 833175ac77..35b740e3de 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -70,6 +70,25 @@ fin:
 return ret;
 }
 
+static void throttle_drained_begin(void *opaque)
+{
+BlockDriverState *bs = opaque;
+ThrottleGroupMember *tgm = bs->opaque;
+
+if (atomic_fetch_inc(&tgm->io_limits_disabled) == 0) {
+throttle_group_restart_tgm(tgm);
+}
+}
+
+static void throttle_drained_end(void *opaque)
+{
+BlockDriverState *bs = opaque;
+ThrottleGroupMember *tgm = bs->opaque;
+
+assert(tgm->io_limits_disabled);
+atomic_dec(&tgm->io_limits_disabled);
+}
+
 static int throttle_open(BlockDriverState *bs, QDict *options,
  int flags, Error **errp)
 {
@@ -146,6 +165,9 @@ static int throttle_co_flush(BlockDriverState *bs)
 static void throttle_detach_aio_context(BlockDriverState *bs)
 {
 ThrottleGroupMember *tgm = bs->opaque;
+aio_context_del_drain_ops(bdrv_get_aio_context(bs),
+  throttle_drained_begin, throttle_drained_end,
+  bs);
 throttle_group_detach_aio_context(tgm);
 }
 
@@ -153,6 +175,9 @@ static void throttle_attach_aio_context(BlockDriverState 
*bs,
 AioContext *new_context)
 {
 ThrottleGroupMember *tgm = bs->opaque;
+aio_context_add_drain_ops(new_context,
+  throttle_drained_begin, throttle_drained_end,
+  bs);
 throttle_group_attach_aio_context(tgm, new_context);
 }
 
@@ -199,17 +224,12 @@ static bool 
throttle_recurse_is_first_non_filter(BlockDriverState *bs,
 
 static void coroutine_fn throttle_co_drain_begin(BlockDriverState *bs)
 {
-ThrottleGroupMember *tgm = bs->opaque;
-if (atomic_fetch_inc(&tgm->io_limits_disabled) == 0) {
-throttle_group_restart_tgm(tgm);
-}
+throttle_drained_begin(bs);
 }
 
 static void coroutine_fn throttle_co_drain_end(BlockDriverState *bs)
 {
-ThrottleGroupMember *tgm = bs->opaque;
-assert(tgm->io_limits_disabled);
-atomic_dec(&tgm->io_limits_disabled);
+throttle_drained_end(bs);
 }
 
 static BlockDriver bdrv_throttle = {
-- 
2.14.3

[Qemu-devel] [PATCH RFC 3/9] blockjob: Implement AioContext drain ops

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 blockjob.c | 47 ++-
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index ff9a614531..86d060c89c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -148,6 +148,23 @@ static void block_job_attached_aio_context(AioContext 
*new_context,
void *opaque);
 static void block_job_detach_aio_context(void *opaque);
 
+static void block_job_drained_begin(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+BlockJob *job = opaque;
+block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+.drained_begin = block_job_drained_begin,
+.drained_end = block_job_drained_end,
+};
+
 void block_job_unref(BlockJob *job)
 {
 if (--job->refcnt == 0) {
@@ -157,6 +174,10 @@ void block_job_unref(BlockJob *job)
 blk_remove_aio_context_notifier(job->blk,
 block_job_attached_aio_context,
 block_job_detach_aio_context, job);
+aio_context_del_drain_ops(blk_get_aio_context(job->blk),
+  block_job_drained_begin,
+  block_job_drained_end,
+  job);
 blk_unref(job->blk);
 error_free(job->blocker);
 g_free(job->id);
@@ -170,6 +191,9 @@ static void block_job_attached_aio_context(AioContext 
*new_context,
 {
 BlockJob *job = opaque;
 
+aio_context_add_drain_ops(blk_get_aio_context(job->blk),
+  block_job_drained_begin, block_job_drained_end,
+  job);
 if (job->driver->attached_aio_context) {
 job->driver->attached_aio_context(job, new_context);
 }
@@ -192,6 +216,9 @@ static void block_job_detach_aio_context(void *opaque)
 {
 BlockJob *job = opaque;
 
+aio_context_del_drain_ops(blk_get_aio_context(job->blk),
+  block_job_drained_begin, block_job_drained_end,
+  job);
 /* In case the job terminates during aio_poll()... */
 block_job_ref(job);
 
@@ -217,23 +244,6 @@ static const BdrvChildRole child_job = {
 .stay_at_node   = true,
 };
 
-static void block_job_drained_begin(void *opaque)
-{
-BlockJob *job = opaque;
-block_job_pause(job);
-}
-
-static void block_job_drained_end(void *opaque)
-{
-BlockJob *job = opaque;
-block_job_resume(job);
-}
-
-static const BlockDevOps block_job_dev_ops = {
-.drained_begin = block_job_drained_begin,
-.drained_end = block_job_drained_end,
-};
-
 void block_job_remove_all_bdrv(BlockJob *job)
 {
 GSList *l;
@@ -671,6 +681,9 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 bs->job = job;
 
 blk_set_dev_ops(blk, &block_job_dev_ops, job);
+aio_context_add_drain_ops(blk_get_aio_context(blk),
+  block_job_drained_begin, block_job_drained_end,
+  job);
 bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
 QLIST_INSERT_HEAD(&block_jobs, job, job_list);
-- 
2.14.3

[Qemu-devel] [PATCH RFC 9/9] blockjob: Drop unused functions

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 blockjob.c   | 24 
 include/block/blockjob_int.h | 14 --
 2 files changed, 38 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 809111bf24..bfeb7c4ace 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -699,18 +699,6 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 return job;
 }
 
-void block_job_pause_all(void)
-{
-BlockJob *job = NULL;
-while ((job = block_job_next(job))) {
-AioContext *aio_context = blk_get_aio_context(job->blk);
-
-aio_context_acquire(aio_context);
-block_job_pause(job);
-aio_context_release(aio_context);
-}
-}
-
 void block_job_early_fail(BlockJob *job)
 {
 block_job_unref(job);
@@ -764,18 +752,6 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 }
 }
 
-void block_job_resume_all(void)
-{
-BlockJob *job = NULL;
-while ((job = block_job_next(job))) {
-AioContext *aio_context = blk_get_aio_context(job->blk);
-
-aio_context_acquire(aio_context);
-block_job_resume(job);
-aio_context_release(aio_context);
-}
-}
-
 void block_job_enter(BlockJob *job)
 {
 if (!block_job_started(job)) {
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 43f3be2965..4b43367608 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -156,20 +156,6 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, 
int64_t ns);
  */
 void block_job_yield(BlockJob *job);
 
-/**
- * block_job_pause_all:
- *
- * Asynchronously pause all jobs.
- */
-void block_job_pause_all(void);
-
-/**
- * block_job_resume_all:
- *
- * Resume all block jobs.  Must be paired with a preceding block_job_pause_all.
- */
-void block_job_resume_all(void);
-
 /**
  * block_job_early_fail:
  * @bs: The block device.
-- 
2.14.3

[Qemu-devel] [PATCH RFC 6/9] block: Use aio_context_drained_begin in bdrv_set_aio_context

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index 9a1a0d1e73..949f0dec11 100644
--- a/block.c
+++ b/block.c
@@ -4745,8 +4745,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, 
AioContext *new_context)
 {
 AioContext *ctx = bdrv_get_aio_context(bs);
 
-aio_disable_external(ctx);
-bdrv_parent_drained_begin(bs);
+aio_context_drained_begin(ctx);
 bdrv_drain(bs); /* ensure there are no in-flight requests */
 
 while (aio_poll(ctx, false)) {
@@ -4760,8 +4759,7 @@ void bdrv_set_aio_context(BlockDriverState *bs, 
AioContext *new_context)
  */
 aio_context_acquire(new_context);
 bdrv_attach_aio_context(bs, new_context);
-bdrv_parent_drained_end(bs);
-aio_enable_external(ctx);
+aio_context_drained_end(ctx);
 aio_context_release(new_context);
 }
 
-- 
2.14.3

[Qemu-devel] [PATCH RFC 7/9] block: Switch to use AIO drained begin/end API

2017-11-29 Thread Fam Zheng

Instead of the recursion of the "disable/enable external requests"
operations on the graph, we switch to AioContext's API to disable/enable
on the whole AioContext altogether. Strictly it is be a bit more than
necessary, but as all drained sections are short, it is not a big
problem.

Drained end can just get away with that.  The other half of drained
begin is to wait for requests, which we can do with BDRV_POLL_WHILE() in
a loop.

Signed-off-by: Fam Zheng 
---
 block/io.c | 116 ++---
 1 file changed, 10 insertions(+), 106 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7f07972489..914037b21a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -40,28 +40,6 @@
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 int64_t offset, int bytes, BdrvRequestFlags flags);
 
-void bdrv_parent_drained_begin(BlockDriverState *bs)
-{
-BdrvChild *c;
-
-QLIST_FOREACH(c, &bs->parents, next_parent) {
-if (c->role->drained_begin) {
-c->role->drained_begin(c);
-}
-}
-}
-
-void bdrv_parent_drained_end(BlockDriverState *bs)
-{
-BdrvChild *c;
-
-QLIST_FOREACH(c, &bs->parents, next_parent) {
-if (c->role->drained_end) {
-c->role->drained_end(c);
-}
-}
-}
-
 static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
 {
 dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
@@ -141,71 +119,6 @@ typedef struct {
 bool begin;
 } BdrvCoDrainData;
 
-static void coroutine_fn bdrv_drain_invoke_entry(void *opaque)
-{
-BdrvCoDrainData *data = opaque;
-BlockDriverState *bs = data->bs;
-
-if (data->begin) {
-bs->drv->bdrv_co_drain_begin(bs);
-} else {
-bs->drv->bdrv_co_drain_end(bs);
-}
-
-/* Set data->done before reading bs->wakeup.  */
-atomic_mb_set(&data->done, true);
-bdrv_wakeup(bs);
-}
-
-static void bdrv_drain_invoke(BlockDriverState *bs, bool begin)
-{
-BdrvCoDrainData data = { .bs = bs, .done = false, .begin = begin};
-
-if (!bs->drv || (begin && !bs->drv->bdrv_co_drain_begin) ||
-(!begin && !bs->drv->bdrv_co_drain_end)) {
-return;
-}
-
-data.co = qemu_coroutine_create(bdrv_drain_invoke_entry, &data);
-bdrv_coroutine_enter(bs, data.co);
-BDRV_POLL_WHILE(bs, !data.done);
-}
-
-static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
-{
-BdrvChild *child, *tmp;
-bool waited;
-
-/* Ensure any pending metadata writes are submitted to bs->file.  */
-bdrv_drain_invoke(bs, begin);
-
-/* Wait for drained requests to finish */
-waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
-
-QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
-BlockDriverState *bs = child->bs;
-bool in_main_loop =
-qemu_get_current_aio_context() == qemu_get_aio_context();
-assert(bs->refcnt > 0);
-if (in_main_loop) {
-/* In case the recursive bdrv_drain_recurse processes a
- * block_job_defer_to_main_loop BH and modifies the graph,
- * let's hold a reference to bs until we are done.
- *
- * IOThread doesn't have such a BH, and it is not safe to call
- * bdrv_unref without BQL, so skip doing it there.
- */
-bdrv_ref(bs);
-}
-waited |= bdrv_drain_recurse(bs, begin);
-if (in_main_loop) {
-bdrv_unref(bs);
-}
-}
-
-return waited;
-}
-
 static void bdrv_co_drain_bh_cb(void *opaque)
 {
 BdrvCoDrainData *data = opaque;
@@ -256,12 +169,13 @@ void bdrv_drained_begin(BlockDriverState *bs)
 return;
 }
 
-if (atomic_fetch_inc(&bs->quiesce_counter) == 0) {
-aio_disable_external(bdrv_get_aio_context(bs));
-bdrv_parent_drained_begin(bs);
+if (atomic_fetch_inc(&bs->quiesce_counter) > 0) {
+return;
+}
+aio_context_drained_begin(bdrv_get_aio_context(bs));
+while (BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0)) {
+/* Loop until no progress is made. */
 }
-
-bdrv_drain_recurse(bs, true);
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
@@ -275,9 +189,7 @@ void bdrv_drained_end(BlockDriverState *bs)
 return;
 }
 
-bdrv_parent_drained_end(bs);
-bdrv_drain_recurse(bs, false);
-aio_enable_external(bdrv_get_aio_context(bs));
+aio_context_drained_end(bdrv_get_aio_context(bs));
 }
 
 /*
@@ -324,14 +236,11 @@ void bdrv_drain_all_begin(void)
 BdrvNextIterator it;
 GSList *aio_ctxs = NULL, *ctx;
 
-block_job_pause_all();
-
 for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
-bdrv_parent_drained_begin(bs);
-aio_disable_external(aio_context);
+aio_context_drained_begin(aio_context);
 aio_context_release(aio_context);

[Qemu-devel] [PATCH RFC 5/9] qed: Implement AioContext drain ops

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/qed.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index 821dcaa055..8ddaa31e7c 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -337,12 +337,33 @@ static void qed_cancel_need_check_timer(BDRVQEDState *s)
 timer_del(s->need_check_timer);
 }
 
+static void bdrv_qed_drained_begin(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BDRVQEDState *s = bs->opaque;
+
+/* Fire the timer immediately in order to start doing I/O as soon as the
+ * header is flushed.
+ */
+if (s->need_check_timer && timer_pending(s->need_check_timer)) {
+qed_cancel_need_check_timer(s);
+qed_need_check_timer_entry(s);
+}
+}
+
+static void bdrv_qed_drained_end(void *opaque)
+{
+}
+
 static void bdrv_qed_detach_aio_context(BlockDriverState *bs)
 {
 BDRVQEDState *s = bs->opaque;
 
 qed_cancel_need_check_timer(s);
 timer_free(s->need_check_timer);
+aio_context_del_drain_ops(bdrv_get_aio_context(bs),
+  bdrv_qed_drained_begin, bdrv_qed_drained_end,
+  bs);
 }
 
 static void bdrv_qed_attach_aio_context(BlockDriverState *bs,
@@ -356,19 +377,14 @@ static void bdrv_qed_attach_aio_context(BlockDriverState 
*bs,
 if (s->header.features & QED_F_NEED_CHECK) {
 qed_start_need_check_timer(s);
 }
+aio_context_add_drain_ops(new_context,
+  bdrv_qed_drained_begin, bdrv_qed_drained_end,
+  bs);
 }
 
 static void coroutine_fn bdrv_qed_co_drain_begin(BlockDriverState *bs)
 {
-BDRVQEDState *s = bs->opaque;
-
-/* Fire the timer immediately in order to start doing I/O as soon as the
- * header is flushed.
- */
-if (s->need_check_timer && timer_pending(s->need_check_timer)) {
-qed_cancel_need_check_timer(s);
-qed_need_check_timer_entry(s);
-}
+bdrv_qed_drained_begin(bs);
 }
 
 static void bdrv_qed_init_state(BlockDriverState *bs)
-- 
2.14.3

Re: [Qemu-devel] [PATCH v1 for-2.12 01/15] s390x/tcg: introduce and use program_interrupt_ra()

2017-11-29 Thread Cornelia Huck

On Tue, 28 Nov 2017 21:33:11 +0100
David Hildenbrand  wrote:

> Allows to easily convert more callers of program_interrupt() and to
> easily introduce new exceptions without forgetting about the cpu state
> reset.
> 
> Use program_interrupt_ra() in places where we already had the same
> pattern.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cpu.h   |  2 ++
>  target/s390x/crypto_helper.c |  7 ++-
>  target/s390x/excp_helper.c   |  5 +
>  target/s390x/interrupt.c | 13 +
>  target/s390x/mem_helper.c| 35 +++
>  target/s390x/misc_helper.c   |  3 +--
>  6 files changed, 30 insertions(+), 35 deletions(-)

> diff --git a/target/s390x/interrupt.c b/target/s390x/interrupt.c
> index ce6177c141..6ce06bb549 100644
> --- a/target/s390x/interrupt.c
> +++ b/target/s390x/interrupt.c
> @@ -53,6 +53,19 @@ void program_interrupt(CPUS390XState *env, uint32_t code, 
> int ilen)
>  }
>  }
>  
> +void program_interrupt_ra(CPUS390XState *env, uint32_t code, int ilen,
> +  uintptr_t ra)
> +{
> +S390CPU *cpu = s390_env_get_cpu(env);

Move this under the if?

> +
> +#ifdef CONFIG_TCG
> +if (tcg_enabled() && ra) {
> +cpu_restore_state(CPU(cpu), ra);
> +}
> +#endif
> +program_interrupt(env, code, ilen);
> +}
> +
>  #if !defined(CONFIG_USER_ONLY)
>  static void cpu_inject_service(S390CPU *cpu, uint32_t param)
>  {

[Qemu-devel] [PATCH RFC 8/9] block: Drop old drained_{begin, end} callbacks

2017-11-29 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block.c| 24 
 block/block-backend.c  | 22 --
 blockjob.c |  6 --
 include/block/block.h  | 16 
 include/block/block_int.h  | 11 ---
 include/sysemu/block-backend.h |  8 
 6 files changed, 87 deletions(-)

diff --git a/block.c b/block.c
index 949f0dec11..4434441df5 100644
--- a/block.c
+++ b/block.c
@@ -810,18 +810,6 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 return g_strdup(bdrv_get_device_or_node_name(parent));
 }
 
-static void bdrv_child_cb_drained_begin(BdrvChild *child)
-{
-BlockDriverState *bs = child->opaque;
-bdrv_drained_begin(bs);
-}
-
-static void bdrv_child_cb_drained_end(BdrvChild *child)
-{
-BlockDriverState *bs = child->opaque;
-bdrv_drained_end(bs);
-}
-
 static int bdrv_child_cb_inactivate(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
@@ -887,8 +875,6 @@ static void bdrv_inherited_options(int *child_flags, QDict 
*child_options,
 const BdrvChildRole child_file = {
 .get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_end = bdrv_child_cb_drained_end,
 .inactivate  = bdrv_child_cb_inactivate,
 };
 
@@ -909,8 +895,6 @@ static void bdrv_inherited_fmt_options(int *child_flags, 
QDict *child_options,
 const BdrvChildRole child_format = {
 .get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_fmt_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_end = bdrv_child_cb_drained_end,
 .inactivate  = bdrv_child_cb_inactivate,
 };
 
@@ -1022,8 +1006,6 @@ const BdrvChildRole child_backing = {
 .attach  = bdrv_backing_attach,
 .detach  = bdrv_backing_detach,
 .inherit_options = bdrv_backing_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_end = bdrv_child_cb_drained_end,
 .inactivate  = bdrv_child_cb_inactivate,
 .update_filename = bdrv_backing_update_filename,
 };
@@ -1973,9 +1955,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
 }
 if (old_bs) {
-if (old_bs->quiesce_counter && child->role->drained_end) {
-child->role->drained_end(child);
-}
 if (child->role->detach) {
 child->role->detach(child);
 }
@@ -1986,9 +1965,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
 
 if (new_bs) {
 QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-if (new_bs->quiesce_counter && child->role->drained_begin) {
-child->role->drained_begin(child);
-}
 
 if (child->role->attach) {
 child->role->attach(child);
diff --git a/block/block-backend.c b/block/block-backend.c
index baef8e7abc..05855ab767 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -68,7 +68,6 @@ struct BlockBackend {
 
 NotifierList remove_bs_notifiers, insert_bs_notifiers;
 
-int quiesce_counter;
 VMChangeStateEntry *vmsh;
 bool force_allow_inactivate;
 };
@@ -245,9 +244,6 @@ static const BdrvChildRole child_root = {
 .get_name   = blk_root_get_name,
 .get_parent_desc= blk_root_get_parent_desc,
 
-.drained_begin  = blk_root_drained_begin,
-.drained_end= blk_root_drained_end,
-
 .activate   = blk_root_activate,
 .inactivate = blk_root_inactivate,
 };
@@ -887,11 +883,6 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps 
*ops,
 
 blk->dev_ops = ops;
 blk->dev_opaque = opaque;
-
-/* Are we currently quiesced? Should we enforce this right now? */
-if (blk->quiesce_counter && ops->drained_begin) {
-ops->drained_begin(opaque);
-}
 }
 
 /*
@@ -2068,12 +2059,6 @@ static void blk_root_drained_begin(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
 
-if (++blk->quiesce_counter == 1) {
-if (blk->dev_ops && blk->dev_ops->drained_begin) {
-blk->dev_ops->drained_begin(blk->dev_opaque);
-}
-}
-
 /* Note that blk->root may not be accessible here yet if we are just
  * attaching to a BlockDriverState that is drained. Use child instead. */
 
@@ -2085,14 +2070,7 @@ static void blk_root_drained_begin(BdrvChild *child)
 static void blk_root_drained_end(BdrvChild *child)
 {
 BlockBackend *blk = child->opaque;
-assert(blk->quiesce_counter);
 
 assert(blk->public.throttle_group_member.io_limits_disabled);
 atomic_dec(&blk->public.throttle_group_member.io_limits_disabled);
-
-if (--blk->quiesce_counter == 0) {
-if (blk->dev_ops && blk->dev_ops->drained_end) {
-blk->dev_ops->drained_end(blk->dev_opaque);
-}
-}
 }
diff --git a/blockj

[Qemu-devel] RFC: Let NBD client request read-only mode

2017-11-29 Thread Eric Blake

Right now, only the server can choose whether an export is read-only.  A 
client can always treat an export as read-only by not sending any 
writes, but a server has no guarantee that a client will behave that 
way, and must assume that an export where the server did not advertise 
NBD_FLAG_READ_ONLY will modify the export.  Therefore, if the server 
does not want to permit simultaneous modifications to the underlying 
data, it has the choice of either permitting only one client at a time, 
or supporting multiple connections but enforcing all subsequent 
connections to see the NBD_FLAG_READ_ONLY bit on the export that is 
already in use by the first connection (note that this is racy - whoever 
connects first is the only one that can get write permissions, even if 
the first connected client doesn't want to write).


However, at least qemu has a case where it would be nice to permit a 
parallel known-read-only client from the same server that is (or will 
be) handling a read-write client; and what's more, to make it so that 
the read-only client can win the race of being the first connection 
without penalizing the actual read-write connection (see 
https://bugzilla.redhat.com/show_bug.cgi?id=1518543).  I don't see any 
way to accomplish this with oldstyle negotiation (but that doesn't 
matter these days); but with newstyle negotiation, there are at least 
two possible implementations:


Idea 1: the server advertises a new global bit NBD_FLAG_NO_WRITE (ideas 
for a better name?) in its 16-bit handshake flags; if the client replies 
with the same bit set (documentation-wise, we'd name the client reply 
NBD_FLAG_C_NO_WRITE), then the server knows that the client promises to 
be a read-only connection.


Idea 2: we add a new option, NBD_OPT_READ_ONLY.  If the client sends 
this option, and the server replies with NBD_REP_ACK, then the server 
knows that the client promises to be a read-only connection.


With either idea, once the server knows the client's intent to be a 
read-only client, the server SHOULD set NBD_FLAG_READ_ONLY on all 
(further) information sent for any export (whether from 
NBD_OPT_EXPORT_NAME, NBD_OPT_INFO, or NBD_OPT_GO) and treat any export 
as read-only for the current client, even if that export is in parallel 
use by another read-write client, and the client MUST NOT send 
NBD_CMD_WRITE, NBD_CMD_TRIM, NBD_CMD_WRITE_ZEROES, or any other command 
that requires a writable connection (the NBD_CMD_RESIZE extension comes 
to mind).


A client that wants to be read-only, but which does not see server 
support (in idea 1, the server did not advertise the bit; in idea 2, the 
server replies with NBD_REP_ERR_UNSUP), does not have to do anything 
special (it is always possible to do just reads to a read-write 
connection, and the server may still set NBD_FLAG_READ_ONLY even without 
supporting the extension of permitting a client-side request).  But such 
a client may, if it wants to be nice to potential parallel writers on 
the same export, decide to disconnect quickly (with NBD_OPT_ABORT or 
NBD_CMD_DISC as appropriate) rather than tie up a read-write connection.


I don't know which idea is more palatable.  We have a finite set of only 
2^4 global handshake flags because it is a bitmask, where only 14 bits 
remain; whereas we have almost 2^32 potential NBD_OPT_ values.  On the 
other hand, using a global handshake flag means the server never shows 
any export as writable; while with the NBD_OPT_ solution, a guest can 
get different results for the sequence NBD_OPT_INFO, NBD_OPT_READ_ONLY, 
NBD_OPT_INFO.  There's also the question with option 2 of whether 
permitting NBD_OPT_READ_ONLY prior to NBD_OPT_STARTTLS would make sense 
(is there any case where the set of TLS authentication to be performed 
can involve looser requirements for a known-read-only client?), where 
using a global bit makes the sequence of required NBD_OPT_* a bit less 
stateful.


Does the idea sound reasonable enough to propose wording to add it to 
the NBD spec and an implementation in qemu?  Which of the two ideas is 
preferred for letting the client inform the server of its intent?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] firmware: Use PTR_ERR_OR_ZERO()

2017-11-29 Thread Gabriel L. Somlo

Acked-by: Gabriel Somlo 

On Tue, Nov 28, 2017 at 10:40:27PM +0100, Vasyl Gomonovych wrote:
> Fix ptr_ret.cocci warnings:
> drivers/firmware/efi/efi.c:610:8-14: WARNING: PTR_ERR_OR_ZERO can be used
> 
> Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
> 
> Generated by: scripts/coccinelle/api/ptr_ret.cocci
> 
> Signed-off-by: Vasyl Gomonovych 
> ---
>  drivers/firmware/qemu_fw_cfg.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
> index 5cfe39f7a45f..18489b6a696d 100644
> --- a/drivers/firmware/qemu_fw_cfg.c
> +++ b/drivers/firmware/qemu_fw_cfg.c
> @@ -693,10 +693,8 @@ static int fw_cfg_cmdline_set(const char *arg, const 
> struct kernel_param *kp)
>*/
>   fw_cfg_cmdline_dev = platform_device_register_simple("fw_cfg",
>   PLATFORM_DEVID_NONE, res, processed);
> - if (IS_ERR(fw_cfg_cmdline_dev))
> - return PTR_ERR(fw_cfg_cmdline_dev);
>  
> - return 0;
> + return PTR_ERR_OR_ZERO(fw_cfg_cmdline_dev);
>  }
>  
>  static int fw_cfg_cmdline_get(char *buf, const struct kernel_param *kp)
> -- 
> 1.9.1
>

Re: [Qemu-devel] [PATCH v1 for-2.12 01/15] s390x/tcg: introduce and use program_interrupt_ra()

2017-11-29 Thread David Hildenbrand

On 29.11.2017 15:55, Cornelia Huck wrote:
> On Tue, 28 Nov 2017 21:33:11 +0100
> David Hildenbrand  wrote:
> 
>> Allows to easily convert more callers of program_interrupt() and to
>> easily introduce new exceptions without forgetting about the cpu state
>> reset.
>>
>> Use program_interrupt_ra() in places where we already had the same
>> pattern.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  target/s390x/cpu.h   |  2 ++
>>  target/s390x/crypto_helper.c |  7 ++-
>>  target/s390x/excp_helper.c   |  5 +
>>  target/s390x/interrupt.c | 13 +
>>  target/s390x/mem_helper.c| 35 +++
>>  target/s390x/misc_helper.c   |  3 +--
>>  6 files changed, 30 insertions(+), 35 deletions(-)
> 
>> diff --git a/target/s390x/interrupt.c b/target/s390x/interrupt.c
>> index ce6177c141..6ce06bb549 100644
>> --- a/target/s390x/interrupt.c
>> +++ b/target/s390x/interrupt.c
>> @@ -53,6 +53,19 @@ void program_interrupt(CPUS390XState *env, uint32_t code, 
>> int ilen)
>>  }
>>  }
>>  
>> +void program_interrupt_ra(CPUS390XState *env, uint32_t code, int ilen,
>> +  uintptr_t ra)
>> +{
>> +S390CPU *cpu = s390_env_get_cpu(env);
> 
> Move this under the if?
> 

Indeed, thanks.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread Cornelia Huck

On Tue, 28 Nov 2017 21:33:14 +0100
David Hildenbrand  wrote:

> TCG needs the retaddr when injecting an interrupt. Let's just pass it
> along and use 0 for KVM. The value will be completely ignored for KVM.

Can we get a #define for that? Just to make it clear at a glance that
we're passing an ignored value and not some magic thing.

> 
> Convert program_interrupt() to program_interrupt_ra() directly, making
> use of the passed address.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/internal.h| 29 +++-
>  target/s390x/ioinst.c  | 67 
> +++---
>  target/s390x/kvm.c | 26 +-
>  target/s390x/misc_helper.c | 20 +++---
>  4 files changed, 73 insertions(+), 69 deletions(-)

Seems sane.

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:13, Cornelia Huck wrote:
> On Tue, 28 Nov 2017 21:33:14 +0100
> David Hildenbrand  wrote:
> 
>> TCG needs the retaddr when injecting an interrupt. Let's just pass it
>> along and use 0 for KVM. The value will be completely ignored for KVM.
> 
> Can we get a #define for that? Just to make it clear at a glance that
> we're passing an ignored value and not some magic thing.

If you can come up with a good name, I can't :)


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH qemu-web v2] Suggest qemu-web prefix in the introductory post

2017-11-29 Thread Eric Blake


On 11/29/2017 03:16 AM, Paolo Bonzini wrote:

On 29/11/2017 05:00, Fam Zheng wrote:

Chances are qemu-devel@nongnu.org patches only adding new files can be
picked up by patchew and treated like a qemu.git patch. Patchew already
ignores [qemu-web] patches, so suggest it in this article as well. (Are
there better places to document this?)


Maybe the contribute page should have a link to the blog post?


Additionally, we should have a mention of qemu-web.git somewhere within 
qemu.git.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2 0/5] target/m68k: implement 680x0 FPU (part 3)

2017-11-29 Thread Thomas Huth

On 29.11.2017 15:17, Laurent Vivier wrote:
> Le 29/11/2017 à 15:08, Thomas Huth a écrit :
>> On 29.11.2017 14:59, Peter Maydell wrote:
>>> On 29 November 2017 at 13:42, Laurent Vivier  wrote:
 these functions come from:

 http://previous.alternative-system.com/

 http://svn.code.sf.net/p/previous/code/trunk/src/softfloat/softfloat.c

 This is also a modified softfloat, release 2b
 which is BSD license if I'm correct.
>>>
>>> We can't use softfloat2b code (the part of the license that goes
>>> "USE OF THIS SOFTWARE IS RESTRICTED TO PERSONS
>>> AND ORGANIZATIONS [...] WHO FURTHERMORE EFFECTIVELY INDEMNIFY JOHN
>>> HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE INSTITUTE" isn't
>>> GPL compatible).
>>>
>>> We can use softfloat2a code, which doesn't have that indemnity clause.
>>
>> Sigh. That's why WinUAE and Hatari immediately switched back to
>> softfloat2a (derived from QEMU) after we've identified the problem
>> there. Looks like the Previous folks forgot to do that step, too :-(
>>
 This code has also been copied to WinUAE (GPL), where softfloat has been
 copied from QEMU:
 https://github.com/tonioni/WinUAE/blob/master/softfloat/softfloat.cpp
>>>
>>> Yes, lots of projects used the softfloat2b code without realising
>>> it wasn't GPL compatible (including QEMU -- we had a painful job
>>> to fix things up and convert to the 2a codebase a while back).
>>>
 But I think the bad news comes later:

 all the other functions (sin, cos, tan, log, exp, ...) found in previous
 (softfloat_fpsp.c) are "derived" from NeXT library FPSP:

 http://svn.code.sf.net/p/previous/code/trunk/src/softfloat/softfloat_fpsp.c

 /*

  This C source file is an extension to the SoftFloat IEC/IEEE 
 Floating-point
  Arithmetic Package, Release 2a.

  Written by Andreas Grabher for Previous, NeXT Computer Emulator.

 =*/
 ...
 /*
 | Algorithms for transcendental functions supported by MC68881 and MC68882
 | mathematical coprocessors. The functions are derived from FPSP library.
 **/

 FPSP library can be found:

 https://ftp.nice.ch/pub/next/developer/hardware/m68k/

 And the assembly source code is not free at all:

 https://ftp.nice.ch/pub/next/developer/hardware/m68k/_libFPSP.1.p2.N.s/l_fpsp.h


 |   Copyright (C) Motorola, Inc. 1991
 |   All Rights Reserved
 |
 |   THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF MOTOROLA
 |   The copyright notice above does not evidence any
 |   actual or intended publication of such source code.


 So I'm wondering what license apply to the C version found in "Previous".
>>>
>>> Good question. It's clearly not copied code (since the FPSP library is
>>> all native m68k assembly), but presumably it's the same algorithms
>>> transliterated into C...
>>
>> There also seem to be other versions of that library available, e.g.:
>>
>> https://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/arch/m68k/fpsp/l_fpsp.h
>>
>> Maybe Andreas (now on CC: ) could clarify which version he used / how
>> the C sources were developed?
> 
> Thank you Thomas,
> 
> This seems to be the same code re-licensed to:
> 
> MOTOROLA MICROPROCESSOR & MEMORY TECHNOLOGY GROUP
> M68000 Hi-Performance Microprocessor Division
> M68040 Software Package
> 
> M68040 Software Package Copyright (c) 1993, 1994 Motorola Inc.
> All rights reserved.
> 
> THE SOFTWARE is provided on an "AS IS" basis and without warranty.
> To the maximum extent permitted by applicable law,
> MOTOROLA DISCLAIMS ALL WARRANTIES WHETHER EXPRESS OR IMPLIED,
> INCLUDING IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
> PARTICULAR PURPOSE and any warranty against infringement with
> regard to the SOFTWARE (INCLUDING ANY MODIFIED VERSIONS THEREOF)
> and any accompanying written materials.
> 
> To the maximum extent permitted by applicable law,
> IN NO EVENT SHALL MOTOROLA BE LIABLE FOR ANY DAMAGES WHATSOEVER
> (INCLUDING WITHOUT LIMITATION, DAMAGES FOR LOSS OF BUSINESS
> PROFITS, BUSINESS INTERRUPTION, LOSS OF BUSINESS INFORMATION, OR
> OTHER PECUNIARY LOSS) ARISING OF THE USE OR INABILITY TO USE THE
> SOFTWARE.  Motorola assumes no responsibility for the maintenance
> and support of the SOFTWARE.
> 
> You are hereby granted a copyright license to use, modify, and
> distribute the SOFTWARE so long as this entire notice is retained
> without alteration in any modified and/or redistributed versions,
> and that such modified versions are clearly identified as such.
> No licenses are granted by implication, estoppel or otherwise

Re: [Qemu-devel] [Bug 645662] x87 fpu emulation not accurate enough

2017-11-29 Thread KONRAD Frederic




On 11/29/2017 01:51 PM, Peter Maydell wrote:

On 29 November 2017 at 12:47, KONRAD Frederic
 wrote:

Maybe a little hack might work for x86 on x86 though.
Something like hardcoding the helper with an inline assembly
code?


The set of people who want to emulate x86 on x86 is surely
even smaller than the already tiny set of people who want to
emulate x86 at all. I don't think it makes sense to add
inline assembly hacks for that: QEMU should (as far as
is reasonably possible) behave the same on all hosts.

thanks
-- PMM



Sure it was a first step suggestion.

Fred

Re: [Qemu-devel] [PATCH RESEND 1/2] x86_iommu: Move machine check to x86_iommu_realize()

2017-11-29 Thread Thomas Huth

On 29.11.2017 13:33, Mohammed Gamal wrote:
> Instead of having the same error checks in vtd_realize()
> and amdvi_realize(), move that over to the generic
> x86_iommu_realize().
> 
> Reviewed-by: Peter Xu 
> Reviewed-by: Eduardo Habkost 
> Signed-off-by: Mohammed Gamal 
> ---
>  hw/i386/amd_iommu.c   | 13 ++---
>  hw/i386/intel_iommu.c | 13 ++---
>  hw/i386/x86-iommu.c   | 13 +
>  3 files changed, 17 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
> index ad8155c..1618282 100644
> --- a/hw/i386/amd_iommu.c
> +++ b/hw/i386/amd_iommu.c
> @@ -1141,18 +1141,9 @@ static void amdvi_realize(DeviceState *dev, Error 
> **err)
>  AMDVIState *s = AMD_IOMMU_DEVICE(dev);
>  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>  MachineState *ms = MACHINE(qdev_get_machine());
> -MachineClass *mc = MACHINE_GET_CLASS(ms);
> -PCMachineState *pcms =
> -PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
> -PCIBus *bus;
> -
> -if (!pcms) {
> -error_setg(err, "Machine-type '%s' not supported by amd-iommu",
> -   mc->name);
> -return;
> -}
> +PCMachineState *pcms = PC_MACHINE(ms);
> +PCIBus *bus = pcms->bus;
>  
> -bus = pcms->bus;
>  s->iotlb = g_hash_table_new_full(amdvi_uint64_hash,
>   amdvi_uint64_equal, g_free, g_free);
>  
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 3a5bb0b..0138b3b 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3027,20 +3027,11 @@ static bool vtd_decide_config(IntelIOMMUState *s, 
> Error **errp)
>  static void vtd_realize(DeviceState *dev, Error **errp)
>  {
>  MachineState *ms = MACHINE(qdev_get_machine());
> -MachineClass *mc = MACHINE_GET_CLASS(ms);
> -PCMachineState *pcms =
> -PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
> -PCIBus *bus;
> +PCMachineState *pcms = PC_MACHINE(ms);
> +PCIBus *bus = pcms->bus;
>  IntelIOMMUState *s = INTEL_IOMMU_DEVICE(dev);
>  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>  
> -if (!pcms) {
> -error_setg(errp, "Machine-type '%s' not supported by intel-iommu",
> -   mc->name);
> -return;
> -}
> -
> -bus = pcms->bus;
>  x86_iommu->type = TYPE_INTEL;
>  
>  if (!vtd_decide_config(s, errp)) {
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 293caf8..51de519 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -21,6 +21,8 @@
>  #include "hw/sysbus.h"
>  #include "hw/boards.h"
>  #include "hw/i386/x86-iommu.h"
> +#include "hw/i386/pc.h"
> +#include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "trace.h"
>  
> @@ -80,7 +82,18 @@ static void x86_iommu_realize(DeviceState *dev, Error 
> **errp)
>  {
>  X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(dev);
>  X86IOMMUClass *x86_class = X86_IOMMU_GET_CLASS(dev);
> +MachineState *ms = MACHINE(qdev_get_machine());
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> +PCMachineState *pcms =
> +PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
>  QLIST_INIT(&x86_iommu->iec_notifiers);
> +
> +if (!pcms) {
> +error_setg(errp, "Machine-type '%s' not supported by IOMMU",
> +   mc->name);
> +return;
> +}
> +
>  if (x86_class->realize) {
>  x86_class->realize(dev, errp);
>  }
> 

Reviewed-by: Thomas Huth

[Qemu-devel] [PATCH] maint: Mention web site maintenance in README

2017-11-29 Thread Eric Blake

Now that we have a website that accepts patches on the list, the
main project should make it easier to find information about that
process.

Signed-off-by: Eric Blake 
---

Doc only, so it could go in -rc3 if we have a reason to slip it
in this late; but I'm also fine if it waits for 2.12.

 README | 4 
 1 file changed, 4 insertions(+)

diff --git a/README b/README
index b92a07a61a..2c8e1c8cc4 100644
--- a/README
+++ b/README
@@ -68,6 +68,10 @@ the QEMU website
   https://qemu.org/Contribute/SubmitAPatch
   https://qemu.org/Contribute/TrivialPatches

+The QEMU website is also maintained under source control.
+
+  git clone git://git.qemu.org/qemu-web.git
+  https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/

 Bug reporting
 =
-- 
2.14.3

[Qemu-devel] [PULL 01/10] iotests: fix 075 and 078

2017-11-29 Thread Kevin Wolf

From: John Snow 

Both of these tests are for formats which now stipulate that they are
read-only. Adjust the tests to match.

Signed-off-by: John Snow 
Reviewed-by: Eric Blake 
Reviewed-by: Lukáš Doktor 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/075 | 18 +-
 tests/qemu-iotests/078 | 14 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tests/qemu-iotests/075 b/tests/qemu-iotests/075
index 770d51c6cb..caa30d4743 100755
--- a/tests/qemu-iotests/075
+++ b/tests/qemu-iotests/075
@@ -48,56 +48,56 @@ offsets_offset=136
 echo
 echo "== check that the first sector can be read =="
 _use_sample_img simple-pattern.cloop.bz2
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== check that the last sector can be read =="
 _use_sample_img simple-pattern.cloop.bz2
-$QEMU_IO -c "read $((1024 * 1024 - 512)) 512" $TEST_IMG 2>&1 | _filter_qemu_io 
| _filter_testdir
+$QEMU_IO -r -c "read $((1024 * 1024 - 512)) 512" $TEST_IMG 2>&1 | 
_filter_qemu_io | _filter_testdir
 
 echo
 echo "== block_size must be a multiple of 512 =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\x00\x00\x02\x01"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== block_size cannot be zero =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\x00\x00\x00\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== huge block_size ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$block_size_offset" "\xff\xff\xfe\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== offsets_size overflow ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$n_blocks_offset" "\xff\xff\xff\xff"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images that require too many offsets ==="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$n_blocks_offset" "\x04\x00\x00\x01"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images with non-monotonically increasing offsets =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$offsets_offset" "\x00\x00\x00\x00\xff\xff\xff\xff"
 poke_file "$TEST_IMG" $((offsets_offset + 8)) 
"\x00\x00\x00\x00\xff\xfe\x00\x00"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 echo
 echo "== refuse images with invalid compressed block size =="
 _use_sample_img simple-pattern.cloop.bz2
 poke_file "$TEST_IMG" "$offsets_offset" "\x00\x00\x00\x00\x00\x00\x00\x00"
 poke_file "$TEST_IMG" $((offsets_offset + 8)) 
"\xff\xff\xff\xff\xff\xff\xff\xff"
-$QEMU_IO -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
+$QEMU_IO -r -c "read 0 512" $TEST_IMG 2>&1 | _filter_qemu_io | _filter_testdir
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/078 b/tests/qemu-iotests/078
index f333e9ac84..a106c26f6b 100755
--- a/tests/qemu-iotests/078
+++ b/tests/qemu-iotests/078
@@ -48,41 +48,41 @@ disk_size_offset=$((0x58))
 echo
 echo "== Read from a valid image =="
 _use_sample_img empty.bochs.bz2
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Negative catalog size =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$catalog_size_offset" "\xff\xff\xff\xff"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Overflow for catalog size * sizeof(uint32_t) =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$catalog_size_offset" "\x00\x00\x00\x40"
-{ $QEMU_IO -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 0 512" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
 
 echo
 echo "== Too small catalog bitmap for image size =="
 _use_sample_img empty.bochs.bz2
 poke_file "$TEST_IMG" "$disk_size_offset" "\x00\xc0\x0f\x00\x00\x00\x00\x7f"
-{ $QEMU_IO -c "read 2T 4k" $TEST_IMG; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+{ $QEMU_IO -r -c "read 2T 4k" $TEST_IMG; }

Re: [Qemu-devel] [PATCH qemu] vfio/spapr: Allow fallback to SPAPR TCE IOMMU v1

2017-11-29 Thread Alex Williamson

On Wed, 22 Nov 2017 16:15:52 +1100
Alexey Kardashevskiy  wrote:

> The vfio_iommu_spapr_tce driver always advertises v1 and v2 IOMMU support,
> however PR KVM (a special version of KVM designed to work in
> a paravirtualized system; these days used for nested virtualizaion) only
> supports the "pseries" platform which does not support v2. Since there is
> no way to choose the IOMMU version in QEMU, it fails to start.

Seems like the bug is then in advertising v2 support when it doesn't
exist.  Isn't that a kernel bug?  Otherwise I think this is a long
standing bug, since QEMU-2.7 and probably not 2.11 material this late
into the freeze.  A "Fixes" tag would probably also be appropriate,
identifying the commit where this was introduced (318f67ce1371) if it
is a QEMU bug.  Thanks,

Alex

> This adds a fallback to the v1 IOMMU if v2 cannot be used.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  hw/vfio/common.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c..cd81cc9 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1040,6 +1040,11 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU;
>  ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
>  if (ret) {
> +container->iommu_type = VFIO_SPAPR_TCE_IOMMU;
> +v2 = false;
> +ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> +}
> +if (ret) {
>  error_setg_errno(errp, errno, "failed to set iommu for 
> container");
>  ret = -errno;
>  goto free_container_exit;

[Qemu-devel] [PULL 03/10] qemu-options: Mention locking option of file driver

2017-11-29 Thread Kevin Wolf

From: Fam Zheng 

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 qemu-options.hx | 4 
 1 file changed, 4 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 3728e9b4dd..f11c4ac960 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -693,6 +693,10 @@ This is the protocol-level block driver for accessing 
regular files.
 The path to the image file in the local filesystem
 @item aio
 Specifies the AIO backend (threads/native, default: threads)
+@item locking
+Specifies whether the image file is protected with Linux OFD / POSIX locks. The
+default is to use the Linux Open File Descriptor API if available, otherwise no
+lock is applied.  (auto/on/off, default: auto)
 @end table
 Example:
 @example
-- 
2.13.6

[Qemu-devel] [PULL 02/10] docs: Add image locking subsection

2017-11-29 Thread Kevin Wolf

From: Fam Zheng 

This documents the image locking feature and explains when and how
related options can be used.

Signed-off-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 docs/qemu-block-drivers.texi | 36 
 qemu-doc.texi|  1 +
 2 files changed, 37 insertions(+)

diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
index 1cb1e55686..503c1847aa 100644
--- a/docs/qemu-block-drivers.texi
+++ b/docs/qemu-block-drivers.texi
@@ -785,6 +785,42 @@ warning: ssh server @code{ssh.example.com:22} does not 
support fsync
 With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
 supported.
 
+@node disk_image_locking
+@subsection Disk image file locking
+
+By default, QEMU tries to protect image files from unexpected concurrent
+access, as long as it's supported by the block protocol driver and host
+operating system. If multiple QEMU processes (including QEMU emulators and
+utilities) try to open the same image with conflicting accessing modes, all but
+the first one will get an error.
+
+This feature is currently supported by the file protocol on Linux with the Open
+File Descriptor (OFD) locking API, and can be configured to fall back to POSIX
+locking if the POSIX host doesn't support Linux OFD locking.
+
+To explicitly enable image locking, specify "locking=on" in the file protocol
+driver options. If OFD locking is not possible, a warning will be printed and
+the POSIX locking API will be used. In this case there is a risk that the lock
+will get silently lost when doing hot plugging and block jobs, due to the
+shortcomings of the POSIX locking API.
+
+QEMU transparently handles lock handover during shared storage migration.  For
+shared virtual disk images between multiple VMs, the "share-rw" device option
+should be used.
+
+Alternatively, locking can be fully disabled by "locking=off" block device
+option. In the command line, the option is usually in the form of
+"file.locking=off" as the protocol driver is normally placed as a "file" child
+under a format driver. For example:
+
+@code{-blockdev 
driver=qcow2,file.filename=/path/to/image,file.locking=off,file.driver=file}
+
+To check if image locking is active, check the output of the "lslocks" command
+on host and see if there are locks held by the QEMU process on the image file.
+More than one byte could be locked by the QEMU instance, each byte of which
+reflects a particular permission that is acquired or protected by the running
+block driver.
+
 @c man end
 
 @ignore
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 617254917d..db2351c746 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -405,6 +405,7 @@ encrypted disk images.
 * disk_images_iscsi:: iSCSI LUNs
 * disk_images_gluster::   GlusterFS disk images
 * disk_images_ssh::   Secure Shell (ssh) disk images
+* disk_image_locking::Disk image file locking
 @end menu
 
 @node disk_images_quickstart
-- 
2.13.6

[Qemu-devel] [PULL 00/10] Block layer fixes for 2.11.0-rc3

2017-11-29 Thread Kevin Wolf

The following changes since commit e7b47c22e2df14d55e3e4426688c929bf8e3f7fb:

  osdep.h: Make TIME_MAX handle different time_t types (2017-11-24 13:23:36 
+)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 5591c001a1056910afd903df3fc3c03f30746623:

  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-29' into 
queue-block (2017-11-29 15:37:31 +0100)


Block layer patches for 2.11.0-rc3


Alberto Garcia (1):
  blockjob: Remove the job from the list earlier in block_job_unref()

Fam Zheng (2):
  docs: Add image locking subsection
  qemu-options: Mention locking option of file driver

John Snow (1):
  iotests: fix 075 and 078

Kashyap Chamarthy (1):
  QAPI & interop: Clarify events emitted by 'block-job-cancel'

Kevin Wolf (2):
  block: Expect graph changes in bdrv_parent_drained_begin/end
  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-29' into 
queue-block

Paolo Bonzini (3):
  blockjob: remove clock argument from block_job_sleep_ns
  blockjob: introduce block_job_do_yield
  blockjob: reimplement block_job_sleep_ns to allow cancellation

Peter Lieven (1):
  block/nfs: fix nfs_client_open for filesize greater than 1TB

 qapi/block-core.json   |  6 +++
 include/block/blockjob.h   |  8 +++-
 include/block/blockjob_int.h   |  7 ++-
 block/backup.c |  4 +-
 block/commit.c |  2 +-
 block/io.c |  8 ++--
 block/mirror.c |  6 +--
 block/nfs.c|  7 ++-
 block/stream.c |  2 +-
 blockjob.c | 84 +++---
 tests/test-blockjob-txn.c  |  2 +-
 docs/interop/live-block-operations.rst | 50 
 docs/qemu-block-drivers.texi   | 36 +++
 qemu-doc.texi  |  1 +
 qemu-options.hx|  4 ++
 tests/qemu-iotests/075 | 18 
 tests/qemu-iotests/078 | 14 +++---
 17 files changed, 187 insertions(+), 72 deletions(-)

[Qemu-devel] [PULL 04/10] QAPI & interop: Clarify events emitted by 'block-job-cancel'

2017-11-29 Thread Kevin Wolf

From: Kashyap Chamarthy 

When you cancel an in-progress 'mirror' job (or "active `block-commit`")
with QMP `block-job-cancel`, it emits the event: BLOCK_JOB_CANCELLED.
However, when `block-job-cancel` is issued *after* `drive-mirror` has
indicated (via the event BLOCK_JOB_READY) that the source and
destination have reached synchronization:

[...] # Snip `drive-mirror` invocation & outputs
{
  "execute":"block-job-cancel",
  "arguments":{
"device":"virtio0"
  }
}

{"return": {}}

It (`block-job-cancel`) will counterintuitively emit the event
'BLOCK_JOB_COMPLETED':

{
  "timestamp":{
"seconds":1510678024,
"microseconds":526240
  },
  "event":"BLOCK_JOB_COMPLETED",
  "data":{
"device":"virtio0",
"len":41126400,
"offset":41126400,
"speed":0,
"type":"mirror"
  }
}

But this is expected behaviour, where the _COMPLETED event indicates
that synchronization has successfully ended (and the destination now has
a point-in-time copy, which is at the time of cancel).

So add a small note to this effect in 'block-core.json'.  While at it,
also update the "Live disk synchronization -- drive-mirror and
blockdev-mirror" section in 'live-block-operations.rst'.

(Thanks: Max Reitz for reminding me of this caveat on IRC.)

Signed-off-by: Kashyap Chamarthy 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json   |  6 
 docs/interop/live-block-operations.rst | 50 ++
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 76bf50f813..dd763dcf87 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2065,6 +2065,12 @@
 # BLOCK_JOB_CANCELLED event.  Before that happens the job is still visible when
 # enumerated using query-block-jobs.
 #
+# Note that if you issue 'block-job-cancel' after 'drive-mirror' has indicated
+# (via the event BLOCK_JOB_READY) that the source and destination are
+# synchronized, then the event triggered by this command changes to
+# BLOCK_JOB_COMPLETED, to indicate that the mirroring has ended and the
+# destination now has a point-in-time copy tied to the time of the 
cancellation.
+#
 # For streaming, the image file retains its backing file unless the streaming
 # operation happens to complete just as it is being cancelled.  A new streaming
 # operation can be started at a later time to finish copying all data from the
diff --git a/docs/interop/live-block-operations.rst 
b/docs/interop/live-block-operations.rst
index 5f0179749f..734252bc80 100644
--- a/docs/interop/live-block-operations.rst
+++ b/docs/interop/live-block-operations.rst
@@ -506,26 +506,40 @@ Again, given our familiar disk image chain::
 
 [A] <-- [B] <-- [C] <-- [D]
 
-The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
-you to copy data from the entire chain into a single target image (which
-can be located on a different host).
-
-Once a 'mirror' job has started, there are two possible actions while a
-``drive-mirror`` job is active:
-
-(1) Issuing the command ``block-job-cancel`` after it emits the event
-``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
-the content from the disk image chain to the target image, [E])
-create a point-in-time (which is at the time of *triggering* the
-cancel command) copy, contained in image [E], of the the entire disk
+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``)
+allows you to copy data from the entire chain into a single target image
+(which can be located on a different host), [E].
+
+.. note::
+
+When you cancel an in-progress 'mirror' job *before* the source and
+target are synchronized, ``block-job-cancel`` will emit the event
+``BLOCK_JOB_CANCELLED``.  However, note that if you cancel a
+'mirror' job *after* it has indicated (via the event
+``BLOCK_JOB_READY``) that the source and target have reached
+synchronization, then the event emitted by ``block-job-cancel``
+changes to ``BLOCK_JOB_COMPLETED``.
+
+Besides the 'mirror' job, the "active ``block-commit``" is the only
+other block device job that emits the event ``BLOCK_JOB_READY``.
+The rest of the block device jobs ('stream', "non-active
+``block-commit``", and 'backup') end automatically.
+
+So there are two possible actions to take, after a 'mirror' job has
+emitted the event ``BLOCK_JOB_READY``, indicating that the source and
+target have reached synchronization:
+
+(1) Issuing the command ``block-job-cancel`` (after it emits the event
+``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at
+the time of *triggering* the cancel command) copy of the entire disk
 image chain (or only the top-most image, depending on the ``sync``
-mode).
+mode), contained in the target image [E]. One use case for this is
+live VM migration

[Qemu-devel] [PULL 10/10] block/nfs: fix nfs_client_open for filesize greater than 1TB

2017-11-29 Thread Kevin Wolf

From: Peter Lieven 

DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE) was overflowing ret (int) if
st.st_size is greater than 1TB.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Peter Lieven 
Message-id: 1511798407-31129-1-git-send-email...@kamp.de
Signed-off-by: Max Reitz 
---
 block/nfs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/nfs.c b/block/nfs.c
index 337fcd9c84..effc8719b5 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -1,7 +1,7 @@
 /*
  * QEMU Block driver for native access to files on NFS shares
  *
- * Copyright (c) 2014-2016 Peter Lieven 
+ * Copyright (c) 2014-2017 Peter Lieven 
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to 
deal
@@ -496,7 +496,7 @@ out:
 static int64_t nfs_client_open(NFSClient *client, QDict *options,
int flags, int open_flags, Error **errp)
 {
-int ret = -EINVAL;
+int64_t ret = -EINVAL;
 QemuOpts *opts = NULL;
 Error *local_err = NULL;
 struct stat st;
@@ -686,8 +686,7 @@ static QemuOptsList nfs_create_opts = {
 
 static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
 {
-int ret = 0;
-int64_t total_size = 0;
+int64_t ret, total_size;
 NFSClient *client = g_new0(NFSClient, 1);
 QDict *options = NULL;
 
-- 
2.13.6

[Qemu-devel] [qemu-web PATCH] Mention website maintainence under Contribute

2017-11-29 Thread Eric Blake

Web (and other doc) updates are also valid contributions.

Suggested-by: Paolo Bonzini 
Signed-off-by: Eric Blake 
---
 contribute.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contribute.md b/contribute.md
index bf4a55d..88baee6 100644
--- a/contribute.md
+++ b/contribute.md
@@ -12,4 +12,6 @@ permalink: /contribute/
 * Chat with the developers on IRC: irc.oftc.net, channel #qemu

 * Read developer documentation: “[Getting started for 
developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers)”,
-  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”, 
“[How to submit a 
patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”
+  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”,
+  “[How to submit a 
patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”,
+  “[Improve the 
website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)”
-- 
2.14.3

[Qemu-devel] [PULL 08/10] blockjob: introduce block_job_do_yield

2017-11-29 Thread Kevin Wolf

From: Paolo Bonzini 

Hide the clearing of job->busy in a single function, and set it
in block_job_enter.  This lets block_job_do_yield verify that
qemu_coroutine_enter is not used while job->busy = false.

Signed-off-by: Paolo Bonzini 
Tested-By: Jeff Cody 
Reviewed-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 blockjob.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index db9e4fc89a..4d22b7d2fb 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -729,6 +729,15 @@ static bool block_job_should_pause(BlockJob *job)
 return job->pause_count > 0;
 }
 
+static void block_job_do_yield(BlockJob *job)
+{
+job->busy = false;
+qemu_coroutine_yield();
+
+/* Set by block_job_enter before re-entering the coroutine.  */
+assert(job->busy);
+}
+
 void coroutine_fn block_job_pause_point(BlockJob *job)
 {
 assert(job && block_job_started(job));
@@ -746,9 +755,7 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 
 if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
 job->paused = true;
-job->busy = false;
-qemu_coroutine_yield(); /* wait for block_job_resume() */
-job->busy = true;
+block_job_do_yield(job);
 job->paused = false;
 }
 
@@ -778,9 +785,12 @@ void block_job_enter(BlockJob *job)
 return;
 }
 
-if (!job->busy) {
-bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+if (job->busy) {
+return;
 }
+
+job->busy = true;
+aio_co_wake(job->co);
 }
 
 bool block_job_is_cancelled(BlockJob *job)
@@ -819,11 +829,9 @@ void block_job_yield(BlockJob *job)
 return;
 }
 
-job->busy = false;
 if (!block_job_should_pause(job)) {
-qemu_coroutine_yield();
+block_job_do_yield(job);
 }
-job->busy = true;
 
 block_job_pause_point(job);
 }
-- 
2.13.6

Re: [Qemu-devel] [PATCH RESEND 2/2] x86_iommu: check if machine has PCI bus

2017-11-29 Thread Thomas Huth

On 29.11.2017 13:33, Mohammed Gamal wrote:
> Starting qemu with
> qemu-system-x86_64 -S -M isapc -device {amd|intel}-iommu
> leads to a segfault. The code assume PCI bus is present and
> tries to access the bus structure without checking.
> 
> Since Intel VT-d and AMDVI should only work with PCI, add a
> check for PCI bus and return error if not present.
> 
> Reviewed-by: Peter Xu 
> Reviewed-by: Eduardo Habkost 
> Signed-off-by: Mohammed Gamal 
> ---
>  hw/i386/x86-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
> index 51de519..8a01a2d 100644
> --- a/hw/i386/x86-iommu.c
> +++ b/hw/i386/x86-iommu.c
> @@ -88,7 +88,7 @@ static void x86_iommu_realize(DeviceState *dev, Error 
> **errp)
>  PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE));
>  QLIST_INIT(&x86_iommu->iec_notifiers);
>  
> -if (!pcms) {
> +if (!pcms || !pcms->bus) {
>  error_setg(errp, "Machine-type '%s' not supported by IOMMU",
> mc->name);
>  return;
> 

Reviewed-by: Thomas Huth

[Qemu-devel] [PULL 05/10] blockjob: Remove the job from the list earlier in block_job_unref()

2017-11-29 Thread Kevin Wolf

From: Alberto Garcia 

When destroying a block job in block_job_unref() we should remove it
from the job list before calling block_job_remove_all_bdrv().

This is because removing the BDSs can trigger an aio_poll() and wake
up other jobs that might attempt to use the block job list. If that
happens the job we're currently destroying should not be in that list
anymore.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 blockjob.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index ff9a614531..2f0cc1528b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -152,6 +152,7 @@ void block_job_unref(BlockJob *job)
 {
 if (--job->refcnt == 0) {
 BlockDriverState *bs = blk_bs(job->blk);
+QLIST_REMOVE(job, job_list);
 bs->job = NULL;
 block_job_remove_all_bdrv(job);
 blk_remove_aio_context_notifier(job->blk,
@@ -160,7 +161,6 @@ void block_job_unref(BlockJob *job)
 blk_unref(job->blk);
 error_free(job->blocker);
 g_free(job->id);
-QLIST_REMOVE(job, job_list);
 g_free(job);
 }
 }
-- 
2.13.6

Re: [Qemu-devel] [PATCH qemu] vfio: Allow configuration without INTx

2017-11-29 Thread Alex Williamson

On Wed, 22 Nov 2017 16:16:49 +1100
Alexey Kardashevskiy  wrote:

> On some platforms INTx may not be enabled on a KVM host (one such
> example is IBM pHyp hypervisor and this is intentional). However
> the PCI_INTERRUPT_PIN is not 0 so QEMU tries initializing INTx, fails as
> (!vdev->pdev->irq) in the VFIO's vfio_intx_enable() and this is
> a fatal error.
> 
> This adds a debug switch - "x-no-intx" - in order to allow broken INTx
> configuration.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> In practice, test teams run PR KVM under HV KVM and there INTx is enabled
> on all levels so having this as a debug switch is enough.
> ---
>  hw/vfio/pci.h | 1 +
>  hw/vfio/pci.c | 8 +++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 502a575..c5e168e 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -141,6 +141,7 @@ typedef struct VFIOPCIDevice {
>  bool has_flr;
>  bool has_pm_reset;
>  bool rom_read_failed;
> +bool no_intx;
>  bool no_kvm_intx;
>  bool no_kvm_msi;
>  bool no_kvm_msix;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c977ee3..c9caf6a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2869,7 +2869,12 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  pci_device_set_intx_routing_notifier(&vdev->pdev, vfio_intx_update);
>  ret = vfio_intx_enable(vdev, errp);
>  if (ret) {
> -goto out_teardown;
> +if (vdev->no_intx) {
> +error_report_err(*errp);
> +*errp = NULL;
> +} else {
> +goto out_teardown;
> +}

Why do we try to setup INTx and let it fail, potentially with errors to
the log, if the user has specified x-no-intx?  Why does the kernel
advertise INTx is available via the pin register if it's not
supported?  I'm not opposed to a debug flag to disable INTx, but by
definition, "x-" prefix flags are not supported and it seems like you
have a configuration that needs a supported mechanism of turning this
off.  Thanks,

Alex

>  }
>  }
>  
> @@ -2986,6 +2991,7 @@ static Property vfio_pci_dev_properties[] = {
>  DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features,
>  VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false),
>  DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false),
> +DEFINE_PROP_BOOL("x-no-intx", VFIOPCIDevice, no_intx, false),
>  DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false),
>  DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false),
>  DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),

[Qemu-devel] [PULL 09/10] blockjob: reimplement block_job_sleep_ns to allow cancellation

2017-11-29 Thread Kevin Wolf

From: Paolo Bonzini 

This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
coroutine double entry or entry-after-completion", 2017-11-21)

This fixed the symptom of a bug rather than the root cause. Canceling the
wait on a sleeping blockjob coroutine is generally fine, we just need to
make it work correctly across AioContexts.  To do so, use a QEMUTimer
that calls block_job_enter.  Use a mutex to ensure that block_job_enter
synchronizes correctly with block_job_sleep_ns.

Signed-off-by: Paolo Bonzini 
Tested-By: Jeff Cody 
Reviewed-by: Fam Zheng 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Signed-off-by: Kevin Wolf 
---
 include/block/blockjob.h |  8 +-
 include/block/blockjob_int.h |  4 +--
 blockjob.c   | 63 
 3 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968fa5..00403d9482 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -77,7 +77,7 @@ typedef struct BlockJob {
 /**
  * Set to false by the job while the coroutine has yielded and may be
  * re-entered by block_job_enter().  There may still be I/O or event loop
- * activity pending.
+ * activity pending.  Accessed under block_job_mutex (in blockjob.c).
  */
 bool busy;
 
@@ -135,6 +135,12 @@ typedef struct BlockJob {
  */
 int ret;
 
+/**
+ * Timer that is used by @block_job_sleep_ns. Accessed under
+ * block_job_mutex (in blockjob.c).
+ */
+QEMUTimer sleep_timer;
+
 /** Non-NULL if this job is part of a transaction */
 BlockJobTxn *txn;
 QLIST_ENTRY(BlockJob) txn_list;
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f7ab183a39..c9b23b0cc9 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -142,8 +142,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
- * the wait, so the cancel will not process until the coroutine wakes up.
+ * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will immediately
+ * interrupt the wait.
  */
 void block_job_sleep_ns(BlockJob *job, int64_t ns);
 
diff --git a/blockjob.c b/blockjob.c
index 4d22b7d2fb..0ed50b953b 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -37,6 +37,26 @@
 #include "qemu/timer.h"
 #include "qapi-event.h"
 
+/* Right now, this mutex is only needed to synchronize accesses to job->busy
+ * and job->sleep_timer, such as concurrent calls to block_job_do_yield and
+ * block_job_enter. */
+static QemuMutex block_job_mutex;
+
+static void block_job_lock(void)
+{
+qemu_mutex_lock(&block_job_mutex);
+}
+
+static void block_job_unlock(void)
+{
+qemu_mutex_unlock(&block_job_mutex);
+}
+
+static void __attribute__((__constructor__)) block_job_init(void)
+{
+qemu_mutex_init(&block_job_mutex);
+}
+
 static void block_job_event_cancelled(BlockJob *job);
 static void block_job_event_completed(BlockJob *job, const char *msg);
 
@@ -161,6 +181,7 @@ void block_job_unref(BlockJob *job)
 blk_unref(job->blk);
 error_free(job->blocker);
 g_free(job->id);
+assert(!timer_pending(&job->sleep_timer));
 g_free(job);
 }
 }
@@ -287,6 +308,13 @@ static void coroutine_fn block_job_co_entry(void *opaque)
 job->driver->start(job);
 }
 
+static void block_job_sleep_timer_cb(void *opaque)
+{
+BlockJob *job = opaque;
+
+block_job_enter(job);
+}
+
 void block_job_start(BlockJob *job)
 {
 assert(job && !block_job_started(job) && job->paused &&
@@ -556,7 +584,7 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 info->type  = g_strdup(BlockJobType_str(job->driver->job_type));
 info->device= g_strdup(job->id);
 info->len   = job->len;
-info->busy  = job->busy;
+info->busy  = atomic_read(&job->busy);
 info->paused= job->pause_count > 0;
 info->offset= job->offset;
 info->speed = job->speed;
@@ -664,6 +692,9 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 job->paused= true;
 job->pause_count   = 1;
 job->refcnt= 1;
+aio_timer_init(qemu_get_aio_context(), &job->sleep_timer,
+   QEMU_CLOCK_REALTIME, SCALE_NS,
+   block_job_sleep_timer_cb, job);
 
 error_setg(&job->blocker, "block device is in use by block job: %s",
BlockJobType_str(driver->job_type));
@@ -729,9 +760,20 @@ static bool block_job_should_pause(BlockJob *job)
 return job->pause_count > 0;
 }
 
-static void block_job_do_yield(BlockJob *job)
+/* Yield, and schedule a timer to reenter the coroutine after @ns nanoseconds.
+ * Reentering the job coroutine with block_job_enter() before the timer has

[Qemu-devel] [PULL 06/10] block: Expect graph changes in bdrv_parent_drained_begin/end

2017-11-29 Thread Kevin Wolf

The .drained_begin/end callbacks can (directly or indirectly via
aio_poll()) cause block nodes to be removed or the current BdrvChild to
point to a different child node.

Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
BlockDriverStates or accidentally continue iterating the parents of the
new child node instead of the node we actually came from.

Signed-off-by: Kevin Wolf 
Tested-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Jeff Cody 
Reviewed-by: Alberto Garcia 
Reviewed-by: Fam Zheng 
Signed-off-by: Kevin Wolf 
---
 block/io.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4fdf93a014..6773926fc1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,9 +42,9 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
 void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
-BdrvChild *c;
+BdrvChild *c, *next;
 
-QLIST_FOREACH(c, &bs->parents, next_parent) {
+QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
 if (c->role->drained_begin) {
 c->role->drained_begin(c);
 }
@@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
 
 void bdrv_parent_drained_end(BlockDriverState *bs)
 {
-BdrvChild *c;
+BdrvChild *c, *next;
 
-QLIST_FOREACH(c, &bs->parents, next_parent) {
+QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
 if (c->role->drained_end) {
 c->role->drained_end(c);
 }
-- 
2.13.6

Re: [Qemu-devel] [Bug 645662] x87 fpu emulation not accurate enough

2017-11-29 Thread Thomas Huth

On 29.11.2017 16:20, KONRAD Frederic wrote:
> 
> 
> On 11/29/2017 01:51 PM, Peter Maydell wrote:
>> On 29 November 2017 at 12:47, KONRAD Frederic
>>  wrote:
>>> Maybe a little hack might work for x86 on x86 though.
>>> Something like hardcoding the helper with an inline assembly
>>> code?
>>
>> The set of people who want to emulate x86 on x86 is surely
>> even smaller than the already tiny set of people who want to
>> emulate x86 at all. I don't think it makes sense to add
>> inline assembly hacks for that: QEMU should (as far as
>> is reasonably possible) behave the same on all hosts.
>>
> 
> Sure it was a first step suggestion.

FYI, there is some code in WinUAE / Previous which might be usable, see
e.g. WinUAE sources here:

https://github.com/tonioni/WinUAE/blob/master/softfloat/softfloat_fpsp.cpp

But the code is targetted for m68k only so far, so it likely needs some
work, and we're currently discussing whether it's feasible to include it
into QEMU at all, see this thread here:

https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg05342.html

 Thomas

[Qemu-devel] [PULL 07/10] blockjob: remove clock argument from block_job_sleep_ns

2017-11-29 Thread Kevin Wolf

From: Paolo Bonzini 

All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.

Signed-off-by: Paolo Bonzini 
Reviewed-by: Alberto Garcia 
Tested-By: Jeff Cody 
Reviewed-by: Fam Zheng 
Reviewed-by: Jeff Cody 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 include/block/blockjob_int.h | 7 +++
 block/backup.c   | 4 ++--
 block/commit.c   | 2 +-
 block/mirror.c   | 6 +++---
 block/stream.c   | 2 +-
 blockjob.c   | 5 +++--
 tests/test-blockjob-txn.c| 2 +-
 7 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 43f3be2965..f7ab183a39 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -139,14 +139,13 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 /**
  * block_job_sleep_ns:
  * @job: The job that calls the function.
- * @clock: The clock to sleep on.
  * @ns: How many nanoseconds to stop for.
  *
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
- * nanoseconds.  Canceling the job will not interrupt the wait, so the
- * cancel will not process until the coroutine wakes up.
+ * %QEMU_CLOCK_REALTIME nanoseconds.  Canceling the job will not interrupt
+ * the wait, so the cancel will not process until the coroutine wakes up.
  */
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
+void block_job_sleep_ns(BlockJob *job, int64_t ns);
 
 /**
  * block_job_yield:
diff --git a/block/backup.c b/block/backup.c
index 06ddbfd03d..99e6bcc748 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -346,9 +346,9 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
 uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
   job->bytes_read);
 job->bytes_read = 0;
-block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
+block_job_sleep_ns(&job->common, delay_ns);
 } else {
-block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
+block_job_sleep_ns(&job->common, 0);
 }
 
 if (block_job_is_cancelled(&job->common)) {
diff --git a/block/commit.c b/block/commit.c
index 5036eec434..c5327551ce 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -174,7 +174,7 @@ static void coroutine_fn commit_run(void *opaque)
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
  */
-block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+block_job_sleep_ns(&s->common, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
diff --git a/block/mirror.c b/block/mirror.c
index 307b6391a8..c9badc1203 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -598,7 +598,7 @@ static void mirror_throttle(MirrorBlockJob *s)
 
 if (now - s->last_pause_ns > SLICE_TIME) {
 s->last_pause_ns = now;
-block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0);
+block_job_sleep_ns(&s->common, 0);
 } else {
 block_job_pause_point(&s->common);
 }
@@ -870,13 +870,13 @@ static void coroutine_fn mirror_run(void *opaque)
 ret = 0;
 trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
 if (!s->synced) {
-block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+block_job_sleep_ns(&s->common, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
 } else if (!should_complete) {
 delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
-block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+block_job_sleep_ns(&s->common, delay_ns);
 }
 s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 }
diff --git a/block/stream.c b/block/stream.c
index e6f72346e5..499cdacdb0 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -141,7 +141,7 @@ static void coroutine_fn stream_run(void *opaque)
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
  */
-block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns);
+block_job_sleep_ns(&s->common, delay_ns);
 if (block_job_is_cancelled(&s->common)) {
 break;
 }
diff --git a/blockjob.c b/blockjob.c
index 2f0cc1528b..db9e4fc89a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -788,7 +788,7 @@ bool block_job_is_cancelled(BlockJob *job)
 return job->cancelled;
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
+void block_job_sleep_ns(BlockJob *job, int64_t ns)
 {
 assert(job->busy);

Re: [Qemu-devel] [PATCH RESEND 0/2] x86_iommu: Fix segfault when starting on non-PCI machines

2017-11-29 Thread Michael S. Tsirkin

On Wed, Nov 29, 2017 at 08:42:17PM +0800, Peter Xu wrote:
> On Wed, Nov 29, 2017 at 01:33:11PM +0100, Mohammed Gamal wrote:
> > [Resending for the second time]
> > 
> > Starting qemu with
> > qemu-system-x86_64 -S -M isapc -device {amd|intel}-iommu
> > leads to a segfault. The code assume PCI bus is present and
> > tries to access the bus structure without checking.
> > 
> > The patch series moves the error checks from vtd_realize()
> > and amdvi_realize() to the generic x86_iommu_realize() and
> > adds a check for PCI bus presence.
> 
> Yes, IMHO this is ok even for 2.11.  Michael, what do you think?


I think so too.
> > 
> > 
> > Mohammed Gamal (2):
> >   x86_iommu: Move machine check to x86_iommu_realize()
> >   x86_iommu: check if machine has PCI bus
> > 
> >  hw/i386/amd_iommu.c   | 13 ++---
> >  hw/i386/intel_iommu.c | 13 ++---
> >  hw/i386/x86-iommu.c   | 13 +
> >  3 files changed, 17 insertions(+), 22 deletions(-)
> > 
> > -- 
> > 1.8.3.1
> > 
> 
> -- 
> Peter Xu

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 16:38:03 +0100
David Hildenbrand  wrote:

> On 29.11.2017 16:35, Cornelia Huck wrote:
> > On Wed, 29 Nov 2017 16:14:40 +0100
> > David Hildenbrand  wrote:
> >   
> >> On 29.11.2017 16:13, Cornelia Huck wrote:  
> >>> On Tue, 28 Nov 2017 21:33:14 +0100
> >>> David Hildenbrand  wrote:
> >>> 
>  TCG needs the retaddr when injecting an interrupt. Let's just pass it
>  along and use 0 for KVM. The value will be completely ignored for KVM.   
>   
> >>>
> >>> Can we get a #define for that? Just to make it clear at a glance that
> >>> we're passing an ignored value and not some magic thing.
> >>
> >> If you can come up with a good name, I can't :)  
> > 
> > KVM_FAKE_RA
> > KVM_RA_UNUSED
> > KVM_RA_IGNORED
> > 
> > ?
> >   
> 
> KVM_RA ? (to keep lines short?)
> 

That sounds too much like a magic value IMO (and not something that is
ignored).

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 16:14:40 +0100
David Hildenbrand  wrote:

> On 29.11.2017 16:13, Cornelia Huck wrote:
> > On Tue, 28 Nov 2017 21:33:14 +0100
> > David Hildenbrand  wrote:
> >   
> >> TCG needs the retaddr when injecting an interrupt. Let's just pass it
> >> along and use 0 for KVM. The value will be completely ignored for KVM.  
> > 
> > Can we get a #define for that? Just to make it clear at a glance that
> > we're passing an ignored value and not some magic thing.  
> 
> If you can come up with a good name, I can't :)

KVM_FAKE_RA
KVM_RA_UNUSED
KVM_RA_IGNORED

?

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:35, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:14:40 +0100
> David Hildenbrand  wrote:
> 
>> On 29.11.2017 16:13, Cornelia Huck wrote:
>>> On Tue, 28 Nov 2017 21:33:14 +0100
>>> David Hildenbrand  wrote:
>>>   
 TCG needs the retaddr when injecting an interrupt. Let's just pass it
 along and use 0 for KVM. The value will be completely ignored for KVM.  
>>>
>>> Can we get a #define for that? Just to make it clear at a glance that
>>> we're passing an ignored value and not some magic thing.  
>>
>> If you can come up with a good name, I can't :)
> 
> KVM_FAKE_RA
> KVM_RA_UNUSED
> KVM_RA_IGNORED
> 
> ?
> 

KVM_RA ? (to keep lines short?)

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH qemu] vfio-pci: Remove unused fields from VFIOMSIXInfo

2017-11-29 Thread Alex Williamson

On Wed, 22 Nov 2017 18:39:47 +1100
Alexey Kardashevskiy  wrote:

Missing reference to commit that made these fields unused.  A Fixes tag
to that commit also seems appropriate.  An empty commit log is pretty
much never justified.  This also looks like 2.12 material.  Thanks,

Alex

> Signed-off-by: Alexey Kardashevskiy 
> ---
>  hw/vfio/pci.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index c5e168e..06dcf99 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -93,8 +93,6 @@ typedef struct VFIOMSIXInfo {
>  uint16_t entries;
>  uint32_t table_offset;
>  uint32_t pba_offset;
> -MemoryRegion mmap_mem;
> -void *mmap;
>  unsigned long *pending;
>  } VFIOMSIXInfo;
>

Re: [Qemu-devel] [PATCH] vfio/common: init giommu_list and hostwin_list of vfio container

2017-11-29 Thread Alex Williamson

On Wed, 22 Nov 2017 15:58:02 +0800
"Liu, Yi L"  wrote:

> The init of giommu_list and hostwin_list is missed during container
> initialization.
> 
> Signed-off-by: Liu, Yi L 
> ---
>  hw/vfio/common.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 7b2924c..14c5940 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -990,6 +990,8 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  container = g_malloc0(sizeof(*container));
>  container->space = space;
>  container->fd = fd;
> +QLIST_INIT(&container->giommu_list);
> +QLIST_INIT(&container->hostwin_list);

container is g_malloc0'd above and QLIST_INIT does:

#define QLIST_INIT(head) do {   \
(head)->lh_first = NULL;\
} while (/*CONSTCOND*/0)

So the only net change is the explicit initialization, which is a fair
fix, but given the current QLIST implementation is not actually a
bug.  Let's save it for after QEMU-2.11.  Thanks,

Alex

>  if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) ||
>  ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) {
>  bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU);

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:39, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:38:03 +0100
> David Hildenbrand  wrote:
> 
>> On 29.11.2017 16:35, Cornelia Huck wrote:
>>> On Wed, 29 Nov 2017 16:14:40 +0100
>>> David Hildenbrand  wrote:
>>>   
 On 29.11.2017 16:13, Cornelia Huck wrote:  
> On Tue, 28 Nov 2017 21:33:14 +0100
> David Hildenbrand  wrote:
> 
>> TCG needs the retaddr when injecting an interrupt. Let's just pass it
>> along and use 0 for KVM. The value will be completely ignored for KVM.   
>>  
>
> Can we get a #define for that? Just to make it clear at a glance that
> we're passing an ignored value and not some magic thing.

 If you can come up with a good name, I can't :)  
>>>
>>> KVM_FAKE_RA
>>> KVM_RA_UNUSED
>>> KVM_RA_IGNORED
>>>
>>> ?
>>>   
>>
>> KVM_RA ? (to keep lines short?)
>>
> 
> That sounds too much like a magic value IMO (and not something that is
> ignored).
> 

Passing 0 will actually also ignore it for TCG. So it is actually a
magic value :) (just that more gets ignore din case of KVM ...)

RA_NONE ?


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 for-2.12 07/15] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-29 Thread Cornelia Huck

On Tue, 28 Nov 2017 21:33:17 +0100
David Hildenbrand  wrote:

> s390_cpu_virt_mem_rw() must always return, so callers can react on
> an exception (e.g. see ioinst_handle_stcrw()). For TCG, there was one
> case where a cpu loop exit was triggered. Fix that up.
> 
> However, for TCG we always have to exit the cpu loop (and restore the
> cpu state before that) if we injected a program interrupt. So let's
> introduce and use s390_cpu_virt_mem_handle_exc() in code that is not
> purely KVM.
> 
> Directly pass the retaddr we already have available in these functions.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/s390-pci-inst.c  |  7 +++
>  target/s390x/cpu.h|  1 +
>  target/s390x/ioinst.c | 21 ++---
>  target/s390x/mmu_helper.c | 19 ++-
>  4 files changed, 44 insertions(+), 4 deletions(-)
> 

> diff --git a/target/s390x/ioinst.c b/target/s390x/ioinst.c
> index 4b6d38f946..0851bf6fef 100644
> --- a/target/s390x/ioinst.c
> +++ b/target/s390x/ioinst.c

> @@ -243,6 +248,8 @@ void ioinst_handle_stsch(S390CPU *cpu, uint64_t reg1, 
> uint32_t ipb,
>   */
>  if (!s390_cpu_virt_mem_check_write(cpu, addr, ar, sizeof(schib))) {
>  program_interrupt_ra(env, PGM_OPERAND, 4, ra);
> +} else {
> +s390_cpu_virt_mem_handle_exc(cpu, ra);

This looks a bit odd if you don't realize that kvm already handled the
exception. But I don't really have a better idea.

>  }
>  return;
>  }

> @@ -645,9 +657,12 @@ void ioinst_handle_chsc(S390CPU *cpu, uint32_t ipb, 
> uintptr_t ra)
>  if (!s390_cpu_virt_mem_write(cpu, addr + len, reg, res,
>   be16_to_cpu(res->len))) {
>  setcc(cpu, 0);/* Command execution complete */
> +} else {
> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>  }
>  }
>  
> +

unrelated whitespace

>  #define SCHM_REG1_RES(_reg) (_reg & 0x0ffc)
>  #define SCHM_REG1_MBK(_reg) ((_reg & 0xf000) >> 28)
>  #define SCHM_REG1_UPD(_reg) ((_reg & 0x0002) >> 1)
> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> index 31e3f3f415..39da9aeef4 100644
> --- a/target/s390x/mmu_helper.c
> +++ b/target/s390x/mmu_helper.c
> @@ -22,6 +22,7 @@
>  #include "internal.h"
>  #include "kvm_s390x.h"
>  #include "sysemu/kvm.h"
> +#include "exec/exec-all.h"
>  #include "trace.h"
>  #include "hw/s390x/storage-keys.h"
>  
> @@ -458,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
> nr_pages,
>  }
>  if (!address_space_access_valid(&address_space_memory, pages[i],
>  TARGET_PAGE_SIZE, is_write)) {
> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);

Why did you change this?

>  return -EFAULT;
>  }
>  addr += TARGET_PAGE_SIZE;
> @@ -478,6 +479,9 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
> nr_pages,
>   *
>   * Copy from/to guest memory using logical addresses. Note that we inject a
>   * program interrupt in case there is an error while accessing the memory.
> + *
> + * This function will always return (also for TCG), make sure to call
> + * s390_cpu_virt_mem_handle_exc() to properly exit the CPU loop.
>   */
>  int s390_cpu_virt_mem_rw(S390CPU *cpu, vaddr laddr, uint8_t ar, void 
> *hostbuf,
>   int len, bool is_write)
> @@ -514,6 +518,19 @@ int s390_cpu_virt_mem_rw(S390CPU *cpu, vaddr laddr, 
> uint8_t ar, void *hostbuf,
>  return ret;
>  }
>  
> +void s390_cpu_virt_mem_handle_exc(S390CPU *cpu, uintptr_t ra)
> +{
> +/* KVM will handle the interrupt automatically, TCG has to exit the TB */
> +#ifdef CONFIG_TCG

Please move the #ifdef/#endif to the beginning of the line.

> +if (tcg_enabled()) {
> +if (ra) {
> +cpu_restore_state(CPU(cpu), ra);
> +}
> +cpu_loop_exit(CPU(cpu));
> +}
> +#endif
> +}
> +
>  /**
>   * Translate a real address into a physical (absolute) address.
>   * @param raddr  the real address

Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller

2017-11-29 Thread Greg Kurz

On Wed, 29 Nov 2017 14:46:56 +0100
Cédric Le Goater  wrote:

> On 11/29/2017 12:49 PM, Greg Kurz wrote:
> 
> > Cédric Le Goater  wrote:
> >   
> >> The XIVE interrupt controller uses a set of tables to redirect exception
> >> from event sources to CPU threads. The Interrupt Virtualization Entry (IVE)
> >> table, also known as Event Assignment Structure (EAS), is one them.
> >>
> >> The XIVE model is designed to make use of the full range of the IRQ
> >> number space and does not use an offset like the XICS mode does.
> >> Hence, the IVE table is directly indexed by the IRQ number.
> >>
> >> The IVE stores Event Queue data associated with a source. The lookups
> >> are performed when the source is configured or when an event is
> >> triggered.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  default-configs/ppc64-softmmu.mak |   1 +
> >>  hw/intc/Makefile.objs |   1 +
> >>  hw/intc/spapr_xive.c  | 165 
> >> ++
> >>  hw/intc/xive-internal.h   |  50 
> >>  include/hw/ppc/spapr_xive.h   |  44 ++
> >>  5 files changed, 261 insertions(+)
> >>  create mode 100644 hw/intc/spapr_xive.c
> >>  create mode 100644 hw/intc/xive-internal.h
> >>  create mode 100644 include/hw/ppc/spapr_xive.h
> >>
> >> diff --git a/default-configs/ppc64-softmmu.mak 
> >> b/default-configs/ppc64-softmmu.mak
> >> index d1b3a6dd50f8..4a7f6a0696de 100644
> >> --- a/default-configs/ppc64-softmmu.mak
> >> +++ b/default-configs/ppc64-softmmu.mak
> >> @@ -56,6 +56,7 @@ CONFIG_SM501=y
> >>  CONFIG_XICS=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
> >>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
> >> +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> >>  # For PReP
> >>  CONFIG_SERIAL_ISA=y
> >>  CONFIG_MC146818RTC=y
> >> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> >> index ae358569a155..49e13e7a 100644
> >> --- a/hw/intc/Makefile.objs
> >> +++ b/hw/intc/Makefile.objs
> >> @@ -35,6 +35,7 @@ obj-$(CONFIG_SH4) += sh_intc.o
> >>  obj-$(CONFIG_XICS) += xics.o
> >>  obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
> >>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> >> +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> >>  obj-$(CONFIG_POWERNV) += xics_pnv.o
> >>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
> >>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> new file mode 100644
> >> index ..b2fc3007c85f
> >> --- /dev/null
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -0,0 +1,165 @@
> >> +/*
> >> + * QEMU PowerPC sPAPR XIVE model
> >> + *
> >> + * Copyright (c) 2017, IBM Corporation.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License, version 2, as  
> > 
> > version 2 or (at your option) any later version.  
> 
> yep. I will shorten the headers at the same time.
> 

Yeah, I found a really short one from Peter in include/hw/arm/armv7m.h:

/*
 * ARMv7M CPU object
 *
 * Copyright (c) 2017 Linaro Ltd
 * Written by Peter Maydell 
 *
 * This code is licensed under the GPL version 2 or later.
 */

> >   
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program; if not, see .
> >> + */
> >> +#include "qemu/osdep.h"
> >> +#include "qemu/log.h"
> >> +#include "qapi/error.h"
> >> +#include "target/ppc/cpu.h"
> >> +#include "sysemu/cpus.h"
> >> +#include "sysemu/dma.h"
> >> +#include "monitor/monitor.h"
> >> +#include "hw/ppc/spapr_xive.h"
> >> +
> >> +#include "xive-internal.h"
> >> +
> >> +/*
> >> + * Main XIVE object
> >> + */
> >> +
> >> +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
> >> +{
> >> +int i;
> >> +
> >> +for (i = 0; i < xive->nr_irqs; i++) {
> >> +XiveIVE *ive = &xive->ivt[i];
> >> +
> >> +if (!(ive->w & IVE_VALID)) {
> >> +continue;
> >> +}
> >> +
> >> +monitor_printf(mon, "  %4x %s %08x %08x\n", i,
> >> +   ive->w & IVE_MASKED ? "M" : " ",
> >> +   (int) GETFIELD(IVE_EQ_INDEX, ive->w),
> >> +   (int) GETFIELD(IVE_EQ_DATA, ive->w));
> >> +}
> >> +}
> >> +
> >> +void spapr_xive_reset(void *dev)
> >> +{
> >> +sPAPRXive *xive = SPAPR_XIVE(dev);
> >> +int i;
> >> +
> >> +/* Mask all valid IVEs in the IRQ number space. */
> >> +for (i = 0; i < xive->nr_irqs; i++) {
> >> +XiveIVE *ive = &xive->ivt[i];
> >> +if (ive->w & IVE_VALID) {
> >> +ive->w |= IVE_MASKED;
> >> +}
>

Re: [Qemu-devel] [PATCH v1 for-2.12 13/15] s390x/tcg: use program_interrupt_ra() in STSI

2017-11-29 Thread Cornelia Huck

On Tue, 28 Nov 2017 21:33:23 +0100
David Hildenbrand  wrote:

> STSI needs some more love, but let's do one step at a time.

Out of curiousity: What else do you want to do?

> We can now drop potential_page_fault().
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/misc_helper.c | 2 +-
>  target/s390x/translate.c   | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)

Re: [Qemu-devel] [PATCH v1 for-2.12 14/15] s390x/tcg: drop program_interrupt()

2017-11-29 Thread Cornelia Huck

On Tue, 28 Nov 2017 21:33:24 +0100
David Hildenbrand  wrote:

> All users are gone, we can finally drop it and make sure that all new
> new program interrupt injections are reminded of the retaddr - as they

s/new new/new/

> have to use program_interrupt_ra() now.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/cpu.h   |  1 -
>  target/s390x/interrupt.c | 22 --
>  2 files changed, 8 insertions(+), 15 deletions(-)
>

Re: [Qemu-devel] [PATCH v1 for-2.12 07/15] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-29 Thread David Hildenbrand

> This looks a bit odd if you don't realize that kvm already handled the
> exception. But I don't really have a better idea.
> 
>>  }
>>  return;
>>  }
> 
>> @@ -645,9 +657,12 @@ void ioinst_handle_chsc(S390CPU *cpu, uint32_t ipb, 
>> uintptr_t ra)
>>  if (!s390_cpu_virt_mem_write(cpu, addr + len, reg, res,
>>   be16_to_cpu(res->len))) {
>>  setcc(cpu, 0);/* Command execution complete */
>> +} else {
>> +s390_cpu_virt_mem_handle_exc(cpu, ra);
>>  }
>>  }
>>  
>> +
> 
> unrelated whitespace

ack, will drop.

> 
>>  #define SCHM_REG1_RES(_reg) (_reg & 0x0ffc)
>>  #define SCHM_REG1_MBK(_reg) ((_reg & 0xf000) >> 28)
>>  #define SCHM_REG1_UPD(_reg) ((_reg & 0x0002) >> 1)
>> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
>> index 31e3f3f415..39da9aeef4 100644
>> --- a/target/s390x/mmu_helper.c
>> +++ b/target/s390x/mmu_helper.c
>> @@ -22,6 +22,7 @@
>>  #include "internal.h"
>>  #include "kvm_s390x.h"
>>  #include "sysemu/kvm.h"
>> +#include "exec/exec-all.h"
>>  #include "trace.h"
>>  #include "hw/s390x/storage-keys.h"
>>  
>> @@ -458,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
>> nr_pages,
>>  }
>>  if (!address_space_access_valid(&address_space_memory, pages[i],
>>  TARGET_PAGE_SIZE, is_write)) {
>> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
>> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
> 
> Why did you change this?

"For TCG, there was one case where a cpu loop exit was triggered. Fix
that up."

Wasn't worth a separate patch :)

> 
>>  return -EFAULT;
>>  }
>>  addr += TARGET_PAGE_SIZE;
>> @@ -478,6 +479,9 @@ static int translate_pages(S390CPU *cpu, vaddr addr, int 
>> nr_pages,
>>   *
>>   * Copy from/to guest memory using logical addresses. Note that we inject a
>>   * program interrupt in case there is an error while accessing the memory.
>> + *
>> + * This function will always return (also for TCG), make sure to call
>> + * s390_cpu_virt_mem_handle_exc() to properly exit the CPU loop.
>>   */
>>  int s390_cpu_virt_mem_rw(S390CPU *cpu, vaddr laddr, uint8_t ar, void 
>> *hostbuf,
>>   int len, bool is_write)
>> @@ -514,6 +518,19 @@ int s390_cpu_virt_mem_rw(S390CPU *cpu, vaddr laddr, 
>> uint8_t ar, void *hostbuf,
>>  return ret;
>>  }
>>  
>> +void s390_cpu_virt_mem_handle_exc(S390CPU *cpu, uintptr_t ra)
>> +{
>> +/* KVM will handle the interrupt automatically, TCG has to exit the TB 
>> */
>> +#ifdef CONFIG_TCG
> 
> Please move the #ifdef/#endif to the beginning of the line.

Oh, yes.

> 
>> +if (tcg_enabled()) {
>> +if (ra) {
>> +cpu_restore_state(CPU(cpu), ra);
>> +}
>> +cpu_loop_exit(CPU(cpu));
>> +}
>> +#endif
>> +}
>> +
>>  /**
>>   * Translate a real address into a physical (absolute) address.
>>   * @param raddr  the real address
> 


-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 for-2.12 13/15] s390x/tcg: use program_interrupt_ra() in STSI

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:53, Cornelia Huck wrote:
> On Tue, 28 Nov 2017 21:33:23 +0100
> David Hildenbrand  wrote:
> 
>> STSI needs some more love, but let's do one step at a time.
> 
> Out of curiousity: What else do you want to do?
> 

Will post the patch as soon as this series has been picked up.

Here is a preview:

commit a1a86f4db56e307d22918117515184fde540c14b
Author: David Hildenbrand 
Date:   Wed Nov 22 18:18:52 2017 +0100

s390x/tcg: fix size + content of STSI blocks

All blocks are 4k in size, which is only true for one of them right now.
Also some reserved fields were wrong, fix it and convert all reserved
fields to speficy u8.

This also fixes the LPAR part output in /proc/sysinfo under LPAR.

Signed-off-by: David Hildenbrand 

commit 41192e74109532d4c106c7b1a6bcf7f989b2f4fd
Author: David Hildenbrand 
Date:   Wed Nov 22 18:21:14 2017 +0100

s390x/tcg: STSI overhaul

Current STSI implementation is a mess, so let's rewrite it.

Problems fixed by this patch:
1) The order of exceptions/when recognized is wrong.
2) We have to store to virtual address space, not absolute.
3) Alignment check of the block is missing.
3) The SMP information is not indicated.

While at it:
a) Make the code look nicer
- get rid of nesting levels
- use struct initialization instead of initializing to zero
- rename a misspelled field and rename function code defines
- use a union and have only one write statement
- use cpu_to_beX()
b) Indicate the VM name/extended name + UUID just like KVM does
c) Indicate that all LPAR CPUs we fake are dedicated
d) Add a comment why we fake being a KVM guest
e) Give our guest as default the name "TCGguest"
f) Fake the same CPU information we have in our Guest for all layers

While at it, get rid of "potential_page_fault()" by forwarding the
retaddr properly.

The result is best verified by looking at "/proc/sysinfo" in the guest
when specifying on the qemu command line
-uuid "74738ff5-5367-5958-9aee-98fffdcd1876" \
-name "extra long guest name"

Signed-off-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 16:45:21 +0100
David Hildenbrand  wrote:

> On 29.11.2017 16:39, Cornelia Huck wrote:
> > On Wed, 29 Nov 2017 16:38:03 +0100
> > David Hildenbrand  wrote:
> >   
> >> On 29.11.2017 16:35, Cornelia Huck wrote:  
> >>> On Wed, 29 Nov 2017 16:14:40 +0100
> >>> David Hildenbrand  wrote:
> >>> 
>  On 29.11.2017 16:13, Cornelia Huck wrote:
> > On Tue, 28 Nov 2017 21:33:14 +0100
> > David Hildenbrand  wrote:
> >   
> >> TCG needs the retaddr when injecting an interrupt. Let's just pass it
> >> along and use 0 for KVM. The value will be completely ignored for KVM. 
> >>  
> >
> > Can we get a #define for that? Just to make it clear at a glance that
> > we're passing an ignored value and not some magic thing.  
> 
>  If you can come up with a good name, I can't :)
> >>>
> >>> KVM_FAKE_RA
> >>> KVM_RA_UNUSED
> >>> KVM_RA_IGNORED
> >>>
> >>> ?
> >>> 
> >>
> >> KVM_RA ? (to keep lines short?)
> >>  
> > 
> > That sounds too much like a magic value IMO (and not something that is
> > ignored).
> >   
> 
> Passing 0 will actually also ignore it for TCG. So it is actually a
> magic value :) (just that more gets ignore din case of KVM ...)
> 
> RA_NONE ?
> 
> 

RA_IGNORED?

Re: [Qemu-devel] [PATCH v1 for-2.12 07/15] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 16:54:30 +0100
David Hildenbrand  wrote:


> >> diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
> >> index 31e3f3f415..39da9aeef4 100644
> >> --- a/target/s390x/mmu_helper.c
> >> +++ b/target/s390x/mmu_helper.c
> >> @@ -22,6 +22,7 @@
> >>  #include "internal.h"
> >>  #include "kvm_s390x.h"
> >>  #include "sysemu/kvm.h"
> >> +#include "exec/exec-all.h"
> >>  #include "trace.h"
> >>  #include "hw/s390x/storage-keys.h"
> >>  
> >> @@ -458,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, 
> >> int nr_pages,
> >>  }
> >>  if (!address_space_access_valid(&address_space_memory, pages[i],
> >>  TARGET_PAGE_SIZE, is_write)) {
> >> -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
> >> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);  
> > 
> > Why did you change this?  
> 
> "For TCG, there was one case where a cpu loop exit was triggered. Fix
> that up."
> 
> Wasn't worth a separate patch :)

Well, maybe it is, to avoid dumb questions :)

> 
> >   
> >>  return -EFAULT;
> >>  }
> >>  addr += TARGET_PAGE_SIZE;

[Qemu-devel] [PATCH for-2.12] iotests: Make 200 run on tmpfs

2017-11-29 Thread Max Reitz

200 currently fails on tmpfs because it sets cache=none.  However,
without that (and aio=native), the test still works now and it fails
before Jeff's series (on fc7dbc119e0852a70dc9fa68bb41a318e49e4cd6).  So
we can probably remove it safely.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/200 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
index d8787ddb46..0e814c9b6c 100755
--- a/tests/qemu-iotests/200
+++ b/tests/qemu-iotests/200
@@ -60,7 +60,7 @@ qemu_comm_method="qmp"
 _launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
  -object iothread,id=iothread0 \
  -device 
virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
- -drive 
file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
 \
+ -drive 
file="${TEST_IMG}",media=disk,if=none,id=drive_sysdisk,format=$IMGFMT \
  -device 
scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
 h1=$QEMU_HANDLE
 
-- 
2.13.6

Re: [Qemu-devel] [PATCH for-2.12] iotests: Make 200 run on tmpfs

2017-11-29 Thread Jeff Cody

On Wed, Nov 29, 2017 at 04:59:42PM +0100, Max Reitz wrote:
> 200 currently fails on tmpfs because it sets cache=none.  However,
> without that (and aio=native), the test still works now and it fails
> before Jeff's series (on fc7dbc119e0852a70dc9fa68bb41a318e49e4cd6).  So
> we can probably remove it safely.
> 
> Signed-off-by: Max Reitz 

Reviewed-by: Jeff Cody 

> ---
>  tests/qemu-iotests/200 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/200 b/tests/qemu-iotests/200
> index d8787ddb46..0e814c9b6c 100755
> --- a/tests/qemu-iotests/200
> +++ b/tests/qemu-iotests/200
> @@ -60,7 +60,7 @@ qemu_comm_method="qmp"
>  _launch_qemu -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
>   -object iothread,id=iothread0 \
>   -device 
> virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0,iothread=iothread0 \
> - -drive 
> file="${TEST_IMG}",media=disk,if=none,cache=none,id=drive_sysdisk,aio=native,format=$IMGFMT
>  \
> + -drive 
> file="${TEST_IMG}",media=disk,if=none,id=drive_sysdisk,format=$IMGFMT \
>   -device 
> scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0
>  h1=$QEMU_HANDLE
>  
> -- 
> 2.13.6
>

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:57, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:45:21 +0100
> David Hildenbrand  wrote:
> 
>> On 29.11.2017 16:39, Cornelia Huck wrote:
>>> On Wed, 29 Nov 2017 16:38:03 +0100
>>> David Hildenbrand  wrote:
>>>   
 On 29.11.2017 16:35, Cornelia Huck wrote:  
> On Wed, 29 Nov 2017 16:14:40 +0100
> David Hildenbrand  wrote:
> 
>> On 29.11.2017 16:13, Cornelia Huck wrote:
>>> On Tue, 28 Nov 2017 21:33:14 +0100
>>> David Hildenbrand  wrote:
>>>   
 TCG needs the retaddr when injecting an interrupt. Let's just pass it
 along and use 0 for KVM. The value will be completely ignored for KVM. 
  
>>>
>>> Can we get a #define for that? Just to make it clear at a glance that
>>> we're passing an ignored value and not some magic thing.  
>>
>> If you can come up with a good name, I can't :)
>
> KVM_FAKE_RA
> KVM_RA_UNUSED
> KVM_RA_IGNORED
>
> ?
> 

 KVM_RA ? (to keep lines short?)
  
>>>
>>> That sounds too much like a magic value IMO (and not something that is
>>> ignored).
>>>   
>>
>> Passing 0 will actually also ignore it for TCG. So it is actually a
>> magic value :) (just that more gets ignore din case of KVM ...)
>>
>> RA_NONE ?
>>
>>
> 
> RA_IGNORED?
> 

jap, will use that.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [qemu-web PATCH] Mention website maintainence under Contribute

2017-11-29 Thread Paolo Bonzini

On 29/11/2017 16:31, Eric Blake wrote:
> Web (and other doc) updates are also valid contributions.
> 
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Eric Blake 
> ---
>  contribute.md | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/contribute.md b/contribute.md
> index bf4a55d..88baee6 100644
> --- a/contribute.md
> +++ b/contribute.md
> @@ -12,4 +12,6 @@ permalink: /contribute/
>  * Chat with the developers on IRC: irc.oftc.net, channel #qemu
> 
>  * Read developer documentation: “[Getting started for 
> developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers)”,
> -  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”, 
> “[How to submit a 
> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”
> +  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”,
> +  “[How to submit a 
> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”,
> +  “[Improve the 
> website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)”
> 

Applied, thanks.

Paolo

Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Eric Blake


On 11/27/2017 10:46 PM, linzhecheng wrote:

If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may get a segfault 
in a low probability.





The root cause of this problem is a bug of glibc(version 2.17,the latest 
version has the same bug),
let's see what happened in glibc's code.


Have you reported this bug to the glibc folks, and if so, can we include 
a URL to the glibc bugzilla?


Working around the glibc bug is nice, but glibc should really be fixed 
so that other projects do not have to continue working around it.





QEMU get a segfault at line 50, becasue pd is an invalid address.
pd is still valid at line 38 when set pd->joinid = pd, at this moment,
created thread is just exiting(only keeps runing for a short time),


s/runing/running/

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 01/17] block/vmdk: Fix , instead of ; at end of line

2017-11-29 Thread Eric Blake


On 11/22/2017 08:08 PM, Max Reitz wrote:

Signed-off-by: Max Reitz 
---
  block/vmdk.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


No semantic difference as far as I can tell, but avoiding needless use 
of the comma operator is always a win in my book.


Reviewed-by: Eric Blake 



diff --git a/block/vmdk.c b/block/vmdk.c
index c665bcc977..1ae47b1c2e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1398,7 +1398,7 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
  qemu_iovec_concat(&local_qiov, qiov, qiov_offset, n_bytes);
  }
  
-write_offset = cluster_offset + offset_in_cluster,

+write_offset = cluster_offset + offset_in_cluster;
  ret = bdrv_co_pwritev(extent->file, write_offset, n_bytes,
&local_qiov, 0);
  



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [qemu-web PATCH] Mention website maintainence under Contribute

2017-11-29 Thread Peter Maydell

On 29 November 2017 at 15:31, Eric Blake  wrote:
> Web (and other doc) updates are also valid contributions.
>
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Eric Blake 
> ---
>  contribute.md | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/contribute.md b/contribute.md
> index bf4a55d..88baee6 100644
> --- a/contribute.md
> +++ b/contribute.md
> @@ -12,4 +12,6 @@ permalink: /contribute/
>  * Chat with the developers on IRC: irc.oftc.net, channel #qemu
>
>  * Read developer documentation: “[Getting started for 
> developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers)”,
> -  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”, 
> “[How to submit a 
> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”
> +  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”,
> +  “[How to submit a 
> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”,
> +  “[Improve the 
> website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)”

Ah, I was vaguely thinking about this the other day; that we should
improve https://wiki.qemu.org/Contribute so that it puts contributions
to documentation, the website, etc on a clearer footing that's more
on par with "contribute code" (possibly by hiving off the "writing
code" bits to a separate section).

We now seem to have two landing pages for "start contributing",
though -- one on the main website and one on the wiki.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 10/25] spapr: add MMIO handlers for the XIVE interrupt sources

2017-11-29 Thread Cédric Le Goater

On 11/29/2017 02:56 PM, Cédric Le Goater wrote:
> +switch (offset) {
> +case 0:
> +spapr_xive_source_eoi(xive, lisn);

 Hrm.  I don't love that you're dealing with clearing that LSI bit
 here, but setting it at a different level.

 The state machines are doing my head in a bit, is there any way
 you could derive the STATUS_SENT bit from the PQ bits?
>>>
>>> Yes. I should. 
>>>
>>> I am also lacking a guest driver to exercise these LSIs so I didn't
>>> pay a lot of attention to level interrupts. Any idea ?
>>
>> How about an old-school emulated PCI device?  Maybe rtl8139?
> 
> Perfect. The current model is working but I will see how I can 
> improve it to use the PQ bits instead.

Using the PQ bits is simplifying the model but we still have to 
maintain an array to store the IRQ type. 

There are 3 unused bits in the IVE descriptor, bits[1-3]:  

  #define IVE_VALID   PPC_BIT(0)
  #define IVE_EQ_BLOCKPPC_BITMASK(4, 7)/* Destination EQ block# */
  #define IVE_EQ_INDEXPPC_BITMASK(8, 31)   /* Destination EQ index */
  #define IVE_MASKED  PPC_BIT(32)  /* Masked */
  #define IVE_EQ_DATA PPC_BITMASK(33, 63)  /* Data written to the EQ */

We could hijack one of them to store the LSI type and get rid of 
the type array. Would you object to that ? 

C.

Re: [Qemu-devel] [PATCH 02/17] qcow2: No persistent dirty bitmaps for compat=0.10

2017-11-29 Thread Eric Blake


On 11/22/2017 08:08 PM, Max Reitz wrote:

Persistent dirty bitmaps require a properly functioning
autoclear_features field, or we cannot track when an unsupporting
program might overwrite them.  Therefore, we cannot support them for
compat=0.10 images.

Signed-off-by: Max Reitz 
---
  block/qcow2-bitmap.c | 10 ++
  block/qcow2.c| 14 +++---
  2 files changed, 21 insertions(+), 3 deletions(-)


Missed Kevin's latest -rc3 pull request; oh well.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [RFC PATCH 0/1] ppc: loadvm corrupts excp_prefix

2017-11-29 Thread Kurban Mallachiev

On processors which don't support MSR_EP bit, loadvm command set exception 
prefix to an incorrect value and so guest OS freezes.

In cpu_post_load() there is:
/* Invalidate all msr bits except MSR_TGPR/MSR_HVB before restoring */
msr = env->msr;
env->msr ^= ~((1ULL << MSR_TGPR) | MSR_HVB);
ppc_store_msr(env, msr);

While hreg_store_msr() (called by ppc_store_msr) contains:
value &= env->msr_mask;
...
if (unlikely((value >> MSR_EP) & 1) != msr_ep) {
/* Change the exception prefix on PowerPC 601 */
...

where msr_ep is ((env->msr >> MSR_EP) & 1).

If MSR_EP bit in msr_mask is zero, then MSR_EP in 'value' bit is zero, and 
MSR_EP bit in env->msr is 1. Condition '(value >> MSR_EP) & 1) != msr_ep' is 
true and so qemu changes exception prefix.

AFAIU we should multiply env->msr by msr_mask, but I am not sure where we 
should do it: inside hreg_store_msr or outside. This is why this patch is RFC.

Current version of the patch adds msr_mask multiplication before the 
hreg_store_msr call.

Kurban

Kurban Mallachiev (1):
  target-ppc: Don't invalidate non-supported msr bits

 target/ppc/machine.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.15.0

[Qemu-devel] [RFC PATCH 1/1] target-ppc: Don't invalidate non-supported msr bits

2017-11-29 Thread Kurban Mallachiev

The msr invalidation code (commits 993eb and 2360b) inverts all
bits except MSR_TGPR and MSR_HVB. On non PowerPC 601 processors
this leads to incorrect change of excp_prefix in hreg_store_msr()
function. The problem is that new msr value get multiplied by msr_mask
and inverted msr does not, thus values of MSR_EP bit in new msr value
and inverted msr are distinct, so that excp_prefix changes but should
not.

Signed-off-by: Kurban Mallachiev 
---
 target/ppc/machine.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index 384caee800..96113ee881 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -298,9 +298,9 @@ static int cpu_post_load(void *opaque, int version_id)
 ppc_store_sdr1(env, env->spr[SPR_SDR1]);
 }
 
-/* Invalidate all msr bits except MSR_TGPR/MSR_HVB before restoring */
+/* Invalidate all supported msr bits except MSR_TGPR/MSR_HVB before 
restoring */
 msr = env->msr;
-env->msr ^= ~((1ULL << MSR_TGPR) | MSR_HVB);
+env->msr ^= env->msr_mask & ~((1ULL << MSR_TGPR) | MSR_HVB);
 ppc_store_msr(env, msr);
 
 hreg_compute_mem_idx(env);
-- 
2.15.0

Re: [Qemu-devel] [PATCH v1 for-2.12 07/15] s390x: handle exceptions during s390_cpu_virt_mem_rw() correctly (TCG)

2017-11-29 Thread David Hildenbrand

On 29.11.2017 16:59, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:54:30 +0100
> David Hildenbrand  wrote:
> 
> 
 diff --git a/target/s390x/mmu_helper.c b/target/s390x/mmu_helper.c
 index 31e3f3f415..39da9aeef4 100644
 --- a/target/s390x/mmu_helper.c
 +++ b/target/s390x/mmu_helper.c
 @@ -22,6 +22,7 @@
  #include "internal.h"
  #include "kvm_s390x.h"
  #include "sysemu/kvm.h"
 +#include "exec/exec-all.h"
  #include "trace.h"
  #include "hw/s390x/storage-keys.h"
  
 @@ -458,7 +459,7 @@ static int translate_pages(S390CPU *cpu, vaddr addr, 
 int nr_pages,
  }
  if (!address_space_access_valid(&address_space_memory, pages[i],
  TARGET_PAGE_SIZE, is_write)) {
 -program_interrupt(env, PGM_ADDRESSING, ILEN_AUTO);
 +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);  
>>>
>>> Why did you change this?  
>>
>> "For TCG, there was one case where a cpu loop exit was triggered. Fix
>> that up."
>>
>> Wasn't worth a separate patch :)
> 
> Well, maybe it is, to avoid dumb questions :)

... splitting it up :) Thanks!

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids

2017-11-29 Thread Halil Pasic



On 11/29/2017 01:37 PM, Cornelia Huck wrote:
> On Tue, 28 Nov 2017 14:07:58 +0100
> Halil Pasic  wrote:
> 
>> The default css 0xfe is currently restricted to virtual subchannel
>> devices. The hope when the decision was made was, that non-virtual
>> subchannel devices will come around when guest can exploit multiple
> 
> s/guest/guests/

OK.

> 
>> channel subsystems. Since the guests generally don't do, the pain
> 
> s/the guests generally don't do/current guests don't do that/
> 

OK.

>> of the partitioned (cssid) namespace outweighs the gain.
>>
>> Let us remove the corresponding restrictions (virtual devices
>> can be put only in 0xfe and non-virtual devices in any css except
>> the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).
>>
>> With this change, however, our schema for generating a css bus ids, if
>> none is specified does not make much sense. Currently we start at cssid
>> 0x0 for non-virtual devices and use the default css (without
>> s390-squash-mcss exclusively) for virtual devices.  That means for
>> non-virtual device the device would auto-magically end up non-visible for
>> guests (which can't use the other channel subsystems).
>>
>> Thus let us also change the css bus id auto assignment algorithm,
>> so that we first fill the default css, and then proceed to the
>> next one (modulo MAX_CSSID).
> 
> Let's reword the previous two paragraphs:
> 
> "At the same time, change our schema for generating css bus ids to put
> both virtual and non-virtual devices into the default css (spilling
> over into other css images, if needed) so that devices without a
> specified id don't end up hidden to guests not supporting multiple
> channel subsystems."
> 

What I don't like about your explanation is, that "so that devices without
a specified id don't end up hidden to guests not supporting multiple channel 
subsystems" is not necessarily true: if we spill the devices are going
to end up hidden.

>>
>> The adverse effect of getting rid of the restriction on migration should
>> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
>> virtual devices using the extra freedom would only make sense with the
>> aforementioned guest support in place.
>>
>> The auto-generated bus ids are affected by both changes. We hope to not
>> encounter any auto-generated bus ids in production as Libvirt is always
>> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
>> mismatch on load", 2017-05-18) the worst that can happen because the same
>> device ended up having a different bus id is a cleanly failed migration.
>> I find it hard to reason about the impact of changed auto-generated bus
>> ids on migration for command line users as I don't know which rules is
>> such an user supposed to follow.
>>
>> Another pain-point is down- or upgrade of QEMU for command line users.
>> The old way and the new way of doing vfio-ccw are mutually incompatible.
>> Libvirt is only going to support the new way, so for libvirt users, the
>> possible problems at QEMU downgrade are the following. If a domain
>> contains virtual devices placed into a css different than 0xfe the domain
>> wil refuse to start with a QEMU not having this patch. Putting devices
>> into a css different that 0xfe however won't make much sense in the
>> near future (guest support). Libvirt will refuse to do vfio-ccw with
>> a QEMU not having this patch. This is business as usual.
> 
> All of this is valuable information, but I don't think a changelog is
> the right place for it. We should really have a place where we can also
> save things like Dong Jia's writeup downthread. In the documentation
> folder or on the QEMU wiki (or both). We can be much more verbose there
> (including examples, which make this stuff way easier to understand).
> I'd recommend adding a documentation file with this patch (or patch
> series, as I'd also like to see a squash-mcss deprecation patch).
> 

I tired to be quite elaborate, because at some point of the v1
discussion, you said if we are planting landmines you want them explained
in the commit message. I'm not sure how this document file is supposed
to be called, and look like. I think this stuff is relevant for
the decision why is this patch a good one, and what are the trade-offs
we make. Referencing to a document would be also OK with me.

Regarding the deprecation patch. It's already on the list as RFC. You
have even commented on it. I intend to make a v2 once we know what are
we going to do here.

>>
>> Signed-off-by: Halil Pasic 
>>
>> ---
>> Hi all!
>>
>> Moving the property to the machine turned out very ugly (IMHO).  Libvirt
>> detects machine properties via query-command-line-options.  This is
>> however decoupled from the existence of the actual property: one needs to
>> touch util/qemu-config.c (see patch) so the property shows up.
>> Furthermore this stuff is global. I've also noticed that the infamous
>> s390-squash-mcss does not show up.
> 
> s390x-softmmu

Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Gonglei (Arei)



> -Original Message-
> From: Eric Blake [mailto:ebl...@redhat.com]
> Sent: Thursday, November 30, 2017 12:19 AM
> To: linzhecheng; qemu-devel@nongnu.org
> Cc: aligu...@us.ibm.com; f...@redhat.com; wangxin (U); Gonglei (Arei);
> pbonz...@redhat.com
> Subject: Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from
> creating thread to created thread
> 
> On 11/27/2017 10:46 PM, linzhecheng wrote:
> > If we create a thread with QEMU_THREAD_DETACHED mode, QEMU may
> get a segfault in a low probability.
> >
> 
> >
> > The root cause of this problem is a bug of glibc(version 2.17,the latest 
> > version
> has the same bug),
> > let's see what happened in glibc's code.
> 
> Have you reported this bug to the glibc folks, and if so, can we include
> a URL to the glibc bugzilla?
> 
No, we didn't do that yet. :(


> Working around the glibc bug is nice, but glibc should really be fixed
> so that other projects do not have to continue working around it.
> 
> 
Yes, agree.


Regards,
-Gonglei

> >
> > QEMU get a segfault at line 50, becasue pd is an invalid address.
> > pd is still valid at line 38 when set pd->joinid = pd, at this moment,
> > created thread is just exiting(only keeps runing for a short time),
> 
> s/runing/running/
> 
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids

2017-11-29 Thread Halil Pasic



On 11/29/2017 12:47 PM, Cornelia Huck wrote:
> On Wed, 29 Nov 2017 16:17:35 +0800
> Dong Jia Shi  wrote:
> 
>> * Halil Pasic  [2017-11-28 14:07:58 +0100]:
>>
>> [...]
>>> The auto-generated bus ids are affected by both changes. We hope to not
>>> encounter any auto-generated bus ids in production as Libvirt is always
>>> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
>>> mismatch on load", 2017-05-18) the worst that can happen because the same
>>> device ended up having a different bus id is a cleanly failed migration.
>>> I find it hard to reason about the impact of changed auto-generated bus
>>> ids on migration for command line users as I don't know which rules is
>>> such an user supposed to follow.  
>> For this paragraph, Halil pointed to me a case that he is thinking of.
>> 1. VM configuration with 3 devices:
>>   -device virtio (e.g. virtio-blk-ccw,id=disk0)
>>   -device vfio-ccw (e.g. id=vfio0)
>>   -device virtio (e.g. virtio-rng-ccw,id=rng0)
>> 2. Start the vm.
>> 3. device_del vfio0
>> 4. migrate "exec:gzip -c > /tmp/tmp_vmstate.gz"
>> 5. modify cmd line from step 1 by removing the vfio0 device, and adding:
>>-incoming "exec:gzip -c -d /tmp/tmp_vmstate.gz"
>>
>> Let me list my test results here for everybody's reference.
>>
>> W/o this patch
>> ==
>>
>> +---+-
>> | squashing off | squashing on
>> +---+-
>> auto id |F  | F
>> +---+-
>> explicit id |F  | S
>> +---+-
>>
>> T1. squashing off + auto id
>>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>>   qemu-system-s390x: Failed to load s390_css:css
>>   qemu-system-s390x: error while loading state for instance 0x0 of device 
>> 's390_css'
>>   qemu-system-s390x: load of migration failed: Invalid argument
>> [Fail due to css mismatch - there is no css 0 in the new vm.]
>>
>> T2. squashing off + explicit given id
>>   qemu-system-s390x: vmstate: get_nullptr expected VMS_NULLPTR_MARKER
>>   qemu-system-s390x: Failed to load s390_css:css
>>   qemu-system-s390x: error while loading state for instance 0x0 of device 
>> 's390_css'
>>   qemu-system-s390x: load of migration failed: Invalid argument
>> [Fail due to css mismatch - there is no css 0 in the new vm.]
> Hmm... so should we even try to migrate an empty css 0? It only exists
> because we have created a device that we had to detach anyway because
> it was non-migrateable...
> 
> [Probably no easy way to deal with this, though.]
> 

We could make the thing go away when the last device is gone.
I see a general problem with implicitly generated shared stuff.

Obviously we can't fix the past.

@Dong Jia:

Thanks for doing the experiments and publishing your findings.

Halil

Re: [Qemu-devel] [qemu-web PATCH] Mention website maintainence under Contribute

2017-11-29 Thread Paolo Bonzini

On 29/11/2017 17:22, Peter Maydell wrote:
> On 29 November 2017 at 15:31, Eric Blake  wrote:
>> Web (and other doc) updates are also valid contributions.
>>
>> Suggested-by: Paolo Bonzini 
>> Signed-off-by: Eric Blake 
>> ---
>>  contribute.md | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/contribute.md b/contribute.md
>> index bf4a55d..88baee6 100644
>> --- a/contribute.md
>> +++ b/contribute.md
>> @@ -12,4 +12,6 @@ permalink: /contribute/
>>  * Chat with the developers on IRC: irc.oftc.net, channel #qemu
>>
>>  * Read developer documentation: “[Getting started for 
>> developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers)”,
>> -  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”, 
>> “[How to submit a 
>> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”
>> +  “[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ)”,
>> +  “[How to submit a 
>> patch](https://wiki.qemu.org/Contribute/SubmitAPatch)”,
>> +  “[Improve the 
>> website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)”
> 
> Ah, I was vaguely thinking about this the other day; that we should
> improve https://wiki.qemu.org/Contribute so that it puts contributions
> to documentation, the website, etc on a clearer footing that's more
> on par with "contribute code" (possibly by hiving off the "writing
> code" bits to a separate section).
> 
> We now seem to have two landing pages for "start contributing",
> though -- one on the main website and one on the wiki.

Apart from the infrastructure part, everything else in the wiki page
probably should be moved to the website.

Then, the wiki "Contribute" can be moved to "Contribute/Infrastructure"
(and the link in the side bar changed from "Start Here" to "Project
Infrastructure").

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Paolo Bonzini

On 29/11/2017 17:28, Gonglei (Arei) wrote:
>>> The root cause of this problem is a bug of glibc(version 2.17,the latest 
>>> version
>> has the same bug),
>>> let's see what happened in glibc's code.
>> Have you reported this bug to the glibc folks, and if so, can we include
>> a URL to the glibc bugzilla?
>>
> No, we didn't do that yet. :(

It's here:

https://sourceware.org/bugzilla/show_bug.cgi?id=19951.

I've added a note to the commit message.

Paolo

Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from creating thread to created thread

2017-11-29 Thread Gonglei (Arei)


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Thursday, November 30, 2017 12:39 AM
> To: Gonglei (Arei); Eric Blake; linzhecheng; qemu-devel@nongnu.org
> Cc: f...@redhat.com; wangxin (U)
> Subject: Re: [Qemu-devel] [PATCH v4] thread: move detach_thread from
> creating thread to created thread
> 
> On 29/11/2017 17:28, Gonglei (Arei) wrote:
> >>> The root cause of this problem is a bug of glibc(version 2.17,the latest
> version
> >> has the same bug),
> >>> let's see what happened in glibc's code.
> >> Have you reported this bug to the glibc folks, and if so, can we include
> >> a URL to the glibc bugzilla?
> >>
> > No, we didn't do that yet. :(
> 
> It's here:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=19951.
> 
> I've added a note to the commit message.
> 
Nice~ :)

Thanks,
-Gonglei

Re: [Qemu-devel] [Qemu-ppc] [PATCH 08/25] spapr: introduce a skeleton for the XIVE interrupt controller

2017-11-29 Thread Cédric Le Goater

 +static const VMStateDescription vmstate_spapr_xive = {
 +.name = TYPE_SPAPR_XIVE,
 +.version_id = 1,
 +.minimum_version_id = 1,
 +.needed = vmstate_spapr_xive_needed,
 +.fields = (VMStateField[]) {
 +VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
 +VMSTATE_STRUCT_VARRAY_UINT32_ALLOC(ivt, sPAPRXive, nr_irqs, 1,
 +   vmstate_spapr_xive_ive, 
 XiveIVE),  
>>>
>>> Hmm... this array is allocated at realize and this will cause
>>> the migration code to re-allocate it again with the same size,
>>> and leak memory IIUC.  
>>
>> I thought so but something was going wrong on the receive side (memory 
>> corruption detected by valgrind). I did not find why yet.
>>
> 
> Have you tried VMSTATE_STRUCT_VARRAY_POINTER_UINT32() ?

yes. tcg/intel only though.

C.

Re: [Qemu-devel] [PATCH 5/6] migration: Now set the migration uri

2017-11-29 Thread Juan Quintela

"Daniel P. Berrange"  wrote:
> On Wed, Nov 22, 2017 at 12:54:58PM +, Daniel P. Berrange wrote:
>> On Wed, Nov 22, 2017 at 01:29:57PM +0100, Juan Quintela wrote:
>> > "Daniel P. Berrange"  wrote:
>> > > On Mon, Oct 30, 2017 at 12:21:11PM +0100, Juan Quintela wrote:

> > > This is bad as it is throwing away data that the original URI
>> > > had. In particular
>> > > you loose the 'ipv4=on|off' and 'ipv6=on|off' flags. If you need to keep 
>> > > the
>> > > original URI for later, then why not just keep the 'host_port' parameter 
>> > > that
>> > > was passed into this function instead of trying to reverse
>> > > engineeer the URI ?
>> > 
>> > I don't need the original uri anymore, this is the incoming side of
>> > migration, and we can only set that once, if migration fails, we need to
>> > restart qemu anymore.
>> > 
>> > I changed it to this:
>> > 
>> > new_uri = g_strdup_printf("tcp:%s:%s%s%s", 
>> > address->u.inet.host,
>> >   address->u.inet.port,
>> >   iaddr->has_ipv4 ? ",ipv4" : "",
>> >   iaddr->has_ipv6 ? ",ipv6" : "");
>> > 
>> > 
>> > Clearly, we don't care about the to= parameter.
>> > 
>> > The whole point of this exercise is that in destination, we use
>> > 
>> > -incoming tcp:0:0
>> > 
>> > and with the output of info migrate_parameters (uri) we are able to
>> > migrate to that place.  For rest of migration posibilities, there is no
>> > changes at all.
>> 
>> That doesn't work typically. When the incoming QEMU listens for connections,
>> by default it will listen on 0.0.0.0, or ::, so that's what the address
>> you're creating will contain.  You can't use 0.0.0.0 or :: in a connect()
>> call on the other side as that's a wildcard address that refers to "any 
>> valid IP address on the current host".
>> 
>> When you connect to the listening QEMU you need the actual IP address
>> of one (of the possibly many) NICs on the target host.  IOW, the only
>> time the listen address is useful is if you have told QEMU to listen on
>> a specific NIC's IP address, instead of the wildcard address.
>> 
>> This is the core reason why libvirt calls 'gethostname()' on the target
>> host to identify the primary IP address of the host, and uses that on the
>> source host to establish the connection.
>
> I should have said the port number info will still be useful (if you're
> using the dynamic port allocation), it is just the IP address that's not
> usable in general.

I gave up O:-)

I introduced tcp_port parameter, that is really what I wanted to know.
The test use case (the one that I am interested right now) is that I
do:

qemu-system-x86_64 . --incomping tcp:127.0.0.1:0

And I want to know the port that the destination gets to connect to it.
For migration, it don't really matters if we _also_ set the
address/ip/whatever it gets translated to, because it is not possible
right now to restart the migration incoming side (you need to restart
qemu for long and boring historic reasons).

So, I ended just adding:

+#
+# @tcp-port: Only used for tcp, to know what is the real port
+# (Since 2.12)
 #

And all the boilerplate code to access/setup this one.

BTW, I am not sure how  well the current code will work with a range of
ports, because we don't have a way to tell the source which one the
kernel/qemu decided to use O:-)

Later, Juan.

[Qemu-devel] [Bug 645662] Re: QEMU x87 emulation of trig and other complex ops is only at 64-bit precision, not 80-bit

2017-11-29 Thread Arno Wagner

That explains it. For most operations that approach works well as
basically nobody uses the 80 bit formats directly anyways. Unfortunately
asinh() is very badly conditioned in the region tested and it is not
enough.

A possible approach to fix this would be to use long double (128 bit)
were available instead of just double and to document this limitation
for double.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/645662

Title:
  QEMU x87 emulation of trig and other complex ops is only at 64-bit
  precision, not 80-bit

Status in QEMU:
  Confirmed

Bug description:
  When doing the regression tests for Python 3.1.2 with Qemu 0.12.5, (Linux 
version 2.6.26-2-686 (Debian 2.6.26-25lenny1)),
  gcc (Debian 4.3.2-1.1) 4.3.2, Python compiled from sources within qemu,
  3 math tests fail, apparently because the floating point unit is buggy. Qmeu 
was compiled from original sources
  on Debian Lenny with kernel  2.6.34.6 from kernel.org, gcc  (Debian 
4.3.2-1.1) 4.3. 

  Regression testing errors:

  test_cmath
  test test_cmath failed -- Traceback (most recent call last):
File "/root/tools/python3/Python-3.1.2/Lib/test/test_cmath.py", line 364, in
  self.fail(error_message)
  AssertionError: acos0034: acos(complex(-1.0002, 0.0))
  Expected: complex(3.141592653589793, -2.1073424255447014e-08)
  Received: complex(3.141592653589793, -2.1073424338879928e-08)
  Received value insufficiently close to expected value.

  
  test_float
  test test_float failed -- Traceback (most recent call last):
File "/root/tools/python3/Python-3.1.2/Lib/test/test_float.py", line 479, in
  self.assertEqual(s, repr(float(s)))
  AssertionError: '8.72293771110361e+25' != '8.722937711103609e+25'

  
  test_math
  test test_math failed -- multiple errors occurred; run in verbose mode for 
deta

  =>

  runtests.sh -v test_math

  le01:~/tools/python3/Python-3.1.2# ./runtests.sh -v test_math
  test_math BAD
   1 BAD
   0 GOOD
   0 SKIPPED
   1 total
  le01:~/tools/python3/Python-3.1.2#

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/645662/+subscriptions

Re: [Qemu-devel] [PATCH 5/6] migration: Now set the migration uri

2017-11-29 Thread Daniel P. Berrange

On Wed, Nov 29, 2017 at 05:43:35PM +0100, Juan Quintela wrote:
> "Daniel P. Berrange"  wrote:
> > On Wed, Nov 22, 2017 at 12:54:58PM +, Daniel P. Berrange wrote:
> >> On Wed, Nov 22, 2017 at 01:29:57PM +0100, Juan Quintela wrote:
> >> > "Daniel P. Berrange"  wrote:
> >> > > On Mon, Oct 30, 2017 at 12:21:11PM +0100, Juan Quintela wrote:
> 
> > > > This is bad as it is throwing away data that the original URI
> >> > > had. In particular
> >> > > you loose the 'ipv4=on|off' and 'ipv6=on|off' flags. If you need to 
> >> > > keep the
> >> > > original URI for later, then why not just keep the 'host_port' 
> >> > > parameter that
> >> > > was passed into this function instead of trying to reverse
> >> > > engineeer the URI ?
> >> > 
> >> > I don't need the original uri anymore, this is the incoming side of
> >> > migration, and we can only set that once, if migration fails, we need to
> >> > restart qemu anymore.
> >> > 
> >> > I changed it to this:
> >> > 
> >> > new_uri = g_strdup_printf("tcp:%s:%s%s%s", 
> >> > address->u.inet.host,
> >> >   address->u.inet.port,
> >> >   iaddr->has_ipv4 ? ",ipv4" : "",
> >> >   iaddr->has_ipv6 ? ",ipv6" : "");
> >> > 
> >> > 
> >> > Clearly, we don't care about the to= parameter.
> >> > 
> >> > The whole point of this exercise is that in destination, we use
> >> > 
> >> > -incoming tcp:0:0
> >> > 
> >> > and with the output of info migrate_parameters (uri) we are able to
> >> > migrate to that place.  For rest of migration posibilities, there is no
> >> > changes at all.
> >> 
> >> That doesn't work typically. When the incoming QEMU listens for 
> >> connections,
> >> by default it will listen on 0.0.0.0, or ::, so that's what the address
> >> you're creating will contain.  You can't use 0.0.0.0 or :: in a connect()
> >> call on the other side as that's a wildcard address that refers to "any 
> >> valid IP address on the current host".
> >> 
> >> When you connect to the listening QEMU you need the actual IP address
> >> of one (of the possibly many) NICs on the target host.  IOW, the only
> >> time the listen address is useful is if you have told QEMU to listen on
> >> a specific NIC's IP address, instead of the wildcard address.
> >> 
> >> This is the core reason why libvirt calls 'gethostname()' on the target
> >> host to identify the primary IP address of the host, and uses that on the
> >> source host to establish the connection.
> >
> > I should have said the port number info will still be useful (if you're
> > using the dynamic port allocation), it is just the IP address that's not
> > usable in general.
> 
> I gave up O:-)
> 
> I introduced tcp_port parameter, that is really what I wanted to know.
> The test use case (the one that I am interested right now) is that I
> do:
> 
> qemu-system-x86_64 . --incomping tcp:127.0.0.1:0
> 
> And I want to know the port that the destination gets to connect to it.
> For migration, it don't really matters if we _also_ set the
> address/ip/whatever it gets translated to, because it is not possible
> right now to restart the migration incoming side (you need to restart
> qemu for long and boring historic reasons).
> 
> So, I ended just adding:
> 
> +#
> +# @tcp-port: Only used for tcp, to know what is the real port
> +# (Since 2.12)
>  #
> 
> And all the boilerplate code to access/setup this one.
> 
> BTW, I am not sure how  well the current code will work with a range of
> ports, because we don't have a way to tell the source which one the
> kernel/qemu decided to use O:-)

Ultimately I think we need to be able to report a full list of
SocketAddress structs representing the listening addresses, because
we will eventually be listening on multiple distinct sockets. Currently
if a hostname resolves to multiple IP addrs, we only listen on the first
successful IP, but I have patches that fix this so we listen on multiple
IPs

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

2017-11-29 Thread Eduardo Habkost

On Wed, Nov 29, 2017 at 04:35:25PM +0300, Roman Kagan wrote:
> > On 2017/11/29 18:41, Eduardo Habkost wrote:
[...]
> > > IMO, the long term solution is to make Linux guests not misbehave
> > > when we stop lying about the L3 cache.  Maybe we could provide a
> > > "IPIs are expensive, please avoid them" hint in the KVM CPUID
> > > leaf?
> 
> We already have it, it's the hypervisor bit ;)  Seriously, I'm unaware
> of hypervisors where IPIs aren't expensive.

Sounds good enough to me, if we can convince the Linux kernel
maintainers that it should avoid IPIs under all hypervisors.

-- 
Eduardo

Re: [Qemu-devel] [PATCH] i386: turn off l3-cache property by default

2017-11-29 Thread Paolo Bonzini

On 29/11/2017 14:35, Roman Kagan wrote:
>>
>>> IMO, the long term solution is to make Linux guests not misbehave
>>> when we stop lying about the L3 cache.  Maybe we could provide a
>>> "IPIs are expensive, please avoid them" hint in the KVM CPUID
>>> leaf?
> We already have it, it's the hypervisor bit ;)  Seriously, I'm unaware
> of hypervisors where IPIs aren't expensive.
> 

In theory, AMD's AVIC should optimize IPIs to running vCPUs.  Amazon's
recently posted patches to disable HLT and MWAIT exits might tilt the
balance in favor of IPIs even for Intel APICv (where sending the IPI is
expensive, but receiving it isn't).

Being able to tie this to Amazon's other proposal, the "DEDICATED" CPUID
bit, would be nice.  My plan was to disable all three of MWAIT/HLT/PAUSE
when setting the dedicated bit.

Paolo

Re: [Qemu-devel] [PATCH v1 for-2.12 01/15] s390x/tcg: introduce and use program_interrupt_ra()

2017-11-29 Thread Richard Henderson

On 11/28/2017 08:33 PM, David Hildenbrand wrote:
> +S390CPU *cpu = s390_env_get_cpu(env);
> +
> +#ifdef CONFIG_TCG
> +if (tcg_enabled() && ra) {
> +cpu_restore_state(CPU(cpu), ra);
> +}
> +#endif

FWIW, I have a patch queued for 2.12 that removes the RA != 0 check protecting
calls to cpu_restore_state.  That condition is already handled inside and we
don't need to duplicate it.

Otherwise,
Reviewed-by: Richard Henderson 

r~

Re: [Qemu-devel] [PATCH v1 for-2.12 02/15] s390x/tcg: get rid of runtime_exception()

2017-11-29 Thread Richard Henderson

On 11/28/2017 08:33 PM, David Hildenbrand wrote:
> Let's use program_interrupt_ra() instead.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/fpu_helper.c  |  2 +-
>  target/s390x/int_helper.c  | 14 +++---
>  target/s390x/internal.h|  2 --
>  target/s390x/misc_helper.c | 16 
>  4 files changed, 8 insertions(+), 26 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end

2017-11-29 Thread Kevin Wolf

Am 29.11.2017 um 15:49 hat Fam Zheng geschrieben:
> While we look at the fixes for 2.11, I briefly prototyped this series
> to see if it makes sense as a simplification of the drain API for
> 2.12.
> 
> The idea is to let AioContext manage quiesce callbacks, then the block
> layer only needs to do the in-flight request waiting. This lets us get
> rid of the callback recursion (both up and down).

So essentially you don't drain individual nodes any more, but whole
AioContexts. I have a feeeling that this would be a step in the wrong
direction.

Not only would it completely bypass the path I/O requests take and
potentially drain a lot more than is actually necessary, but it also
requires that all nodes that are connected in a tree are in the same
AioContext.

Paolo can say more on this, but my understanding was that the long-term
plan is to make the block layer fully thread safe so that any thread
could call things on any node. I remember Paolo saying that AioContext
was even fully going away in the future. I don't see how this is
compatible with your approach.

And finally, I don't really think that the recursion is even a problem.
The problem is with graph changes made by callbacks that drain allows to
run. With your changes, it might be a bit easier to avoid that
bdrv_drain() itself gets into trouble due to graph changes, but this
doesn't solve the problem for any (possibly indirect) callers of
bdrv_drain().

Kevin

Re: [Qemu-devel] [PATCH v1 for-2.12 03/15] s390x/tcg: rip out dead tpi code

2017-11-29 Thread Richard Henderson

On 11/28/2017 08:33 PM, David Hildenbrand wrote:
> It is broken and not even wired up. We'll add a new handler soon, but
> that will live somewhere else.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/css.c  |  6 --
>  include/hw/s390x/css.h  |  1 -
>  target/s390x/internal.h |  1 -
>  target/s390x/ioinst.c   | 26 --
>  4 files changed, 34 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC PATCH v2 1/1] s390x/css: unrestrict cssids

2017-11-29 Thread Cornelia Huck

On Wed, 29 Nov 2017 17:25:59 +0100
Halil Pasic  wrote:

> On 11/29/2017 01:37 PM, Cornelia Huck wrote:
> > On Tue, 28 Nov 2017 14:07:58 +0100
> > Halil Pasic  wrote:
> >   
> >> The default css 0xfe is currently restricted to virtual subchannel
> >> devices. The hope when the decision was made was, that non-virtual
> >> subchannel devices will come around when guest can exploit multiple  
> > 
> > s/guest/guests/  
> 
> OK.
> 
> >   
> >> channel subsystems. Since the guests generally don't do, the pain  
> > 
> > s/the guests generally don't do/current guests don't do that/
> >   
> 
> OK.
> 
> >> of the partitioned (cssid) namespace outweighs the gain.
> >>
> >> Let us remove the corresponding restrictions (virtual devices
> >> can be put only in 0xfe and non-virtual devices in any css except
> >> the 0xfe -- while s390-squash-mcss then remaps everything to cssid 0).
> >>
> >> With this change, however, our schema for generating a css bus ids, if
> >> none is specified does not make much sense. Currently we start at cssid
> >> 0x0 for non-virtual devices and use the default css (without
> >> s390-squash-mcss exclusively) for virtual devices.  That means for
> >> non-virtual device the device would auto-magically end up non-visible for
> >> guests (which can't use the other channel subsystems).
> >>
> >> Thus let us also change the css bus id auto assignment algorithm,
> >> so that we first fill the default css, and then proceed to the
> >> next one (modulo MAX_CSSID).  
> > 
> > Let's reword the previous two paragraphs:
> > 
> > "At the same time, change our schema for generating css bus ids to put
> > both virtual and non-virtual devices into the default css (spilling
> > over into other css images, if needed) so that devices without a
> > specified id don't end up hidden to guests not supporting multiple
> > channel subsystems."
> >   
> 
> What I don't like about your explanation is, that "so that devices without
> a specified id don't end up hidden to guests not supporting multiple channel 
> subsystems" is not necessarily true: if we spill the devices are going
> to end up hidden.

Let's make this "don't always end up hidden".

> 
> >>
> >> The adverse effect of getting rid of the restriction on migration should
> >> not be too severe.  Vfio-ccw devices are not live-migratable yet, and for
> >> virtual devices using the extra freedom would only make sense with the
> >> aforementioned guest support in place.
> >>
> >> The auto-generated bus ids are affected by both changes. We hope to not
> >> encounter any auto-generated bus ids in production as Libvirt is always
> >> explicit about the bus id.  Since 8ed179c937 ("s390x/css: catch section
> >> mismatch on load", 2017-05-18) the worst that can happen because the same
> >> device ended up having a different bus id is a cleanly failed migration.
> >> I find it hard to reason about the impact of changed auto-generated bus
> >> ids on migration for command line users as I don't know which rules is
> >> such an user supposed to follow.
> >>
> >> Another pain-point is down- or upgrade of QEMU for command line users.
> >> The old way and the new way of doing vfio-ccw are mutually incompatible.
> >> Libvirt is only going to support the new way, so for libvirt users, the
> >> possible problems at QEMU downgrade are the following. If a domain
> >> contains virtual devices placed into a css different than 0xfe the domain
> >> wil refuse to start with a QEMU not having this patch. Putting devices
> >> into a css different that 0xfe however won't make much sense in the
> >> near future (guest support). Libvirt will refuse to do vfio-ccw with
> >> a QEMU not having this patch. This is business as usual.  
> > 
> > All of this is valuable information, but I don't think a changelog is
> > the right place for it. We should really have a place where we can also
> > save things like Dong Jia's writeup downthread. In the documentation
> > folder or on the QEMU wiki (or both). We can be much more verbose there
> > (including examples, which make this stuff way easier to understand).
> > I'd recommend adding a documentation file with this patch (or patch
> > series, as I'd also like to see a squash-mcss deprecation patch).
> >   
> 
> I tired to be quite elaborate, because at some point of the v1
> discussion, you said if we are planting landmines you want them explained
> in the commit message. I'm not sure how this document file is supposed
> to be called, and look like. I think this stuff is relevant for
> the decision why is this patch a good one, and what are the trade-offs
> we make. Referencing to a document would be also OK with me.

I don't think there will be landmines left, no? Or do I misread?

> 
> Regarding the deprecation patch. It's already on the list as RFC. You
> have even commented on it. I intend to make a v2 once we know what are
> we going to do here.

This needs to be a patch series with a cover letter. Discussing in
multiple places is draining.

>

Re: [Qemu-devel] [PATCH v1 for-2.12 04/15] s390x/ioinst: pass the retaddr to all IO instructions

2017-11-29 Thread Richard Henderson

On 11/28/2017 08:33 PM, David Hildenbrand wrote:
> TCG needs the retaddr when injecting an interrupt. Let's just pass it
> along and use 0 for KVM. The value will be completely ignored for KVM.
> 
> Convert program_interrupt() to program_interrupt_ra() directly, making
> use of the passed address.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  target/s390x/internal.h| 29 +++-
>  target/s390x/ioinst.c  | 67 
> +++---
>  target/s390x/kvm.c | 26 +-
>  target/s390x/misc_helper.c | 20 +++---
>  4 files changed, 73 insertions(+), 69 deletions(-)

Reviewed-by: Richard Henderson 

... with or without RA_IGNORED.  ;-)

r~

Re: [Qemu-devel] [PATCH v1 for-2.12 05/15] s390x/pci: pass the retaddr to all PCI instructions

2017-11-29 Thread Richard Henderson

On 11/28/2017 08:33 PM, David Hildenbrand wrote:
> Once we wire up TCG, we will need the retaddr to correctly inject
> program interrupts. As we want to get rid of the function
> program_interrupt(), convert PCI code too.
> 
> For KVM, we can simply pass a 0.
> 
> Convert program_interrupt() to program_interrupt_ra() directly, making
> use of the passed address.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/s390x/s390-pci-inst.c | 83 
> +---
>  hw/s390x/s390-pci-inst.h | 16 ++
>  target/s390x/kvm.c   | 14 
>  3 files changed, 59 insertions(+), 54 deletions(-)

Reviewed-by: Richard Henderson 

with or without RA_IGNORED.


r~

1 2 3 4 >

1 - 100 of 320 matches

Mail list logo