Re: [Qemu-devel] [RESEND PATCH 2/2] hostmem-file: add an attribute 'align' to set its alignment

2017-05-26 Thread Marc-André Lureau
Hi

On Fri, May 26, 2017 at 10:51 AM Haozhong Zhang 
wrote:

> On 05/26/17 06:39 +, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, May 26, 2017 at 6:34 AM Haozhong Zhang  >
> > wrote:
> >
> > > file_ram_alloc() currently maps the backend file via mmap to a virtual
> > > address aligned to the value returned by qemu_fd_getpagesize(). When a
> > > DAX device (e.g. /dev/dax0.0) is used as the backend file, its kernel
> > > mmap implementation may require an alignment larger than what
> > > qemu_fd_get_pagesize() returns (e.g. 2MB vs. 4KB), and mmap may fail.
> > >
> >
> > How is the accepted value queried? Any chance to configure it
> > automatically?
>
> Take /dev/dax0.0 for example. The value can be read from
> /sys/class/dax/dax0.0/device/dax_region/align.
>

Should this work be left to management layer, or could qemu figure it out
by itself (using udev?)


> [..]
> > > +static void
> > >  file_backend_class_init(ObjectClass *oc, void *data)
> > >  {
> > >  HostMemoryBackendClass *bc = MEMORY_BACKEND_CLASS(oc);
> > > @@ -116,6 +151,10 @@ file_backend_class_init(ObjectClass *oc, void
> *data)
> > >  object_class_property_add_str(oc, "mem-path",
> > >  get_mem_path, set_mem_path,
> > >  &error_abort);
> > > +object_class_property_add(oc, "align", "int",
> > >
> >
> > It would make sense to make it "size" instead of "int"
> >
>
> will change in the next version
>

thanks
-- 
Marc-André Lureau


[Qemu-devel] [PULL 03/22] docker: Add libaio to fedora image

2017-05-26 Thread Fam Zheng
Signed-off-by: Fam Zheng 
Message-Id: <20170505032340.26467-5-f...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/fedora.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/fedora.docker 
b/tests/docker/dockerfiles/fedora.docker
index 39f6b58..4eaa8ed 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -2,7 +2,7 @@ FROM fedora:latest
 ENV PACKAGES \
 ccache git tar PyYAML sparse flex bison python2 bzip2 hostname \
 glib2-devel pixman-devel zlib-devel SDL-devel libfdt-devel \
-gcc gcc-c++ clang make perl which bc findutils \
+gcc gcc-c++ clang make perl which bc findutils libaio-devel \
 mingw32-pixman mingw32-glib2 mingw32-gmp mingw32-SDL mingw32-pkg-config \
 mingw32-gtk2 mingw32-gtk3 mingw32-gnutls mingw32-nettle mingw32-libtasn1 \
 mingw32-libjpeg-turbo mingw32-libpng mingw32-curl mingw32-libssh2 \
-- 
2.9.4




[Qemu-devel] [PULL 02/22] docker: Add bzip2 and hostname to fedora image

2017-05-26 Thread Fam Zheng
It is used by qemu-iotests.

Signed-off-by: Fam Zheng 
Message-Id: <20170505032340.26467-3-f...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/fedora.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/fedora.docker 
b/tests/docker/dockerfiles/fedora.docker
index c4f80ad..39f6b58 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -1,6 +1,6 @@
 FROM fedora:latest
 ENV PACKAGES \
-ccache git tar PyYAML sparse flex bison python2 \
+ccache git tar PyYAML sparse flex bison python2 bzip2 hostname \
 glib2-devel pixman-devel zlib-devel SDL-devel libfdt-devel \
 gcc gcc-c++ clang make perl which bc findutils \
 mingw32-pixman mingw32-glib2 mingw32-gmp mingw32-SDL mingw32-pkg-config \
-- 
2.9.4




[Qemu-devel] [PULL 00/22] Docker and block patches

2017-05-26 Thread Fam Zheng
The following changes since commit 9964e96dccf7f7c936ee854a795415d19b60:

  Merge remote-tracking branch 'jasowang/tags/net-pull-request' into staging 
(2017-05-23 15:01:31 +0100)

are available in the git repository at:

  git://github.com/famz/qemu.git tags/docker-and-block-pull-request

for you to fetch changes up to 77269bba94ef97de99ae61fdc98629a8704ae2ed:

  block: make accounting thread-safe (2017-05-26 09:25:30 +0800)



For Paolo's block layer thread safety part I and my docker testing
enhancements.



Fam Zheng (4):
  docker: Run tests with current user
  docker: Add bzip2 and hostname to fedora image
  docker: Add libaio to fedora image
  docker: Add flex and bison to centos6 image

Paolo Bonzini (18):
  block: access copy_on_read with atomic ops
  block: access quiesce_counter with atomic ops
  block: access io_limits_disabled with atomic ops
  block: access serialising_in_flight with atomic ops
  block: access wakeup with atomic ops
  block: access io_plugged with atomic ops
  throttle-groups: only start one coroutine from drained_begin
  throttle-groups: do not use qemu_co_enter_next
  throttle-groups: protect throttled requests with a CoMutex
  util: add stats64 module
  block: use Stat64 for wr_highest_offset
  block: access write_gen with atomics
  block: protect tracked_requests and flush_queue with reqs_lock
  block: introduce dirty_bitmap_mutex
  migration/block: reset dirty bitmap before reading
  block: protect modification of dirty bitmaps with a mutex
  block: introduce block_account_one_io
  block: make accounting thread-safe

 block.c |   9 +-
 block/accounting.c  |  65 ++-
 block/block-backend.c   |   5 +-
 block/dirty-bitmap.c| 114 +--
 block/io.c  |  51 +
 block/mirror.c  |  14 ++-
 block/nfs.c |   4 +-
 block/qapi.c|   2 +-
 block/sheepdog.c|   3 +-
 block/throttle-groups.c |  91 +++
 blockdev.c  |  46 ++--
 include/block/accounting.h  |   8 +-
 include/block/block.h   |   5 +-
 include/block/block_int.h   |  65 +++
 include/block/dirty-bitmap.h|  25 +++--
 include/qemu/stats64.h  | 193 
 include/sysemu/block-backend.h  |  10 +-
 migration/block.c   |  17 ++-
 tests/docker/Makefile.include   |   2 +-
 tests/docker/dockerfiles/centos6.docker |   2 +-
 tests/docker/dockerfiles/fedora.docker  |   4 +-
 util/Makefile.objs  |   1 +
 util/stats64.c  | 136 ++
 23 files changed, 687 insertions(+), 185 deletions(-)
 create mode 100644 include/qemu/stats64.h
 create mode 100644 util/stats64.c

-- 
2.9.4




[Qemu-devel] [PULL 05/22] block: access copy_on_read with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-2-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block.c   |  6 --
 block/io.c|  8 
 blockdev.c|  2 +-
 include/block/block_int.h | 11 ++-
 4 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 50ba264..f6066ef 100644
--- a/block.c
+++ b/block.c
@@ -1260,7 +1260,9 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 goto fail_opts;
 }
 
-assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
+/* bdrv_new() and bdrv_close() make it so */
+assert(atomic_read(&bs->copy_on_read) == 0);
+
 if (bs->open_flags & BDRV_O_COPY_ON_READ) {
 if (!bs->read_only) {
 bdrv_enable_copy_on_read(bs);
@@ -3023,7 +3025,7 @@ static void bdrv_close(BlockDriverState *bs)
 
 g_free(bs->opaque);
 bs->opaque = NULL;
-bs->copy_on_read = 0;
+atomic_set(&bs->copy_on_read, 0);
 bs->backing_file[0] = '\0';
 bs->backing_format[0] = '\0';
 bs->total_sectors = 0;
diff --git a/block/io.c b/block/io.c
index fdd7485..80a3a8c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -129,13 +129,13 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error 
**errp)
  */
 void bdrv_enable_copy_on_read(BlockDriverState *bs)
 {
-bs->copy_on_read++;
+atomic_inc(&bs->copy_on_read);
 }
 
 void bdrv_disable_copy_on_read(BlockDriverState *bs)
 {
-assert(bs->copy_on_read > 0);
-bs->copy_on_read--;
+int old = atomic_fetch_dec(&bs->copy_on_read);
+assert(old >= 1);
 }
 
 /* Check if any requests are in-flight (including throttled requests) */
@@ -1157,7 +1157,7 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
 bdrv_inc_in_flight(bs);
 
 /* Don't do copy-on-read if we read data before write operation */
-if (bs->copy_on_read && !(flags & BDRV_REQ_NO_SERIALISING)) {
+if (atomic_read(&bs->copy_on_read) && !(flags & BDRV_REQ_NO_SERIALISING)) {
 flags |= BDRV_REQ_COPY_ON_READ;
 }
 
diff --git a/blockdev.c b/blockdev.c
index c63f4e8..a2fa9f7 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1791,7 +1791,7 @@ static void external_snapshot_commit(BlkActionState 
*common)
 /* We don't need (or want) to use the transactional
  * bdrv_reopen_multiple() across all the entries at once, because we
  * don't want to abort all of them if one of them fails the reopen */
-if (!state->old_bs->copy_on_read) {
+if (!atomic_read(&state->old_bs->copy_on_read)) {
 bdrv_reopen(state->old_bs, state->old_bs->open_flags & ~BDRV_O_RDWR,
 NULL);
 }
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8d3724c..c71492a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -595,11 +595,6 @@ struct BlockDriverState {
 
 /* Protected by AioContext lock */
 
-/* If true, copy read backing sectors into image.  Can be >1 if more
- * than one client has requested copy-on-read.
- */
-int copy_on_read;
-
 /* If we are reading a disk image, give its size in sectors.
  * Generally read-only; it is written to by load_vmstate and save_vmstate,
  * but the block layer is quiescent during those.
@@ -633,6 +628,12 @@ struct BlockDriverState {
 
 QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
 
+/* If true, copy read backing sectors into image.  Can be >1 if more
+ * than one client has requested copy-on-read.  Accessed with atomic
+ * ops.
+ */
+int copy_on_read;
+
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
-- 
2.9.4




[Qemu-devel] [PULL 01/22] docker: Run tests with current user

2017-05-26 Thread Fam Zheng
We've used --add-current-user to create a user in the image, use it to
run tests, because root has too much priviledge, and can surprise test
cases.

Signed-off-by: Fam Zheng 
Message-Id: <20170505032340.26467-2-f...@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
---
 tests/docker/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 03eda37..0ed8c3d 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -126,7 +126,7 @@ docker-run: docker-qemu-src
"  COPYING $(EXECUTABLE) to $(IMAGE)"))
$(call quiet-command,   \
$(SRC_PATH)/tests/docker/docker.py run  \
-   -t  \
+   $(if $(NOUSER),,-u $(shell id -u)) -t   \
$(if $V,,--rm)  \
$(if $(DEBUG),-i,--net=none)\
-e TARGET_LIST=$(TARGET_LIST)   \
-- 
2.9.4




[Qemu-devel] [PULL 12/22] throttle-groups: do not use qemu_co_enter_next

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Prepare for removing this function; always restart throttled requests
from coroutine context.  This will matter when restarting throttled
requests will have to acquire a CoMutex.

Reviewed-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-9-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/throttle-groups.c | 42 +-
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 85169ec..8bf1031 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -260,6 +260,20 @@ static bool throttle_group_schedule_timer(BlockBackend 
*blk, bool is_write)
 return must_wait;
 }
 
+/* Start the next pending I/O request for a BlockBackend.  Return whether
+ * any request was actually pending.
+ *
+ * @blk:   the current BlockBackend
+ * @is_write:  the type of operation (read/write)
+ */
+static bool coroutine_fn throttle_group_co_restart_queue(BlockBackend *blk,
+ bool is_write)
+{
+BlockBackendPublic *blkp = blk_get_public(blk);
+
+return qemu_co_queue_next(&blkp->throttled_reqs[is_write]);
+}
+
 /* Look for the next pending I/O request and schedule it.
  *
  * This assumes that tg->lock is held.
@@ -287,7 +301,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 if (!must_wait) {
 /* Give preference to requests from the current blk */
 if (qemu_in_coroutine() &&
-qemu_co_queue_next(&blkp->throttled_reqs[is_write])) {
+throttle_group_co_restart_queue(blk, is_write)) {
 token = blk;
 } else {
 ThrottleTimers *tt = &blk_get_public(token)->throttle_timers;
@@ -340,15 +354,21 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(BlockBackend *blk,
 qemu_mutex_unlock(&tg->lock);
 }
 
-static void throttle_group_restart_queue(BlockBackend *blk, bool is_write)
+typedef struct {
+BlockBackend *blk;
+bool is_write;
+} RestartData;
+
+static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
 {
+RestartData *data = opaque;
+BlockBackend *blk = data->blk;
+bool is_write = data->is_write;
 BlockBackendPublic *blkp = blk_get_public(blk);
 ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
 bool empty_queue;
 
-aio_context_acquire(blk_get_aio_context(blk));
-empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
-aio_context_release(blk_get_aio_context(blk));
+empty_queue = !throttle_group_co_restart_queue(blk, is_write);
 
 /* If the request queue was empty then we have to take care of
  * scheduling the next one */
@@ -359,6 +379,18 @@ static void throttle_group_restart_queue(BlockBackend 
*blk, bool is_write)
 }
 }
 
+static void throttle_group_restart_queue(BlockBackend *blk, bool is_write)
+{
+Coroutine *co;
+RestartData rd = {
+.blk = blk,
+.is_write = is_write
+};
+
+co = qemu_coroutine_create(throttle_group_restart_queue_entry, &rd);
+aio_co_enter(blk_get_aio_context(blk), co);
+}
+
 void throttle_group_restart_blk(BlockBackend *blk)
 {
 BlockBackendPublic *blkp = blk_get_public(blk);
-- 
2.9.4




[Qemu-devel] [PULL 04/22] docker: Add flex and bison to centos6 image

2017-05-26 Thread Fam Zheng
Currently there are warnings about flex and bison being missing when
building in the centos6 image:

make[1]: flex: Command not found
 BISON dtc-parser.tab.c
make[1]: bison: Command not found

Add them.

Reported-by: Thomas Huth 
Signed-off-by: Fam Zheng 
Message-Id: <20170524005206.31916-1-f...@redhat.com>
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Fam Zheng 
---
 tests/docker/dockerfiles/centos6.docker | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/dockerfiles/centos6.docker 
b/tests/docker/dockerfiles/centos6.docker
index 34e0d3b..17a4d24 100644
--- a/tests/docker/dockerfiles/centos6.docker
+++ b/tests/docker/dockerfiles/centos6.docker
@@ -1,7 +1,7 @@
 FROM centos:6
 RUN yum install -y epel-release
 ENV PACKAGES libfdt-devel ccache \
-tar git make gcc g++ \
+tar git make gcc g++ flex bison \
 zlib-devel glib2-devel SDL-devel pixman-devel \
 epel-release
 RUN yum install -y $PACKAGES
-- 
2.9.4




[Qemu-devel] [PULL 06/22] block: access quiesce_counter with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Alberto Garcia 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-3-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/io.c| 4 ++--
 include/block/block_int.h | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 80a3a8c..7d31946 100644
--- a/block/io.c
+++ b/block/io.c
@@ -240,7 +240,7 @@ void bdrv_drained_begin(BlockDriverState *bs)
 return;
 }
 
-if (!bs->quiesce_counter++) {
+if (atomic_fetch_inc(&bs->quiesce_counter) == 0) {
 aio_disable_external(bdrv_get_aio_context(bs));
 bdrv_parent_drained_begin(bs);
 }
@@ -251,7 +251,7 @@ void bdrv_drained_begin(BlockDriverState *bs)
 void bdrv_drained_end(BlockDriverState *bs)
 {
 assert(bs->quiesce_counter > 0);
-if (--bs->quiesce_counter > 0) {
+if (atomic_fetch_dec(&bs->quiesce_counter) > 1) {
 return;
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c71492a..4af44df 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -637,6 +637,7 @@ struct BlockDriverState {
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
+/* Accessed with atomic ops.  */
 int quiesce_counter;
 };
 
-- 
2.9.4




[Qemu-devel] [PULL 16/22] block: access write_gen with atomics

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-13-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block.c   | 2 +-
 block/io.c| 6 +++---
 include/block/block_int.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index f6066ef..45a77f5 100644
--- a/block.c
+++ b/block.c
@@ -3384,7 +3384,7 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, Error 
**errp)
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 bdrv_dirty_bitmap_truncate(bs);
 bdrv_parent_cb_resize(bs);
-++bs->write_gen;
+atomic_inc(&bs->write_gen);
 }
 return ret;
 }
diff --git a/block/io.c b/block/io.c
index afb8294..8021b5b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1415,7 +1415,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 }
 bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
 
-++bs->write_gen;
+atomic_inc(&bs->write_gen);
 bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
 
 stat64_max(&bs->wr_highest_offset, offset + bytes);
@@ -2304,7 +2304,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 goto early_exit;
 }
 
-current_gen = bs->write_gen;
+current_gen = atomic_read(&bs->write_gen);
 
 /* Wait until any previous flushes are completed */
 while (bs->active_flush_req) {
@@ -2529,7 +2529,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, 
int64_t offset,
 }
 ret = 0;
 out:
-++bs->write_gen;
+atomic_inc(&bs->write_gen);
 bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
req.bytes >> BDRV_SECTOR_BITS);
 tracked_request_end(&req);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a23e4bb..74d6e62 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -612,7 +612,6 @@ struct BlockDriverState {
 QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 CoQueue flush_queue;  /* Serializing flush queue */
 bool active_flush_req;/* Flush request in flight? */
-unsigned int write_gen;   /* Current data generation */
 unsigned int flushed_gen; /* Flushed write generation */
 
 QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
@@ -647,6 +646,7 @@ struct BlockDriverState {
 
 /* Accessed with atomic ops.  */
 int quiesce_counter;
+unsigned int write_gen;   /* Current data generation */
 };
 
 struct BlockBackendRootState {
-- 
2.9.4




[Qemu-devel] [PULL 10/22] block: access io_plugged with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-7-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/io.c| 4 ++--
 include/block/block_int.h | 8 +---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3a98b02..f34d030 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2658,7 +2658,7 @@ void bdrv_io_plug(BlockDriverState *bs)
 bdrv_io_plug(child->bs);
 }
 
-if (bs->io_plugged++ == 0) {
+if (atomic_fetch_inc(&bs->io_plugged) == 0) {
 BlockDriver *drv = bs->drv;
 if (drv && drv->bdrv_io_plug) {
 drv->bdrv_io_plug(bs);
@@ -2671,7 +2671,7 @@ void bdrv_io_unplug(BlockDriverState *bs)
 BdrvChild *child;
 
 assert(bs->io_plugged);
-if (--bs->io_plugged == 0) {
+if (atomic_fetch_dec(&bs->io_plugged) == 1) {
 BlockDriver *drv = bs->drv;
 if (drv && drv->bdrv_io_unplug) {
 drv->bdrv_io_unplug(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b690e58..d54ed75 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -611,9 +611,6 @@ struct BlockDriverState {
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
 
-/* counter for nested bdrv_io_plug */
-unsigned io_plugged;
-
 QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 CoQueue flush_queue;  /* Serializing flush queue */
 bool active_flush_req;/* Flush request in flight? */
@@ -639,6 +636,11 @@ struct BlockDriverState {
  */
 bool wakeup;
 
+/* counter for nested bdrv_io_plug.
+ * Accessed with atomic ops.
+*/
+unsigned io_plugged;
+
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
-- 
2.9.4




[Qemu-devel] [PULL 08/22] block: access serialising_in_flight with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-5-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/io.c|  6 +++---
 include/block/block_int.h | 10 ++
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index 7d31946..cc5de2c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -388,7 +388,7 @@ void bdrv_drain_all(void)
 static void tracked_request_end(BdrvTrackedRequest *req)
 {
 if (req->serialising) {
-req->bs->serialising_in_flight--;
+atomic_dec(&req->bs->serialising_in_flight);
 }
 
 QLIST_REMOVE(req, list);
@@ -427,7 +427,7 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
- overlap_offset;
 
 if (!req->serialising) {
-req->bs->serialising_in_flight++;
+atomic_inc(&req->bs->serialising_in_flight);
 req->serialising = true;
 }
 
@@ -532,7 +532,7 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 bool retry;
 bool waited = false;
 
-if (!bs->serialising_in_flight) {
+if (!atomic_read(&bs->serialising_in_flight)) {
 return false;
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4af44df..2fafd65 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -604,10 +604,6 @@ struct BlockDriverState {
 /* Callback before write request is processed */
 NotifierWithReturnList before_write_notifiers;
 
-/* number of in-flight requests; overall and serialising */
-unsigned int in_flight;
-unsigned int serialising_in_flight;
-
 bool wakeup;
 
 /* Offset after the highest byte written to */
@@ -634,6 +630,12 @@ struct BlockDriverState {
  */
 int copy_on_read;
 
+/* number of in-flight requests; overall and serialising.
+ * Accessed with atomic ops.
+ */
+unsigned int in_flight;
+unsigned int serialising_in_flight;
+
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
-- 
2.9.4




[Qemu-devel] [PULL 07/22] block: access io_limits_disabled with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Alberto Garcia 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-4-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/block-backend.c  | 4 ++--
 block/throttle-groups.c| 2 +-
 include/sysemu/block-backend.h | 3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f3a6008..e50ec03 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1953,7 +1953,7 @@ static void blk_root_drained_begin(BdrvChild *child)
 /* Note that blk->root may not be accessible here yet if we are just
  * attaching to a BlockDriverState that is drained. Use child instead. */
 
-if (blk->public.io_limits_disabled++ == 0) {
+if (atomic_fetch_inc(&blk->public.io_limits_disabled) == 0) {
 throttle_group_restart_blk(blk);
 }
 }
@@ -1964,7 +1964,7 @@ static void blk_root_drained_end(BdrvChild *child)
 assert(blk->quiesce_counter);
 
 assert(blk->public.io_limits_disabled);
---blk->public.io_limits_disabled;
+atomic_dec(&blk->public.io_limits_disabled);
 
 if (--blk->quiesce_counter == 0) {
 if (blk->dev_ops && blk->dev_ops->drained_end) {
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index b73e7a8..69bfbd4 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -240,7 +240,7 @@ static bool throttle_group_schedule_timer(BlockBackend 
*blk, bool is_write)
 ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
 bool must_wait;
 
-if (blkp->io_limits_disabled) {
+if (atomic_read(&blkp->io_limits_disabled)) {
 return false;
 }
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 840ad61..24b63d6 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -80,7 +80,8 @@ typedef struct BlockBackendPublic {
 CoQueue  throttled_reqs[2];
 
 /* Nonzero if the I/O limits are currently being ignored; generally
- * it is zero.  */
+ * it is zero.  Accessed with atomic operations.
+ */
 unsigned int io_limits_disabled;
 
 /* The following fields are protected by the ThrottleGroup lock.
-- 
2.9.4




[Qemu-devel] [PULL 11/22] throttle-groups: only start one coroutine from drained_begin

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Starting all waiting coroutines from bdrv_drain_all is unnecessary;
throttle_group_co_io_limits_intercept calls schedule_next_request as
soon as the coroutine restarts, which in turn will restart the next
request if possible.

If we only start the first request and let the coroutines dance from
there the code is simpler and there is more reuse between
throttle_group_config, throttle_group_restart_blk and timer_cb.  The
next patch will benefit from this.

We also stop accessing from throttle_group_restart_blk the
blkp->throttled_reqs CoQueues even when there was no
attached throttling group.  This worked but is not pretty.

The only thing that can interrupt the dance is the QEMU_CLOCK_VIRTUAL
timer when switching from one block device to the next, because the
timer is set to "now + 1" but QEMU_CLOCK_VIRTUAL might not be running.
Set that timer to point in the present ("now") rather than the future
and things work.

Reviewed-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-8-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/throttle-groups.c | 45 +
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 69bfbd4..85169ec 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -292,7 +292,7 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 } else {
 ThrottleTimers *tt = &blk_get_public(token)->throttle_timers;
 int64_t now = qemu_clock_get_ns(tt->clock_type);
-timer_mod(tt->timers[is_write], now + 1);
+timer_mod(tt->timers[is_write], now);
 tg->any_timer_armed[is_write] = true;
 }
 tg->tokens[is_write] = token;
@@ -340,15 +340,32 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(BlockBackend *blk,
 qemu_mutex_unlock(&tg->lock);
 }
 
+static void throttle_group_restart_queue(BlockBackend *blk, bool is_write)
+{
+BlockBackendPublic *blkp = blk_get_public(blk);
+ThrottleGroup *tg = container_of(blkp->throttle_state, ThrottleGroup, ts);
+bool empty_queue;
+
+aio_context_acquire(blk_get_aio_context(blk));
+empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
+aio_context_release(blk_get_aio_context(blk));
+
+/* If the request queue was empty then we have to take care of
+ * scheduling the next one */
+if (empty_queue) {
+qemu_mutex_lock(&tg->lock);
+schedule_next_request(blk, is_write);
+qemu_mutex_unlock(&tg->lock);
+}
+}
+
 void throttle_group_restart_blk(BlockBackend *blk)
 {
 BlockBackendPublic *blkp = blk_get_public(blk);
-int i;
 
-for (i = 0; i < 2; i++) {
-while (qemu_co_enter_next(&blkp->throttled_reqs[i])) {
-;
-}
+if (blkp->throttle_state) {
+throttle_group_restart_queue(blk, 0);
+throttle_group_restart_queue(blk, 1);
 }
 }
 
@@ -376,8 +393,7 @@ void throttle_group_config(BlockBackend *blk, 
ThrottleConfig *cfg)
 throttle_config(ts, tt, cfg);
 qemu_mutex_unlock(&tg->lock);
 
-qemu_co_enter_next(&blkp->throttled_reqs[0]);
-qemu_co_enter_next(&blkp->throttled_reqs[1]);
+throttle_group_restart_blk(blk);
 }
 
 /* Get the throttle configuration from a particular group. Similar to
@@ -408,7 +424,6 @@ static void timer_cb(BlockBackend *blk, bool is_write)
 BlockBackendPublic *blkp = blk_get_public(blk);
 ThrottleState *ts = blkp->throttle_state;
 ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-bool empty_queue;
 
 /* The timer has just been fired, so we can update the flag */
 qemu_mutex_lock(&tg->lock);
@@ -416,17 +431,7 @@ static void timer_cb(BlockBackend *blk, bool is_write)
 qemu_mutex_unlock(&tg->lock);
 
 /* Run the request that was waiting for this timer */
-aio_context_acquire(blk_get_aio_context(blk));
-empty_queue = !qemu_co_enter_next(&blkp->throttled_reqs[is_write]);
-aio_context_release(blk_get_aio_context(blk));
-
-/* If the request queue was empty then we have to take care of
- * scheduling the next one */
-if (empty_queue) {
-qemu_mutex_lock(&tg->lock);
-schedule_next_request(blk, is_write);
-qemu_mutex_unlock(&tg->lock);
-}
+throttle_group_restart_queue(blk, is_write);
 }
 
 static void read_timer_cb(void *opaque)
-- 
2.9.4




[Qemu-devel] [PULL 09/22] block: access wakeup with atomic ops

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-6-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/io.c| 3 ++-
 block/nfs.c   | 4 +++-
 block/sheepdog.c  | 3 ++-
 include/block/block.h | 5 +++--
 include/block/block_int.h | 7 +--
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index cc5de2c..3a98b02 100644
--- a/block/io.c
+++ b/block/io.c
@@ -514,7 +514,8 @@ static void dummy_bh_cb(void *opaque)
 
 void bdrv_wakeup(BlockDriverState *bs)
 {
-if (bs->wakeup) {
+/* The barrier (or an atomic op) is in the caller.  */
+if (atomic_read(&bs->wakeup)) {
 aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
 }
 }
diff --git a/block/nfs.c b/block/nfs.c
index 848b2c0..18c87d2 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -730,7 +730,9 @@ nfs_get_allocated_file_size_cb(int ret, struct nfs_context 
*nfs, void *data,
 if (task->ret < 0) {
 error_report("NFS Error: %s", nfs_get_error(nfs));
 }
-task->complete = 1;
+
+/* Set task->complete before reading bs->wakeup.  */
+atomic_mb_set(&task->complete, 1);
 bdrv_wakeup(task->bs);
 }
 
diff --git a/block/sheepdog.c b/block/sheepdog.c
index a18315a..5ebf5d9 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -698,7 +698,8 @@ out:
 
 srco->co = NULL;
 srco->ret = ret;
-srco->finished = true;
+/* Set srco->finished before reading bs->wakeup.  */
+atomic_mb_set(&srco->finished, true);
 if (srco->bs) {
 bdrv_wakeup(srco->bs);
 }
diff --git a/include/block/block.h b/include/block/block.h
index 9b355e9..a4f09df 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -402,7 +402,8 @@ void bdrv_drain_all(void);
  * block_job_defer_to_main_loop for how to do it). \
  */\
 assert(!bs_->wakeup);  \
-bs_->wakeup = true;\
+/* Set bs->wakeup before evaluating cond.  */  \
+atomic_mb_set(&bs_->wakeup, true); \
 while (busy_) {\
 if ((cond)) {  \
 waited_ = busy_ = true;\
@@ -414,7 +415,7 @@ void bdrv_drain_all(void);
 waited_ |= busy_;  \
 }  \
 }  \
-bs_->wakeup = false;   \
+atomic_set(&bs_->wakeup, false);   \
 }  \
 waited_; })
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2fafd65..b690e58 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -604,8 +604,6 @@ struct BlockDriverState {
 /* Callback before write request is processed */
 NotifierWithReturnList before_write_notifiers;
 
-bool wakeup;
-
 /* Offset after the highest byte written to */
 uint64_t wr_highest_offset;
 
@@ -636,6 +634,11 @@ struct BlockDriverState {
 unsigned int in_flight;
 unsigned int serialising_in_flight;
 
+/* Internal to BDRV_POLL_WHILE and bdrv_wakeup.  Accessed with atomic
+ * ops.
+ */
+bool wakeup;
+
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
-- 
2.9.4




[Qemu-devel] [PULL 15/22] block: use Stat64 for wr_highest_offset

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-12-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/io.c| 4 +---
 block/qapi.c  | 2 +-
 include/block/block_int.h | 7 ---
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index f34d030..afb8294 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1418,9 +1418,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild 
*child,
 ++bs->write_gen;
 bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
 
-if (bs->wr_highest_offset < offset + bytes) {
-bs->wr_highest_offset = offset + bytes;
-}
+stat64_max(&bs->wr_highest_offset, offset + bytes);
 
 if (ret >= 0) {
 bs->total_sectors = MAX(bs->total_sectors, end_sector);
diff --git a/block/qapi.c b/block/qapi.c
index a40922e..14b60ae 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -441,7 +441,7 @@ static BlockStats *bdrv_query_bds_stats(const 
BlockDriverState *bs,
 s->node_name = g_strdup(bdrv_get_node_name(bs));
 }
 
-s->stats->wr_highest_offset = bs->wr_highest_offset;
+s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset);
 
 if (bs->file) {
 s->has_parent = true;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index d54ed75..a23e4bb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -29,6 +29,7 @@
 #include "qemu/option.h"
 #include "qemu/queue.h"
 #include "qemu/coroutine.h"
+#include "qemu/stats64.h"
 #include "qemu/timer.h"
 #include "qapi-types.h"
 #include "qemu/hbitmap.h"
@@ -604,9 +605,6 @@ struct BlockDriverState {
 /* Callback before write request is processed */
 NotifierWithReturnList before_write_notifiers;
 
-/* Offset after the highest byte written to */
-uint64_t wr_highest_offset;
-
 /* threshold limit for writes, in bytes. "High water mark". */
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
@@ -619,6 +617,9 @@ struct BlockDriverState {
 
 QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
 
+/* Offset after the highest byte written to */
+Stat64 wr_highest_offset;
+
 /* If true, copy read backing sectors into image.  Can be >1 if more
  * than one client has requested copy-on-read.  Accessed with atomic
  * ops.
-- 
2.9.4




[Qemu-devel] [PULL 19/22] migration/block: reset dirty bitmap before reading

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Any data that is returned by read may be stale already, the bitmap
has to be cleared before issuing the read.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-16-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 migration/block.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/block.c b/migration/block.c
index 9e9f031..8fe484e 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -536,6 +536,8 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 } else {
 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
 }
+bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, sector, nr_sectors);
+
 blk = g_new(BlkMigBlock, 1);
 blk->buf = g_malloc(BLOCK_SIZE);
 blk->bmds = bmds;
@@ -568,7 +570,6 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 g_free(blk);
 }
 
-bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, sector, nr_sectors);
 sector += nr_sectors;
 bmds->cur_dirty = sector;
 
-- 
2.9.4




[Qemu-devel] [PULL 17/22] block: protect tracked_requests and flush_queue with reqs_lock

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-14-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block.c   |  1 +
 block/io.c| 16 ++--
 include/block/block_int.h | 14 +-
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 45a77f5..d297188 100644
--- a/block.c
+++ b/block.c
@@ -280,6 +280,7 @@ BlockDriverState *bdrv_new(void)
 QLIST_INIT(&bs->op_blockers[i]);
 }
 notifier_with_return_list_init(&bs->before_write_notifiers);
+qemu_co_mutex_init(&bs->reqs_lock);
 bs->refcnt = 1;
 bs->aio_context = qemu_get_aio_context();
 
diff --git a/block/io.c b/block/io.c
index 8021b5b..caca63a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -391,8 +391,10 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 atomic_dec(&req->bs->serialising_in_flight);
 }
 
+qemu_co_mutex_lock(&req->bs->reqs_lock);
 QLIST_REMOVE(req, list);
 qemu_co_queue_restart_all(&req->wait_queue);
+qemu_co_mutex_unlock(&req->bs->reqs_lock);
 }
 
 /**
@@ -417,7 +419,9 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 
 qemu_co_queue_init(&req->wait_queue);
 
+qemu_co_mutex_lock(&bs->reqs_lock);
 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
+qemu_co_mutex_unlock(&bs->reqs_lock);
 }
 
 static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align)
@@ -539,6 +543,7 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 
 do {
 retry = false;
+qemu_co_mutex_lock(&bs->reqs_lock);
 QLIST_FOREACH(req, &bs->tracked_requests, list) {
 if (req == self || (!req->serialising && !self->serialising)) {
 continue;
@@ -557,7 +562,7 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
  * (instead of producing a deadlock in the former case). */
 if (!req->waiting_for) {
 self->waiting_for = req;
-qemu_co_queue_wait(&req->wait_queue, NULL);
+qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
 self->waiting_for = NULL;
 retry = true;
 waited = true;
@@ -565,6 +570,7 @@ static bool coroutine_fn 
wait_serialising_requests(BdrvTrackedRequest *self)
 }
 }
 }
+qemu_co_mutex_unlock(&bs->reqs_lock);
 } while (retry);
 
 return waited;
@@ -2304,14 +2310,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 goto early_exit;
 }
 
+qemu_co_mutex_lock(&bs->reqs_lock);
 current_gen = atomic_read(&bs->write_gen);
 
 /* Wait until any previous flushes are completed */
 while (bs->active_flush_req) {
-qemu_co_queue_wait(&bs->flush_queue, NULL);
+qemu_co_queue_wait(&bs->flush_queue, &bs->reqs_lock);
 }
 
+/* Flushes reach this point in nondecreasing current_gen order.  */
 bs->active_flush_req = true;
+qemu_co_mutex_unlock(&bs->reqs_lock);
 
 /* Write back all layers by calling one driver function */
 if (bs->drv->bdrv_co_flush) {
@@ -2383,9 +2392,12 @@ out:
 if (ret == 0) {
 bs->flushed_gen = current_gen;
 }
+
+qemu_co_mutex_lock(&bs->reqs_lock);
 bs->active_flush_req = false;
 /* Return value is ignored - it's ok if wait queue is empty */
 qemu_co_queue_next(&bs->flush_queue);
+qemu_co_mutex_unlock(&bs->reqs_lock);
 
 early_exit:
 bdrv_dec_in_flight(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 74d6e62..4b2c594 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -609,11 +609,6 @@ struct BlockDriverState {
 uint64_t write_threshold_offset;
 NotifierWithReturn write_threshold_notifier;
 
-QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
-CoQueue flush_queue;  /* Serializing flush queue */
-bool active_flush_req;/* Flush request in flight? */
-unsigned int flushed_gen; /* Flushed write generation */
-
 QLIST_HEAD(, BdrvDirtyBitmap) dirty_bitmaps;
 
 /* Offset after the highest byte written to */
@@ -647,6 +642,15 @@ struct BlockDriverState {
 /* Accessed with atomic ops.  */
 int quiesce_counter;
 unsigned int write_gen;   /* Current data generation */
+
+/* Protected by reqs_lock.  */
+CoMutex reqs_lock;
+QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
+CoQueue flush_queue;  /* Serializing flush queue */
+bool active_flush_req;/* Flush request in flight? */
+
+/* Only read/written by whoever has set active_flush_req to true.  */
+unsigned int flushed_gen; /* Flushed write generation */
 };
 
 struct BlockBackendRootState {
-- 
2.9.4




[Qemu-devel] [PULL 18/22] block: introduce dirty_bitmap_mutex

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

It protects only the list of dirty bitmaps; in the next patch we will
also protect their content.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-15-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c  | 44 +++-
 block/mirror.c|  3 ++-
 blockdev.c| 44 +++-
 include/block/block_int.h |  5 +
 migration/block.c |  6 --
 5 files changed, 57 insertions(+), 45 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 519737c..fa78109 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -52,6 +52,17 @@ struct BdrvDirtyBitmapIter {
 BdrvDirtyBitmap *bitmap;
 };
 
+static inline void bdrv_dirty_bitmaps_lock(BlockDriverState *bs)
+{
+qemu_mutex_lock(&bs->dirty_bitmap_mutex);
+}
+
+static inline void bdrv_dirty_bitmaps_unlock(BlockDriverState *bs)
+{
+qemu_mutex_unlock(&bs->dirty_bitmap_mutex);
+}
+
+/* Called with BQL or dirty_bitmap lock taken.  */
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
 {
 BdrvDirtyBitmap *bm;
@@ -65,6 +76,7 @@ BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, 
const char *name)
 return NULL;
 }
 
+/* Called with BQL taken.  */
 void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
 {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
@@ -72,6 +84,7 @@ void bdrv_dirty_bitmap_make_anon(BdrvDirtyBitmap *bitmap)
 bitmap->name = NULL;
 }
 
+/* Called with BQL taken.  */
 BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs,
   uint32_t granularity,
   const char *name,
@@ -100,7 +113,9 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
 bitmap->disabled = false;
+bdrv_dirty_bitmaps_lock(bs);
 QLIST_INSERT_HEAD(&bs->dirty_bitmaps, bitmap, list);
+bdrv_dirty_bitmaps_unlock(bs);
 return bitmap;
 }
 
@@ -164,16 +179,19 @@ const char *bdrv_dirty_bitmap_name(const BdrvDirtyBitmap 
*bitmap)
 return bitmap->name;
 }
 
+/* Called with BQL taken.  */
 bool bdrv_dirty_bitmap_frozen(BdrvDirtyBitmap *bitmap)
 {
 return bitmap->successor;
 }
 
+/* Called with BQL taken.  */
 bool bdrv_dirty_bitmap_enabled(BdrvDirtyBitmap *bitmap)
 {
 return !(bitmap->disabled || bitmap->successor);
 }
 
+/* Called with BQL taken.  */
 DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap)
 {
 if (bdrv_dirty_bitmap_frozen(bitmap)) {
@@ -188,6 +206,7 @@ DirtyBitmapStatus bdrv_dirty_bitmap_status(BdrvDirtyBitmap 
*bitmap)
 /**
  * Create a successor bitmap destined to replace this bitmap after an 
operation.
  * Requires that the bitmap is not frozen and has no successor.
+ * Called with BQL taken.
  */
 int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
BdrvDirtyBitmap *bitmap, Error **errp)
@@ -220,6 +239,7 @@ int bdrv_dirty_bitmap_create_successor(BlockDriverState *bs,
 /**
  * For a bitmap with a successor, yield our name to the successor,
  * delete the old bitmap, and return a handle to the new bitmap.
+ * Called with BQL taken.
  */
 BdrvDirtyBitmap *bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
 BdrvDirtyBitmap *bitmap,
@@ -247,6 +267,7 @@ BdrvDirtyBitmap 
*bdrv_dirty_bitmap_abdicate(BlockDriverState *bs,
  * In cases of failure where we can no longer safely delete the parent,
  * we may wish to re-join the parent and child/successor.
  * The merged parent will be un-frozen, but not explicitly re-enabled.
+ * Called with BQL taken.
  */
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
BdrvDirtyBitmap *parent,
@@ -271,25 +292,30 @@ BdrvDirtyBitmap 
*bdrv_reclaim_dirty_bitmap(BlockDriverState *bs,
 
 /**
  * Truncates _all_ bitmaps attached to a BDS.
+ * Called with BQL taken.
  */
 void bdrv_dirty_bitmap_truncate(BlockDriverState *bs)
 {
 BdrvDirtyBitmap *bitmap;
 uint64_t size = bdrv_nb_sectors(bs);
 
+bdrv_dirty_bitmaps_lock(bs);
 QLIST_FOREACH(bitmap, &bs->dirty_bitmaps, list) {
 assert(!bdrv_dirty_bitmap_frozen(bitmap));
 assert(!bitmap->active_iterators);
 hbitmap_truncate(bitmap->bitmap, size);
 bitmap->size = size;
 }
+bdrv_dirty_bitmaps_unlock(bs);
 }
 
+/* Called with BQL taken.  */
 static void bdrv_do_release_matching_dirty_bitmap(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap,
   bool only_named)
 {
 BdrvDirtyBitmap *bm, *next;
+bdrv_dirty_bitmaps_lock(bs);
 QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
 if ((!bitmap || bm == bitmap) && (

[Qemu-devel] [PULL 20/22] block: protect modification of dirty bitmaps with a mutex

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-17-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/dirty-bitmap.c | 70 ++--
 block/mirror.c   | 11 +--
 include/block/block_int.h|  4 +--
 include/block/dirty-bitmap.h | 25 +++-
 migration/block.c| 10 ---
 5 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index fa78109..a04c6e4 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -37,6 +37,7 @@
  * or enabled. A frozen bitmap can only abdicate() or reclaim().
  */
 struct BdrvDirtyBitmap {
+QemuMutex *mutex;
 HBitmap *bitmap;/* Dirty sector bitmap implementation */
 HBitmap *meta;  /* Meta dirty bitmap */
 BdrvDirtyBitmap *successor; /* Anonymous child; implies frozen status */
@@ -62,6 +63,16 @@ static inline void 
bdrv_dirty_bitmaps_unlock(BlockDriverState *bs)
 qemu_mutex_unlock(&bs->dirty_bitmap_mutex);
 }
 
+void bdrv_dirty_bitmap_lock(BdrvDirtyBitmap *bitmap)
+{
+qemu_mutex_lock(bitmap->mutex);
+}
+
+void bdrv_dirty_bitmap_unlock(BdrvDirtyBitmap *bitmap)
+{
+qemu_mutex_unlock(bitmap->mutex);
+}
+
 /* Called with BQL or dirty_bitmap lock taken.  */
 BdrvDirtyBitmap *bdrv_find_dirty_bitmap(BlockDriverState *bs, const char *name)
 {
@@ -109,6 +120,7 @@ BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState 
*bs,
 return NULL;
 }
 bitmap = g_new0(BdrvDirtyBitmap, 1);
+bitmap->mutex = &bs->dirty_bitmap_mutex;
 bitmap->bitmap = hbitmap_alloc(bitmap_size, ctz32(sector_granularity));
 bitmap->size = bitmap_size;
 bitmap->name = g_strdup(name);
@@ -134,20 +146,24 @@ void bdrv_create_meta_dirty_bitmap(BdrvDirtyBitmap 
*bitmap,
int chunk_size)
 {
 assert(!bitmap->meta);
+qemu_mutex_lock(bitmap->mutex);
 bitmap->meta = hbitmap_create_meta(bitmap->bitmap,
chunk_size * BITS_PER_BYTE);
+qemu_mutex_unlock(bitmap->mutex);
 }
 
 void bdrv_release_meta_dirty_bitmap(BdrvDirtyBitmap *bitmap)
 {
 assert(bitmap->meta);
+qemu_mutex_lock(bitmap->mutex);
 hbitmap_free_meta(bitmap->bitmap);
 bitmap->meta = NULL;
+qemu_mutex_unlock(bitmap->mutex);
 }
 
-int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
-   BdrvDirtyBitmap *bitmap, int64_t sector,
-   int nb_sectors)
+int bdrv_dirty_bitmap_get_meta_locked(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, int64_t sector,
+  int nb_sectors)
 {
 uint64_t i;
 int sectors_per_bit = 1 << hbitmap_granularity(bitmap->meta);
@@ -162,11 +178,26 @@ int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
 return false;
 }
 
+int bdrv_dirty_bitmap_get_meta(BlockDriverState *bs,
+   BdrvDirtyBitmap *bitmap, int64_t sector,
+   int nb_sectors)
+{
+bool dirty;
+
+qemu_mutex_lock(bitmap->mutex);
+dirty = bdrv_dirty_bitmap_get_meta_locked(bs, bitmap, sector, nb_sectors);
+qemu_mutex_unlock(bitmap->mutex);
+
+return dirty;
+}
+
 void bdrv_dirty_bitmap_reset_meta(BlockDriverState *bs,
   BdrvDirtyBitmap *bitmap, int64_t sector,
   int nb_sectors)
 {
+qemu_mutex_lock(bitmap->mutex);
 hbitmap_reset(bitmap->meta, sector, nb_sectors);
+qemu_mutex_unlock(bitmap->mutex);
 }
 
 int64_t bdrv_dirty_bitmap_size(const BdrvDirtyBitmap *bitmap)
@@ -393,8 +424,9 @@ BlockDirtyInfoList 
*bdrv_query_dirty_bitmaps(BlockDriverState *bs)
 return list;
 }
 
-int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
-   int64_t sector)
+/* Called within bdrv_dirty_bitmap_lock..unlock */
+int bdrv_get_dirty_locked(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+  int64_t sector)
 {
 if (bitmap) {
 return hbitmap_get(bitmap->bitmap, sector);
@@ -467,23 +499,42 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 return hbitmap_iter_next(&iter->hbi);
 }
 
-void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
-   int64_t cur_sector, int64_t nr_sectors)
+/* Called within bdrv_dirty_bitmap_lock..unlock */
+void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
+  int64_t cur_sector, int64_t nr_sectors)
 {
 assert(bdrv_dirty_bitmap_enabled(bitmap));
 hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
 }
 
-void bdrv_reset_dirty_bitmap(BdrvDirtyBitmap *bitmap,
- int64_t cur_sector, int64_t nr_sectors)
+void bdrv_set_dirty_bitmap(BdrvDirtyBitmap *bitmap,
+   int64_t cur_sector, int64_t nr_sectors)
+{
+bdrv_dir

[Qemu-devel] [PULL 13/22] throttle-groups: protect throttled requests with a CoMutex

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

Another possibility is to use tg->lock, which we're holding anyway in
both schedule_next_request and throttle_group_co_io_limits_intercept.
This would require open-coding the CoQueue however, so I've chosen this
alternative.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-10-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/block-backend.c  |  1 +
 block/throttle-groups.c| 12 ++--
 include/sysemu/block-backend.h |  7 ++-
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index e50ec03..be2ddf1 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -216,6 +216,7 @@ BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm)
 blk->shared_perm = shared_perm;
 blk_set_enable_write_cache(blk, true);
 
+qemu_co_mutex_init(&blk->public.throttled_reqs_lock);
 qemu_co_queue_init(&blk->public.throttled_reqs[0]);
 qemu_co_queue_init(&blk->public.throttled_reqs[1]);
 
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 8bf1031..a181cb1 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -270,8 +270,13 @@ static bool coroutine_fn 
throttle_group_co_restart_queue(BlockBackend *blk,
  bool is_write)
 {
 BlockBackendPublic *blkp = blk_get_public(blk);
+bool ret;
 
-return qemu_co_queue_next(&blkp->throttled_reqs[is_write]);
+qemu_co_mutex_lock(&blkp->throttled_reqs_lock);
+ret = qemu_co_queue_next(&blkp->throttled_reqs[is_write]);
+qemu_co_mutex_unlock(&blkp->throttled_reqs_lock);
+
+return ret;
 }
 
 /* Look for the next pending I/O request and schedule it.
@@ -340,7 +345,10 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(BlockBackend *blk,
 if (must_wait || blkp->pending_reqs[is_write]) {
 blkp->pending_reqs[is_write]++;
 qemu_mutex_unlock(&tg->lock);
-qemu_co_queue_wait(&blkp->throttled_reqs[is_write], NULL);
+qemu_co_mutex_lock(&blkp->throttled_reqs_lock);
+qemu_co_queue_wait(&blkp->throttled_reqs[is_write],
+   &blkp->throttled_reqs_lock);
+qemu_co_mutex_unlock(&blkp->throttled_reqs_lock);
 qemu_mutex_lock(&tg->lock);
 blkp->pending_reqs[is_write]--;
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 24b63d6..999eb23 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -72,11 +72,8 @@ typedef struct BlockDevOps {
  * fields that must be public. This is in particular for QLIST_ENTRY() and
  * friends so that BlockBackends can be kept in lists outside block-backend.c 
*/
 typedef struct BlockBackendPublic {
-/* I/O throttling has its own locking, but also some fields are
- * protected by the AioContext lock.
- */
-
-/* Protected by AioContext lock.  */
+/* throttled_reqs_lock protects the CoQueues for throttled requests.  */
+CoMutex  throttled_reqs_lock;
 CoQueue  throttled_reqs[2];
 
 /* Nonzero if the I/O limits are currently being ignored; generally
-- 
2.9.4




[Qemu-devel] [PULL 14/22] util: add stats64 module

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

This module provides fast paths for 64-bit atomic operations on machines
that only have 32-bit atomic access.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-11-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 include/qemu/stats64.h | 193 +
 util/Makefile.objs |   1 +
 util/stats64.c | 136 ++
 3 files changed, 330 insertions(+)
 create mode 100644 include/qemu/stats64.h
 create mode 100644 util/stats64.c

diff --git a/include/qemu/stats64.h b/include/qemu/stats64.h
new file mode 100644
index 000..4a357b3
--- /dev/null
+++ b/include/qemu/stats64.h
@@ -0,0 +1,193 @@
+/*
+ * Atomic operations on 64-bit quantities.
+ *
+ * Copyright (C) 2017 Red Hat, Inc.
+ *
+ * Author: Paolo Bonzini 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_STATS64_H
+#define QEMU_STATS64_H 1
+
+#include "qemu/atomic.h"
+
+/* This provides atomic operations on 64-bit type, using a reader-writer
+ * spinlock on architectures that do not have 64-bit accesses.  Even on
+ * those architectures, it tries hard not to take the lock.
+ */
+
+typedef struct Stat64 {
+#ifdef CONFIG_ATOMIC64
+uint64_t value;
+#else
+uint32_t low, high;
+uint32_t lock;
+#endif
+} Stat64;
+
+#ifdef CONFIG_ATOMIC64
+static inline void stat64_init(Stat64 *s, uint64_t value)
+{
+/* This is not guaranteed to be atomic! */
+*s = (Stat64) { value };
+}
+
+static inline uint64_t stat64_get(const Stat64 *s)
+{
+return atomic_read__nocheck(&s->value);
+}
+
+static inline void stat64_add(Stat64 *s, uint64_t value)
+{
+atomic_add(&s->value, value);
+}
+
+static inline void stat64_min(Stat64 *s, uint64_t value)
+{
+uint64_t orig = atomic_read__nocheck(&s->value);
+while (orig > value) {
+orig = atomic_cmpxchg__nocheck(&s->value, orig, value);
+}
+}
+
+static inline void stat64_max(Stat64 *s, uint64_t value)
+{
+uint64_t orig = atomic_read__nocheck(&s->value);
+while (orig < value) {
+orig = atomic_cmpxchg__nocheck(&s->value, orig, value);
+}
+}
+#else
+uint64_t stat64_get(const Stat64 *s);
+bool stat64_min_slow(Stat64 *s, uint64_t value);
+bool stat64_max_slow(Stat64 *s, uint64_t value);
+bool stat64_add32_carry(Stat64 *s, uint32_t low, uint32_t high);
+
+static inline void stat64_init(Stat64 *s, uint64_t value)
+{
+/* This is not guaranteed to be atomic! */
+*s = (Stat64) { .low = value, .high = value >> 32, .lock = 0 };
+}
+
+static inline void stat64_add(Stat64 *s, uint64_t value)
+{
+uint32_t low, high;
+high = value >> 32;
+low = (uint32_t) value;
+if (!low) {
+if (high) {
+atomic_add(&s->high, high);
+}
+return;
+}
+
+for (;;) {
+uint32_t orig = s->low;
+uint32_t result = orig + low;
+uint32_t old;
+
+if (result < low || high) {
+/* If the high part is affected, take the lock.  */
+if (stat64_add32_carry(s, low, high)) {
+return;
+}
+continue;
+}
+
+/* No carry, try with a 32-bit cmpxchg.  The result is independent of
+ * the high 32 bits, so it can race just fine with stat64_add32_carry
+ * and even stat64_get!
+ */
+old = atomic_cmpxchg(&s->low, orig, result);
+if (orig == old) {
+return;
+}
+}
+}
+
+static inline void stat64_min(Stat64 *s, uint64_t value)
+{
+uint32_t low, high;
+uint32_t orig_low, orig_high;
+
+high = value >> 32;
+low = (uint32_t) value;
+do {
+orig_high = atomic_read(&s->high);
+if (orig_high < high) {
+return;
+}
+
+if (orig_high == high) {
+/* High 32 bits are equal.  Read low after high, otherwise we
+ * can get a false positive (e.g. 0x1235,0x changes to
+ * 0x1234,0x8000 and we read it as 0x1234,0x). Pairs with
+ * the write barrier in stat64_min_slow.
+ */
+smp_rmb();
+orig_low = atomic_read(&s->low);
+if (orig_low <= low) {
+return;
+}
+
+/* See if we were lucky and a writer raced against us.  The
+ * barrier is theoretically unnecessary, but if we remove it
+ * we may miss being lucky.
+ */
+smp_rmb();
+orig_high = atomic_read(&s->high);
+if (orig_high < high) {
+return;
+}
+}
+
+/* If the value changes in any way, we have to take the lock.  */
+} while (!stat64_min_slow(s, value));
+}
+
+static inline void stat64_max(Stat64 *s, uint64_t value)
+{
+uint32_t low, high;
+uint32_t orig_low, orig_high;
+
+high = value >> 32;
+low

[Qemu-devel] [PULL 21/22] block: introduce block_account_one_io

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

This is the common code to account operations that produced actual I/O.

Reviewed-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-18-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/accounting.c | 51 ++-
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index 3f457c4..a279e0b 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -86,7 +86,8 @@ void block_acct_start(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
 cookie->type = type;
 }
 
-void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
+static void block_account_one_io(BlockAcctStats *stats, BlockAcctCookie 
*cookie,
+ bool failed)
 {
 BlockAcctTimedStats *s;
 int64_t time_ns = qemu_clock_get_ns(clock_type);
@@ -98,31 +99,14 @@ void block_acct_done(BlockAcctStats *stats, BlockAcctCookie 
*cookie)
 
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
-stats->nr_bytes[cookie->type] += cookie->bytes;
-stats->nr_ops[cookie->type]++;
-stats->total_time_ns[cookie->type] += latency_ns;
-stats->last_access_time_ns = time_ns;
-
-QSLIST_FOREACH(s, &stats->intervals, entries) {
-timed_average_account(&s->latency[cookie->type], latency_ns);
+if (failed) {
+stats->failed_ops[cookie->type]++;
+} else {
+stats->nr_bytes[cookie->type] += cookie->bytes;
+stats->nr_ops[cookie->type]++;
 }
-}
-
-void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
-{
-assert(cookie->type < BLOCK_MAX_IOTYPE);
-
-stats->failed_ops[cookie->type]++;
-
-if (stats->account_failed) {
-BlockAcctTimedStats *s;
-int64_t time_ns = qemu_clock_get_ns(clock_type);
-int64_t latency_ns = time_ns - cookie->start_time_ns;
-
-if (qtest_enabled()) {
-latency_ns = qtest_latency_ns;
-}
 
+if (!failed || stats->account_failed) {
 stats->total_time_ns[cookie->type] += latency_ns;
 stats->last_access_time_ns = time_ns;
 
@@ -132,15 +116,24 @@ void block_acct_failed(BlockAcctStats *stats, 
BlockAcctCookie *cookie)
 }
 }
 
+void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
+{
+block_account_one_io(stats, cookie, false);
+}
+
+void block_acct_failed(BlockAcctStats *stats, BlockAcctCookie *cookie)
+{
+block_account_one_io(stats, cookie, true);
+}
+
 void block_acct_invalid(BlockAcctStats *stats, enum BlockAcctType type)
 {
 assert(type < BLOCK_MAX_IOTYPE);
 
-/* block_acct_done() and block_acct_failed() update
- * total_time_ns[], but this one does not. The reason is that
- * invalid requests are accounted during their submission,
- * therefore there's no actual I/O involved. */
-
+/* block_account_one_io() updates total_time_ns[], but this one does
+ * not.  The reason is that invalid requests are accounted during their
+ * submission, therefore there's no actual I/O involved.
+ */
 stats->invalid_ops[type]++;
 
 if (stats->account_invalid) {
-- 
2.9.4




[Qemu-devel] [PULL 22/22] block: make accounting thread-safe

2017-05-26 Thread Fam Zheng
From: Paolo Bonzini 

I'm not trying too hard yet.  Later, with multiqueue support,
this may cause mutex contention or cacheline bouncing.

Reviewed-by: Alberto Garcia 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
Message-Id: <20170525163225.29954-19-pbonz...@redhat.com>
Signed-off-by: Fam Zheng 
---
 block/accounting.c | 16 
 include/block/accounting.h |  8 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/block/accounting.c b/block/accounting.c
index a279e0b..37ed66f 100644
--- a/block/accounting.c
+++ b/block/accounting.c
@@ -35,6 +35,7 @@ static const int qtest_latency_ns = NANOSECONDS_PER_SECOND / 
1000;
 void block_acct_init(BlockAcctStats *stats, bool account_invalid,
  bool account_failed)
 {
+qemu_mutex_init(&stats->lock);
 stats->account_invalid = account_invalid;
 stats->account_failed = account_failed;
 
@@ -49,6 +50,7 @@ void block_acct_cleanup(BlockAcctStats *stats)
 QSLIST_FOREACH_SAFE(s, &stats->intervals, entries, next) {
 g_free(s);
 }
+qemu_mutex_destroy(&stats->lock);
 }
 
 void block_acct_add_interval(BlockAcctStats *stats, unsigned interval_length)
@@ -58,12 +60,15 @@ void block_acct_add_interval(BlockAcctStats *stats, 
unsigned interval_length)
 
 s = g_new0(BlockAcctTimedStats, 1);
 s->interval_length = interval_length;
+s->stats = stats;
+qemu_mutex_lock(&stats->lock);
 QSLIST_INSERT_HEAD(&stats->intervals, s, entries);
 
 for (i = 0; i < BLOCK_MAX_IOTYPE; i++) {
 timed_average_init(&s->latency[i], clock_type,
(uint64_t) interval_length * 
NANOSECONDS_PER_SECOND);
 }
+qemu_mutex_unlock(&stats->lock);
 }
 
 BlockAcctTimedStats *block_acct_interval_next(BlockAcctStats *stats,
@@ -99,6 +104,8 @@ static void block_account_one_io(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 
 assert(cookie->type < BLOCK_MAX_IOTYPE);
 
+qemu_mutex_lock(&stats->lock);
+
 if (failed) {
 stats->failed_ops[cookie->type]++;
 } else {
@@ -114,6 +121,8 @@ static void block_account_one_io(BlockAcctStats *stats, 
BlockAcctCookie *cookie,
 timed_average_account(&s->latency[cookie->type], latency_ns);
 }
 }
+
+qemu_mutex_unlock(&stats->lock);
 }
 
 void block_acct_done(BlockAcctStats *stats, BlockAcctCookie *cookie)
@@ -134,18 +143,23 @@ void block_acct_invalid(BlockAcctStats *stats, enum 
BlockAcctType type)
  * not.  The reason is that invalid requests are accounted during their
  * submission, therefore there's no actual I/O involved.
  */
+qemu_mutex_lock(&stats->lock);
 stats->invalid_ops[type]++;
 
 if (stats->account_invalid) {
 stats->last_access_time_ns = qemu_clock_get_ns(clock_type);
 }
+qemu_mutex_unlock(&stats->lock);
 }
 
 void block_acct_merge_done(BlockAcctStats *stats, enum BlockAcctType type,
   int num_requests)
 {
 assert(type < BLOCK_MAX_IOTYPE);
+
+qemu_mutex_lock(&stats->lock);
 stats->merged[type] += num_requests;
+qemu_mutex_unlock(&stats->lock);
 }
 
 int64_t block_acct_idle_time_ns(BlockAcctStats *stats)
@@ -160,7 +174,9 @@ double block_acct_queue_depth(BlockAcctTimedStats *stats,
 
 assert(type < BLOCK_MAX_IOTYPE);
 
+qemu_mutex_lock(&stats->stats->lock);
 sum = timed_average_sum(&stats->latency[type], &elapsed);
+qemu_mutex_unlock(&stats->stats->lock);
 
 return (double) sum / elapsed;
 }
diff --git a/include/block/accounting.h b/include/block/accounting.h
index 2089163..2687be7 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -26,8 +26,10 @@
 #define BLOCK_ACCOUNTING_H
 
 #include "qemu/timed-average.h"
+#include "qemu/thread.h"
 
 typedef struct BlockAcctTimedStats BlockAcctTimedStats;
+typedef struct BlockAcctStats BlockAcctStats;
 
 enum BlockAcctType {
 BLOCK_ACCT_READ,
@@ -37,12 +39,14 @@ enum BlockAcctType {
 };
 
 struct BlockAcctTimedStats {
+BlockAcctStats *stats;
 TimedAverage latency[BLOCK_MAX_IOTYPE];
 unsigned interval_length; /* in seconds */
 QSLIST_ENTRY(BlockAcctTimedStats) entries;
 };
 
-typedef struct BlockAcctStats {
+struct BlockAcctStats {
+QemuMutex lock;
 uint64_t nr_bytes[BLOCK_MAX_IOTYPE];
 uint64_t nr_ops[BLOCK_MAX_IOTYPE];
 uint64_t invalid_ops[BLOCK_MAX_IOTYPE];
@@ -53,7 +57,7 @@ typedef struct BlockAcctStats {
 QSLIST_HEAD(, BlockAcctTimedStats) intervals;
 bool account_invalid;
 bool account_failed;
-} BlockAcctStats;
+};
 
 typedef struct BlockAcctCookie {
 int64_t bytes;
-- 
2.9.4




Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

2017-05-26 Thread Kevin Wolf
Am 23.05.2017 um 13:22 hat Alberto Garcia geschrieben:
> Instead of calling perform_cow() twice with a different COW region
> each time, call it just once and make perform_cow() handle both
> regions.
> 
> This patch simply moves code around. The next one will do the actual
> reordering of the COW operations.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 38 +++---
>  1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 6757875ec9..59a0ffba1a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState 
> *bs,
>  struct iovec iov;
>  int ret;
>  
> +if (bytes == 0) {
> +return 0;
> +}
> +
>  iov.iov_len = bytes;
>  iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
>  if (iov.iov_base == NULL) {
> @@ -751,31 +755,40 @@ uint64_t 
> qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>  return cluster_offset;
>  }
>  
> -static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion 
> *r)
> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>  {
>  BDRVQcow2State *s = bs->opaque;
> +Qcow2COWRegion *start = &m->cow_start;
> +Qcow2COWRegion *end = &m->cow_end;
>  int ret;
>  
> -if (r->nb_bytes == 0) {
> +if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>  return 0;
>  }

With this change, it can now happen that we call do_perform_cow() with
bytes == 0. Maybe it would be better for do_perform_cow() to check bytes
first so that it doesn't bdrv_co_preadv/pwritev() for 0 bytes?

Kevin



Re: [Qemu-devel] [PATCH v1 05/13] qcow2: set inactive flag

2017-05-26 Thread Kevin Wolf
Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:
> Qcow2State and BlockDriverState flags have to be in sync
> 
> Signed-off-by: Anton Nefedov 
> ---
>  block/qcow2.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 6e7ce96..07c1706 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1939,6 +1939,7 @@ static int qcow2_inactivate(BlockDriverState *bs)
>  
>  if (result == 0) {
>  qcow2_mark_clean(bs);
> +s->flags |= BDRV_O_INACTIVE;
>  }

Good catch.

But can't we simply use bs->open_flags and completely get rid of
s->flags?

Kevin



Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Kevin Wolf
Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:
> From: "Denis V. Lunev" 
> 
> Currently each single write operation can result in 3 write operations
> if guest offsets are not cluster aligned. One write is performed for the
> real payload and two for COW-ed areas. Thus the data possibly lays
> non-contiguously on the host filesystem. This will reduce further
> sequential read performance significantly.
> 
> The patch allocates the space in the file with cluster granularity,
> ensuring
>   1. better host offset locality
>   2. less space allocation operations
>  (which can be expensive on distributed storages)
> 
> Signed-off-by: Denis V. Lunev 
> Signed-off-by: Anton Nefedov 

I don't think this is the right approach. You end up with two
write operations: One write_zeroes and then one data write. If the
backend actually supports efficient zero writes, write_zeroes won't
necessarily allocate space, but writing data will definitely split
the already existing allocation. If anything, we need a new
bdrv_allocate() or something that would call fallocate() instead of
abusing write_zeroes.

It seems much clearer to me that simply unifying the three write
requests into a single one is an improvement. And it's easy to do, I
even had a patch once to do this. The reason that I didn't send it was
that it seemed to conflict with the data cache approach

>  block/qcow2.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index a8d61f0..2e6a0ec 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1575,6 +1575,32 @@ fail:
>  return ret;
>  }
>  
> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +BlockDriverState *file = bs->file->bs;
> +QCowL2Meta *m;
> +int ret;
> +
> +for (m = l2meta; m != NULL; m = m->next) {
> +uint64_t bytes = m->nb_clusters << s->cluster_bits;
> +
> +if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
> +continue;
> +}
> +
> +/* try to alloc host space in one chunk for better locality */
> +ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 
> 0);

No. This is what you bypass:

* All sanity checks that the block layer does

* bdrv_inc/dec_in_flight(), which is required for drain to work
  correctly. Not doing this will cause crashes.

* tracked_request_begin/end(), mark_request_serialising() and
  wait_serialising_requests(), which are required for serialising
  requests to work correctly

* Ensuring correct request alignment for file. This means that e.g.
  qcow2 with cluster size 512 on a host with a 4k native disk will
  break.

* blkdebug events

* before_write_notifiers. Not calling these will cause corrupted backups
  if someone backups file.

* Dirty bitmap updates

* Updating write_gen, wr_highest_offset and total_sectors

* Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
  respected

And these are just the obvious things. I'm sure I missed some.

> +if (ret != 0) {
> +continue;
> +}
> +
> +file->total_sectors = MAX(file->total_sectors,
> +  (m->alloc_offset + bytes) / 
> BDRV_SECTOR_SIZE);

You only compensate for part of a single item in the list above, by
duplicating code with block/io.c. This is not how to do things.

As I said above, I think you don't really want write_zeroes anyway, but
if you wanted a write_zeroes "but only if it's efficient" (which I'm not
sure is a good thing to want), then a better way might be introducing a
new request flag.

> +}
> +}
> +
>  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t 
> offset,
>   uint64_t bytes, QEMUIOVector *qiov,
>   int flags)

Kevin



[Qemu-devel] [PATCH] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Mao Zhongyi
ioh3420_interrupts_init() and its callers, rp_realize() and
pci_qdev_realize() fill error message to local_err, then
propagate it to errp by error_porpagate(), which's not necessary.
So eliminate it and pass errp directly instead of local_err.
Of course, error_propagate() also will be removed.

Signed-off-by: Mao Zhongyi 
---
 hw/pci-bridge/ioh3420.c|  4 +---
 hw/pci-bridge/pcie_root_port.c |  7 ++-
 hw/pci/pci.c   | 11 ---
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index da4e5bd..5f56a2f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -64,15 +64,13 @@ static uint8_t ioh3420_aer_vector(const PCIDevice *d)
 static int ioh3420_interrupts_init(PCIDevice *d, Error **errp)
 {
 int rc;
-Error *local_err = NULL;
 
 rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
-  &local_err);
+  errp);
 if (rc < 0) {
 assert(rc == -ENOTSUP);
-error_propagate(errp, local_err);
 }
 
 return rc;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index cf36318..9022996 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -59,7 +59,6 @@ static void rp_realize(PCIDevice *d, Error **errp)
 PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
 PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
 int rc;
-Error *local_err = NULL;
 
 pci_config_set_interrupt_pin(d->config, 1);
 pci_bridge_initfn(d, TYPE_PCIE_BUS);
@@ -72,9 +71,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
 }
 
 if (rpc->interrupts_init) {
-rc = rpc->interrupts_init(d, &local_err);
+rc = rpc->interrupts_init(d, errp);
 if (rc < 0) {
-error_propagate(errp, local_err);
 goto err_bridge;
 }
 }
@@ -98,9 +96,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
 }
 
 rc = pcie_aer_init(d, PCI_ERR_VER, rpc->aer_offset,
-   PCI_ERR_SIZEOF, &local_err);
+   PCI_ERR_SIZEOF, errp);
 if (rc < 0) {
-error_propagate(errp, local_err);
 goto err;
 }
 pcie_aer_root_init(d);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 98ccc27..0d4a3ff 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1982,7 +1982,6 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 {
 PCIDevice *pci_dev = (PCIDevice *)qdev;
 PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
-Error *local_err = NULL;
 PCIBus *bus;
 bool is_default_rom;
 
@@ -1999,9 +1998,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 return;
 
 if (pc->realize) {
-pc->realize(pci_dev, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+pc->realize(pci_dev, errp);
+if (errp && *errp) {
 do_pci_unregister_device(pci_dev);
 return;
 }
@@ -2014,9 +2012,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 is_default_rom = true;
 }
 
-pci_add_option_rom(pci_dev, is_default_rom, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+pci_add_option_rom(pci_dev, is_default_rom, errp);
+if (errp && *errp) {
 pci_qdev_unrealize(DEVICE(pci_dev), NULL);
 return;
 }
-- 
2.9.3






Re: [Qemu-devel] [PULL 00/22] Docker and block patches

2017-05-26 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20170526075246.20265-1-f...@redhat.com
Subject: [Qemu-devel] [PULL 00/22] Docker and block patches

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20170526082925.2888-1-maozy.f...@cn.fujitsu.com -> 
patchew/20170526082925.2888-1-maozy.f...@cn.fujitsu.com
Switched to a new branch 'test'
16cb74a block: make accounting thread-safe
8239635 block: introduce block_account_one_io
63bcf13 block: protect modification of dirty bitmaps with a mutex
f58097c migration/block: reset dirty bitmap before reading
ed83d5e block: introduce dirty_bitmap_mutex
f046d70 block: protect tracked_requests and flush_queue with reqs_lock
a7745ae block: access write_gen with atomics
1541938 block: use Stat64 for wr_highest_offset
6ebccfc util: add stats64 module
d0f217d throttle-groups: protect throttled requests with a CoMutex
0e06e5a throttle-groups: do not use qemu_co_enter_next
31d6860 throttle-groups: only start one coroutine from drained_begin
68c0e16 block: access io_plugged with atomic ops
621c393 block: access wakeup with atomic ops
c1c29f6 block: access serialising_in_flight with atomic ops
87f6654 block: access io_limits_disabled with atomic ops
16e5db1 block: access quiesce_counter with atomic ops
018cb8b block: access copy_on_read with atomic ops
f57b4b1 docker: Add flex and bison to centos6 image
6ac5045 docker: Add libaio to fedora image
7a2103e docker: Add bzip2 and hostname to fedora image
7e83232 docker: Run tests with current user

=== OUTPUT BEGIN ===
Checking PATCH 1/22: docker: Run tests with current user...
Checking PATCH 2/22: docker: Add bzip2 and hostname to fedora image...
Checking PATCH 3/22: docker: Add libaio to fedora image...
Checking PATCH 4/22: docker: Add flex and bison to centos6 image...
Checking PATCH 5/22: block: access copy_on_read with atomic ops...
Checking PATCH 6/22: block: access quiesce_counter with atomic ops...
Checking PATCH 7/22: block: access io_limits_disabled with atomic ops...
Checking PATCH 8/22: block: access serialising_in_flight with atomic ops...
Checking PATCH 9/22: block: access wakeup with atomic ops...
Checking PATCH 10/22: block: access io_plugged with atomic ops...
Checking PATCH 11/22: throttle-groups: only start one coroutine from 
drained_begin...
Checking PATCH 12/22: throttle-groups: do not use qemu_co_enter_next...
Checking PATCH 13/22: throttle-groups: protect throttled requests with a 
CoMutex...
Checking PATCH 14/22: util: add stats64 module...
ERROR: memory barrier without comment
#330: FILE: util/stats64.c:101:
+smp_wmb();

ERROR: memory barrier without comment
#359: FILE: util/stats64.c:130:
+smp_wmb();

total: 2 errors, 0 warnings, 334 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 15/22: block: use Stat64 for wr_highest_offset...
Checking PATCH 16/22: block: access write_gen with atomics...
Checking PATCH 17/22: block: protect tracked_requests and flush_queue with 
reqs_lock...
Checking PATCH 18/22: block: introduce dirty_bitmap_mutex...
Checking PATCH 19/22: migration/block: reset dirty bitmap before reading...
Checking PATCH 20/22: block: protect modification of dirty bitmaps with a 
mutex...
WARNING: line over 80 characters
#306: FILE: migration/block.c:539:
+bdrv_reset_dirty_bitmap_locked(bmds->dirty_bitmap, sector, 
nr_sectors);

total: 0 errors, 1 warnings, 272 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 21/22: block: introduce block_account_one_io...
Checking PATCH 22/22: block: make accounting thread-safe...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PULL v2 00/20] Misc patches for 2017-05-19

2017-05-26 Thread Vladimir Sementsov-Ogievskiy

Hi!

What about this? Looks incomplete, only two patches here...

24.05.2017 19:11, Paolo Bonzini wrote:

The following changes since commit 56821559f0ba682fe6b367815572e6f974d329ab:

   Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging 
(2017-05-18 13:36:15 +0100)

are available in the git repository at:

   git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 46465ffebc75edd75091f4eb0963fd48ce7c02b6:

   target/i386: use multiple CPU AddressSpaces (2017-05-24 18:09:03 +0200)

v2 fixes coding style errors in vhost-user-scsi.


* virtio-scsi use-after-free fix (Fam)
* vhost-user-scsi support (Felipe)
* SMM fixes and improvements for TCG (myself)
* irqchip and AddressSpaceDispatch cleanups and fixes (Peter)
* Coverity fix (Stefano)
* NBD cleanups (Vladimir)
* RTC accuracy improvements and code cleanups (Guangrong+Yunfang)


Fam Zheng (1):
   virtio-scsi: Unset hotplug handler when unrealize

Felipe Franciosi (2):
   vhost-user-scsi: Introduce vhost-user-scsi host device
   vhost-user-scsi: Introduce a vhost-user-scsi sample application

Paolo Bonzini (2):
   target/i386: enable A20 automatically in system management mode
   target/i386: use multiple CPU AddressSpaces

Peter Xu (4):
   kvm: irqchip: trace changes on msi add/remove
   msix: trace control bit write op
   kvm: irqchip: skip update msi when disabled
   exec: simplify phys_page_find() params

Stefano Stabellini (1):
   Check the return value of fcntl in qemu_set_cloexec

Tai Yunfang (1):
   mc146818rtc: precisely count the clock for periodic timer

Vladimir Sementsov-Ogievskiy (5):
   nbd: strict nbd_wr_syncv
   nbd: read_sync and friends: return 0 on success
   nbd: add errp parameter to nbd_wr_syncv()
   nbd: add errp to read_sync, write_sync and drop_sync
   nbd/client.c: use errp instead of LOG

Xiao Guangrong (4):
   mc146818rtc: update periodic timer only if it is needed
   mc146818rtc: ensure LOST_TICK_POLICY_SLEW is only enabled on TARGET_I386
   mc146818rtc: drop unnecessary '#ifdef TARGET_I386'
   mc146818rtc: embrace all x86 specific code

  .gitignore|   1 +
  Makefile  |   3 +
  Makefile.objs |   4 +
  block/nbd-client.c|  11 +-
  contrib/libvhost-user/libvhost-user.h |  11 +-
  contrib/vhost-user-scsi/Makefile.objs |   1 +
  contrib/vhost-user-scsi/vhost-user-scsi.c | 886 ++
  default-configs/pci.mak   |   1 +
  default-configs/s390x-softmmu.mak |   1 +
  exec.c|  13 +-
  hw/pci/msix.c |  11 +-
  hw/pci/trace-events   |   3 +
  hw/scsi/Makefile.objs |   1 +
  hw/scsi/vhost-user-scsi.c | 211 +++
  hw/scsi/virtio-scsi.c |   3 +
  hw/timer/mc146818rtc.c| 206 ---
  hw/virtio/virtio-pci.c|  54 ++
  hw/virtio/virtio-pci.h|  11 +
  include/block/nbd.h   |   8 +-
  include/hw/virtio/vhost-user-scsi.h   |  35 ++
  include/hw/virtio/virtio-scsi.h   |   2 +
  kvm-all.c |   4 +-
  nbd/client.c  | 125 ++---
  nbd/common.c  |  23 +-
  nbd/nbd-internal.h|  40 +-
  nbd/server.c  |  92 ++--
  qemu-nbd.c|   3 +-
  target/i386/arch_memory_mapping.c |  18 +-
  target/i386/cpu.c |  15 +-
  target/i386/cpu.h |  20 +-
  target/i386/helper.c  |  96 ++--
  target/i386/kvm.c |  12 +-
  target/i386/machine.c |   4 -
  target/i386/smm_helper.c  |  18 -
  trace-events  |   3 +-
  util/oslib-posix.c|   4 +-
  36 files changed, 1643 insertions(+), 311 deletions(-)
  create mode 100644 contrib/vhost-user-scsi/Makefile.objs
  create mode 100644 contrib/vhost-user-scsi/vhost-user-scsi.c
  create mode 100644 hw/scsi/vhost-user-scsi.c
  create mode 100644 include/hw/virtio/vhost-user-scsi.h



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Denis V. Lunev
On 05/26/2017 11:11 AM, Kevin Wolf wrote:
> Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:
>> From: "Denis V. Lunev" 
>>
>> Currently each single write operation can result in 3 write operations
>> if guest offsets are not cluster aligned. One write is performed for the
>> real payload and two for COW-ed areas. Thus the data possibly lays
>> non-contiguously on the host filesystem. This will reduce further
>> sequential read performance significantly.
>>
>> The patch allocates the space in the file with cluster granularity,
>> ensuring
>>   1. better host offset locality
>>   2. less space allocation operations
>>  (which can be expensive on distributed storages)
>>
>> Signed-off-by: Denis V. Lunev 
>> Signed-off-by: Anton Nefedov 
> I don't think this is the right approach. You end up with two
> write operations: One write_zeroes and then one data write. If the
> backend actually supports efficient zero writes, write_zeroes won't
> necessarily allocate space, but writing data will definitely split
> the already existing allocation. If anything, we need a new
> bdrv_allocate() or something that would call fallocate() instead of
> abusing write_zeroes.
great idea. Very nice then.

> It seems much clearer to me that simply unifying the three write
> requests into a single one is an improvement. And it's easy to do, I
> even had a patch once to do this. The reason that I didn't send it was
> that it seemed to conflict with the data cache approach
These changes help a lot on 2 patterns:
- writing 4Kb into the middle of the block. The bigger the block size,
  more we gain
- sequential write, which is not sector aligned and comes in 2 requests:
  we perform allocation, which should be significantly faster than actual
  write and after that both parts of the write can be executed in parallel.
At my opinion both cases are frequent and important.

But OK, the code should be improved, you are absolutely right here.

>>  block/qcow2.c | 32 +++-
>>  1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index a8d61f0..2e6a0ec 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1575,6 +1575,32 @@ fail:
>>  return ret;
>>  }
>>  
>> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>> +{
>> +BDRVQcow2State *s = bs->opaque;
>> +BlockDriverState *file = bs->file->bs;
>> +QCowL2Meta *m;
>> +int ret;
>> +
>> +for (m = l2meta; m != NULL; m = m->next) {
>> +uint64_t bytes = m->nb_clusters << s->cluster_bits;
>> +
>> +if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
>> +continue;
>> +}
>> +
>> +/* try to alloc host space in one chunk for better locality */
>> +ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, 
>> bytes, 0);
> No. This is what you bypass:
>
> * All sanity checks that the block layer does
>
> * bdrv_inc/dec_in_flight(), which is required for drain to work
>   correctly. Not doing this will cause crashes.
>
> * tracked_request_begin/end(), mark_request_serialising() and
>   wait_serialising_requests(), which are required for serialising
>   requests to work correctly
>
> * Ensuring correct request alignment for file. This means that e.g.
>   qcow2 with cluster size 512 on a host with a 4k native disk will
>   break.
>
> * blkdebug events
>
> * before_write_notifiers. Not calling these will cause corrupted backups
>   if someone backups file.
>
> * Dirty bitmap updates
>
> * Updating write_gen, wr_highest_offset and total_sectors
>
> * Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
>   respected
>
> And these are just the obvious things. I'm sure I missed some.
>

You seems right. I have not though about that from this angle.

>> +if (ret != 0) {
>> +continue;
>> +}
>> +
>> +file->total_sectors = MAX(file->total_sectors,
>> +  (m->alloc_offset + bytes) / 
>> BDRV_SECTOR_SIZE);
> You only compensate for part of a single item in the list above, by
> duplicating code with block/io.c. This is not how to do things.
>
> As I said above, I think you don't really want write_zeroes anyway, but
> if you wanted a write_zeroes "but only if it's efficient" (which I'm not
> sure is a good thing to want), then a better way might be introducing a
> new request flag.
>
>> +}
>> +}
>> +
>>  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t 
>> offset,
>>   uint64_t bytes, QEMUIOVector *qiov,
>>   int flags)
> Kevin

Thank you for review and ideas ;)

Den



Re: [Qemu-devel] [PULL v2 00/20] Misc patches for 2017-05-19

2017-05-26 Thread Alex Bennée

Vladimir Sementsov-Ogievskiy  writes:

> Hi!
>
> What about this? Looks incomplete, only two patches here...

A pull request is merged from a git tree - the patches on the list are
informational only. If you need to re-issue a pull request it is
acceptable to post it with just the tweaked patches for information.

--
Alex Bennée



Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

2017-05-26 Thread Alberto Garcia
On Fri 26 May 2017 10:11:29 AM CEST, Kevin Wolf  wrote:
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState 
>> *bs,
>>  struct iovec iov;
>>  int ret;
>>  
>> +if (bytes == 0) {
>> +return 0;
>> +}
>> +

   [...]

>> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
>>  {
>>  BDRVQcow2State *s = bs->opaque;
>> +Qcow2COWRegion *start = &m->cow_start;
>> +Qcow2COWRegion *end = &m->cow_end;
>>  int ret;
>>  
>> -if (r->nb_bytes == 0) {
>> +if (start->nb_bytes == 0 && end->nb_bytes == 0) {
>>  return 0;
>>  }
>
> With this change, it can now happen that we call do_perform_cow() with
> bytes == 0.

Yes, but see the change I made to do_perform_cow() in the same patch
(quoted above).

Berto



[Qemu-devel] [PATCH 0/3] arm/virt: refine virt.c code and implement hot_add_cpu interface

2017-05-26 Thread Li Zhang
From: Li Zhang 

virt machine doesn't support hot_add_cpu interface. This patchset is to 
implement
hot_add_cpu interface. A CPU can be added by QMP command with QEMU monitor.

Here is command to add a CPU with QMP command.

* qemu-system-aarch64 -machine virt -cpu cortex-a15 -smp 1,maxcpus=4 \
-monitor telnet:127.0.0.1:,server,nowait -nographic
* connect monitor:  telnet 127.0.0.1 
* execute qmp command:  cpu-add 1

Currently, when KVM is enabled and add a CPU with QMP command, QEMU 
reports error "kvm_init_vcpu failed: Device or resourc busy". KVM can't
create a new CPU when vgic has been initialized and irqchip_in_kernel in
function kvm_arch_vcpu_create. It needs to change KVM code in the future.

Li Zhang (3):
  arm/virt: Refine fdt_add_cpu_nodes code
  arm/virt: Refine code of machvirt_init
  arm/virt: Implement hot_add_cpu interface

 hw/arm/virt.c | 344 ++
 1 file changed, 202 insertions(+), 142 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH 2/3] arm/virt: Refine code of machvirt_init

2017-05-26 Thread Li Zhang
From: Li Zhang 

This patch is to refine cpu related in machvirt_init and
add virt_new_cpu function which can be called by hot_add_cpu.

Signed-off-by: Li Zhang 
---
 hw/arm/virt.c | 231 --
 1 file changed, 128 insertions(+), 103 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 73c3cf7..31314c1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -110,6 +110,8 @@ static ARMPlatformBusSystemParams platform_bus_params;
 #define RAMLIMIT_GB 255
 #define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
 
+static MemoryRegion *secure_sysmem;
+
 /* Addresses and sizes of our components.
  * 0..128MB is space for a flash device so we can run bootrom code such as 
UEFI.
  * 128MB..256MB is used for miscellaneous device I/O.
@@ -415,6 +417,116 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 }
 }
 
+static void virt_new_cpu(MachineState *ms, int id, Error **errp)
+{
+const char *typename;
+char **cpustr;
+int node_id;
+
+ObjectClass *oc;
+CPUClass *cc;
+CPUState *cs;
+Object *cpuobj;
+const CPUArchIdList *possible_cpus;
+
+Error *err = NULL;
+const char *cpu_model = ms->cpu_model;
+MemoryRegion *sysmem = get_system_memory();
+
+VirtMachineState *vms = VIRT_MACHINE(ms);
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+CPU_FOREACH(cs) {
+if (cs->cpu_index == id) {
+error_report("CPU %d has been created.", id);
+return;
+}
+}
+possible_cpus = mc->possible_cpu_arch_ids(ms);
+/* Separate the actual CPU model name from any appended features */
+cpustr = g_strsplit(cpu_model, ",", 2);
+if (!cpuname_valid(cpustr[0])) {
+error_report("mach-virt: CPU %s not supported", cpustr[0]);
+goto out;
+}
+oc = cpu_class_by_name(TYPE_ARM_CPU, cpustr[0]);
+if (!oc) {
+error_report("Unable to find CPU definition");
+goto out;
+}
+
+typename = object_class_get_name(oc);
+
+cc = CPU_CLASS(oc);
+cc->parse_features(typename, cpustr[1], &err);
+if (err) {
+error_report_err(err);
+goto out;
+}
+
+cpuobj = object_new(typename);
+object_property_set_int(cpuobj, possible_cpus->cpus[id].arch_id,
+"mp-affinity", NULL);
+cs = CPU(cpuobj);
+cs->cpu_index = id;
+
+node_id = possible_cpus->cpus[cs->cpu_index].props.node_id;
+if (!possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
+/* by default CPUState::numa_node was 0 if it's not set via CLI
+ * keep it this way for now but in future we probably should
+ * refuse to start up with incomplete numa mapping */
+node_id = 0;
+}
+if (cs->numa_node == CPU_UNSET_NUMA_NODE_ID) {
+cs->numa_node = node_id;
+} else {
+/* CPU isn't device_add compatible yet, this shouldn't happen */
+error_setg(&error_abort, "user set node-id not implemented");
+}
+
+if (!vms->secure) {
+object_property_set_bool(cpuobj, false, "has_el3", NULL);
+}
+
+if (!vms->virt && object_property_find(cpuobj, "has_el2", NULL)) {
+object_property_set_bool(cpuobj, false, "has_el2", NULL);
+}
+
+if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
+object_property_set_int(cpuobj, vms->psci_conduit,
+"psci-conduit", NULL);
+/* Secondary CPUs start in PSCI powered-down state */
+if (id > 0) {
+object_property_set_bool(cpuobj, true, "start-powered-off", NULL);
+}
+}
+
+if (vmc->no_pmu && object_property_find(cpuobj, "pmu", NULL)) {
+object_property_set_bool(cpuobj, false, "pmu", NULL);
+}
+
+if (object_property_find(cpuobj, "reset-cbar", NULL)) {
+object_property_set_int(cpuobj, vms->memmap[VIRT_CPUPERIPHS].base,
+"reset-cbar", &error_abort);
+}
+
+object_property_set_link(cpuobj, OBJECT(sysmem), "memory", &error_abort);
+
+if (vms->secure) {
+object_property_set_link(cpuobj, OBJECT(secure_sysmem),
+  "secure-memory", &error_abort);
+}
+
+object_property_set_int(cpuobj, id, "id", NULL);
+object_property_set_bool(cpuobj, true, "realized", NULL);
+object_unref(cpuobj);
+
+out:
+g_strfreev(cpustr);
+error_propagate(errp, err);
+}
+
 static void fdt_add_its_gic_node(VirtMachineState *vms)
 {
 vms->msi_phandle = qemu_fdt_alloc_phandle(vms->fdt);
@@ -1237,35 +1349,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void machvirt_init(MachineState *machine)
 {
 VirtMachineState *vms = VIRT_MACHINE(machine);
-VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus;
 qemu_irq pic[NUM_IRQS];
 MemoryRegio

[Qemu-devel] [PATCH 1/3] arm/virt: Refine fdt_add_cpu_nodes code

2017-05-26 Thread Li Zhang
From: Li Zhang 

Refind fdt_add_cpu_nodes code and add a new function fdt_add_cpu_node,
which can be called by hot_add_cpu function.

Signed-off-by: Li Zhang 
---
 hw/arm/virt.c | 106 +-
 1 file changed, 61 insertions(+), 45 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c7c8159..73c3cf7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -334,68 +334,84 @@ static void fdt_add_timer_nodes(const VirtMachineState 
*vms)
GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags);
 }
 
-static void fdt_add_cpu_nodes(const VirtMachineState *vms)
+/*
+ * From Documentation/devicetree/bindings/arm/cpus.txt
+ *  On ARM v8 64-bit systems value should be set to 2,
+ *  that corresponds to the MPIDR_EL1 register size.
+ *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
+ *  in the system, #address-cells can be set to 1, since
+ *  MPIDR_EL1[63:32] bits are not used for CPUs
+ *  identification.
+ *
+ *  Here we actually don't know whether our system is 32- or 64-bit one.
+ *  The simplest way to go is to examine affinity IDs of all our CPUs. If
+ *  at least one of them has Aff3 populated, we set #address-cells to 2.
+ */
+
+static int fdt_get_addr_cells(const VirtMachineState *vms)
 {
 int cpu;
 int addr_cells = 1;
-const MachineState *ms = MACHINE(vms);
 
-/*
- * From Documentation/devicetree/bindings/arm/cpus.txt
- *  On ARM v8 64-bit systems value should be set to 2,
- *  that corresponds to the MPIDR_EL1 register size.
- *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
- *  in the system, #address-cells can be set to 1, since
- *  MPIDR_EL1[63:32] bits are not used for CPUs
- *  identification.
- *
- *  Here we actually don't know whether our system is 32- or 64-bit one.
- *  The simplest way to go is to examine affinity IDs of all our CPUs. If
- *  at least one of them has Aff3 populated, we set #address-cells to 2.
- */
 for (cpu = 0; cpu < vms->smp_cpus; cpu++) {
 ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
-
 if (armcpu->mp_affinity & ARM_AFF3_MASK) {
 addr_cells = 2;
 break;
 }
 }
+return addr_cells;
+}
 
-qemu_fdt_add_subnode(vms->fdt, "/cpus");
-qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells);
-qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0);
 
-for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) {
-char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
-ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
-CPUState *cs = CPU(armcpu);
+static void fdt_add_cpu_node(const VirtMachineState *vms, int cpu)
+{
+char *nodename;
+int addr_cells = fdt_get_addr_cells(vms);
+const MachineState *ms = MACHINE(vms);
+ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
+CPUState *cs = CPU(armcpu);
 
-qemu_fdt_add_subnode(vms->fdt, nodename);
-qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
-qemu_fdt_setprop_string(vms->fdt, nodename, "compatible",
-armcpu->dtb_compatible);
-
-if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED
-&& vms->smp_cpus > 1) {
-qemu_fdt_setprop_string(vms->fdt, nodename,
-"enable-method", "psci");
-}
+if (cpu < 0 || cpu >= max_cpus) {
+error_report("Invalid cpu index.");
+return;
+}
 
-if (addr_cells == 2) {
-qemu_fdt_setprop_u64(vms->fdt, nodename, "reg",
- armcpu->mp_affinity);
-} else {
-qemu_fdt_setprop_cell(vms->fdt, nodename, "reg",
-  armcpu->mp_affinity);
-}
+nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
 
-if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
-qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
-ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
-}
+qemu_fdt_add_subnode(vms->fdt, nodename);
+qemu_fdt_setprop_string(vms->fdt, nodename, "device_type", "cpu");
+qemu_fdt_setprop_string(vms->fdt, nodename, "compatible",
+armcpu->dtb_compatible);
+if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED) {
+qemu_fdt_setprop_string(vms->fdt, nodename, "enable-method", "psci");
+}
 
-g_free(nodename);
+if (addr_cells == 2) {
+qemu_fdt_setprop_u64(vms->fdt, nodename, "reg", armcpu->mp_affinity);
+} else {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "reg", armcpu->mp_affinity);
+}
+
+if (ms->possible_cpus->cpus[cs->cpu_index].props.has_node_id) {
+qemu_fdt_setprop_cell(vms->fdt, nodename, "numa-node-id",
+ms->possible_cpus->cpus[cs->cpu_index].props.node_id);
+}
+
+g_free(nodename);
+}
+
+static void fdt_add_

[Qemu-devel] [PATCH 3/3] arm/virt: Implement hot_add_cpu interface

2017-05-26 Thread Li Zhang
From: Li Zhang 

This patch is to add add_hot_cpu interface.

Signed-off-by: Li Zhang 
---
 hw/arm/virt.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 31314c1..c7e8322 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -527,6 +527,24 @@ out:
 error_propagate(errp, err);
 }
 
+static void virt_hot_add_cpu(const int64_t id, Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+VirtMachineState *vms = VIRT_MACHINE(ms);
+
+if (id < 0 || id >= max_cpus) {
+error_setg(errp, "Invalid CPU id: %" PRIi64, id);
+return;
+}
+
+virt_new_cpu(ms, id, errp);
+if (errp) {
+return;
+}
+
+fdt_add_cpu_node(vms, id);
+}
+
 static void fdt_add_its_gic_node(VirtMachineState *vms)
 {
 vms->msi_phandle = qemu_fdt_alloc_phandle(vms->fdt);
@@ -1654,6 +1672,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->minimum_page_bits = 12;
 mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
 mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
+mc->hot_add_cpu = virt_hot_add_cpu;
 }
 
 static const TypeInfo virt_machine_info = {
-- 
2.7.4




Re: [Qemu-devel] [PATCH] scsi/lsi53c895a: Remove unused lsi_mem_*() return value

2017-05-26 Thread Paolo Bonzini


On 26/05/2017 03:46, Mao Zhongyi wrote:
> lsi_mem_read/write() always return 0 about which their
> callers actually don't care. Change the function type
> to void.
> 
> Signed-off-by: Mao Zhongyi 
> ---
>  hw/scsi/lsi53c895a.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
> index 595c260..3e56ab2 100644
> --- a/hw/scsi/lsi53c895a.c
> +++ b/hw/scsi/lsi53c895a.c
> @@ -408,27 +408,25 @@ static void lsi_reg_writeb(LSIState *s, int offset, 
> uint8_t val);
>  static void lsi_execute_script(LSIState *s);
>  static void lsi_reselect(LSIState *s, lsi_request *p);
>  
> -static inline int lsi_mem_read(LSIState *s, dma_addr_t addr,
> +static inline void lsi_mem_read(LSIState *s, dma_addr_t addr,
> void *buf, dma_addr_t len)
>  {
>  if (s->dmode & LSI_DMODE_SIOM) {
>  address_space_read(&s->pci_io_as, addr, MEMTXATTRS_UNSPECIFIED,
> buf, len);
> -return 0;
>  } else {
> -return pci_dma_read(PCI_DEVICE(s), addr, buf, len);
> +pci_dma_read(PCI_DEVICE(s), addr, buf, len);
>  }
>  }
>  
> -static inline int lsi_mem_write(LSIState *s, dma_addr_t addr,
> +static inline void lsi_mem_write(LSIState *s, dma_addr_t addr,
>  const void *buf, dma_addr_t len)
>  {
>  if (s->dmode & LSI_DMODE_DIOM) {
>  address_space_write(&s->pci_io_as, addr, MEMTXATTRS_UNSPECIFIED,
>  buf, len);
> -return 0;
>  } else {
> -return pci_dma_write(PCI_DEVICE(s), addr, buf, len);
> +pci_dma_write(PCI_DEVICE(s), addr, buf, len);
>  }
>  }
>  
> 

Cc: qemu-triv...@nongnu.org



Re: [Qemu-devel] [PATCH v4] sockets: improve error reporting if UNIX socket path is too long

2017-05-26 Thread Paolo Bonzini


On 25/05/2017 17:53, Daniel P. Berrange wrote:
> The 'struct sockaddr_un' only allows 108 bytes for the socket
> path.
> 
> If the user supplies a path, QEMU uses snprintf() to silently
> truncate it when too long. This is undesirable because the user
> will then be unable to connect to the path they asked for.
> 
> If the user doesn't supply a path, QEMU builds one based on
> TMPDIR, but if that leads to an overlong path, it mistakenly
> uses error_setg_errno() with a stale errno value, because
> snprintf() does not set errno on truncation.
> 
> In solving this the code needed some refactoring to ensure we
> don't pass 'un.sun_path' directly to any APIs which expect
> NUL-terminated strings, because the path is not required to
> be terminated.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
> 
> Changed in v4:
> 
>  - Do length check before any mkstemp to avoid leaving long
>tmpfile on disk on error
> 
> Changed in v3:
> 
>  - Also fix unix_connect_saddr error reporting
>  - Avoid calling unlink with path that might not be nul terminated
>  - Ensure the TMPDIR derived path can fill up sun_path space.
>  - Don't update saddr->path until we have succesfully listened
>  - Unify error reporting across both explicit path & TMPDIR path
>branches
> 
>  util/qemu-sockets.c | 68 
> -
>  1 file changed, 46 insertions(+), 22 deletions(-)
> 
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index d8183f7..dfaf4e1 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -845,6 +845,8 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
>  {
>  struct sockaddr_un un;
>  int sock, fd;
> +char *pathbuf = NULL;
> +const char *path;
>  
>  sock = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
>  if (sock < 0) {
> @@ -852,20 +854,22 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
>  return -1;
>  }
>  
> -memset(&un, 0, sizeof(un));
> -un.sun_family = AF_UNIX;
> -if (saddr->path && strlen(saddr->path)) {
> -snprintf(un.sun_path, sizeof(un.sun_path), "%s", saddr->path);
> +if (saddr->path && saddr->path[0]) {
> +path = saddr->path;
>  } else {
>  const char *tmpdir = getenv("TMPDIR");
>  tmpdir = tmpdir ? tmpdir : "/tmp";
> -if (snprintf(un.sun_path, sizeof(un.sun_path), 
> "%s/qemu-socket-XX",
> - tmpdir) >= sizeof(un.sun_path)) {
> -error_setg_errno(errp, errno,
> - "TMPDIR environment variable (%s) too large", 
> tmpdir);
> -goto err;
> -}
> +path = pathbuf = g_strdup_printf("%s/qemu-socket-XX", tmpdir);
> +}
>  
> +if (strlen(path) > sizeof(un.sun_path)) {
> +error_setg(errp, "UNIX socket path '%s' is too long", path);
> +error_append_hint(errp, "Path must be less than %zu bytes\n",
> +  sizeof(un.sun_path));
> +goto err;
> +}
> +
> +if (pathbuf != NULL) {
>  /*
>   * This dummy fd usage silences the mktemp() unsecure warning.
>   * Using mkstemp() doesn't make things more secure here
> @@ -873,24 +877,25 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
>   * to unlink first and thus re-open the race window.  The
>   * worst case possible is bind() failing, i.e. a DoS attack.
>   */
> -fd = mkstemp(un.sun_path);
> +fd = mkstemp(pathbuf);
>  if (fd < 0) {
>  error_setg_errno(errp, errno,
> - "Failed to make a temporary socket name in %s", 
> tmpdir);
> + "Failed to make a temporary socket %s", 
> pathbuf);
>  goto err;
>  }
>  close(fd);
> -if (update_addr) {
> -g_free(saddr->path);
> -saddr->path = g_strdup(un.sun_path);
> -}
>  }
>  
> -if (unlink(un.sun_path) < 0 && errno != ENOENT) {
> +if (unlink(path) < 0 && errno != ENOENT) {
>  error_setg_errno(errp, errno,
> - "Failed to unlink socket %s", un.sun_path);
> + "Failed to unlink socket %s", path);
>  goto err;
>  }
> +
> +memset(&un, 0, sizeof(un));
> +un.sun_family = AF_UNIX;
> +strncpy(un.sun_path, path, sizeof(un.sun_path));
> +
>  if (bind(sock, (struct sockaddr*) &un, sizeof(un)) < 0) {
>  error_setg_errno(errp, errno, "Failed to bind socket to %s", 
> un.sun_path);
>  goto err;
> @@ -900,9 +905,16 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
>  goto err;
>  }
>  
> +if (update_addr && pathbuf) {
> +g_free(saddr->path);
> +saddr->path = pathbuf;
> +} else {
> +g_free(pathbuf);
> +}
>  return sock;
>  
>  err:
> +g_free(pathbuf);
>  closesocket(sock);
>  return -1;
>  }
> @@ -932,9 +944,16 @@ static int unix_connect_saddr(UnixSoc

[Qemu-devel] [PATCH] io: simplify qio_channel_attach_aio_context

2017-05-26 Thread Paolo Bonzini
If properly preceded by qio_channel_detach_aio_context, this function really
has nothing to do except setting ioc->ctx.

Signed-off-by: Paolo Bonzini 
---
 io/channel.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/io/channel.c b/io/channel.c
index cdf74540c1..1cfb8b33a2 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -279,15 +279,9 @@ static void qio_channel_set_aio_fd_handlers(QIOChannel 
*ioc)
 void qio_channel_attach_aio_context(QIOChannel *ioc,
 AioContext *ctx)
 {
-AioContext *old_ctx;
-if (ioc->ctx == ctx) {
-return;
-}
-
-old_ctx = ioc->ctx ? ioc->ctx : iohandler_get_aio_context();
-qio_channel_set_aio_fd_handler(ioc, old_ctx, NULL, NULL, NULL);
+assert(!ioc->read_coroutine);
+assert(!ioc->write_coroutine);
 ioc->ctx = ctx;
-qio_channel_set_aio_fd_handlers(ioc);
 }
 
 void qio_channel_detach_aio_context(QIOChannel *ioc)
-- 
2.13.0




Re: [Qemu-devel] [PATCH v2] target/s390x/cpu_models: Allow some additional feature bits for the "qemu" CPU

2017-05-26 Thread Aurelien Jarno
On 2017-05-25 11:22, Thomas Huth wrote:
> Currently we only present the plain z900 feature bits to the guest,
> but QEMU already emulates some additional features (but not all of
> the next CPU generation, so we can not use the next CPU level as
> default yet). Since newer Linux kernels are checking the feature bits
> and refuse to work if a required feature is missing, it would be nice
> to have a way to present more of the supported features when we are
> running with the "qemu" CPU.
> This patch now adds the supported features to the "full_feat" bitmap,
> so that additional features can be enabled on the command line now,
> for example with:
> 
>  qemu-system-s390x -cpu qemu,stfle=true,ldisp=true,eimm=true,stckf=true
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2: Mark feats array with "static const"
> 
>  target/s390x/cpu_models.c | 34 +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)

Reviewed-by: Aurelien Jarno 

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



Re: [Qemu-devel] [PATCH] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Igor Mammedov
On Fri, 26 May 2017 16:29:25 +0800
Mao Zhongyi  wrote:

> ioh3420_interrupts_init() and its callers, rp_realize() and
> pci_qdev_realize() fill error message to local_err, then
> propagate it to errp by error_porpagate(), which's not necessary.
> So eliminate it and pass errp directly instead of local_err.
> Of course, error_propagate() also will be removed.
> 
> Signed-off-by: Mao Zhongyi 
> ---
[...]

> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 98ccc27..0d4a3ff 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1982,7 +1982,6 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  {
>  PCIDevice *pci_dev = (PCIDevice *)qdev;
>  PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
> -Error *local_err = NULL;
>  PCIBus *bus;
>  bool is_default_rom;
>  
> @@ -1999,9 +1998,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  return;
>  
>  if (pc->realize) {
> -pc->realize(pci_dev, &local_err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +pc->realize(pci_dev, errp);
> +if (errp && *errp) {
>  do_pci_unregister_device(pci_dev);
>  return;
>  }
> @@ -2014,9 +2012,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
> **errp)
>  is_default_rom = true;
>  }
>  
> -pci_add_option_rom(pci_dev, is_default_rom, &local_err);
> -if (local_err) {
> -error_propagate(errp, local_err);
> +pci_add_option_rom(pci_dev, is_default_rom, errp);
> +if (errp && *errp) {
>  pci_qdev_unrealize(DEVICE(pci_dev), NULL);
>  return;
>  }

dropping local_error here looks wrong since it's used
to check error status inside these functions and to undo
side effects in case of failure.

consider if caller pass errp = NULL
then error handling path won't be executed.

So keep the rest of the patch but drop above hunks. 



Re: [Qemu-devel] [PATCH] io: simplify qio_channel_attach_aio_context

2017-05-26 Thread Daniel P. Berrange
On Fri, May 26, 2017 at 11:36:41AM +0200, Paolo Bonzini wrote:
> If properly preceded by qio_channel_detach_aio_context, this function really
> has nothing to do except setting ioc->ctx.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  io/channel.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/io/channel.c b/io/channel.c
> index cdf74540c1..1cfb8b33a2 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -279,15 +279,9 @@ static void qio_channel_set_aio_fd_handlers(QIOChannel 
> *ioc)
>  void qio_channel_attach_aio_context(QIOChannel *ioc,
>  AioContext *ctx)
>  {
> -AioContext *old_ctx;
> -if (ioc->ctx == ctx) {
> -return;
> -}
> -
> -old_ctx = ioc->ctx ? ioc->ctx : iohandler_get_aio_context();
> -qio_channel_set_aio_fd_handler(ioc, old_ctx, NULL, NULL, NULL);
> +assert(!ioc->read_coroutine);
> +assert(!ioc->write_coroutine);
>  ioc->ctx = ctx;
> -qio_channel_set_aio_fd_handlers(ioc);
>  }
>  
>  void qio_channel_detach_aio_context(QIOChannel *ioc)

Thanks, queued

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-devel] [RFC] Is it useful to create a seperate address space for cpu-memory on x86 platform?

2017-05-26 Thread Gonglei (Arei)
Hi all,

I found that the memory consumption increase about 5M since below commit on 
Qemu process.

commit 6731d864f80938e404dc3e5eb7f6b76b891e3e43
Author: Peter Crosthwaite 
Date:   Thu Jan 21 14:15:06 2016 +

qom/cpu: Add MemoryRegion property

Add a MemoryRegion property, which if set is used to construct
the CPU's initial (default) AddressSpace.

Signed-off-by: Peter Crosthwaite 
[PMM: code is moved from qom/cpu.c to exec.c to avoid having to
 make qom/cpu.o be a non-common object file; code to use the
 MemoryRegion and to default it to system_memory added.]
Signed-off-by: Peter Maydell 
Acked-by: Edgar E. Iglesias 

The result before this commit:

linux-arei:/mnt/sdb/gonglei/opensource/qemu # ps aux|grep qemu
root 12658 17.2  0.0 375996 21412 pts/1Sl+  12:53   0:00 
./x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=kvm,usb=off 
-nodefaults

after the commit:
 
linux-arei:/mnt/sdb/gonglei/opensource/qemu # ps aux|grep qemu
root 23201 17.2  0.0 384196 25736 pts/1Sl+  12:54   0:00 
./x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=kvm,usb=off 
-nodefaults

I couldn't find any benefits on x86 platform (I'm not familiar with ARM), am I 
missing something?

And It works well if I revert this commit. Can I?

Thanks,
-Gonglei





Re: [Qemu-devel] [PATCH 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

2017-05-26 Thread Kevin Wolf
Am 26.05.2017 um 11:10 hat Alberto Garcia geschrieben:
> On Fri 26 May 2017 10:11:29 AM CEST, Kevin Wolf  wrote:
> >> --- a/block/qcow2-cluster.c
> >> +++ b/block/qcow2-cluster.c
> >> @@ -414,6 +414,10 @@ static int coroutine_fn 
> >> do_perform_cow(BlockDriverState *bs,
> >>  struct iovec iov;
> >>  int ret;
> >>  
> >> +if (bytes == 0) {
> >> +return 0;
> >> +}
> >> +
> 
>[...]
> 
> >> +static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
> >>  {
> >>  BDRVQcow2State *s = bs->opaque;
> >> +Qcow2COWRegion *start = &m->cow_start;
> >> +Qcow2COWRegion *end = &m->cow_end;
> >>  int ret;
> >>  
> >> -if (r->nb_bytes == 0) {
> >> +if (start->nb_bytes == 0 && end->nb_bytes == 0) {
> >>  return 0;
> >>  }
> >
> > With this change, it can now happen that we call do_perform_cow() with
> > bytes == 0.
> 
> Yes, but see the change I made to do_perform_cow() in the same patch
> (quoted above).

Wait... How did you manage to hack my email account and insert this
retroactively? :-)

Sorry for the noise then, I must have been looking at the source code of
the wrong commit.

Kevin



Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Anton Nefedov



On 05/26/2017 11:57 AM, Denis V. Lunev wrote:

On 05/26/2017 11:11 AM, Kevin Wolf wrote:

Am 19.05.2017 um 11:34 hat Anton Nefedov geschrieben:

From: "Denis V. Lunev" 

Currently each single write operation can result in 3 write operations
if guest offsets are not cluster aligned. One write is performed for the
real payload and two for COW-ed areas. Thus the data possibly lays
non-contiguously on the host filesystem. This will reduce further
sequential read performance significantly.

The patch allocates the space in the file with cluster granularity,
ensuring
   1. better host offset locality
   2. less space allocation operations
  (which can be expensive on distributed storages)

Signed-off-by: Denis V. Lunev 
Signed-off-by: Anton Nefedov 

I don't think this is the right approach. You end up with two
write operations: One write_zeroes and then one data write. If the
backend actually supports efficient zero writes, write_zeroes won't
necessarily allocate space, but writing data will definitely split
the already existing allocation. If anything, we need a new
bdrv_allocate() or something that would call fallocate() instead of
abusing write_zeroes.

great idea. Very nice then.



hi Kevin, thanks a lot for your comments.

The driver's callback is commented as:

/*
 * Efficiently zero a region of the disk image.  Typically an image 
format

 * would use a compact metadata representation to implement this.  This
 * function pointer may be NULL or return -ENOSUP and .bdrv_co_writev()
 * will be called instead.
 */
int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
int64_t offset, int count, BdrvRequestFlags flags);


so it looks like one may expect either efficiency or ENOTSUP from that
one? but not necessarily from the safe "facade" bdrv_co_pwrite_zeroes()
which falls back to pwritev()

/*
 * Efficiently zero a region of the disk image.  Note that this is a 
regular

 * I/O request like read or write and should have a reasonable size.  This
 * function is not suitable for zeroing the entire image in a single 
request

 * because it may allocate memory for the entire region.
 */
int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
   int count, BdrvRequestFlags flags);


(maybe we should have even more explicit comment here)


Then let's indeed create a new bdrv_allocate(), which will result in
bdrv_co_pwrite_zeroes(flags|=ZERO_EFFICIENTLY), so we'll reuse most of
the code and employ the same driver callback, but return ENOTSUP if it
fails?



It seems much clearer to me that simply unifying the three write
requests into a single one is an improvement. And it's easy to do, I
even had a patch once to do this. The reason that I didn't send it was
that it seemed to conflict with the data cache approach

These changes help a lot on 2 patterns:
- writing 4Kb into the middle of the block. The bigger the block size,
   more we gain
- sequential write, which is not sector aligned and comes in 2 requests:
   we perform allocation, which should be significantly faster than actual
   write and after that both parts of the write can be executed in parallel.
At my opinion both cases are frequent and important.

But OK, the code should be improved, you are absolutely right here.


  block/qcow2.c | 32 +++-
  1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a8d61f0..2e6a0ec 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,32 @@ fail:
  return ret;
  }
  
+static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)

+{
+BDRVQcow2State *s = bs->opaque;
+BlockDriverState *file = bs->file->bs;
+QCowL2Meta *m;
+int ret;
+
+for (m = l2meta; m != NULL; m = m->next) {
+uint64_t bytes = m->nb_clusters << s->cluster_bits;
+
+if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+continue;
+}
+
+/* try to alloc host space in one chunk for better locality */
+ret = file->drv->bdrv_co_pwrite_zeroes(file, m->alloc_offset, bytes, 
0);

No. This is what you bypass:

* All sanity checks that the block layer does

* bdrv_inc/dec_in_flight(), which is required for drain to work
   correctly. Not doing this will cause crashes.

* tracked_request_begin/end(), mark_request_serialising() and
   wait_serialising_requests(), which are required for serialising
   requests to work correctly

* Ensuring correct request alignment for file. This means that e.g.
   qcow2 with cluster size 512 on a host with a 4k native disk will
   break.

* blkdebug events

* before_write_notifiers. Not calling these will cause corrupted backups
   if someone backups file.

* Dirty bitmap updates

* Updating write_gen, wr_highest_offset and total_sectors

* Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
   respected

And these are just the obvious th

Re: [Qemu-devel] [PATCH 7/7] qcow2: Merge the writing of the COW regions with the guest data

2017-05-26 Thread Kevin Wolf
Am 24.05.2017 um 21:05 hat Alberto Garcia geschrieben:
> On Wed 24 May 2017 06:43:31 PM CEST, Anton Nefedov wrote:
> >> +if (m->data_qiov) {
> >> +qemu_iovec_reset(&qiov);
> >> +qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
> >> +qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
> >> +qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
> >
> > Can it be a problem if (m->data_qiov->niov == IOV_MAX)?
> > We had to add merge-iovecs code for the case (maybe there's better
> > solution?)
> 
> You're right, good catch! I'll add a check for that. To be honest I
> don't think that's likely to happen in practice, so if it does we can
> simply fall back to the old behavior (separate writes).
> 
> > Also, will this work if allocation is split into several l2metas?
> > e.g. one has cow_start.nb_bytes and another has cow_end.nb_bytes
> 
> The guest request will be merged with the first l2meta that has to copy
> at least one of the two regions. It doesn't matter if the other one has
> nb_bytes == 0.

I think we can improve this later so that we merge everything together
(multiple l2metas and guest data) into a single request, but what this
seris implements is still a very good first step, so we shouldn't let
this stop us from taking the good rather than waiting for the perfect.

Kevin



[Qemu-devel] [PATCH] qemu-ga: remove useless allocation

2017-05-26 Thread Marc-André Lureau
There is no need to duplicate a fixed string.

CC: qemu-triv...@nongnu.org
Signed-off-by: Marc-André Lureau 
---
 qga/commands-posix.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 284ecc6d7e..d8e412275e 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2197,12 +2197,10 @@ static void transfer_memory_block(GuestMemoryBlock 
*mem_blk, bool sys2memblk,
 }
 } else {
 if (mem_blk->online != (strncmp(status, "online", 6) == 0)) {
-char *new_state = mem_blk->online ? g_strdup("online") :
-g_strdup("offline");
+const char *new_state = mem_blk->online ? "online" : "offline";
 
 ga_write_sysfs_file(dirfd, "state", new_state, strlen(new_state),
 &local_err);
-g_free(new_state);
 if (local_err) {
 error_free(local_err);
 result->response =
-- 
2.13.0.rc1.16.gd80b50c3f




Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-26 Thread Kevin Wolf
Am 24.05.2017 um 18:09 hat Anton Nefedov geschrieben:
> I agree; as mentioned we have similar patches and they don't conflict much.
> We noticed a performance regression on HDD though, for the
> presumably optimized case (random 4k write over a large backed
> image); so the patches were put on hold.

You're talking about your own patches that should do the same thing,
right? Can you re-do the same test with Berto's patches? Maybe there was
just an implementation glitch in yours.

This approach should very obviously result in a performance improvement,
and the patches are relatively simple, so I'm very much inclined to
merge this as soon as possible.

Kevin



Re: [Qemu-devel] [PATCH 0/4] qapi: Handle some keyval fallout

2017-05-26 Thread Marc-André Lureau
On Mon, May 22, 2017 at 8:45 PM Markus Armbruster  wrote:

> Markus Armbruster (4):
>   qobject-input-visitor: Reject non-finite numbers with keyval
>   qapi: Document visit_type_any() issues with keyval input
>   tests/qapi-schema: Avoid 'str' in alternate test cases
>   qapi: Reject alternates that can't work with keyval_parse()
>
>  include/qapi/visitor.h |  4 ++
>  qapi/qobject-input-visitor.c   |  3 +-
>  scripts/qapi.py| 19 ++-
>  tests/Makefile.include |  2 +
>  tests/qapi-schema/alternate-clash.json |  2 +-
>  tests/qapi-schema/alternate-conflict-dict.json |  2 +-
>  tests/qapi-schema/alternate-conflict-enum-bool.err |  1 +
>  .../qapi-schema/alternate-conflict-enum-bool.exit  |  1 +
>  .../qapi-schema/alternate-conflict-enum-bool.json  |  6 +++
>  tests/qapi-schema/alternate-conflict-enum-bool.out |  0
>  tests/qapi-schema/alternate-conflict-enum-int.err  |  1 +
>  tests/qapi-schema/alternate-conflict-enum-int.exit |  1 +
>  tests/qapi-schema/alternate-conflict-enum-int.json |  6 +++
>  tests/qapi-schema/alternate-conflict-enum-int.out  |  0
>  tests/qapi-schema/alternate-conflict-string.err|  2 +-
>  tests/qapi-schema/alternate-conflict-string.json   |  6 +--
>  tests/qapi-schema/alternate-nested.json|  2 +-
>  tests/qapi-schema/args-alternate.json  |  2 +-
>  tests/qapi-schema/doc-bad-alternate-member.json|  2 +-
>  tests/qapi-schema/qapi-schema-test.json| 13 +++--
>  tests/qapi-schema/qapi-schema-test.out | 34 ++--
>  tests/qapi-schema/returns-alternate.json   |  2 +-
>  tests/test-clone-visitor.c | 23 
>  tests/test-keyval.c| 18 ---
>  tests/test-qobject-input-visitor.c | 62
> --
>  tests/test-qobject-output-visitor.c|  4 +-
>  util/keyval.c  | 10 ++--
>  27 files changed, 140 insertions(+), 88 deletions(-)
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-bool.err
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-bool.exit
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-bool.json
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-bool.out
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-int.err
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-int.exit
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-int.json
>  create mode 100644 tests/qapi-schema/alternate-conflict-enum-int.out
>
>
Reviewed-by: Marc-André Lureau 


-- 
Marc-André Lureau


Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Denis V. Lunev
On 05/26/2017 11:11 AM, Kevin Wolf wrote:
>
> No. This is what you bypass:
some analysis for the record.

> * All sanity checks that the block layer does
>
> * bdrv_inc/dec_in_flight(), which is required for drain to work
>   correctly. Not doing this will cause crashes.
Should not be  a problem. We are in the request, which
is accounted for the parent. Drain waits parent first and after that
switched to child.

> * tracked_request_begin/end(), mark_request_serialising() and
>   wait_serialising_requests(), which are required for serialising
>   requests to work correctly
should not cause a problem as we are protected by meta
and this area is not exposed to read operation until
fully complete. But yes, not that good.
 
> * Ensuring correct request alignment for file. This means that e.g.
>   qcow2 with cluster size 512 on a host with a 4k native disk will
>   break.
we have already checked that clusters are aligned.

> * blkdebug events
yes
> * before_write_notifiers. Not calling these will cause corrupted backups
>   if someone backups file.
no as there is no user visible data here, we are
writing zeroes where there are zeroes from guest
point of view

> * Dirty bitmap updates
same. The data from the guest point of view is not changed.
This is even good that we do not mark this areas as dirty.

> * Updating write_gen, wr_highest_offset and total_sectors
write_gen is not a problem. It will be set on actual write.
total_sectors is changed manually. Agree that this is not
really elegant.

> * Ensuring that bl.max_pwrite_zeroes and bl.pwrite_zeroes_alignment are
>   respected
yes
> And these are just the obvious things. I'm sure I missed some.
anyway, this is a good list of things to take into account
within bdrv_allocate. But it is important, that from the
guest point of view the content of COW areas is not
changed and thus we can have some important shortcuts.

Den



Re: [Qemu-devel] [PATCH v5 4/5] slirp: fix leak

2017-05-26 Thread Marc-André Lureau
On Sat, May 6, 2017 at 3:56 AM Samuel Thibault 
wrote:

> Applied to my tree, thanks!
>
>
still not in tree, please remove the unnecessary 'if' before merging

thanks!
-- 
Marc-André Lureau


Re: [Qemu-devel] [PATCH] qtest: add rtc periodic timer test

2017-05-26 Thread Paolo Bonzini


On 26/05/2017 05:21, Xiao Guangrong wrote:
> On 05/26/2017 12:03 AM, Paolo Bonzini wrote:
>> On 25/05/2017 05:19, guangrong.x...@gmail.com wrote:
>> I'm not sure I understand.  Why would clock_step(1000) not be a good
>> replacement for nsleep(1000)?
> 
> We can not. As we use the real time to compare with the time that is
> passed in the VM, however, clock_step() is not a real time based clock
> source which immediately injects a time step to the VM regardless how
> much real time elapsed.

The trick lies in not using the real time for comparison, but the number
of time steps that were injected (or directly nanoseconds), like this:

--- a/tests/rtc-periodic-test.c
+++ b/tests/rtc-periodic-test.c
@@ -21,29 +21,29 @@
 
 #include "rtc-test.h"
 
-#define RTC_PERIOD_CODE113
-#define RTC_PERIOD_CODE215
+#define RTC_PERIOD_CODE113 /* 8 Hz */
+#define RTC_PERIOD_CODE215 /* 2 Hz */
 
 #define RTC_PERIOD_TEST_NR  50
 
-static void nsleep(int64_t nsecs)
-{
-const struct timespec val = { .tv_nsec = nsecs };
-nanosleep(&val, NULL);
-}
-
-static void wait_periodic_interrupt(void)
+static uint64_t wait_periodic_interrupt(void)
 {
-while (1) {
-if (get_irq(RTC_ISA_IRQ)) {
-break;
-}
-
-/* 1 us.*/
-nsleep(1000);
+uint64_t real_time = 0;
+while (!get_irq(RTC_ISA_IRQ)) {
+/*
+* about 2 ms.  It's more than enough given the period
+* that we set above.
+*/
+   clock_step(NANOSECONDS_PER_SECOND / 512);
+   real_time += NANOSECONDS_PER_SECOND / 512;
 }
 
 g_assert((cmos_read(RTC_REG_C) & REG_C_PF) != 0);
+return real_time;
 }
 
 static void periodic_timer(void)
@@ -58,17 +58,15 @@
 /* enable periodic interrupt after properly configure the period. */
 cmos_write(RTC_REG_B, cmos_read(RTC_REG_B) | REG_B_PIE);
 
-real_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
+real_time = 0;
 
 for (i = 0; i < RTC_PERIOD_TEST_NR; i++) {
 cmos_write(RTC_REG_A, RTC_PERIOD_CODE1);
-wait_periodic_interrupt();
+real_time += wait_periodic_interrupt();
 cmos_write(RTC_REG_A, RTC_PERIOD_CODE2);
-wait_periodic_interrupt();
+real_time += wait_periodic_interrupt();
 }
 
-real_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - real_time;
-
 period_clocks = periodic_period_to_clock(RTC_PERIOD_CODE1) +
periodic_period_to_clock(RTC_PERIOD_CODE2);
 period_clocks *= RTC_PERIOD_TEST_NR;
@@ -85,7 +83,7 @@
 
 g_test_init(&argc, &argv, NULL);
 
-s = qtest_start("-rtc clock=host");
+s = qtest_start("-rtc clock=vm");
 qtest_irq_intercept_in(s, "ioapic");
 
 qtest_add_func("/rtc/periodic/interrupt", periodic_timer);


Even better, the vm_clock time can be returned directly from
clock_step(), and you can even speed up the test by using
clock_step_next():

--- a/tests/rtc-periodic-test.c
+++ b/tests/rtc-periodic-test.c
@@ -21,29 +21,19 @@
 
 #include "rtc-test.h"
 
-#define RTC_PERIOD_CODE113
-#define RTC_PERIOD_CODE215
+#define RTC_PERIOD_CODE113 /* 8 Hz */
+#define RTC_PERIOD_CODE215 /* 2 Hz */
 
 #define RTC_PERIOD_TEST_NR  50
 
-static void nsleep(int64_t nsecs)
+static uint64_t wait_periodic_interrupt(uint64_t real_time)
 {
-const struct timespec val = { .tv_nsec = nsecs };
-nanosleep(&val, NULL);
-}
-
-static void wait_periodic_interrupt(void)
-{
-while (1) {
-if (get_irq(RTC_ISA_IRQ)) {
-break;
-}
-
-/* 1 us.*/
-nsleep(1000);
+while (!get_irq(RTC_ISA_IRQ)) {
+   real_time = clock_step_next();
 }
 
 g_assert((cmos_read(RTC_REG_C) & REG_C_PF) != 0);
+return real_time;
 }
 
 static void periodic_timer(void)
@@ -51,6 +41,8 @@
 int i;
 int64_t period_clocks, period_time, real_time;
 
+real_time = clock_step(0);
+
 /* disable all interrupts. */
 cmos_write(RTC_REG_B, cmos_read(RTC_REG_B) &
~(REG_B_PIE | REG_B_AIE | REG_B_UIE));
@@ -58,17 +50,13 @@
 /* enable periodic interrupt after properly configure the period. */
 cmos_write(RTC_REG_B, cmos_read(RTC_REG_B) | REG_B_PIE);
 
-real_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-
 for (i = 0; i < RTC_PERIOD_TEST_NR; i++) {
 cmos_write(RTC_REG_A, RTC_PERIOD_CODE1);
-wait_periodic_interrupt();
+real_time = wait_periodic_interrupt(real_time);
 cmos_write(RTC_REG_A, RTC_PERIOD_CODE2);
-wait_periodic_interrupt();
+real_time = wait_periodic_interrupt(real_time);
 }
 
-real_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - real_time;
-
 period_clocks = periodic_period_to_clock(RTC_PERIOD_CODE1) +
periodic_period_to_clock(RTC_PERIOD_CODE2);
 period_clocks *= RTC_PERIOD_TEST_NR;
@@ -85,7 +73,7 @@
 
 g_test_init(&argc, &argv, NULL);
 
-s = qtest_start("-rtc clock=host");
+s = qtest_start("-rtc clock=vm");
 qtest_i

Re: [Qemu-devel] [PATCH v5 5/5] dump: fix memory_mapping_filter leak

2017-05-26 Thread Marc-André Lureau
Hi

On Thu, May 4, 2017 at 2:42 AM Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> Spotted by ASAN.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  memory_mapping.c | 1 +
>

No maintainer for this file, Wen, Paolo?



>  1 file changed, 1 insertion(+)
>
> diff --git a/memory_mapping.c b/memory_mapping.c
> index 6a39d71da2..a5d38552a6 100644
> --- a/memory_mapping.c
> +++ b/memory_mapping.c
> @@ -337,6 +337,7 @@ void memory_mapping_filter(MemoryMappingList *list,
> int64_t begin,
>  if (cur->phys_addr >= begin + length ||
>  cur->phys_addr + cur->length <= begin) {
>  QTAILQ_REMOVE(&list->head, cur, next);
> +g_free(cur);
>  list->num--;
>  continue;
>  }
> --
> 2.12.0.191.gc5d8de91d
>
>
> --
Marc-André Lureau


[Qemu-devel] [PATCH] numa-test: fix query-cpus leaks

2017-05-26 Thread Marc-André Lureau
Fix test leaks introduced in commit 2941020a476.

(and small extra space removed)

Spotted by ASAN.

Signed-off-by: Marc-André Lureau 
---
 tests/numa-test.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index c3475d6d5e..3f636840b1 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -92,7 +92,7 @@ static QList *get_cpus(QDict **resp)
 *resp = qmp("{ 'execute': 'query-cpus' }");
 g_assert(*resp);
 g_assert(qdict_haskey(*resp, "return"));
-return  qdict_get_qlist(*resp, "return");
+return qdict_get_qlist(*resp, "return");
 }
 
 static void test_query_cpus(const void *data)
@@ -100,7 +100,7 @@ static void test_query_cpus(const void *data)
 char *cli;
 QDict *resp;
 QList *cpus;
-const QObject *e;
+QObject *e;
 
 cli = make_cli(data, "-smp 8 -numa node,cpus=0-3 -numa node,cpus=4-7");
 qtest_start(cli);
@@ -124,6 +124,7 @@ static void test_query_cpus(const void *data)
 } else {
 g_assert_cmpint(node, ==, 1);
 }
+qobject_decref(e);
 }
 
 QDECREF(resp);
@@ -136,7 +137,7 @@ static void pc_numa_cpu(const void *data)
 char *cli;
 QDict *resp;
 QList *cpus;
-const QObject *e;
+QObject *e;
 
 cli = make_cli(data, "-cpu pentium -smp 8,sockets=2,cores=2,threads=2 "
 "-numa node,nodeid=0 -numa node,nodeid=1 "
@@ -176,6 +177,7 @@ static void pc_numa_cpu(const void *data)
 } else {
 g_assert(false);
 }
+qobject_decref(e);
 }
 
 QDECREF(resp);
@@ -188,7 +190,7 @@ static void spapr_numa_cpu(const void *data)
 char *cli;
 QDict *resp;
 QList *cpus;
-const QObject *e;
+QObject *e;
 
 cli = make_cli(data, "-smp 4,cores=4 "
 "-numa node,nodeid=0 -numa node,nodeid=1 "
@@ -220,6 +222,7 @@ static void spapr_numa_cpu(const void *data)
 } else {
 g_assert(false);
 }
+qobject_decref(e);
 }
 
 QDECREF(resp);
@@ -232,7 +235,7 @@ static void aarch64_numa_cpu(const void *data)
 char *cli;
 QDict *resp;
 QList *cpus;
-const QObject *e;
+QObject *e;
 
 cli = make_cli(data, "-smp 2 "
 "-numa node,nodeid=0 -numa node,nodeid=1 "
@@ -262,6 +265,7 @@ static void aarch64_numa_cpu(const void *data)
 } else {
 g_assert(false);
 }
+qobject_decref(e);
 }
 
 QDECREF(resp);
-- 
2.13.0.rc1.16.gd80b50c3f




Re: [Qemu-devel] [RFC] Is it useful to create a seperate address space for cpu-memory on x86 platform?

2017-05-26 Thread Paolo Bonzini


On 26/05/2017 11:59, Gonglei (Arei) wrote:
> The result before this commit:
> 
> linux-arei:/mnt/sdb/gonglei/opensource/qemu # ps aux|grep qemu
> root 12658 17.2  0.0 375996 21412 pts/1Sl+  12:53   0:00 
> ./x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=kvm,usb=off 
> -nodefaults
> 
> after the commit:
>  
> linux-arei:/mnt/sdb/gonglei/opensource/qemu # ps aux|grep qemu
> root 23201 17.2  0.0 384196 25736 pts/1Sl+  12:54   0:00 
> ./x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.5,accel=kvm,usb=off 
> -nodefaults
> 
> I couldn't find any benefits on x86 platform (I'm not familiar with ARM), am 
> I missing something?

Hi,

this was also reported by Anthony Xu.  I am working on reducing the time
and space cost of multiple address spaces by sharing the FlatView and
AddressSpaceDispatch.

Paolo



[Qemu-devel] [PATCH v2] nbd/client.c: use errp instead of LOG

2017-05-26 Thread Vladimir Sementsov-Ogievskiy
Move to modern errp scheme from just LOGging errors.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Fixes:

- local_err initialized to NULL
- 083 iotest ajusted

\I feel like an idiot...

 block/nbd-client.c |  7 ++-
 include/block/nbd.h|  5 +++--
 nbd/client.c   | 30 +-
 qemu-nbd.c |  3 ++-
 tests/qemu-iotests/083.out |  2 ++
 5 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 538d95e031..09d955bc4d 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -28,6 +28,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "nbd-client.h"
 
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
@@ -70,10 +71,14 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 NBDClientSession *s = opaque;
 uint64_t i;
 int ret;
+Error *local_err = NULL;
 
 for (;;) {
 assert(s->reply.handle == 0);
-ret = nbd_receive_reply(s->ioc, &s->reply);
+ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
+if (ret < 0) {
+error_report_err(local_err);
+}
 if (ret <= 0) {
 break;
 }
diff --git a/include/block/nbd.h b/include/block/nbd.h
index a96fb5fceb..e0c64e4981 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -133,9 +133,10 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char 
*name, uint16_t *flags,
   QCryptoTLSCreds *tlscreds, const char *hostname,
   QIOChannel **outioc,
   off_t *size, Error **errp);
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size);
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size,
+ Error **errp);
 ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest *request);
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply);
+ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 
diff --git a/nbd/client.c b/nbd/client.c
index f102375504..f9e1d75be4 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -627,11 +627,13 @@ fail:
 }
 
 #ifdef __linux__
-int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t flags, off_t size,
+ Error **errp)
 {
 unsigned long sectors = size / BDRV_SECTOR_SIZE;
 if (size / BDRV_SECTOR_SIZE != sectors) {
-LOG("Export size %lld too large for 32-bit kernel", (long long) size);
+error_setg(errp, "Export size %lld too large for 32-bit kernel",
+   (long long) size);
 return -E2BIG;
 }
 
@@ -639,7 +641,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_SOCK, (unsigned long) sioc->fd) < 0) {
 int serrno = errno;
-LOG("Failed to set NBD socket");
+error_setg(errp, "Failed to set NBD socket");
 return -serrno;
 }
 
@@ -647,7 +649,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_BLKSIZE, (unsigned long)BDRV_SECTOR_SIZE) < 0) {
 int serrno = errno;
-LOG("Failed setting NBD block size");
+error_setg(errp, "Failed setting NBD block size");
 return -serrno;
 }
 
@@ -659,7 +661,7 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, NBD_SET_SIZE_BLOCKS, sectors) < 0) {
 int serrno = errno;
-LOG("Failed setting size (in blocks)");
+error_setg(errp, "Failed setting size (in blocks)");
 return -serrno;
 }
 
@@ -670,12 +672,12 @@ int nbd_init(int fd, QIOChannelSocket *sioc, uint16_t 
flags, off_t size)
 
 if (ioctl(fd, BLKROSET, (unsigned long) &read_only) < 0) {
 int serrno = errno;
-LOG("Failed setting read-only attribute");
+error_setg(errp, "Failed setting read-only attribute");
 return -serrno;
 }
 } else {
 int serrno = errno;
-LOG("Failed setting flags");
+error_setg(errp, "Failed setting flags");
 return -serrno;
 }
 }
@@ -723,8 +725,10 @@ int nbd_disconnect(int fd)
 }
 
 #else
-int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size)
+int nbd_init(int fd, QIOChannelSocket *ioc, uint16_t flags, off_t size,
+ Error **errp)
 {
+error_setg(errp, "nbd_init is not supported");
 return -ENOTSUP;
 }
 
@@ -758,19 +762,19 @@ ssize_t nbd_send_request(QIOChannel *ioc, NBDRequest 
*request)
 return write_sync(ioc, buf, sizeof(buf), NULL);
 }
 
-ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply)
+ssize_t nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
 uint8_t buf[NBD_REPLY_SIZE];

Re: [Qemu-devel] [PATCH] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Mao Zhongyi

Hi, Markus

On 05/26/2017 05:43 PM, Igor Mammedov wrote:

On Fri, 26 May 2017 16:29:25 +0800
Mao Zhongyi  wrote:


ioh3420_interrupts_init() and its callers, rp_realize() and
pci_qdev_realize() fill error message to local_err, then
propagate it to errp by error_porpagate(), which's not necessary.
So eliminate it and pass errp directly instead of local_err.
Of course, error_propagate() also will be removed.

Signed-off-by: Mao Zhongyi 
---

[...]


diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 98ccc27..0d4a3ff 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1982,7 +1982,6 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 {
 PCIDevice *pci_dev = (PCIDevice *)qdev;
 PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
-Error *local_err = NULL;
 PCIBus *bus;
 bool is_default_rom;

@@ -1999,9 +1998,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 return;

 if (pc->realize) {
-pc->realize(pci_dev, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+pc->realize(pci_dev, errp);
+if (errp && *errp) {
 do_pci_unregister_device(pci_dev);
 return;
 }
@@ -2014,9 +2012,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
 is_default_rom = true;
 }

-pci_add_option_rom(pci_dev, is_default_rom, &local_err);
-if (local_err) {
-error_propagate(errp, local_err);
+pci_add_option_rom(pci_dev, is_default_rom, errp);
+if (errp && *errp) {
 pci_qdev_unrealize(DEVICE(pci_dev), NULL);
 return;
 }


dropping local_error here looks wrong since it's used
to check error status inside these functions and to undo
side effects in case of failure.

consider if caller pass errp = NULL
then error handling path won't be executed.


Thanks for you detailed explanation, I ignored this question
actually. The specified error message will not be converted to
Error. Because error handling function returned directly when
caller pass errp = NULL, right?



So keep the rest of the patch but drop above hunks.


I will drop this part of wrong process and commit the next
version right away.

Thanks a lot.
Mao











Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Kevin Wolf
Am 26.05.2017 um 10:57 hat Denis V. Lunev geschrieben:
> On 05/26/2017 11:11 AM, Kevin Wolf wrote:
> > It seems much clearer to me that simply unifying the three write
> > requests into a single one is an improvement. And it's easy to do, I
> > even had a patch once to do this. The reason that I didn't send it was
> > that it seemed to conflict with the data cache approach
> These changes help a lot on 2 patterns:
> - writing 4Kb into the middle of the block. The bigger the block size,
>   more we gain
> - sequential write, which is not sector aligned and comes in 2 requests:
>   we perform allocation, which should be significantly faster than actual
>   write and after that both parts of the write can be executed in parallel.
> At my opinion both cases are frequent and important.

Both cases are helped somewhat by reducing the number of I/O requests.
So as a first step, I really think we need to get Berto's patches ready
to be merged.

As for this series, there is one important restriction that you don't
mention: Both cases only benefit from the changes when there is no
backing file (or the backing file is sparse at the given cluster). The
main reason why people use qcow2 is for snapshots, so qcow2 without or
with an empty backing file is actually not my primary concern.

To be honest, I am concerned a bit by the additional complexity brought
in by this series (which also seemed to be more tacked on rather than
fully integrated at the first sight), but I'll have to study it in more
detail.


Maybe another thing to keep in mind is that the sequential write case
can completely avoid COW in theory. I wonder whether we should make
another attempt at this. Previously, it was the complexity of my own
prototypes that stopped me from actually merging them.

Kevin



Re: [Qemu-devel] [PATCH 0/3] arm/virt: refine virt.c code and implement hot_add_cpu interface

2017-05-26 Thread Igor Mammedov
On Fri, 26 May 2017 17:21:05 +0800
Li Zhang  wrote:

> From: Li Zhang 
> 
> virt machine doesn't support hot_add_cpu interface. This patchset is to 
> implement
> hot_add_cpu interface. A CPU can be added by QMP command with QEMU monitor.
> 
> Here is command to add a CPU with QMP command.
> 
> * qemu-system-aarch64 -machine virt -cpu cortex-a15 -smp 1,maxcpus=4 \
> -monitor telnet:127.0.0.1:,server,nowait -nographic
> * connect monitor:  telnet 127.0.0.1 
> * execute qmp command:  cpu-add 1
cpu-add command shouldn't be used for new cpu hotplug,
pls use generic device_add for that.

The last time I looked at it, virt machine needed quite a bit of
re-factoring of the way it creates/wires up CPUs.

Pls see usage of following callbacks for example on how to implement
device_add based cpu hotplug:

  get_hotplug_handler
  pc_cpu_pre_plug
  pc_cpu_plug
  pc_possible_cpu_arch_ids

and pc_cpus_init() for initial cpu creation with above callbacks in use

> 
> Currently, when KVM is enabled and add a CPU with QMP command, QEMU 
> reports error "kvm_init_vcpu failed: Device or resourc busy". KVM can't
> create a new CPU when vgic has been initialized and irqchip_in_kernel in
> function kvm_arch_vcpu_create. It needs to change KVM code in the future.
if KVM isn't capable do it yet, then qemu should refuse cpu hotplug if running
with KVM accelarator and allow it only in TCG mode.

> 
> Li Zhang (3):
>   arm/virt: Refine fdt_add_cpu_nodes code
>   arm/virt: Refine code of machvirt_init
>   arm/virt: Implement hot_add_cpu interface
> 
>  hw/arm/virt.c | 344 
> ++
>  1 file changed, 202 insertions(+), 142 deletions(-)
> 




[Qemu-devel] [PATCH V2 3/3] COLO-compare: Add remote initialization and checkpoint notification

2017-05-26 Thread Zhang Chen
Initialize remote communication socket and make a handshack with remote 
colo-frame.
Then use this way to notify remote colo-frame do checkpoint.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index c968597..34e31e1 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -100,6 +100,12 @@ enum {
 SECONDARY_IN,
 };
 
+typedef enum ProxyRemoteRequest {
+USERSPACE_PROXY_INIT = 0,
+USERSPACE_PROXY_GET_REMOTE_INIT = 1,
+USERSPACE_PROXY_NOTIFY_CHECKPOINT = 2
+} ProxyRemoteRequest;
+
 static int compare_chr_send(CharBackend *out,
 const uint8_t *buf,
 uint32_t size);
@@ -498,6 +504,18 @@ static void colo_compare_connection(void *opaque, void 
*user_data)
 trace_colo_compare_main("packet different");
 g_queue_push_tail(&conn->primary_list, pkt);
 /* TODO: colo_notify_checkpoint();*/
+/*
+ * If we have notify_dev that means COLO didn't running on KVM,
+ * So we use remote checkpoint notify, Xen is the first user.
+ */
+if (s->notify_dev) {
+ProxyRemoteRequest msg = USERSPACE_PROXY_NOTIFY_CHECKPOINT;
+ret = compare_chr_send(&s->chr_notify_dev, (uint8_t *)msg,
+   sizeof(msg));
+if (ret < 0) {
+error_report("Notify remote COLO-frame failed");
+}
+}
 break;
 }
 }
@@ -713,7 +731,21 @@ static void compare_sec_rs_finalize(SocketReadState 
*sec_rs)
 
 static void compare_notify_rs_finalize(SocketReadState *notify_rs)
 {
+CompareState *s = container_of(notify_rs, CompareState, notify_rs);
+
 /* Get remote colo-frame's notify and handle the message */
+char *data = g_memdup(notify_rs->buf, notify_rs->packet_len);
+ProxyRemoteRequest msg = USERSPACE_PROXY_GET_REMOTE_INIT;
+ProxyRemoteRequest msg_recv = *data;
+int ret;
+
+if (msg_recv == USERSPACE_PROXY_INIT) {
+ret = compare_chr_send(&s->chr_notify_dev, (uint8_t *)msg,
+   sizeof(msg));
+if (ret < 0) {
+error_report("Notify remote COLO-frame INIT failed");
+}
+}
 }
 
 /*
@@ -784,7 +816,7 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 !qemu_chr_fe_init(&s->chr_notify_dev, chr, errp)) {
 return;
 }
-net_socket_rs_init(&s->notify_dev, compare_notify_rs_finalize);
+net_socket_rs_init(&s->notify_rs, compare_notify_rs_finalize);
 }
 
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
-- 
2.7.4






[Qemu-devel] [PATCH V2 2/3] COLO-compare: Add remote checkpoint notify chardev socket handler frame

2017-05-26 Thread Zhang Chen
Add chardev handler to get remote colo-frame's notify, Xen colo is the first 
user.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 41 ++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 6f8c439..c968597 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -72,8 +72,10 @@ typedef struct CompareState {
 CharBackend chr_pri_in;
 CharBackend chr_sec_in;
 CharBackend chr_out;
+CharBackend chr_notify_dev;
 SocketReadState pri_rs;
 SocketReadState sec_rs;
+SocketReadState notify_rs;
 
 /* connection list: the connections belonged to this NIC could be found
  * in this list.
@@ -567,6 +569,19 @@ static void compare_sec_chr_in(void *opaque, const uint8_t 
*buf, int size)
 }
 }
 
+static void compare_notify_chr(void *opaque, const uint8_t *buf, int size)
+{
+CompareState *s = COLO_COMPARE(opaque);
+int ret;
+
+ret = net_fill_rstate(&s->notify_rs, buf, size);
+if (ret == -1) {
+qemu_chr_fe_set_handlers(&s->chr_notify_dev, NULL, NULL, NULL,
+ NULL, NULL, true);
+error_report("colo-compare notify_dev error");
+}
+}
+
 /*
  * Check old packet regularly so it can watch for any packets
  * that the secondary hasn't produced equivalents of.
@@ -589,9 +604,11 @@ static void *colo_compare_thread(void *opaque)
 s->worker_context = g_main_context_new();
 
 qemu_chr_fe_set_handlers(&s->chr_pri_in, compare_chr_can_read,
-  compare_pri_chr_in, NULL, s, s->worker_context, 
true);
+ compare_pri_chr_in, NULL, s, s->worker_context, 
true);
 qemu_chr_fe_set_handlers(&s->chr_sec_in, compare_chr_can_read,
-  compare_sec_chr_in, NULL, s, s->worker_context, 
true);
+ compare_sec_chr_in, NULL, s, s->worker_context, 
true);
+qemu_chr_fe_set_handlers(&s->chr_notify_dev, compare_chr_can_read,
+ compare_notify_chr, NULL, s, s->worker_context, 
true);
 
 s->compare_loop = g_main_loop_new(s->worker_context, FALSE);
 
@@ -694,6 +711,10 @@ static void compare_sec_rs_finalize(SocketReadState 
*sec_rs)
 }
 }
 
+static void compare_notify_rs_finalize(SocketReadState *notify_rs)
+{
+/* Get remote colo-frame's notify and handle the message */
+}
 
 /*
  * Return 0 is success.
@@ -757,6 +778,15 @@ static void colo_compare_complete(UserCreatable *uc, Error 
**errp)
 return;
 }
 
+/* This parameter is Optional, Remote checkpoint notify need it */
+if (s->notify_dev) {
+if (find_and_check_chardev(&chr, s->notify_dev, errp) ||
+!qemu_chr_fe_init(&s->chr_notify_dev, chr, errp)) {
+return;
+}
+net_socket_rs_init(&s->notify_dev, compare_notify_rs_finalize);
+}
+
 net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
 net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
 
@@ -825,6 +855,7 @@ static void colo_compare_finalize(Object *obj)
 qemu_chr_fe_set_handlers(&s->chr_sec_in, NULL, NULL, NULL, NULL,
  s->worker_context, true);
 qemu_chr_fe_deinit(&s->chr_out);
+qemu_chr_fe_deinit(&s->chr_notify_dev);
 
 g_main_loop_quit(s->compare_loop);
 qemu_thread_join(&s->thread);
@@ -838,7 +869,11 @@ static void colo_compare_finalize(Object *obj)
 g_free(s->pri_indev);
 g_free(s->sec_indev);
 g_free(s->outdev);
-g_free(s->notify_dev);
+
+/* This parameter is Optional */
+if (s->notify_dev) {
+g_free(s->notify_dev);
+}
 }
 
 static const TypeInfo colo_compare_info = {
-- 
2.7.4






[Qemu-devel] [PATCH V2 0/3] COLO-compare: Make COLO-compare support remote COLO-frame

2017-05-26 Thread Zhang Chen
This series focus on COLO-proxy remote colo-frame support.
Xen COLO-frame is the first user. We add a new chardev socket
in colo-compare as the way of communicate with remote COLO-frame.
And remote COLO-frame notify colo-proxy part depend on this serise:
https://lists.nongnu.org/archive/html/qemu-devel/2017-04/msg03904.html

I will send another part of this series after depend patchset have
been merged.

V2:
 - Rename this series.
 - Change communication way to remote colo-frame.
 - Some bugfix.
 - Split the main function, anther part wait depend patchset.


Zhang Chen (3):
  COLO-compare: Add new parameter for communicate with remote colo-frame
  COLO-compare: Add remote checkpoint notify chardev socket handler
frame
  COLO-compare: Add remote initialization and checkpoint notification

 net/colo-compare.c | 91 --
 qemu-options.hx| 41 
 2 files changed, 124 insertions(+), 8 deletions(-)

-- 
2.7.4






[Qemu-devel] [PATCH V2 1/3] COLO-compare: Add new parameter for communicate with remote colo-frame

2017-05-26 Thread Zhang Chen
We add the "notify_dev=chardevID" parameter. colo-compare can connect with
remote colo-frame through chardev socket, Xen colo-frame is the first user.
We can notify colo-frame do checkpoint events.

Signed-off-by: Zhang Chen 
---
 net/colo-compare.c | 20 
 qemu-options.hx| 41 +++--
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 2639c7f..6f8c439 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -68,6 +68,7 @@ typedef struct CompareState {
 char *pri_indev;
 char *sec_indev;
 char *outdev;
+char *notify_dev;
 CharBackend chr_pri_in;
 CharBackend chr_sec_in;
 CharBackend chr_out;
@@ -653,6 +654,21 @@ static void compare_set_outdev(Object *obj, const char 
*value, Error **errp)
 s->outdev = g_strdup(value);
 }
 
+static char *compare_get_notify_dev(Object *obj, Error **errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+return g_strdup(s->notify_dev);
+}
+
+static void compare_set_notify_dev(Object *obj, const char *value, Error 
**errp)
+{
+CompareState *s = COLO_COMPARE(obj);
+
+g_free(s->notify_dev);
+s->notify_dev = g_strdup(value);
+}
+
 static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 {
 CompareState *s = container_of(pri_rs, CompareState, pri_rs);
@@ -795,6 +811,9 @@ static void colo_compare_init(Object *obj)
 object_property_add_str(obj, "outdev",
 compare_get_outdev, compare_set_outdev,
 NULL);
+object_property_add_str(obj, "notify_dev",
+compare_get_notify_dev, compare_set_notify_dev,
+NULL);
 }
 
 static void colo_compare_finalize(Object *obj)
@@ -819,6 +838,7 @@ static void colo_compare_finalize(Object *obj)
 g_free(s->pri_indev);
 g_free(s->sec_indev);
 g_free(s->outdev);
+g_free(s->notify_dev);
 }
 
 static const TypeInfo colo_compare_info = {
diff --git a/qemu-options.hx b/qemu-options.hx
index f63f7dc..1f3b3d2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4092,23 +4092,26 @@ Dump the network traffic on netdev @var{dev} to the 
file specified by
 The file format is libpcap, so it can be analyzed with tools such as tcpdump
 or Wireshark.
 
-@item -object 
colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},
-outdev=@var{chardevid}
+@item -object 
colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid},notify_dev=@var{chardevid}
 
-Colo-compare gets packet from primary_in@var{chardevid} and 
secondary_in@var{chardevid}, than compare primary packet with
-secondary packet. If the packets are same, we will output primary
-packet to outdev@var{chardevid}, else we will notify colo-frame
+Colo-compare gets packet from primary_in@var{chardevid} and 
secondary_in@var{chardevid},
+than compare primary packet with secondary packet. If the packets are same,
+we will output primary packet to outdev@var{chardevid},else we will notify 
colo-frame
 do checkpoint and send primary packet to outdev@var{chardevid}.
+If we use Xen COLO, will use notify_dev to notify remote colo-frame do 
checkpoint(like Xen),
+at the same time send primary packet to outdev@var{chardevid}.
 
 we must use it with the help of filter-mirror and filter-redirector.
 
 @example
 
+KVM COLO
+
 primary:
 -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
 -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
 -chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
--chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,wait
 -chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
 -chardev socket,id=compare0-0,host=3.3.3.3,port=9001
 -chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
@@ -4126,6 +4129,32 @@ secondary:
 -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
 -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
 
+
+Xen COLO
+
+primary:
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
+-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
+-chardev socket,id=notify_way,host=3.3.3.3,port=9009,server,nowait
+-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0
+-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out
+-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0
+-object 
colo-compare,id=comp0,primary_in=compare0-0,s

Re: [Qemu-devel] [PULL v2 00/20] Misc patches for 2017-05-19

2017-05-26 Thread Paolo Bonzini
On 24/05/2017 18:11, Paolo Bonzini wrote:
> The following changes since commit 56821559f0ba682fe6b367815572e6f974d329ab:
> 
>   Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging 
> (2017-05-18 13:36:15 +0100)
> 
> are available in the git repository at:
> 
>   git://github.com/bonzini/qemu.git tags/for-upstream
> 
> for you to fetch changes up to 46465ffebc75edd75091f4eb0963fd48ce7c02b6:
> 
>   target/i386: use multiple CPU AddressSpaces (2017-05-24 18:09:03 +0200)
> 
> v2 fixes coding style errors in vhost-user-scsi.

NACK, Vladimir sent an updated patch.

Paolo

> 
> * virtio-scsi use-after-free fix (Fam)
> * vhost-user-scsi support (Felipe)
> * SMM fixes and improvements for TCG (myself)
> * irqchip and AddressSpaceDispatch cleanups and fixes (Peter)
> * Coverity fix (Stefano)
> * NBD cleanups (Vladimir)
> * RTC accuracy improvements and code cleanups (Guangrong+Yunfang)
> 
> 
> Fam Zheng (1):
>   virtio-scsi: Unset hotplug handler when unrealize
> 
> Felipe Franciosi (2):
>   vhost-user-scsi: Introduce vhost-user-scsi host device
>   vhost-user-scsi: Introduce a vhost-user-scsi sample application
> 
> Paolo Bonzini (2):
>   target/i386: enable A20 automatically in system management mode
>   target/i386: use multiple CPU AddressSpaces
> 
> Peter Xu (4):
>   kvm: irqchip: trace changes on msi add/remove
>   msix: trace control bit write op
>   kvm: irqchip: skip update msi when disabled
>   exec: simplify phys_page_find() params
> 
> Stefano Stabellini (1):
>   Check the return value of fcntl in qemu_set_cloexec
> 
> Tai Yunfang (1):
>   mc146818rtc: precisely count the clock for periodic timer
> 
> Vladimir Sementsov-Ogievskiy (5):
>   nbd: strict nbd_wr_syncv
>   nbd: read_sync and friends: return 0 on success
>   nbd: add errp parameter to nbd_wr_syncv()
>   nbd: add errp to read_sync, write_sync and drop_sync
>   nbd/client.c: use errp instead of LOG
> 
> Xiao Guangrong (4):
>   mc146818rtc: update periodic timer only if it is needed
>   mc146818rtc: ensure LOST_TICK_POLICY_SLEW is only enabled on TARGET_I386
>   mc146818rtc: drop unnecessary '#ifdef TARGET_I386'
>   mc146818rtc: embrace all x86 specific code
> 
>  .gitignore|   1 +
>  Makefile  |   3 +
>  Makefile.objs |   4 +
>  block/nbd-client.c|  11 +-
>  contrib/libvhost-user/libvhost-user.h |  11 +-
>  contrib/vhost-user-scsi/Makefile.objs |   1 +
>  contrib/vhost-user-scsi/vhost-user-scsi.c | 886 
> ++
>  default-configs/pci.mak   |   1 +
>  default-configs/s390x-softmmu.mak |   1 +
>  exec.c|  13 +-
>  hw/pci/msix.c |  11 +-
>  hw/pci/trace-events   |   3 +
>  hw/scsi/Makefile.objs |   1 +
>  hw/scsi/vhost-user-scsi.c | 211 +++
>  hw/scsi/virtio-scsi.c |   3 +
>  hw/timer/mc146818rtc.c| 206 ---
>  hw/virtio/virtio-pci.c|  54 ++
>  hw/virtio/virtio-pci.h|  11 +
>  include/block/nbd.h   |   8 +-
>  include/hw/virtio/vhost-user-scsi.h   |  35 ++
>  include/hw/virtio/virtio-scsi.h   |   2 +
>  kvm-all.c |   4 +-
>  nbd/client.c  | 125 ++---
>  nbd/common.c  |  23 +-
>  nbd/nbd-internal.h|  40 +-
>  nbd/server.c  |  92 ++--
>  qemu-nbd.c|   3 +-
>  target/i386/arch_memory_mapping.c |  18 +-
>  target/i386/cpu.c |  15 +-
>  target/i386/cpu.h |  20 +-
>  target/i386/helper.c  |  96 ++--
>  target/i386/kvm.c |  12 +-
>  target/i386/machine.c |   4 -
>  target/i386/smm_helper.c  |  18 -
>  trace-events  |   3 +-
>  util/oslib-posix.c|   4 +-
>  36 files changed, 1643 insertions(+), 311 deletions(-)
>  create mode 100644 contrib/vhost-user-scsi/Makefile.objs
>  create mode 100644 contrib/vhost-user-scsi/vhost-user-scsi.c
>  create mode 100644 hw/scsi/vhost-user-scsi.c
>  create mode 100644 include/hw/virtio/vhost-user-scsi.h
> 




Re: [Qemu-devel] [PATCH v1 01/13] qcow2: alloc space for COW in one chunk

2017-05-26 Thread Kevin Wolf
Am 26.05.2017 um 12:57 hat Denis V. Lunev geschrieben:
> On 05/26/2017 11:11 AM, Kevin Wolf wrote:
> >
> > No. This is what you bypass:
> some analysis for the record.
> [...]

> anyway, this is a good list of things to take into account
> within bdrv_allocate. But it is important, that from the
> guest point of view the content of COW areas is not
> changed and thus we can have some important shortcuts.

Yes, not everything in this list is an actual problem for this specific
case. Some things are, though, and if you have to make an analysis like
this, that's already a sign that something is wrong.

The real point I wanted to make is that the bdrv_*() wrappers do a whole
lot of useful things (half of which everyone, including me, would
probably forget when listing them without looking at the code) - and on
top of this, when making changes to the block layer, we have an
expectation that every request consistently goes through the block/io.c
functions. So not wanting one of these useful things in a special case
isn't a good enough reason to completey bypass them.

If we really need to avoid some part of a function, we can always add
another flag or something that makes that part optional.

Kevin



Re: [Qemu-devel] [Qemu-block] Behavior of QMP "query-block"

2017-05-26 Thread Kevin Wolf
Am 25.05.2017 um 20:44 hat Bruno Alvisio geschrieben:
> Hello John,
> 
> Thanks. Yes, typo when I wrote the e-mail.
> 
> It might be possible that XEN does that (I will ask in the XEN forum). 
> However,
> I have noticed that it is not the case for all of the VMs I have launched. In
> some of them I can query the block devices even after a long time the VM has
> been running.
> 
> I was wondering if the device is removed in case the disk is corrupted or
> something. When I mention it runs normally, I can log into ithe VM and perform
> read/write operations such creating files.
> 
> Also, is there an easy way to log all qemu events so that I can see what might
> be going on with the 'ide' device?

There is one completely crazy thing that Xen does with respect to disks.
Instead of having support for their PV disks (i.e. virtio-blk, just
different) in the BIOS, they add _both_ an IDE disk and a PV disk to the
VM, so that the bootloader or non-PV-aware guest OSes can access the IDE
disk, for which they most certainly do have drivers. As soon as a driver
for the PV disk is loaded, however, that driver calls a hypervisor
function that removes all the IDE disks from the VM and leaves only the
PV ones there, so that the PV-aware guest doesn't see two same disks.

I suspect that what you're seeing initially is the IDE disks, and when
the PV driver is loaded, they disappear.

Kevin



Re: [Qemu-devel] [PATCH 07/13] chardev: serial & parallel declaration to own headers

2017-05-26 Thread Marc-André Lureau
Hi

On Tue, May 9, 2017 at 3:49 PM Paolo Bonzini  wrote:

>
>
> On 09/05/2017 13:33, Marc-André Lureau wrote:
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  include/chardev/char-parallel.h | 20 +++-
> >  include/chardev/char-serial.h   | 22 ++
> >  include/chardev/char.h  | 36
> 
> >  hw/arm/strongarm.c  |  2 +-
> >  hw/bt/hci-csr.c |  2 +-
> >  hw/char/cadence_uart.c  |  2 +-
> >  hw/char/escc.c  |  2 +-
> >  hw/char/exynos4210_uart.c   |  2 +-
> >  hw/char/parallel.c  |  2 +-
> >  hw/char/serial.c|  2 +-
> >  hw/usb/dev-serial.c |  2 +-
> >  11 files changed, 49 insertions(+), 45 deletions(-)
> >
> > diff --git a/include/chardev/char-parallel.h
> b/include/chardev/char-parallel.h
> > index 26742f9d5c..3284a1b96b 100644
> > --- a/include/chardev/char-parallel.h
> > +++ b/include/chardev/char-parallel.h
> > @@ -24,9 +24,27 @@
> >  #ifndef CHAR_PARALLEL_H
> >  #define CHAR_PARALLEL_H
> >
> > -#if defined(__linux__) || defined(__FreeBSD__) || \
> > +#include "chardev/char.h"
> > +
> > +#if defined(__linux__) || defined(__FreeBSD__) ||   \
> >  defined(__FreeBSD_kernel__) || defined(__DragonFly__)
> >  #define HAVE_CHARDEV_PARPORT 1
> >  #endif
> >
> > +#define CHR_IOCTL_PP_READ_DATA3
> > +#define CHR_IOCTL_PP_WRITE_DATA   4
> > +#define CHR_IOCTL_PP_READ_CONTROL 5
> > +#define CHR_IOCTL_PP_WRITE_CONTROL6
> > +#define CHR_IOCTL_PP_READ_STATUS  7
> > +#define CHR_IOCTL_PP_EPP_READ_ADDR8
> > +#define CHR_IOCTL_PP_EPP_READ 9
> > +#define CHR_IOCTL_PP_EPP_WRITE_ADDR  10
> > +#define CHR_IOCTL_PP_EPP_WRITE   11
> > +#define CHR_IOCTL_PP_DATA_DIR12
> > +
> > +struct ParallelIOArg {
> > +void *buffer;
> > +int count;
> > +};
> > +
> >  #endif /* CHAR_PARALLEL_H */
> > diff --git a/include/chardev/char-serial.h
> b/include/chardev/char-serial.h
> > index 64a27f63b1..cb2e59e82a 100644
> > --- a/include/chardev/char-serial.h
> > +++ b/include/chardev/char-serial.h
> > @@ -24,6 +24,8 @@
> >  #ifndef CHAR_SERIAL_H
> >  #define CHAR_SERIAL_H
> >
> > +#include "chardev/char.h"
> > +
> >  #ifdef _WIN32
> >  #define HAVE_CHARDEV_SERIAL 1
> >  #elif defined(__linux__) || defined(__sun__) || defined(__FreeBSD__)
> \
> > @@ -32,4 +34,24 @@
> >  #define HAVE_CHARDEV_SERIAL 1
> >  #endif
> >
> > +#define CHR_IOCTL_SERIAL_SET_PARAMS   1
> > +typedef struct {
> > +int speed;
> > +int parity;
> > +int data_bits;
> > +int stop_bits;
> > +} QEMUSerialSetParams;
> > +
> > +#define CHR_IOCTL_SERIAL_SET_BREAK2
> > +
> > +#define CHR_IOCTL_SERIAL_SET_TIOCM   13
> > +#define CHR_IOCTL_SERIAL_GET_TIOCM   14
> > +
> > +#define CHR_TIOCM_CTS   0x020
> > +#define CHR_TIOCM_CAR   0x040
> > +#define CHR_TIOCM_DSR   0x100
> > +#define CHR_TIOCM_RI0x080
> > +#define CHR_TIOCM_DTR   0x002
> > +#define CHR_TIOCM_RTS   0x004
> > +
> >  #endif
> > diff --git a/include/chardev/char.h b/include/chardev/char.h
> > index ea9f2cb7d6..0e1ef1ea4f 100644
> > --- a/include/chardev/char.h
> > +++ b/include/chardev/char.h
> > @@ -27,42 +27,6 @@ typedef enum {
> >
> >  #define CHR_READ_BUF_LEN 4096
> >
> > -#define CHR_IOCTL_SERIAL_SET_PARAMS   1
> > -typedef struct {
> > -int speed;
> > -int parity;
> > -int data_bits;
> > -int stop_bits;
> > -} QEMUSerialSetParams;
> > -
> > -#define CHR_IOCTL_SERIAL_SET_BREAK2
> > -
> > -#define CHR_IOCTL_PP_READ_DATA3
> > -#define CHR_IOCTL_PP_WRITE_DATA   4
> > -#define CHR_IOCTL_PP_READ_CONTROL 5
> > -#define CHR_IOCTL_PP_WRITE_CONTROL6
> > -#define CHR_IOCTL_PP_READ_STATUS  7
> > -#define CHR_IOCTL_PP_EPP_READ_ADDR8
> > -#define CHR_IOCTL_PP_EPP_READ 9
> > -#define CHR_IOCTL_PP_EPP_WRITE_ADDR  10
> > -#define CHR_IOCTL_PP_EPP_WRITE   11
> > -#define CHR_IOCTL_PP_DATA_DIR12
> > -
> > -struct ParallelIOArg {
> > -void *buffer;
> > -int count;
> > -};
> > -
> > -#define CHR_IOCTL_SERIAL_SET_TIOCM   13
> > -#define CHR_IOCTL_SERIAL_GET_TIOCM   14
> > -
> > -#define CHR_TIOCM_CTS0x020
> > -#define CHR_TIOCM_CAR0x040
> > -#define CHR_TIOCM_DSR0x100
> > -#define CHR_TIOCM_RI 0x080
> > -#define CHR_TIOCM_DTR0x002
> > -#define CHR_TIOCM_RTS0x004
> > -
> >  typedef void IOEventHandler(void *opaque, int event);
> >
> >  typedef enum {
>
> Hmm, this makes the previous patch more desirable.
>

Yeah, I think it's a bit simpler if we put all chardev/ headers under
include/chardev, since files outside chardev/ include headers like
char-serial.h or char-parallel.h and it helps cleaning the headers.

We could limit it to headers accessed outside chardev/, that is probably
mainly -serial and -parellel (I haven't checked)


> > diff --git a/hw/arm/strongarm.c b/hw/arm/strongarm.c
> > index 66cad198d4..967caea749 100644
> > --- a/hw/arm/strongarm.c
> > +++ b/

[Qemu-devel] [PATCH v2] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Mao Zhongyi
ioh3420_interrupts_init() and its callers rp_realize() fill error
message to local_err, then propagate it to errp by error_porpagate(),
which's not necessary. So eliminate it and pass errp directly instead
of local_err. Of course, error_propagate() also has been removed.

Signed-off-by: Mao Zhongyi 
---
v2:
* Drop the part of wrong handling that ignored the fact that if 
  pci_qdev_realize()'s caller pass errp = NULL.

 hw/pci-bridge/ioh3420.c| 4 +---
 hw/pci-bridge/pcie_root_port.c | 7 ++-
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index da4e5bd..5f56a2f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -64,15 +64,13 @@ static uint8_t ioh3420_aer_vector(const PCIDevice *d)
 static int ioh3420_interrupts_init(PCIDevice *d, Error **errp)
 {
 int rc;
-Error *local_err = NULL;
 
 rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
-  &local_err);
+  errp);
 if (rc < 0) {
 assert(rc == -ENOTSUP);
-error_propagate(errp, local_err);
 }
 
 return rc;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index cf36318..9022996 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -59,7 +59,6 @@ static void rp_realize(PCIDevice *d, Error **errp)
 PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
 PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
 int rc;
-Error *local_err = NULL;
 
 pci_config_set_interrupt_pin(d->config, 1);
 pci_bridge_initfn(d, TYPE_PCIE_BUS);
@@ -72,9 +71,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
 }
 
 if (rpc->interrupts_init) {
-rc = rpc->interrupts_init(d, &local_err);
+rc = rpc->interrupts_init(d, errp);
 if (rc < 0) {
-error_propagate(errp, local_err);
 goto err_bridge;
 }
 }
@@ -98,9 +96,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
 }
 
 rc = pcie_aer_init(d, PCI_ERR_VER, rpc->aer_offset,
-   PCI_ERR_SIZEOF, &local_err);
+   PCI_ERR_SIZEOF, errp);
 if (rc < 0) {
-error_propagate(errp, local_err);
 goto err;
 }
 pcie_aer_root_init(d);
-- 
2.9.3






Re: [Qemu-devel] [PATCH v7 09/20] qcow: convert QCow to use QCryptoBlock for encryption

2017-05-26 Thread Alberto Garcia
On Thu 25 May 2017 06:38:40 PM CEST, Daniel P. Berrange wrote:
> @@ -105,6 +116,13 @@ static int qcow_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  int ret;
>  QCowHeader header;
>  Error *local_err = NULL;
> +QCryptoBlockOpenOptions *crypto_opts = NULL;
> +unsigned int cflags = 0;
> +QDict *encryptopts = NULL;
> +const char *encryptfmt;
> +
> +qdict_extract_subqdict(options, &encryptopts, "encrypt.");
> +encryptfmt = qdict_get_try_str(encryptopts, "format");
>  
>  bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
> false, errp);
>  if (!bs->file) {
>  return -EINVAL;
>  }

You're leaking encryptopts if the function returns here.

> @@ -873,6 +850,20 @@ static int qcow_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  goto exit;
>  }
>  header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
> +
> +crypto_opts = block_crypto_create_opts_init(
> +Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, &local_err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +ret = -EINVAL;
> +goto exit;
> +}

Not very important, and my fault for not having pointed it out in my
previous review, but you can spare the error_propagate() call if you
pass errp directly to block_crypto_create_opts_init() and then check if
crypto_opts is NULL.

Actually none of the error_propagate() calls in qcow_create() is really
necessary, but that could be fixed in a separate patch, if at all (it's
not so important).

The leak however needs to be fixed. With that,

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-devel] [PATCH V2 0/3] COLO-compare: Make COLO-compare support remote COLO-frame

2017-05-26 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1495797727-16695-1-git-send-email-zhangchen.f...@cn.fujitsu.com
Subject: [Qemu-devel] [PATCH V2 0/3] COLO-compare: Make COLO-compare support 
remote COLO-frame

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/19feea44a3f4010645a1b829b504481f1b574b9a.1495797845.git.maozy.f...@cn.fujitsu.com
 -> 
patchew/19feea44a3f4010645a1b829b504481f1b574b9a.1495797845.git.maozy.f...@cn.fujitsu.com
Switched to a new branch 'test'
ac41f91 COLO-compare: Add remote initialization and checkpoint notification
a0a428f COLO-compare: Add remote checkpoint notify chardev socket handler frame
e976476 COLO-compare: Add new parameter for communicate with remote colo-frame

=== OUTPUT BEGIN ===
Checking PATCH 1/3: COLO-compare: Add new parameter for communicate with remote 
colo-frame...
Checking PATCH 2/3: COLO-compare: Add remote checkpoint notify chardev socket 
handler frame...
WARNING: line over 80 characters
#52: FILE: net/colo-compare.c:607:
+ compare_pri_chr_in, NULL, s, s->worker_context, 
true);

WARNING: line over 80 characters
#55: FILE: net/colo-compare.c:609:
+ compare_sec_chr_in, NULL, s, s->worker_context, 
true);

WARNING: line over 80 characters
#57: FILE: net/colo-compare.c:611:
+ compare_notify_chr, NULL, s, s->worker_context, 
true);

ERROR: g_free(NULL) is safe this check is probably not required
#104: FILE: net/colo-compare.c:875:
+if (s->notify_dev) {
+g_free(s->notify_dev);

total: 1 errors, 3 warnings, 86 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 3/3: COLO-compare: Add remote initialization and checkpoint 
notification...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH v7 11/20] qcow2: convert QCow2 to use QCryptoBlock for encryption

2017-05-26 Thread Alberto Garcia
On Thu 25 May 2017 06:38:42 PM CEST, Daniel P. Berrange wrote:
> This converts the qcow2 driver to make use of the QCryptoBlock
> APIs for encrypting image content, using the legacy QCow2 AES
> scheme.
>
> With this change it is now required to use the QCryptoSecret
> object for providing passwords, instead of the current block
> password APIs / interactive prompting.
>
>   $QEMU \
> -object secret,id=sec0,filename=/home/berrange/encrypted.pw \
> -drive file=/home/berrange/encrypted.qcow2,encrypt.key-secret=sec0
>
> The test 087 could be simplified since there is no longer a
> difference in behaviour when using blockdev_add with encrypted
> images for the running vs stopped CPU state.
>
> Reviewed-by: Eric Blake 
> Signed-off-by: Daniel P. Berrange 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-devel] [PATCH v2] target/s390x/cpu_models: Allow some additional feature bits for the "qemu" CPU

2017-05-26 Thread David Hildenbrand
On 25.05.2017 11:22, Thomas Huth wrote:
> Currently we only present the plain z900 feature bits to the guest,
> but QEMU already emulates some additional features (but not all of
> the next CPU generation, so we can not use the next CPU level as
> default yet). Since newer Linux kernels are checking the feature bits
> and refuse to work if a required feature is missing, it would be nice
> to have a way to present more of the supported features when we are
> running with the "qemu" CPU.
> This patch now adds the supported features to the "full_feat" bitmap,
> so that additional features can be enabled on the command line now,
> for example with:
> 
>  qemu-system-s390x -cpu qemu,stfle=true,ldisp=true,eimm=true,stckf=true>
> Signed-off-by: Thomas Huth 
> ---
>  v2: Mark feats array with "static const"
> 
>  target/s390x/cpu_models.c | 34 +++---
>  1 file changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 8d27363..e5e005a 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -658,6 +658,30 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
>"available in the configuration: ");
>  }
>  
> +/**
> + * The base TCG CPU model "qemu" is based on the z900. However, we already
> + * can also emulate some additional features of later CPU generations, so
> + * we add these additional feature bits here.
> + */
> +static void add_qemu_cpu_model_features(S390FeatBitmap fbm)
> +{
> +static const int feats[] = {
> +S390_FEAT_STFLE,
> +S390_FEAT_EXTENDED_IMMEDIATE,
> +S390_FEAT_LONG_DISPLACEMENT,
> +S390_FEAT_LONG_DISPLACEMENT_FAST,
> +S390_FEAT_STORE_CLOCK_FAST,
> +S390_FEAT_GENERAL_INSTRUCTIONS_EXT,
> +S390_FEAT_EXECUTE_EXT,
> +S390_FEAT_STFLE_45,
> +};
> +int i;
> +
> +for (i = 0; i < ARRAY_SIZE(feats); i++) {
> +set_bit(feats[i], fbm);
> +}
> +}
> +
>  static S390CPUModel *get_max_cpu_model(Error **errp)
>  {
>  static S390CPUModel max_model;
> @@ -670,10 +694,11 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
>  if (kvm_enabled()) {
>  kvm_s390_get_host_cpu_model(&max_model, errp);
>  } else {
> -/* TCG emulates a z900 */
> +/* TCG emulates a z900 (with some optional additional features) */
>  max_model.def = &s390_cpu_defs[0];
>  bitmap_copy(max_model.features, max_model.def->default_feat,
>  S390_FEAT_MAX);
> +add_qemu_cpu_model_features(max_model.features);
>  }
>  if (!*errp) {
>  cached = true;
> @@ -925,11 +950,14 @@ static void s390_host_cpu_model_initfn(Object *obj)
>  
>  static void s390_qemu_cpu_model_initfn(Object *obj)
>  {
> +static S390CPUDef s390_qemu_cpu_defs;
>  S390CPU *cpu = S390_CPU(obj);
>  
>  cpu->model = g_malloc0(sizeof(*cpu->model));
> -/* TCG emulates a z900 */
> -cpu->model->def = &s390_cpu_defs[0];
> +/* TCG emulates a z900 (with some optional additional features) */
> +memcpy(&s390_qemu_cpu_defs, &s390_cpu_defs[0], 
> sizeof(s390_qemu_cpu_defs));
> +add_qemu_cpu_model_features(s390_qemu_cpu_defs.full_feat);
> +cpu->model->def = &s390_qemu_cpu_defs;
>  bitmap_copy(cpu->model->features, cpu->model->def->default_feat,
>  S390_FEAT_MAX);

That should work for the general case, at least for now.

arch_query_cpu_model_baseline() will still drop the additional featues.

Another option would be to directly bump up the CPU model to a z9, with
missing base features (for now). Which looks cleaner in my eyes:


>From 59c944c6fef87808e4456ba9565ace7772d236d7 Mon Sep 17 00:00:00 2001
From: David Hildenbrand 
Date: Fri, 26 May 2017 13:54:27 +0200
Subject: [PATCH] s390x/cpu_models: Allow some additional feature bits
for the
 "qemu" CPU

Signed-off-by: David Hildenbrand 
---
 hw/s390x/s390-virtio-ccw.c | 13 +++
 target/s390x/cpu.h |  2 ++
 target/s390x/cpu_models.c  | 55
+++---
 3 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index fdd4384..3963a84 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -454,6 +454,19 @@ DEFINE_CCW_MACHINE(2_10, "2.10", true);

 static void ccw_machine_2_9_instance_options(MachineState *machine)
 {
+static const int feats[] = {
+S390_FEAT_GROUP_PLO,
+S390_FEAT_ESAN3,
+S390_FEAT_ZARCH,
+};
+S390FeatBitmap qemu_features = {};
+int i;
+
+for (i = 0; i < ARRAY_SIZE(feats); i++) {
+set_bit(feats[i], qemu_features);
+}
+s390_set_qemu_cpu_model_compat(0x2064, 7, 1, qemu_features);
+
 ccw_machine_2_10_instance_options(machine);
 }

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 240b8a5..8a11104 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ 

[Qemu-devel] [PATCH] pci-bridge/i82801b11: Convert to realize

2017-05-26 Thread Mao Zhongyi
The pci-birdge device i82801b11 still implements the old
PCIDeviceClass .init() through i82801b11_bridge_init()
instead of the new .realize(). All devices need to be
converted to .realize(). So convert it and rename it to
i82801b11_bridge_realize().

Signed-off-by: Mao Zhongyi 
---
 hw/pci-bridge/i82801b11.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
index 2404e7e..dca3162 100644
--- a/hw/pci-bridge/i82801b11.c
+++ b/hw/pci-bridge/i82801b11.c
@@ -44,6 +44,7 @@
 #include "qemu/osdep.h"
 #include "hw/pci/pci.h"
 #include "hw/i386/ich9.h"
+#include "qapi/error.h"
 
 
 /*/
@@ -58,7 +59,7 @@ typedef struct I82801b11Bridge {
 /*< public >*/
 } I82801b11Bridge;
 
-static int i82801b11_bridge_initfn(PCIDevice *d)
+static void i82801b11_bridge_realize(PCIDevice *d, Error **errp)
 {
 int rc;
 
@@ -70,12 +71,10 @@ static int i82801b11_bridge_initfn(PCIDevice *d)
 goto err_bridge;
 }
 pci_config_set_prog_interface(d->config, PCI_CLASS_BRIDGE_PCI_INF_SUB);
-return 0;
+return;
 
 err_bridge:
 pci_bridge_exitfn(d);
-
-return rc;
 }
 
 static const VMStateDescription i82801b11_bridge_dev_vmstate = {
@@ -95,7 +94,7 @@ static void i82801b11_bridge_class_init(ObjectClass *klass, 
void *data)
 k->vendor_id = PCI_VENDOR_ID_INTEL;
 k->device_id = PCI_DEVICE_ID_INTEL_82801BA_11;
 k->revision = ICH9_D2P_A2_REVISION;
-k->init = i82801b11_bridge_initfn;
+k->realize = i82801b11_bridge_realize;
 k->config_write = pci_bridge_write_config;
 dc->vmsd = &i82801b11_bridge_dev_vmstate;
 set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
-- 
2.9.3






Re: [Qemu-devel] [PATCH 08/13] be-hci: use backend functions

2017-05-26 Thread Marc-André Lureau
On Tue, May 9, 2017 at 3:42 PM Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> Avoid accessing CharBackend directly, use qemu_chr_be_* methods instead.
>
> be->chr_read should exists if qemu_chr_be_can_write() is true.
>
> (use qemu_chr_be_write(), _impl() bypasses replay)
>
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/bt/hci-csr.c | 9 +++--
>

No maintainer for this file. Andrzej, as author of the file, can you
review?


>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/hw/bt/hci-csr.c b/hw/bt/hci-csr.c
> index 0f2021086d..d13192b9b5 100644
> --- a/hw/bt/hci-csr.c
> +++ b/hw/bt/hci-csr.c
> @@ -82,17 +82,14 @@ enum {
>
>  static inline void csrhci_fifo_wake(struct csrhci_s *s)
>  {
> -Chardev *chr = (Chardev *)s;
> -CharBackend *be = chr->be;
> +Chardev *chr = CHARDEV(s);
>
>  if (!s->enable || !s->out_len)
>  return;
>
>  /* XXX: Should wait for s->modem_state & CHR_TIOCM_RTS? */
> -if (be && be->chr_can_read && be->chr_can_read(be->opaque) &&
> -be->chr_read) {
> -be->chr_read(be->opaque,
> - s->outfifo + s->out_start++, 1);
> +if (qemu_chr_be_can_write(chr)) {
> +qemu_chr_be_write(chr, s->outfifo + s->out_start++, 1);
>  s->out_len--;
>  if (s->out_start >= s->out_size) {
>  s->out_start = 0;
> --
> 2.13.0.rc1.16.gd80b50c3f
>
>
> --
Marc-André Lureau


[Qemu-devel] [PATCH v2] kvmclock: update system_time_msr address forcibly

2017-05-26 Thread Denis Plotnikov
Do an update of system_time_msr address every time before reading
the value of tsc_timestamp from guest's kvmclock page.

There is no other code paths which ensure that qemu has an up-to-date
value of system_time_msr. So, force this update on guest's tsc_timestamp
reading.

This bug causes effect on those nested setups which turn off TPR access
interception for L2 guests and that access being intercepted by L0 doesn't
show up in L1.
Linux bootstrap initiate kvmclock before APIC initializing causing TPR access.
That's why on L1 guests, having TPR interception turned on for L2, the effect
of the bug is not revealed.

This patch fixes this problem by making sure it knows the correct
system_time_msr address every time it is needed.

Signed-off-by: Denis Plotnikov 
---
 hw/i386/kvm/clock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 0f75dd3..875d85f 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -61,6 +61,8 @@ static uint64_t kvmclock_current_nsec(KVMClockState *s)
 uint64_t nsec_hi;
 uint64_t nsec;
 
+cpu_synchronize_state(cpu);
+
 if (!(env->system_time_msr & 1ULL)) {
 /* KVM clock not active */
 return 0;
-- 
2.7.4




Re: [Qemu-devel] [PATCH v2] kvmclock: update system_time_msr address forcibly

2017-05-26 Thread Roman Kagan
On Fri, May 26, 2017 at 03:25:22PM +0300, Denis Plotnikov wrote:
> Do an update of system_time_msr address every time before reading
> the value of tsc_timestamp from guest's kvmclock page.
> 
> There is no other code paths which ensure that qemu has an up-to-date
> value of system_time_msr. So, force this update on guest's tsc_timestamp
> reading.
> 
> This bug causes effect on those nested setups which turn off TPR access
> interception for L2 guests and that access being intercepted by L0 doesn't
> show up in L1.
> Linux bootstrap initiate kvmclock before APIC initializing causing TPR access.
> That's why on L1 guests, having TPR interception turned on for L2, the effect
> of the bug is not revealed.
> 
> This patch fixes this problem by making sure it knows the correct
> system_time_msr address every time it is needed.
> 
> Signed-off-by: Denis Plotnikov 
> ---
>  hw/i386/kvm/clock.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Roman Kagan 



Re: [Qemu-devel] [PATCH 09/13] char: generalize qemu_chr_write_all()

2017-05-26 Thread Marc-André Lureau
Hi

On Tue, May 9, 2017 at 4:59 PM Philippe Mathieu-Daudé 
wrote:

> Hi Marc-André,
>
> On 05/09/2017 08:33 AM, Marc-André Lureau wrote:
> > qemu_chr_fe_write() is similar to qemu_chr_write_all(): the later write
> > all with a chardev backend.
> >
> > Make qemu_chr_write() and qemu_chr_fe_write_buffer() take an 'all'
> > argument. If false, handle 'partial' write the way qemu_chr_fe_write()
> > use to, and call qemu_chr_write() from qemu_chr_fe_write().
>
> What about invert logic and name the argument 'is_partial[_write]'? Else
> 'write_all' to have more readable code?
>
>
I have a slight preference for the 'all' over 'partial' logic argument, but
I don't mind much.

It's fine to rename it write_all too.


> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  chardev/char.c | 69
> +++---
> >  1 file changed, 27 insertions(+), 42 deletions(-)
> >
> > diff --git a/chardev/char.c b/chardev/char.c
> > index af7d871ed5..392dba6a86 100644
> > --- a/chardev/char.c
> > +++ b/chardev/char.c
> > @@ -96,7 +96,8 @@ static void qemu_chr_fe_write_log(Chardev *s,
> >  }
> >
> >  static int qemu_chr_fe_write_buffer(Chardev *s,
> > -const uint8_t *buf, int len, int
> *offset)
> > +const uint8_t *buf, int len,
> > +int *offset, bool all)
> >  {
> >  ChardevClass *cc = CHARDEV_GET_CLASS(s);
> >  int res = 0;
> > @@ -106,7 +107,7 @@ static int qemu_chr_fe_write_buffer(Chardev *s,
> >  while (*offset < len) {
> >  retry:
> >  res = cc->chr_write(s, buf + *offset, len - *offset);
> > -if (res < 0 && errno == EAGAIN) {
> > +if (res < 0 && errno == EAGAIN && all) {
> >  g_usleep(100);
> >  goto retry;
> >  }
> > @@ -116,6 +117,9 @@ static int qemu_chr_fe_write_buffer(Chardev *s,
> >  }
> >
> >  *offset += res;
> > +if (!all) {
> > +break;
> > +}
> >  }
> >  if (*offset > 0) {
> >  qemu_chr_fe_write_log(s, buf, *offset);
> > @@ -130,54 +134,19 @@ static bool qemu_chr_replay(Chardev *chr)
> >  return qemu_chr_has_feature(chr, QEMU_CHAR_FEATURE_REPLAY);
> >  }
> >
> > -int qemu_chr_fe_write(CharBackend *be, const uint8_t *buf, int len)
> > +static int qemu_chr_write(Chardev *s, const uint8_t *buf, int len, bool
> all)
> >  {
> > -Chardev *s = be->chr;
> > -ChardevClass *cc;
> > -int ret;
> > -
> > -if (!s) {
> > -return 0;
> > -}
> > -
> > -if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_PLAY) {
> > -int offset;
> > -replay_char_write_event_load(&ret, &offset);
> > -assert(offset <= len);
> > -qemu_chr_fe_write_buffer(s, buf, offset, &offset);
> > -return ret;
> > -}
> > -
> > -cc = CHARDEV_GET_CLASS(s);
> > -qemu_mutex_lock(&s->chr_write_lock);
> > -ret = cc->chr_write(s, buf, len);
> > -
> > -if (ret > 0) {
> > -qemu_chr_fe_write_log(s, buf, ret);
> > -}
> > -
> > -qemu_mutex_unlock(&s->chr_write_lock);
> > -
> > -if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_RECORD) {
> > -replay_char_write_event_save(ret, ret < 0 ? 0 : ret);
> > -}
> > -
> > -return ret;
> > -}
> > -
> > -int qemu_chr_write_all(Chardev *s, const uint8_t *buf, int len)
> > -{
> > -int offset;
> > +int offset = 0;
> >  int res;
> >
> >  if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_PLAY) {
> >  replay_char_write_event_load(&res, &offset);
> >  assert(offset <= len);
> > -qemu_chr_fe_write_buffer(s, buf, offset, &offset);
> > +qemu_chr_fe_write_buffer(s, buf, offset, &offset, true);
> >  return res;
> >  }
> >
> > -res = qemu_chr_fe_write_buffer(s, buf, len, &offset);
> > +res = qemu_chr_fe_write_buffer(s, buf, len, &offset, all);
> >
> >  if (qemu_chr_replay(s) && replay_mode == REPLAY_MODE_RECORD) {
> >  replay_char_write_event_save(res, offset);
> > @@ -189,6 +158,22 @@ int qemu_chr_write_all(Chardev *s, const uint8_t
> *buf, int len)
> >  return offset;
> >  }
> >
> > +int qemu_chr_write_all(Chardev *s, const uint8_t *buf, int len)
> > +{
> > +return qemu_chr_write(s, buf, len, true);
> > +}
> > +
> > +int qemu_chr_fe_write(CharBackend *be, const uint8_t *buf, int len)
> > +{
> > +Chardev *s = be->chr;
> > +
> > +if (!s) {
> > +return 0;
> > +}
> > +
> > +return qemu_chr_write(s, buf, len, false);
> > +}
> > +
> >  int qemu_chr_fe_write_all(CharBackend *be, const uint8_t *buf, int len)
> >  {
> >  Chardev *s = be->chr;
> > @@ -197,7 +182,7 @@ int qemu_chr_fe_write_all(CharBackend *be, const
> uint8_t *buf, int len)
> >  return 0;
> >  }
> >
> > -return qemu_chr_write_all(s, buf, len);
> > +return qemu_chr_write(s, buf, len, true);
> >  }
> >
> >  int qemu_chr_fe_read_all(CharBackend *be, ui

Re: [Qemu-devel] [PATCH v2] kvmclock: update system_time_msr address forcibly

2017-05-26 Thread no-reply
Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1495801522-27628-1-git-send-email-dplotni...@virtuozzo.com
Subject: [Qemu-devel] [PATCH v2] kvmclock: update system_time_msr address 
forcibly
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-mingw@fedora
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1495801522-27628-1-git-send-email-dplotni...@virtuozzo.com -> 
patchew/1495801522-27628-1-git-send-email-dplotni...@virtuozzo.com
 - [tag update]  patchew/20170525163851.8047-1-berra...@redhat.com -> 
patchew/20170525163851.8047-1-berra...@redhat.com
 * [new tag] patchew/20170526121516.6607-1-maozy.f...@cn.fujitsu.com -> 
patchew/20170526121516.6607-1-maozy.f...@cn.fujitsu.com
Switched to a new branch 'test'
cd6099f kvmclock: update system_time_msr address forcibly

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-h_mt4soa/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-h_mt4soa/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=0c6e1be2b3f3
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
B

Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-26 Thread Anton Nefedov


On 05/26/2017 01:17 PM, Kevin Wolf wrote:

Am 24.05.2017 um 18:09 hat Anton Nefedov geschrieben:

I agree; as mentioned we have similar patches and they don't conflict much.
We noticed a performance regression on HDD though, for the
presumably optimized case (random 4k write over a large backed
image); so the patches were put on hold.


You're talking about your own patches that should do the same thing,
right? Can you re-do the same test with Berto's patches? Maybe there was
just an implementation glitch in yours.

This approach should very obviously result in a performance improvement,
and the patches are relatively simple, so I'm very much inclined to
merge this as soon as possible.

Kevin




Tried the another machine; about 10% improvement here

[root@dpclient centos-7.3-x86_64]# qemu-img info ./harddisk2.hdd
image: ./harddisk2.hdd
file format: qcow2
virtual size: 2.0G (2147483648 bytes)
disk size: 260K
cluster_size: 65536
backing file: harddisk2.hdd.base (actual path: ./harddisk2.hdd.base)
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
[root@dpclient centos-7.3-x86_64]# qemu-img info ./harddisk2.hdd.base
image: ./harddisk2.hdd.base
file format: qcow2
virtual size: 2.0G (2147483648 bytes)
disk size: 2.0G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: true
refcount bits: 16
corrupt: false
[root@dpclient centos-7.3-x86_64]# filefrag ./harddisk2.hdd.base
./harddisk2.hdd.base: 3 extents found


[root@localhost ~]# fio --name=randwrite --blocksize=4k
--filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
--size=2g --io_size=32m

/* master */
write: io=32768KB, bw=372785B/s, iops=91, runt= 90010msec
/* Berto's patches */
write: io=32768KB, bw=404304B/s, iops=98, runt= 82993msec


/Anton



Re: [Qemu-devel] [PATCH 03/12] migration: Switch to using announcement timer

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 12:16 AM, Jason Wang wrote:
> 
> 
> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>> Switch qemu_announce_self and virtio annoucements to use
>> the announcement timer framework.  This makes sure that both
>> timers use the same timeouts and number of annoucement attempts
>>
>> Based on work by Dr. David Alan Gilbert 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>   hw/net/virtio-net.c| 28 
>>   include/hw/virtio/virtio-net.h |  3 +--
>>   include/migration/vmstate.h| 17 +++--
>>   include/sysemu/sysemu.h|  2 +-
>>   migration/migration.c  |  2 +-
>>   migration/savevm.c | 28 ++--
>>   6 files changed, 44 insertions(+), 36 deletions(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 7d091c9..1c65825 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -25,6 +25,7 @@
>>   #include "qapi/qmp/qjson.h"
>>   #include "qapi-event.h"
>>   #include "hw/virtio/virtio-access.h"
>> +#include "migration/migration.h"
>> #define VIRTIO_NET_VM_VERSION11
>>   @@ -115,7 +116,7 @@ static void virtio_net_announce_timer(void *opaque)
>>   VirtIONet *n = opaque;
>>   VirtIODevice *vdev = VIRTIO_DEVICE(n);
>>   -n->announce_counter--;
>> +n->announce_timer->round--;
>>   n->status |= VIRTIO_NET_S_ANNOUNCE;
>>   virtio_notify_config(vdev);
>>   }
>> @@ -427,8 +428,8 @@ static void virtio_net_reset(VirtIODevice *vdev)
>>   n->nobcast = 0;
>>   /* multiqueue is disabled by default */
>>   n->curr_queues = 1;
>> -timer_del(n->announce_timer);
>> -n->announce_counter = 0;
>> +timer_del(n->announce_timer->tm);
>> +n->announce_timer->round = 0;
>>   n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> /* Flush any MAC and VLAN filter table state */
>> @@ -872,10 +873,10 @@ static int virtio_net_handle_announce(VirtIONet *n, 
>> uint8_t cmd,
>>   if (cmd == VIRTIO_NET_CTRL_ANNOUNCE_ACK &&
>>   n->status & VIRTIO_NET_S_ANNOUNCE) {
>>   n->status &= ~VIRTIO_NET_S_ANNOUNCE;
>> -if (n->announce_counter) {
>> -timer_mod(n->announce_timer,
>> +if (n->announce_timer->round) {
>> +timer_mod(n->announce_timer->tm,
>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -  self_announce_delay(n->announce_counter));
>> +  self_announce_delay(n->announce_timer));
>>   }
>>   return VIRTIO_NET_OK;
>>   } else {
>> @@ -1615,8 +1616,8 @@ static int virtio_net_post_load_device(void *opaque, 
>> int version_id)
>> if (virtio_vdev_has_feature(vdev, VIRTIO_NET_F_GUEST_ANNOUNCE) &&
>>   virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) {
>> -n->announce_counter = SELF_ANNOUNCE_ROUNDS;
>> -timer_mod(n->announce_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>> +n->announce_timer->round = n->announce_timer->params.rounds;
>> +timer_mod(n->announce_timer->tm, 
>> qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL));
>>   }
>> return 0;
>> @@ -1938,8 +1939,10 @@ static void virtio_net_device_realize(DeviceState 
>> *dev, Error
>> **errp)
>>   qemu_macaddr_default_if_unset(&n->nic_conf.macaddr);
>>   memcpy(&n->mac[0], &n->nic_conf.macaddr, sizeof(n->mac));
>>   n->status = VIRTIO_NET_S_LINK_UP;
>> -n->announce_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> - virtio_net_announce_timer, n);
>> +n->announce_timer = qemu_announce_timer_new(qemu_get_announce_params(),
>> +QEMU_CLOCK_VIRTUAL);
>> +n->announce_timer->tm = timer_new_ms(QEMU_CLOCK_VIRTUAL,
>> +  virtio_net_announce_timer, n);
>> if (n->netclient_type) {
>>   /*
>> @@ -2001,8 +2004,9 @@ static void virtio_net_device_unrealize(DeviceState 
>> *dev, Error
>> **errp)
>>   virtio_net_del_queue(n, i);
>>   }
>>   -timer_del(n->announce_timer);
>> -timer_free(n->announce_timer);
>> +timer_del(n->announce_timer->tm);
>> +timer_free(n->announce_timer->tm);
>> +g_free(n->announce_timer);
>>   g_free(n->vqs);
>>   qemu_del_nic(n->nic);
>>   virtio_cleanup(vdev);
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 1eec9a2..51dd4c3 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -94,8 +94,7 @@ typedef struct VirtIONet {
>>   char *netclient_name;
>>   char *netclient_type;
>>   uint64_t curr_guest_offloads;
>> -QEMUTimer *announce_timer;
>> -int announce_counter;
>> +AnnounceTimer *announce_timer;
>>   bool needs_vnet_hdr_swap;
>>   } VirtIONet;
>>   diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index a6bf84d..f8aed9b 100644
>> --- a/include/migration/vmstate.h
>> +++ b/inclu

Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 12:03 AM, Jason Wang wrote:
> 
> On 2017年05月25日 02:05, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" 
>>
>> Signed-off-by: Vladislav Yasevich 
> 
> I think it's better to explain e.g under which condition do we need to tweak 
> such parameters.
> 
> Thanks
> 

OK.  I'll rip off what dgilbert wrote in his original series for the 
description.

Dave, if you have any text to add as to why migration might need to tweak this, 
I'd
appreciate it.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-26 Thread Alberto Garcia
On Fri 26 May 2017 02:47:55 PM CEST, Anton Nefedov wrote:
> Tried the another machine; about 10% improvement here
  [...]
> [root@localhost ~]# fio --name=randwrite --blocksize=4k
> --filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
> --size=2g --io_size=32m

In my tests I sometimes detected slight performance decreases in that
HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
instead of --io_size).

Can you try and see how that works for you?

Berto



Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Eric Blake
On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
> Add parameters that control RARP/GARP announcement timeouts.
> The parameters structure is added to the QAPI and a qmp command
> is added to set/get the parameter data.
> 
> Based on work by "Dr. David Alan Gilbert" 
> 
> Signed-off-by: Vladislav Yasevich 
> ---

Just an interface review for now:

> +++ b/qapi-schema.json
> @@ -569,6 +569,90 @@
>  ##
>  { 'command': 'query-events', 'returns': ['EventInfo'] }
>  
> +
> +##
> +# @AnnounceParameter:
> +#
> +# @AnnounceParameter enumeration
> +#
> +# @initial: Initial delay (in ms) before sending the first GARP/RARP
> +#   announcement
> +#
> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets

s/announcemnt/announcement/

> +#
> +# @rounds: Number of self-announcement attempts
> +#
> +# @step: Delay increate (in ms) after each self-announcment attempt

s/increate/increase/
s/announcment/announcement/

> +#
> +# Since: 2.10
> +##
> +{ 'enum' : 'AnnounceParameter',
> +  'data' : [ 'initial', 'max', 'rounds', 'step' ] }

Why are we creating an enum here?  Without reading further, it looks
like you plan on using the enum to delineate members of a union? But
that feels like it will be overly complicated.  A struct should be
sufficient (each parameter being an optional member of the struct, where
you can supply as many or as few on input, but all are reported on output).

> +
> +##
> +# @AnnounceParameters:
> +#
> +# Optional members may be omited on input, but all values will be present

s/omited/omitted/

> +# on output.
> +#   
> +# @initial: Initial delay (in ms) before sending the first GARP/RARP
> +#   announcement
> +#
> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets

s/announcemnt/announcement/

> +#
> +# @rounds: Number of self-announcement attempts
> +#
> +# @step: Delay increate (in ms) after each self-announcment attempt

s/increate/increase/
s/announcment/announcement/

> +#
> +# Since: 2.10
> +##
> +
> +{ 'struct': 'AnnounceParameters',
> +  'data': { '*initial': 'int',
> +'*max': 'int',
> +'*rounds': 'int',
> +'*step': 'int' } }
> +
> +##
> +# @announce-set-parameters:
> +#
> +# Set qemu announce parameters.
> +#
> +# Since: 2.10
> +#
> +# Example:
> +#
> +# -> { "execute": "announce-set-parameters",
> +#  "arguments": { "announce-rounds": 10 } }
> +#
> +##
> +{ 'command': 'announce-set-parameters', 'boxed': true,
> +  'data': 'AnnounceParameters' }
> +
> +##
> +# @query-announce-parameters:
> +#
> +# Returns information about the current announce parameters
> +#
> +# Returns: @AnnounceParameters
> +#
> +# Since: 2.10
> +#
> +# Example:
> +#
> +# -> { "execute": "query-announce-parameters" }
> +# <- { "return": {
> +#  "initial": 50,
> +#  "max": 500,
> +#  "rounds": 5,
> +#  "step": 100
> +#   }
> +#}
> +#
> +##
> +{ 'command': 'query-announce-parameters',
> +  'returns': 'AnnounceParameters' }

Yep, I'm right. The enum is bogus.  The struct is sufficient, so you
don't need the enum.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-26 Thread Eric Blake
On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
> Add a qmp command that can trigger guest announcements.
> 
> Based on work of Germano Veit Michel 
> 
> Signed-off-by: Vladislav Yasevich 
> ---
>  migration/savevm.c | 14 ++
>  qapi-schema.json   | 19 +++
>  2 files changed, 33 insertions(+)
> 

> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
> +   Error **errp)
> +{
> +AnnounceParameters announce_params;
> +
> +memcpy(&announce_params, qemu_get_announce_params(),
> +   sizeof(announce_params));

Shallow copies of a QAPI type happen to work when the type is all scalar
(as AnnounceParameters currently is), but it's more robust to use
QAPI_CLONE() or QAPI_CLONE_MEMBERS() so that any future non-scalar
additions to the parameters type will be correctly copied without
introducing memory leaks or double frees.

Even this might be better:
AnnounceParameters announce_params = { 0 };
qemu_set_announce_parameters(&announce_params, qemu_get_announce_params());

> +
> +if (has_params)
> +qemu_set_announce_parameters(&announce_params, params);
> +
> +qemu_announce_self(&announce_params);
> +}
The QMP changes look okay.  Do you have a testsuite covering the new QMP
command somewhere in the series?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 01/12] migration: Introduce announce parameters

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 09:08 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add parameters that control RARP/GARP announcement timeouts.
>> The parameters structure is added to the QAPI and a qmp command
>> is added to set/get the parameter data.
>>
>> Based on work by "Dr. David Alan Gilbert" 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
> 
> Just an interface review for now:
> 
>> +++ b/qapi-schema.json
>> @@ -569,6 +569,90 @@
>>  ##
>>  { 'command': 'query-events', 'returns': ['EventInfo'] }
>>  
>> +
>> +##
>> +# @AnnounceParameter:
>> +#
>> +# @AnnounceParameter enumeration
>> +#
>> +# @initial: Initial delay (in ms) before sending the first GARP/RARP
>> +#   announcement
>> +#
>> +# @max: Maximum delay (in ms) to between GARP/RARP announcemnt packets
> 
> s/announcemnt/announcement/
> 
>> +#
>> +# @rounds: Number of self-announcement attempts
>> +#
>> +# @step: Delay increate (in ms) after each self-announcment attempt
> 
> s/increate/increase/
> s/announcment/announcement/
> 
>> +#
>> +# Since: 2.10
>> +##
>> +{ 'enum' : 'AnnounceParameter',
>> +  'data' : [ 'initial', 'max', 'rounds', 'step' ] }
> 
> Why are we creating an enum here?  Without reading further, it looks
> like you plan on using the enum to delineate members of a union? But
> that feels like it will be overly complicated.  A struct should be
> sufficient (each parameter being an optional member of the struct, where
> you can supply as many or as few on input, but all are reported on output).
> 

I end up using them for the HMP interface.  If it's better, I can move this
definition to the HMP patch.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 06/12] qmp: Expose qemu_announce_self as a qmp command

2017-05-26 Thread Vlad Yasevich
On 05/26/2017 09:16 AM, Eric Blake wrote:
> On 05/24/2017 01:05 PM, Vladislav Yasevich wrote:
>> Add a qmp command that can trigger guest announcements.
>>
>> Based on work of Germano Veit Michel 
>>
>> Signed-off-by: Vladislav Yasevich 
>> ---
>>  migration/savevm.c | 14 ++
>>  qapi-schema.json   | 19 +++
>>  2 files changed, 33 insertions(+)
>>
> 
>> +void qmp_announce_self(bool has_params, AnnounceParameters *params,
>> +   Error **errp)
>> +{
>> +AnnounceParameters announce_params;
>> +
>> +memcpy(&announce_params, qemu_get_announce_params(),
>> +   sizeof(announce_params));
> 
> Shallow copies of a QAPI type happen to work when the type is all scalar
> (as AnnounceParameters currently is), but it's more robust to use
> QAPI_CLONE() or QAPI_CLONE_MEMBERS() so that any future non-scalar
> additions to the parameters type will be correctly copied without
> introducing memory leaks or double frees.
> 
> Even this might be better:
> AnnounceParameters announce_params = { 0 };
> qemu_set_announce_parameters(&announce_params, qemu_get_announce_params());
> 

Ok.

>> +
>> +if (has_params)
>> +qemu_set_announce_parameters(&announce_params, params);
>> +
>> +qemu_announce_self(&announce_params);
>> +}
> The QMP changes look okay.  Do you have a testsuite covering the new QMP
> command somewhere in the series?
> 

There is basic test in patch 12.

Thanks
-vlad



Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-26 Thread Anton Nefedov

On 05/26/2017 04:08 PM, Alberto Garcia wrote:

On Fri 26 May 2017 02:47:55 PM CEST, Anton Nefedov wrote:

Tried the another machine; about 10% improvement here

   [...]

[root@localhost ~]# fio --name=randwrite --blocksize=4k
--filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
--size=2g --io_size=32m


In my tests I sometimes detected slight performance decreases in that
HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
instead of --io_size).

Can you try and see how that works for you?

Berto



For me it keeps giving pretty much the same result before and after

 write: io=512736KB, bw=8545.4KB/s, iops=2136, runt= 60004msec

/Anton



Re: [Qemu-devel] [PATCH v2] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Eric Blake
On 05/26/2017 06:58 AM, Mao Zhongyi wrote:

In the subject: s/porpagate/propagate/

> ioh3420_interrupts_init() and its callers rp_realize() fill error
> message to local_err, then propagate it to errp by error_porpagate(),

and again

> which's not necessary. So eliminate it and pass errp directly instead

s/which's/which is/ (English does not have the abbreviation which's)

> of local_err. Of course, error_propagate() also has been removed.
> 
> Signed-off-by: Mao Zhongyi 
> ---
> v2:
> * Drop the part of wrong handling that ignored the fact that if 
>   pci_qdev_realize()'s caller pass errp = NULL.
> 

The code portion is fine, so with the commit message fixed (which a
maintainer might do without needing you to send a v3),
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] numa-test: fix query-cpus leaks

2017-05-26 Thread Eric Blake
On 05/26/2017 06:04 AM, Marc-André Lureau wrote:
> Fix test leaks introduced in commit 2941020a476.
> 
> (and small extra space removed)
> 
> Spotted by ASAN.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  tests/numa-test.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 11/13] Remove/replace chardev/char.h inclusion

2017-05-26 Thread Marc-André Lureau
On Tue, May 9, 2017 at 4:46 PM Philippe Mathieu-Daudé 
wrote:

> Hi Marc-André Lureau,
>
> Isn't clearer if the "remove" part of this commit goes before your patch
> 6 "move headers to include/chardev"?
>

make sense, reordered

thanks


>
> On 05/09/2017 08:33 AM, Marc-André Lureau wrote:
> > Those are apparently unnecessary includes.
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  hw/arm/bcm2835_peripherals.c | 1 -
> >  hw/char/imx_serial.c | 1 -
> >  hw/display/xenfb.c   | 1 -
> >  hw/i386/xen/xen-hvm.c| 1 -
> >  hw/mips/mips_fulong2e.c  | 1 -
> >  hw/mips/mips_malta.c | 1 -
> >  hw/net/xgmac.c   | 1 -
> >  hw/ppc/spapr_events.c| 1 -
> >  hw/ppc/spapr_rtas.c  | 1 -
> >  hw/sparc/leon3.c | 1 -
> >  hw/usb/ccid-card-emulated.c  | 2 +-
> >  hw/xen/xen_backend.c | 1 -
> >  util/event_notifier-posix.c  | 1 -
> >  13 files changed, 1 insertion(+), 13 deletions(-)
> >
> > diff --git a/hw/arm/bcm2835_peripherals.c b/hw/arm/bcm2835_peripherals.c
> > index 091e14dc60..502f04c02a 100644
> > --- a/hw/arm/bcm2835_peripherals.c
> > +++ b/hw/arm/bcm2835_peripherals.c
> > @@ -13,7 +13,6 @@
> >  #include "hw/arm/bcm2835_peripherals.h"
> >  #include "hw/misc/bcm2835_mbox_defs.h"
> >  #include "hw/arm/raspi_platform.h"
> > -#include "chardev/char.h"
> >  #include "sysemu/sysemu.h"
> >
> >  /* Peripheral base address on the VC (GPU) system bus */
> > diff --git a/hw/char/imx_serial.c b/hw/char/imx_serial.c
> > index 1d4f378a59..af250305be 100644
> > --- a/hw/char/imx_serial.c
> > +++ b/hw/char/imx_serial.c
> > @@ -21,7 +21,6 @@
> >  #include "qemu/osdep.h"
> >  #include "hw/char/imx_serial.h"
> >  #include "sysemu/sysemu.h"
> > -#include "chardev/char.h"
> >  #include "qemu/log.h"
> >
> >  #ifndef DEBUG_IMX_UART
> > diff --git a/hw/display/xenfb.c b/hw/display/xenfb.c
> > index 4a1a2a4d6e..e76c0d805c 100644
> > --- a/hw/display/xenfb.c
> > +++ b/hw/display/xenfb.c
> > @@ -28,7 +28,6 @@
> >
> >  #include "hw/hw.h"
> >  #include "ui/console.h"
> > -#include "chardev/char.h"
> >  #include "hw/xen/xen_backend.h"
> >
> >  #include 
> > diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> > index 8dc57a31cf..321cf36b67 100644
> > --- a/hw/i386/xen/xen-hvm.c
> > +++ b/hw/i386/xen/xen-hvm.c
> > @@ -18,7 +18,6 @@
> >  #include "hw/xen/xen_backend.h"
> >  #include "qmp-commands.h"
> >
> > -#include "chardev/char.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/range.h"
> >  #include "sysemu/xen-mapcache.h"
> > diff --git a/hw/mips/mips_fulong2e.c b/hw/mips/mips_fulong2e.c
> > index 55eeb9227d..dbe2805acb 100644
> > --- a/hw/mips/mips_fulong2e.c
> > +++ b/hw/mips/mips_fulong2e.c
> > @@ -32,7 +32,6 @@
> >  #include "hw/mips/mips.h"
> >  #include "hw/mips/cpudevs.h"
> >  #include "hw/pci/pci.h"
> > -#include "chardev/char.h"
> >  #include "sysemu/sysemu.h"
> >  #include "audio/audio.h"
> >  #include "qemu/log.h"
> > diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
> > index 610e1e9085..6fcdd9c6c1 100644
> > --- a/hw/mips/mips_malta.c
> > +++ b/hw/mips/mips_malta.c
> > @@ -37,7 +37,6 @@
> >  #include "hw/mips/mips.h"
> >  #include "hw/mips/cpudevs.h"
> >  #include "hw/pci/pci.h"
> > -#include "chardev/char.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/arch_init.h"
> >  #include "qemu/log.h"
> > diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
> > index 3d00e2868c..0843bf185c 100644
> > --- a/hw/net/xgmac.c
> > +++ b/hw/net/xgmac.c
> > @@ -26,7 +26,6 @@
> >
> >  #include "qemu/osdep.h"
> >  #include "hw/sysbus.h"
> > -#include "chardev/char.h"
> >  #include "qemu/log.h"
> >  #include "net/net.h"
> >  #include "net/checksum.h"
> > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> > index b309f2f4b0..8a5f1d321a 100644
> > --- a/hw/ppc/spapr_events.c
> > +++ b/hw/ppc/spapr_events.c
> > @@ -28,7 +28,6 @@
> >  #include "qapi/error.h"
> >  #include "cpu.h"
> >  #include "sysemu/sysemu.h"
> > -#include "chardev/char.h"
> >  #include "hw/qdev.h"
> >  #include "sysemu/device_tree.h"
> >
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index 9995b1a9f5..dd1633a104 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -29,7 +29,6 @@
> >  #include "qemu/log.h"
> >  #include "qemu/error-report.h"
> >  #include "sysemu/sysemu.h"
> > -#include "chardev/char.h"
> >  #include "hw/qdev.h"
> >  #include "sysemu/device_tree.h"
> >  #include "sysemu/cpus.h"
> > diff --git a/hw/sparc/leon3.c b/hw/sparc/leon3.c
> > index f3e62f5bd7..f415997649 100644
> > --- a/hw/sparc/leon3.c
> > +++ b/hw/sparc/leon3.c
> > @@ -28,7 +28,6 @@
> >  #include "hw/hw.h"
> >  #include "qemu/timer.h"
> >  #include "hw/ptimer.h"
> > -#include "chardev/char.h"
> >  #include "sysemu/sysemu.h"
> >  #include "sysemu/qtest.h"
> >  #include "hw/boards.h"
> > diff --git a/hw/usb/ccid-card-emulated.c b/hw/usb/ccid-card-emulated.c
> > index 690b6c3ab4..e646eb243b 100644
> > --- a/hw/usb/ccid-card-emulated.c
> > +++ b/hw/usb/cci

Re: [Qemu-devel] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-05-26 Thread Alberto Garcia
On Fri 26 May 2017 03:32:49 PM CEST, Anton Nefedov wrote:
>>> [root@localhost ~]# fio --name=randwrite --blocksize=4k
>>> --filename=/dev/sdb --rw=randwrite --direct=1 --ioengine=libaio
>>> --size=2g --io_size=32m
>> 
>> In my tests I sometimes detected slight performance decreases in that
>> HDD scenario but using 'write' instead of 'randwrite' (and --runtime=60
>> instead of --io_size).
>> 
>> Can you try and see how that works for you?
>
> For me it keeps giving pretty much the same result before and after
>
>   write: io=512736KB, bw=8545.4KB/s, iops=2136, runt= 60004msec

Ok, that's the expected behavior. Good!

Berto



Re: [Qemu-devel] [PATCH] qemu-ga: remove useless allocation

2017-05-26 Thread Eric Blake
On 05/26/2017 05:13 AM, Marc-André Lureau wrote:
> There is no need to duplicate a fixed string.
> 
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Marc-André Lureau 
> ---
>  qga/commands-posix.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 03/26] target/s390x: implement TEST AND SET

2017-05-26 Thread Richard Henderson

On 05/25/2017 02:04 PM, Aurelien Jarno wrote:

+TCGv_i32 t1 = tcg_const_i32(0xff);
+tcg_gen_atomic_fetch_or_i32(t1, o->in2, t1, get_mem_index(s), MO_UB);


Better as tcg_gen_atomic_xchg_i32.


r~



Re: [Qemu-devel] [PATCH v2] nbd/client.c: use errp instead of LOG

2017-05-26 Thread Eric Blake
On 05/26/2017 06:09 AM, Vladimir Sementsov-Ogievskiy wrote:
> Move to modern errp scheme from just LOGging errors.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Fixes:
> 
> - local_err initialized to NULL
> - 083 iotest ajusted
> 
> \I feel like an idiot...
> 

We all have days like that ;)


> +++ b/tests/qemu-iotests/083.out
> @@ -69,10 +69,12 @@ read failed: Input/output error
>  
>  === Check disconnect 4 reply ===
>  
> +read failed
>  read failed: Input/output error

Having a double error report is awkward; can we figure out where the
duplication is coming from and trim it down to a single report?

But that can be done as a followup.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 04/26] target/s390x: implement TEST ADDRESSING MODE

2017-05-26 Thread Richard Henderson

On 05/25/2017 02:04 PM, Aurelien Jarno wrote:

+tcg_gen_movi_i32(cc_op, cc);
+set_cc_static(s);


gen_op_movi_cc


r~



Re: [Qemu-devel] [PATCH 01/26] target/s390x: remove dead code in translate.c

2017-05-26 Thread Richard Henderson

On 05/25/2017 02:04 PM, Aurelien Jarno wrote:

Signed-off-by: Aurelien Jarno
---
  target/s390x/translate.c | 5 +
  1 file changed, 1 insertion(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH 02/26] target/s390x: make IPTE SMP aware

2017-05-26 Thread Richard Henderson

On 05/25/2017 02:04 PM, Aurelien Jarno wrote:

  /* XXX we exploit the fact that Linux passes the exact virtual
 address here - it's not obliged to! */
-tlb_flush_page(cs, page);
+tlb_flush_page_all_cpus_synced(cs, page);
  
  /* XXX 31-bit hack */

  if (page & 0x8000) {
-tlb_flush_page(cs, page & ~0x8000);
+tlb_flush_page_all_cpus_synced(cs, page & ~0x8000);
  } else {
-tlb_flush_page(cs, page | 0x8000);
+tlb_flush_page_all_cpus_synced(cs, page | 0x8000);
  }


Ideally we would, at the same time, implement the local-pte facility, which 
examines a bit in the M4 field to *not* do this.  That said,


Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] [PATCH v7 09/20] qcow: convert QCow to use QCryptoBlock for encryption

2017-05-26 Thread Daniel P. Berrange
On Fri, May 26, 2017 at 02:03:40PM +0200, Alberto Garcia wrote:
> On Thu 25 May 2017 06:38:40 PM CEST, Daniel P. Berrange wrote:
> > @@ -105,6 +116,13 @@ static int qcow_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  int ret;
> >  QCowHeader header;
> >  Error *local_err = NULL;
> > +QCryptoBlockOpenOptions *crypto_opts = NULL;
> > +unsigned int cflags = 0;
> > +QDict *encryptopts = NULL;
> > +const char *encryptfmt;
> > +
> > +qdict_extract_subqdict(options, &encryptopts, "encrypt.");
> > +encryptfmt = qdict_get_try_str(encryptopts, "format");
> >  
> >  bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file,
> > false, errp);
> >  if (!bs->file) {
> >  return -EINVAL;
> >  }
> 
> You're leaking encryptopts if the function returns here.
> 
> > @@ -873,6 +850,20 @@ static int qcow_create(const char *filename, QemuOpts 
> > *opts, Error **errp)
> >  goto exit;
> >  }
> >  header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES);
> > +
> > +crypto_opts = block_crypto_create_opts_init(
> > +Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, &local_err);
> > +if (local_err) {
> > +error_propagate(errp, local_err);
> > +ret = -EINVAL;
> > +goto exit;
> > +}
> 
> Not very important, and my fault for not having pointed it out in my
> previous review, but you can spare the error_propagate() call if you
> pass errp directly to block_crypto_create_opts_init() and then check if
> crypto_opts is NULL.
> 
> Actually none of the error_propagate() calls in qcow_create() is really
> necessary, but that could be fixed in a separate patch, if at all (it's
> not so important).

Since I have to re-post to fix the leak anyway, I'll eliminate the
intermedia local_err usage - I like to avoid that when not needed
anyway.

> 
> The leak however needs to be fixed. With that,
> 
> Reviewed-by: Alberto Garcia 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [PATCH v2] pci: Set err to errp directly rather than through error_porpagate()

2017-05-26 Thread Marcel Apfelbaum



On 26/05/2017 14:58, Mao Zhongyi wrote:

ioh3420_interrupts_init() and its callers rp_realize() fill error
message to local_err, then propagate it to errp by error_porpagate(),
which's not necessary. So eliminate it and pass errp directly instead
of local_err. Of course, error_propagate() also has been removed.

Signed-off-by: Mao Zhongyi 
---
v2:
* Drop the part of wrong handling that ignored the fact that if
   pci_qdev_realize()'s caller pass errp = NULL.

  hw/pci-bridge/ioh3420.c| 4 +---
  hw/pci-bridge/pcie_root_port.c | 7 ++-
  2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index da4e5bd..5f56a2f 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -64,15 +64,13 @@ static uint8_t ioh3420_aer_vector(const PCIDevice *d)
  static int ioh3420_interrupts_init(PCIDevice *d, Error **errp)
  {
  int rc;
-Error *local_err = NULL;
  
  rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,

IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
-  &local_err);
+  errp);
  if (rc < 0) {
  assert(rc == -ENOTSUP);
-error_propagate(errp, local_err);
  }
  
  return rc;

diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index cf36318..9022996 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -59,7 +59,6 @@ static void rp_realize(PCIDevice *d, Error **errp)
  PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
  PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
  int rc;
-Error *local_err = NULL;
  
  pci_config_set_interrupt_pin(d->config, 1);

  pci_bridge_initfn(d, TYPE_PCIE_BUS);
@@ -72,9 +71,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
  }
  
  if (rpc->interrupts_init) {

-rc = rpc->interrupts_init(d, &local_err);
+rc = rpc->interrupts_init(d, errp);
  if (rc < 0) {
-error_propagate(errp, local_err);
  goto err_bridge;
  }
  }
@@ -98,9 +96,8 @@ static void rp_realize(PCIDevice *d, Error **errp)
  }
  
  rc = pcie_aer_init(d, PCI_ERR_VER, rpc->aer_offset,

-   PCI_ERR_SIZEOF, &local_err);
+   PCI_ERR_SIZEOF, errp);
  if (rc < 0) {
-error_propagate(errp, local_err);
  goto err;
  }
  pcie_aer_root_init(d);


Reviewed-by: Marcel Apfelbaum

Thanks,
Marcel



  1   2   3   >