Re: [Qemu-devel] [PATCH 1/4 for-2.11?] block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap

2017-11-16 Thread Vladimir Sementsov-Ogievskiy

13.11.2017 20:50, Eric Blake wrote:

On 11/13/2017 10:20 AM, Vladimir Sementsov-Ogievskiy wrote:

Like other setters here these functions should take a lock.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/dirty-bitmap.c | 4 
  1 file changed, 4 insertions(+)

Should this patch be in 2.11?



these functions are unused now, so, no, it's not necessary




diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index bd04e991b1..2a0bcd9e51 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -397,15 +397,19 @@ void bdrv_remove_persistent_dirty_bitmap(BlockDriverState 
*bs,
  /* Called with BQL taken.  */
  void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
  {
+bdrv_dirty_bitmap_lock(bitmap);
  assert(!bdrv_dirty_bitmap_frozen(bitmap));
  bitmap->disabled = true;
+bdrv_dirty_bitmap_unlock(bitmap);

Why do we need this lock in addition to BQL?


  }
  
  /* Called with BQL taken.  */

  void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap)
  {
+bdrv_dirty_bitmap_lock(bitmap);
  assert(!bdrv_dirty_bitmap_frozen(bitmap));
  bitmap->disabled = false;
+bdrv_dirty_bitmap_unlock(bitmap);

Again, why do we need this in addition to BQL?

The commit message needs more details about a scenario where our
existing BQL lock is insufficient to prevent misuse of bitmap->disabled.




--
Best regards,
Vladimir




Re: [Qemu-devel] [RFC v3 27/27] tests: qmp-test: add oob test

2017-11-16 Thread Peter Xu
On Wed, Nov 15, 2017 at 10:21:16AM +, Stefan Hajnoczi wrote:
> On Mon, Nov 06, 2017 at 05:46:43PM +0800, Peter Xu wrote:
> > +/*
> > + * Try a time-consuming command, following by a OOB command, make
> > + * sure we get OOB command before the time-consuming one (which is
> > + * run in the parser).
> > + *
> > + * When writting up this test script, the only command that
> > + * support OOB is migrate-incoming.  It's not the best command to
> > + * test OOB but we don't really have a choice here.  We will check
> > + * arriving order but not command errors, which does not really
> > + * matter to us.
> > + */
> > +qmp_async("{ 'execute': 'dump-guest-memory',"
> > +  "  'arguments': { 'paging': true, "
> > +  " 'protocol': 'file:/dev/null' }, "
> > +  "  'id': 'time-consuming-cmd'}");
> > +qmp_async("{ 'execute': 'migrate-incoming', "
> > +  "  'control': { 'run-oob': true }, "
> > +  "  'id': 'oob-cmd' }");
> > +
> > +/* Ignore all events.  Wait for 2 acks */
> > +while (acks < 2) {
> > +resp = qmp_receive();
> > +if (qdict_haskey(resp, "event")) {
> > +/* Skip possible events */
> > +continue;
> > +}
> > +cmd_id = qdict_get_str(resp, "id");
> > +if (acks == 0) {
> > +/* Need to receive OOB response first */
> > +g_assert_cmpstr(cmd_id, ==, "oob-cmd");
> > +} else if (acks == 1) {
> > +g_assert_cmpstr(cmd_id, ==, "time-consuming-cmd");
> > +}
> > +acks++;
> > +}
> 
> This test is non-deterministic.  The dump-guest-memory command could
> complete first on a fast machine.
> 
> On a slow machine this test might take a long time...
> 
> Please introduce a test command that is deterministic.  For example,
> when 'x-oob-test' is invoked without 'run-oob': true it waits until
> invoked again, this time with 'run-oob': true.

Yes this sounds good.

> 
> We have similar interfaces in the block layer for controlling the order
> in which parallel I/O requests are processed.  This allows test cases to
> deterministically take specific code paths.

It's great to know that I can create a command to test it.  That
should be much easier.  Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Yan Vugenfirer
Hi Jason,

Windows driver will initialise only the amount of queue based on the amount of 
available vCPUs. So if there will be more queues in the device than we have 
vCPUs on the guest, the driver will not initialise “excessive” queues. This is 
tied to the way RSS on Windows should be implemented.
Exactly as in described scenario (7 queues, but only 4 vCPUs).

Best regards,
Yan.

> On 16 Nov 2017, at 07:53, Longpeng (Mike)  wrote:
> 
> Hi Jason & Michael,
> 
> Do you have any idea about this problem ?
> 
> -- 
> Regards,
> Longpeng(Mike)
> 
> On 2017/11/15 23:54, Longpeng(Mike) wrote:
> 
>> 2017-11-15 23:05 GMT+08:00 Jason Wang :
>>> 
>>> 
>>> On 2017年11月15日 22:55, Longpeng(Mike) wrote:
 
 Hi guys,
 
 We got a BUG report from our testers yesterday, the testing scenario was
 migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
 queues*).
 
 We found the cause reason, and we'll report the BUG or send a fix patch
 to upstream if necessary( we haven't test the upstream yet, sorry... ).
>>> 
>>> 
>>> Could you explain this a little bit more?
>>> 
 
 We want to know why the vhost_net_start() must start *total queues* ( in
 our
 VM there're 7 queues ) but not *the queues that current used* ( in our VM,
 guest
 only uses the first 4 queues because it's limited by the number of vcpus)
 ?
 
 Looking forward to your help, thx :)
>>> 
>>> 
>>> Since the codes have been there for years and works well for kernel
>>> datapath. You should really explain what's wrong.
>>> 
>> 
>> OK. :)
>> 
>> In our scenario,  the Windows's virtio-net driver only use the first 4
>> queues and it
>> *only set desc/avail/used table for the first 4 queues*, so in QEMU
>> the desc/avail/
>> used of the last 3 queues are ZERO,  but unfortunately...
>> '''
>> vhost_net_start
>>  for (i = 0; i < total_queues; i++)
>>vhost_net_start_one
>>  vhost_dev_start
>>vhost_virtqueue_start
>> '''
>> In vhost_virtqueue_start(), it will calculate the HVA of
>> desc/avail/used table, so for last
>> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then
>> send the results
>> to the user-mode backend ( we use *vhost-user* ) by 
>> vhost_virtqueue_set_addr().
>> 
>> When the EVS get these address, it will update a *idx* which will be
>> treated as  vq's
>> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).
>> 
>> So we get the following result after virtio-net stop:
>>  the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's
>>  last_avail_idx is NOT ZERO.
>> 
>> At last, virtio_load() reports an error:
>> '''
>>  if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
>> will be TRUE
>>  error_report("VQ %d address 0x0 "
>> "inconsistent with Host index 0x%x",
>> i, vdev->vq[i].last_avail_idx);
>>return -1;
>>   }
>> '''
>> 
>> BTW, the problem won't appear if use Linux guest, because the Linux 
>> virtio-net
>> driver will set all 7 queues's desc/avail/used tables. And the problem
>> won't appear
>> if the VM use vhost-net, because vhost-net won't update *idx* in SET_ADDR 
>> ioctl.
>> 
>> Sorry for my pool English, Maybe I could describe the problem in Chinese for 
>> you
>> in private if necessary.
>> 
>> 
>>> Thanks
>> 
>> 
> 
> 
> -- 
> Regards,
> Longpeng(Mike)



[Qemu-devel] sheepdog block driver and read write error policy

2017-11-16 Thread Vasiliy Tolstov
Hi. I'm try to write own sheepdog compatible daemon and test it with qemu.
Sometimes ago in qemu added read write error policy to allow to stop
domain or continue or something else. As i see in case of sheepdog
this policy is ignored and qemu try to reconnect to sheepdog daemon.
If nobody wants i can try to fix this, and if policy is not specified
work like now.
Where i need to start to easy understand how this works in case of file raw ?

Also i'm fedora user with virt-preview repo and see, that if sheepdog
daemon is unavailable 2 or more minutes and started after that qemu
crashed. Does i need to retest with latest qemu version?
b2587932582333197c88bf663785b19f441989d7
f1af3251f885f3b8adf73ba078500f2eeefbedae
5eceb01adfbe513c0309528293b0b86e32a6e27d


qemu-system-x86_64 -machine q35 -accel kvm -m 512M -vnc 0.0.0.0:1
-device virtio-scsi-pci,id=scsi0 -drive
aio=native,rerror=stop,werror=stop,if=none,format=raw,id=drive-scsi-disk0,cache=none,file=sheepdog:test,discard=unmap,detect-zeroes=off
 -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=device-scsi-disk0
qemu-system-x86_64: failed to get the header, Invalid argument
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: failed to send a req, Bad file descriptor
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused
qemu-system-x86_64: Failed to connect socket: Connection refused



qemu-system-x86_64: /builddir/build/BUILD/qemu-2.9.0/util/iov.c:167:
iov_send_recv: Assertion `niov < iov_cnt' failed.
Makefile:5: recipe for target 'test' failed
make: *** [test] Aborted (core dumped)

And if someone from fedora/redhat team is present - when new qemu
version available in virt preview repo for fedora 25 or it EOL?

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru



Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-11-16 Thread Vladimir Sementsov-Ogievskiy

16.11.2017 00:20, John Snow wrote:


On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi all.

There are three qmp commands, needed to implement external backup API.

Using these three commands, client may do all needed bitmap management by
hand:

on backup start we need to do a transaction:
  {disable old bitmap, create new bitmap}

on backup success:
  drop old bitmap

on backup fail:
  enable old bitmap
  merge new bitmap to old bitmap
  drop new bitmap


Can you give me an example of how you expect these commands to be used,
and why they're required?

I'm a little weary about how error-prone these commands might be and the
potential for incorrect usage seems... high. Why do we require them?


It is needed for incremental backup. It looks like bad idea to export 
abdicate/reclaim functionality, it is simpler

and clearer to allow user to merge/enable/disable bitmaps by hand.

usage is like this:

1. we have dirty bitmap bitmap0 for incremental backup.

2. prepare image fleecing (create temporary image with backing=our_disk)
3. in qmp transaction:
    - disable bitmap0
    - create bitmap1
    - start image fleecing (backup sync=none our_disk -> temp_disk)
4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD

=== external tool can get bitmap0 and then copy some clusters through 
NBD ===


on successful backup external tool can drop bitmap0. or can save it and 
reuse somehow


on failed backup external tool can do the following:
    - enable bitmap0
    - merge bitmap1 to bitmap0
    - drop bitmap1

--
Best regards,
Vladimir




Re: [Qemu-devel] sheepdog block driver and read write error policy

2017-11-16 Thread Fam Zheng
On Thu, 11/16 11:11, Vasiliy Tolstov wrote:
> Hi. I'm try to write own sheepdog compatible daemon and test it with qemu.
> Sometimes ago in qemu added read write error policy to allow to stop
> domain or continue or something else. As i see in case of sheepdog
> this policy is ignored and qemu try to reconnect to sheepdog daemon.
> If nobody wants i can try to fix this, and if policy is not specified
> work like now.
> Where i need to start to easy understand how this works in case of file raw ?

The driver callbacks (sd_co_readv/sd_co_writev) should simply return error
instead of retrying.

> 
> Also i'm fedora user with virt-preview repo and see, that if sheepdog
> daemon is unavailable 2 or more minutes and started after that qemu
> crashed. Does i need to retest with latest qemu version?
> b2587932582333197c88bf663785b19f441989d7
> f1af3251f885f3b8adf73ba078500f2eeefbedae
> 5eceb01adfbe513c0309528293b0b86e32a6e27d
> 
> 
> qemu-system-x86_64 -machine q35 -accel kvm -m 512M -vnc 0.0.0.0:1
> -device virtio-scsi-pci,id=scsi0 -drive
> aio=native,rerror=stop,werror=stop,if=none,format=raw,id=drive-scsi-disk0,cache=none,file=sheepdog:test,discard=unmap,detect-zeroes=off
>  -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=device-scsi-disk0
> qemu-system-x86_64: failed to get the header, Invalid argument
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: failed to send a req, Bad file descriptor
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> qemu-system-x86_64: Failed to connect socket: Connection refused
> 
> 
> 
> qemu-system-x86_64: /builddir/build/BUILD/qemu-2.9.0/util/iov.c:167:
> iov_send_recv: Assertion `niov < iov_cnt' failed.
> Makefile:5: recipe for target 'test' failed
> make: *** [test] Aborted (core dumped)
> 
> And if someone from fedora/redhat team is present - when new qemu
> version available in virt preview repo for fedora 25 or it EOL?

It's recommended to use qemu.git if you want to develop new features for QEMU:

http://wiki.qemu.org/Contribute

Fam



Re: [Qemu-devel] [PATCH v2 for-2.11] nbd/server: Fix error reporting for bad requests

2017-11-16 Thread Vladimir Sementsov-Ogievskiy

16.11.2017 00:35, Eric Blake wrote:

The NBD spec says an attempt to NBD_CMD_TRIM on a read-only
export should fail with EPERM, as a trim has the potential
to change disk contents, but we were relying on the block
layer to catch that for us, which might not always give the
right error (and even if it does, it does not let us pass
back a sane message for structured replies).

The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of
bounds should fail with ENOSPC, not EINVAL.

Our check for u64 offset + u32 length wraparound up front is
pointless; nothing uses offset until after the second round
of sanity checks, and we can just as easily ensure there is
no wraparound by checking whether offset is in bounds (since
a disk size cannot exceed off_t which is 63 bits, adding a
32-bit number for a valid offset can't overflow).


looks like here is another problem which you've fixed:
with old code connection would be lost if this check fails in case of
CMD_WRITE, as req->complete would be false.



Solve all of these issues by some code motion and improved
request validation.

Signed-off-by: Eric Blake 
---
v2: actually commit the compiler-error fixes before submitting...

  nbd/server.c | 36 
  1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index df771fd42f..7d6801b427 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1366,15 +1366,6 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
  return -EIO;
  }

-/* Check for sanity in the parameters, part 1.  Defer as many
- * checks as possible until after reading any NBD_CMD_WRITE
- * payload, so we can try and keep the connection alive.  */


I don't understand the comment. Opposite: to keep connection alive we 
must read the payload, even in case of sanity check fail.



-if ((request->from + request->len) < request->from) {
-error_setg(errp,
-   "integer overflow detected, you're probably being 
attacked");
-return -EINVAL;
-}
-
  if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) {
  if (request->len > NBD_MAX_BUFFER_SIZE) {
  error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)",


related idea here: if request->len > NBD_MAX_BUFFER_SIZE or if we failed 
to allocate buffer in following if,
we can call nbd_drop to read CMD_WRITE payload and set req->complete = 
true;, to keep connection in this

cases.

However, it may be done later.


@@ -1399,12 +1390,21 @@ static int nbd_co_receive_request(NBDRequestData *req, 
NBDRequest *request,
request->len);
  }

-/* Sanity checks, part 2. */
-if (request->from + request->len > client->exp->size) {
+/* Sanity checks. */
+if (client->exp->nbdflags & NBD_FLAG_READ_ONLY &&
+(request->type == NBD_CMD_WRITE ||
+ request->type == NBD_CMD_WRITE_ZEROES ||
+ request->type == NBD_CMD_TRIM)) {
+error_setg(errp, "Export is read-only");
+return -EROFS;
+}
+if (request->from > client->exp->size ||
+request->from + request->len > client->exp->size) {
  error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" 
PRIu32
 ", Size: %" PRIu64, request->from, request->len,
 (uint64_t)client->exp->size);
-return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
+return (request->type == NBD_CMD_WRITE ||
+request->type == NBD_CMD_WRITE_ZEROES) ? -ENOSPC : -EINVAL;
  }
  valid_flags = NBD_CMD_FLAG_FUA;
  if (request->type == NBD_CMD_READ && client->structured_reply) {
@@ -1482,12 +1482,6 @@ static coroutine_fn void nbd_trip(void *opaque)

  break;
  case NBD_CMD_WRITE:
-if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
-error_setg(&local_err, "Export is read-only");
-ret = -EROFS;
-break;
-}
-
  flags = 0;
  if (request.flags & NBD_CMD_FLAG_FUA) {
  flags |= BDRV_REQ_FUA;
@@ -1500,12 +1494,6 @@ static coroutine_fn void nbd_trip(void *opaque)

  break;
  case NBD_CMD_WRITE_ZEROES:
-if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
-error_setg(&local_err, "Export is read-only");
-ret = -EROFS;
-break;
-}
-
  flags = 0;
  if (request.flags & NBD_CMD_FLAG_FUA) {
  flags |= BDRV_REQ_FUA;



Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir




Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Jason Wang



On 2017年11月16日 13:53, Longpeng (Mike) wrote:

On 2017/11/15 23:54, Longpeng(Mike) wrote:

2017-11-15 23:05 GMT+08:00 Jason Wang:

On 2017年11月15日 22:55, Longpeng(Mike) wrote:

Hi guys,

We got a BUG report from our testers yesterday, the testing scenario was
migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
queues*).

We found the cause reason, and we'll report the BUG or send a fix patch
to upstream if necessary( we haven't test the upstream yet, sorry... ).

Could you explain this a little bit more?


We want to know why the vhost_net_start() must start*total queues*  ( in
our
VM there're 7 queues ) but not*the queues that current used*  ( in our VM,
guest
only uses the first 4 queues because it's limited by the number of vcpus)
?

Looking forward to your help, thx:)

Since the codes have been there for years and works well for kernel
datapath. You should really explain what's wrong.


OK.:)

In our scenario,  the Windows's virtio-net driver only use the first 4
queues and it
*only set desc/avail/used table for the first 4 queues*, so in QEMU
the desc/avail/
used of the last 3 queues are ZERO,  but unfortunately...
'''
vhost_net_start
   for (i = 0; i < total_queues; i++)
 vhost_net_start_one
   vhost_dev_start
 vhost_virtqueue_start
'''
In vhost_virtqueue_start(), it will calculate the HVA of
desc/avail/used table, so for last
3 queues, it will use ZERO as the GPA to calculate the HVA, and then
send the results
to the user-mode backend ( we use*vhost-user*  ) by vhost_virtqueue_set_addr().

When the EVS get these address, it will update a*idx*  which will be
treated as  vq's
last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).

So we get the following result after virtio-net stop:
   the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's
   last_avail_idx is NOT ZERO.

At last, virtio_load() reports an error:
'''
   if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
will be TRUE
   error_report("VQ %d address 0x0 "
  "inconsistent with Host index 0x%x",
  i, vdev->vq[i].last_avail_idx);
 return -1;
}
'''

BTW, the problem won't appear if use Linux guest, because the Linux virtio-net
driver will set all 7 queues's desc/avail/used tables. And the problem
won't appear
if the VM use vhost-net, because vhost-net won't update*idx*  in SET_ADDR ioctl.


Just to make sure I understand here, I thought Windows guest + vhost_net 
hit this issue?


Thanks



Sorry for my pool English, Maybe I could describe the problem in Chinese for you
in private if necessary.



Thanks

-- Regards, Longpeng(Mike)





[Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl

2017-11-16 Thread Thomas Huth
The bios-tables-test was writing out files that we pass to iasl in
with the wrong endianness in the header when running on a big endian
host. So instead of storing mixed endian information in our structures,
let's keep everything in little endian and byte-swap it only when we
need a value in the code.

Reported-by: Daniel P. Berrange 
Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Thomas Huth 
---
 tests/acpi-utils.h   | 27 +--
 tests/bios-tables-test.c | 42 ++
 2 files changed, 27 insertions(+), 42 deletions(-)

diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h
index f8d8723..d5ca5b6 100644
--- a/tests/acpi-utils.h
+++ b/tests/acpi-utils.h
@@ -28,24 +28,9 @@ typedef struct {
 bool tmp_files_retain;   /* do not delete the temp asl/aml */
 } AcpiSdtTable;
 
-#define ACPI_READ_FIELD(field, addr)   \
-do {   \
-switch (sizeof(field)) {   \
-case 1:\
-field = readb(addr);   \
-break; \
-case 2:\
-field = readw(addr);   \
-break; \
-case 4:\
-field = readl(addr);   \
-break; \
-case 8:\
-field = readq(addr);   \
-break; \
-default:   \
-g_assert(false);   \
-}  \
+#define ACPI_READ_FIELD(field, addr)\
+do {\
+memread(addr, &field, sizeof(field));   \
 addr += sizeof(field);  \
 } while (0);
 
@@ -74,16 +59,14 @@ typedef struct {
 } while (0);
 
 #define ACPI_ASSERT_CMP(actual, expected) do { \
-uint32_t ACPI_ASSERT_CMP_le = cpu_to_le32(actual); \
 char ACPI_ASSERT_CMP_str[5] = {}; \
-memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 4); \
+memcpy(ACPI_ASSERT_CMP_str, &actual, 4); \
 g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
 } while (0)
 
 #define ACPI_ASSERT_CMP64(actual, expected) do { \
-uint64_t ACPI_ASSERT_CMP_le = cpu_to_le64(actual); \
 char ACPI_ASSERT_CMP_str[9] = {}; \
-memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 8); \
+memcpy(ACPI_ASSERT_CMP_str, &actual, 8); \
 g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
 } while (0)
 
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 564da45..e28e0c9 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -96,17 +96,20 @@ static void test_acpi_rsdp_table(test_data *data)
 static void test_acpi_rsdt_table(test_data *data)
 {
 AcpiRsdtDescriptorRev1 *rsdt_table = &data->rsdt_table;
-uint32_t addr = data->rsdp_table.rsdt_physical_address;
+uint32_t addr = le32_to_cpu(data->rsdp_table.rsdt_physical_address);
 uint32_t *tables;
 int tables_nr;
 uint8_t checksum;
+uint32_t rsdt_table_length;
 
 /* read the header */
 ACPI_READ_TABLE_HEADER(rsdt_table, addr);
 ACPI_ASSERT_CMP(rsdt_table->signature, "RSDT");
 
+rsdt_table_length = le32_to_cpu(rsdt_table->length);
+
 /* compute the table entries in rsdt */
-tables_nr = (rsdt_table->length - sizeof(AcpiRsdtDescriptorRev1)) /
+tables_nr = (rsdt_table_length - sizeof(AcpiRsdtDescriptorRev1)) /
 sizeof(uint32_t);
 g_assert(tables_nr > 0);
 
@@ -114,7 +117,7 @@ static void test_acpi_rsdt_table(test_data *data)
 tables = g_new0(uint32_t, tables_nr);
 ACPI_READ_ARRAY_PTR(tables, tables_nr, addr);
 
-checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table->length) +
+checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table_length) +
acpi_calc_checksum((uint8_t *)tables,
   tables_nr * sizeof(uint32_t));
 g_assert(!checksum);
@@ -130,7 +133,7 @@ static void test_acpi_fadt_table(test_data *data)
 uint32_t addr;
 
 /* FADT table comes first */
-addr = data->rsdt_tables_addr[0];
+addr = le32_to_cpu(data->rsdt_tables_addr[0]);
 ACPI_READ_TABLE_HEADER(fadt_table, addr);
 
 ACPI_READ_FIELD(fadt_table->firmware_ctrl, addr);
@@ -187,13 +190,14 @@ static void test_acpi_fadt_table(test_data *data)
 ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe1_block, addr);
 
 ACPI_ASSERT_CMP(fadt_table->signature, "FACP");
-g_assert(!acpi_calc_checksum((uint8_t *)fadt_table, fadt_table->length));
+g_assert(!acpi_calc_checksum((uint8_t *)fadt_table,
+ le32_to_cpu(fadt_table->length)));
 }
 
 static void test_acpi_facs_table(test_d

Re: [Qemu-devel] [PATCH v8 02/14] block/dirty-bitmap: add locked version of bdrv_release_dirty_bitmap

2017-11-16 Thread Vladimir Sementsov-Ogievskiy

11.11.2017 01:52, John Snow wrote:


On 10/30/2017 12:32 PM, Vladimir Sementsov-Ogievskiy wrote:

It is needed to realize bdrv_dirty_bitmap_release_successor in
the following patch.


OK, but...


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/dirty-bitmap.c | 25 -
  1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 81adbeb6d4..981f99d362 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -326,13 +326,13 @@ static bool bdrv_dirty_bitmap_has_name(BdrvDirtyBitmap 
*bitmap)
  return !!bdrv_dirty_bitmap_name(bitmap);
  }
  
-/* Called with BQL taken.  */

-static void bdrv_do_release_matching_dirty_bitmap(
+/* Called within bdrv_dirty_bitmap_lock..unlock */

...Add this so it will compile:

__attribute__((__unused__))


ok


+static void bdrv_do_release_matching_dirty_bitmap_locked(
  BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
  bool (*cond)(BdrvDirtyBitmap *bitmap))
  {
  BdrvDirtyBitmap *bm, *next;
-bdrv_dirty_bitmaps_lock(bs);
+
  QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) {
  if ((!bitmap || bm == bitmap) && (!cond || cond(bm))) {
  assert(!bm->active_iterators);
@@ -344,18 +344,33 @@ static void bdrv_do_release_matching_dirty_bitmap(
  g_free(bm);
  
  if (bitmap) {

-goto out;
+return;
  }
  }
  }
+
  if (bitmap) {
  abort();
  }

Do we have any style guide rules on using abort() instead of assert()?
The rest of this function uses assert, and it'd be less lines to simply
write:

assert(!bitmap);

which I think might also carry better semantic information for coverity
beyond an actual runtime conditional branch.

(I think. Please correct me if I am wrong, I'm a little hazy on this.)


agree, but it is a preexisting code, so I'll fix it with an additional 
patch.





+}
  
-out:

+/* Called with BQL taken.  */
+static void bdrv_do_release_matching_dirty_bitmap(
+BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
+bool (*cond)(BdrvDirtyBitmap *bitmap))
+{
+bdrv_dirty_bitmaps_lock(bs);
+bdrv_do_release_matching_dirty_bitmap_locked(bs, bitmap, cond);
  bdrv_dirty_bitmaps_unlock(bs);
  }
  
+/* Called within bdrv_dirty_bitmap_lock..unlock */

+static void bdrv_release_dirty_bitmap_locked(BlockDriverState *bs,
+ BdrvDirtyBitmap *bitmap)
+{
+bdrv_do_release_matching_dirty_bitmap_locked(bs, bitmap, NULL);
+}
+
  /* Called with BQL taken.  */
  void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap)
  {


If you agree with those two changes, you may add:


ok



Reviewed-by: John Snow 



--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH v2 2/2] Add new PCI ID for i82559a

2017-11-16 Thread Jason Wang



On 2017年11月16日 14:40, Stefan Weil wrote:

Am 15.11.2017 um 14:09 schrieb Michael Nawrocki:

Hi Stefan,

I've got a VxWorks driver binary that explicitly looks for device ID
0x1030 (which is admittedly not ideal). It seems like the "82559
InBusiness 10/100" hardware uses this, though I've had trouble finding
an official source. The following documents reference that ID:

https://pci-ids.ucw.cz/read/PC/8086/1030
http://ks.pams.ncsu.edu/pub/ncsuscyld/i386/misc/src/trees/hdstg2/modules/pcitable

https://cateee.net/lkddb/web-lkddb/E100.html

And I found a similar post on a different mailing list that might shed
some light:
http://www.beowulf.org/pipermail/eepro100/2000-January/000760.html

It looks like the 8255x series of devices have a number of potential
IDs; maybe a property to set a specific PCI device ID would work?

Thanks,
Mike

Yes, that might be a very general solution which could be applied to all
PCI devices.
It could even be extended to include the vendor ID or any PCI
configuration value
as well.

Nevertheless the technically correct solution would be a full emulation
of the
EEPROM. Then we could provide an EEPROM for the 82559 InBusiness 10/100",
and the data from that EEPROM would set the right PCI device ID.

Jason, until we get a better solution, the last commit should be
reverted before
the new QEMU version is made. That commit also added a wrong help text.

Regards
Stefan


Let me post a patch soon.

Thanks



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Gonglei (Arei)


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, November 16, 2017 4:55 PM
> To: longpeng; m...@redhat.com
> Cc: Longpeng(Mike); qemu-devel@nongnu.org; Gonglei (Arei); Wangjing (King,
> Euler); Huangweidong (C); stefa...@redhat.com
> Subject: Re: [Question] why need to start all queues in vhost_net_start
> 
> 
> 
> On 2017年11月16日 13:53, Longpeng (Mike) wrote:
> > On 2017/11/15 23:54, Longpeng(Mike) wrote:
> >> 2017-11-15 23:05 GMT+08:00 Jason Wang:
> >>> On 2017年11月15日 22:55, Longpeng(Mike) wrote:
>  Hi guys,
> 
>  We got a BUG report from our testers yesterday, the testing scenario was
>  migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
>  queues*).
> 
>  We found the cause reason, and we'll report the BUG or send a fix patch
>  to upstream if necessary( we haven't test the upstream yet, sorry... ).
> >>> Could you explain this a little bit more?
> >>>
>  We want to know why the vhost_net_start() must start*total queues*
> ( in
>  our
>  VM there're 7 queues ) but not*the queues that current used*  ( in our
> VM,
>  guest
>  only uses the first 4 queues because it's limited by the number of vcpus)
>  ?
> 
>  Looking forward to your help, thx:)
> >>> Since the codes have been there for years and works well for kernel
> >>> datapath. You should really explain what's wrong.
> >>>
> >> OK.:)
> >>
> >> In our scenario,  the Windows's virtio-net driver only use the first 4
> >> queues and it
> >> *only set desc/avail/used table for the first 4 queues*, so in QEMU
> >> the desc/avail/
> >> used of the last 3 queues are ZERO,  but unfortunately...
> >> '''
> >> vhost_net_start
> >>for (i = 0; i < total_queues; i++)
> >>  vhost_net_start_one
> >>vhost_dev_start
> >>  vhost_virtqueue_start
> >> '''
> >> In vhost_virtqueue_start(), it will calculate the HVA of
> >> desc/avail/used table, so for last
> >> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then
> >> send the results
> >> to the user-mode backend ( we use*vhost-user*  ) by
> vhost_virtqueue_set_addr().
> >>
> >> When the EVS get these address, it will update a*idx*  which will be
> >> treated as  vq's
> >> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).
> >>
> >> So we get the following result after virtio-net stop:
> >>the desc/avail/used of the last 3 queues's vqs are all ZERO, but these
> vqs's
> >>last_avail_idx is NOT ZERO.
> >>
> >> At last, virtio_load() reports an error:
> >> '''
> >>if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
> >> will be TRUE
> >>error_report("VQ %d address 0x0 "
> >>   "inconsistent with Host index 0x%x",
> >>   i, vdev->vq[i].last_avail_idx);
> >>  return -1;
> >> }
> >> '''
> >>
> >> BTW, the problem won't appear if use Linux guest, because the Linux
> virtio-net
> >> driver will set all 7 queues's desc/avail/used tables. And the problem
> >> won't appear
> >> if the VM use vhost-net, because vhost-net won't update*idx*  in
> SET_ADDR ioctl.
> 
> Just to make sure I understand here, I thought Windows guest + vhost_net
> hit this issue?
> 
No, Windows guest + vhost-user/DPDK.

BTW pls see virtio spec in :

"If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be 
used SHOULD be populated
with receive buffers."

It is not mandatory that all queues must be initialized.

Thanks,
-Gonglei



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Jason Wang



On 2017年11月16日 17:01, Gonglei (Arei) wrote:



-Original Message-
From: Jason Wang [mailto:jasow...@redhat.com]
Sent: Thursday, November 16, 2017 4:55 PM
To: longpeng; m...@redhat.com
Cc: Longpeng(Mike); qemu-devel@nongnu.org; Gonglei (Arei); Wangjing (King,
Euler); Huangweidong (C); stefa...@redhat.com
Subject: Re: [Question] why need to start all queues in vhost_net_start



On 2017年11月16日 13:53, Longpeng (Mike) wrote:

On 2017/11/15 23:54, Longpeng(Mike) wrote:

2017-11-15 23:05 GMT+08:00 Jason Wang:

On 2017年11月15日 22:55, Longpeng(Mike) wrote:

Hi guys,

We got a BUG report from our testers yesterday, the testing scenario was
migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
queues*).

We found the cause reason, and we'll report the BUG or send a fix patch
to upstream if necessary( we haven't test the upstream yet, sorry... ).

Could you explain this a little bit more?


We want to know why the vhost_net_start() must start*total queues*

( in

our
VM there're 7 queues ) but not*the queues that current used*  ( in our

VM,

guest
only uses the first 4 queues because it's limited by the number of vcpus)
?

Looking forward to your help, thx:)

Since the codes have been there for years and works well for kernel
datapath. You should really explain what's wrong.


OK.:)

In our scenario,  the Windows's virtio-net driver only use the first 4
queues and it
*only set desc/avail/used table for the first 4 queues*, so in QEMU
the desc/avail/
used of the last 3 queues are ZERO,  but unfortunately...
'''
vhost_net_start
for (i = 0; i < total_queues; i++)
  vhost_net_start_one
vhost_dev_start
  vhost_virtqueue_start
'''
In vhost_virtqueue_start(), it will calculate the HVA of
desc/avail/used table, so for last
3 queues, it will use ZERO as the GPA to calculate the HVA, and then
send the results
to the user-mode backend ( we use*vhost-user*  ) by

vhost_virtqueue_set_addr().

When the EVS get these address, it will update a*idx*  which will be
treated as  vq's
last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).

So we get the following result after virtio-net stop:
the desc/avail/used of the last 3 queues's vqs are all ZERO, but these

vqs's

last_avail_idx is NOT ZERO.

At last, virtio_load() reports an error:
'''
if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
will be TRUE
error_report("VQ %d address 0x0 "
   "inconsistent with Host index 0x%x",
   i, vdev->vq[i].last_avail_idx);
  return -1;
 }
'''

BTW, the problem won't appear if use Linux guest, because the Linux

virtio-net

driver will set all 7 queues's desc/avail/used tables. And the problem
won't appear
if the VM use vhost-net, because vhost-net won't update*idx*  in

SET_ADDR ioctl.

Just to make sure I understand here, I thought Windows guest + vhost_net
hit this issue?


No, Windows guest + vhost-user/DPDK.

BTW pls see virtio spec in :

"If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be 
used SHOULD be populated
with receive buffers."

It is not mandatory that all queues must be initialized.

Thanks,
-Gonglei



Interesting, vhost_net will set last_avail_idx to vq.num during 
SET_VRING_BASE. So I thought it should hit this.


Btw, maybe we should relax the check to:

if (!vdev->vq[i].vring.desc && (vdev->vq[i].last_avail_idx != 
vdev->vq[i].vring.num)) {


Thanks



Re: [Qemu-devel] [PATCH] s390/kvm_virtio/linux-headers: remove traces of old virtio transport

2017-11-16 Thread Cornelia Huck
On Thu, 16 Nov 2017 08:45:09 +0100
Christian Borntraeger  wrote:

> On 11/15/2017 06:10 PM, Cornelia Huck wrote:
> > On Wed, 15 Nov 2017 16:42:23 +0100
> > Christian Borntraeger  wrote:
> >   
> >> We no longer support the old s390 transport, neither does the newest
> >> Linux kernel. Remove it from the linux header script as well as the
> >> s390x virtio code.  We still should handle the VIRTIO_NOTIFY hypercall,
> >> to tolerate early printk on older guest kernels without an sclp console.  
> > 
> > Are there any such guests still around? Wouldn't they be unable to run
> > because of the missing old transport anyway?  
> 
> As far as I can see even an 4.13 will do
> 
> static int __init s390_virtio_console_init(void)
> {
> if (sclp.has_vt220 || sclp.has_linemode)
> return -ENODEV;
> return virtio_cons_early_init(early_put_chars);
> }
> console_initcall(s390_virtio_console_init);
> 
> No matter if there is the old transport or not available.
> 
> So as soon as somebody chooses virtio-console you should see the diag500 from 
> the early
> printk.

Grmpf, and the first condition does not trigger when you don't define
any sclp console.

Oh well, it seems we still have to drag this along :(
 
> >   
> >> We continue to ignore these events.
> >>
> >> Signed-off-by: Christian Borntraeger 
> >> ---
> >>  hw/s390x/s390-virtio-hcall.h   |  6 ++-
> >>  include/standard-headers/asm-s390/kvm_virtio.h | 64
> >> --
> >> scripts/update-linux-headers.sh|  1 - 3 files
> >> changed, 4 insertions(+), 67 deletions(-) delete mode 100644
> >> include/standard-headers/asm-s390/kvm_virtio.h  
> >   

I think this becomes relevant only when someone does a headers update
against 4.15+, which is unlikely to happen in freeze. So I'll queue
this to s390-next only.



Re: [Qemu-devel] [PATCH v1] ps2: check PS2Queue pointers in post_load routine

2017-11-16 Thread Paolo Bonzini
On 16/11/2017 08:51, P J P wrote:
> From: Prasad J Pandit 
> 
> During Qemu guest migration, a destination process invokes ps2
> post_load function. In that, if 'rptr' and 'count' values were
> invalid, it could lead to OOB access or infinite loop issue.
> Add check to avoid it.
> 
> Reported-by: Cyrille Chatras 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/input/ps2.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)
> 
> Update: avoid changing field type, add range check on values
>   -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02815.html
> 
> diff --git a/hw/input/ps2.c b/hw/input/ps2.c
> index f388a23c8e..de171a28dd 100644
> --- a/hw/input/ps2.c
> +++ b/hw/input/ps2.c
> @@ -1225,24 +1225,21 @@ static void ps2_common_reset(PS2State *s)
>  static void ps2_common_post_load(PS2State *s)
>  {
>  PS2Queue *q = &s->queue;
> -int size;
> -int i;
> -int tmp_data[PS2_QUEUE_SIZE];
> +uint8_t i, size;
> +uint8_t tmp_data[PS2_QUEUE_SIZE];

Hi Prasad,

you don't need to change the invalid values to sane ones.  Instead, make
ps2_common_post_load return an int (just like the .post_load member of
VMStateDescription).  You can then detect out of range count/rptr/wptr
and return -1 for bad indices.

In the callers,

ps2_common_post_load(ps2);

return 0;

can then become simply

return ps2_common_post_load(ps2);

Thanks,

Paolo

>  /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */
> -size = q->count > PS2_QUEUE_SIZE ? 0 : q->count;
> +size = (q->count < 0 || q->count > PS2_QUEUE_SIZE) ? 0 : q->count;
>  
>  /* move the queue elements to the start of data array */
> -if (size > 0) {
> -for (i = 0; i < size; i++) {
> -/* move the queue elements to the temporary buffer */
> -tmp_data[i] = q->data[q->rptr];
> -if (++q->rptr == 256) {
> -q->rptr = 0;
> -}
> +for (i = 0; i < size; i++) {
> +if (q->rptr < 0 || q->rptr >= sizeof(q->data)) {
> +q->rptr = 0;
>  }
> -memcpy(q->data, tmp_data, size);
> +tmp_data[i] = q->data[q->rptr++];
>  }
> +memcpy(q->data, tmp_data, size);
> +
>  /* reset rptr/wptr/count */
>  q->rptr = 0;
>  q->wptr = size;
> 




Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Longpeng (Mike)


On 2017/11/16 16:54, Jason Wang wrote:

> 
> 
> On 2017年11月16日 13:53, Longpeng (Mike) wrote:
>> On 2017/11/15 23:54, Longpeng(Mike) wrote:
>>> 2017-11-15 23:05 GMT+08:00 Jason Wang:
 On 2017年11月15日 22:55, Longpeng(Mike) wrote:
> Hi guys,
>
> We got a BUG report from our testers yesterday, the testing scenario was
> migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
> queues*).
>
> We found the cause reason, and we'll report the BUG or send a fix patch
> to upstream if necessary( we haven't test the upstream yet, sorry... ).
 Could you explain this a little bit more?

> We want to know why the vhost_net_start() must start*total queues*  ( in
> our
> VM there're 7 queues ) but not*the queues that current used*  ( in our VM,
> guest
> only uses the first 4 queues because it's limited by the number of vcpus)
> ?
>
> Looking forward to your help, thx:)
 Since the codes have been there for years and works well for kernel
 datapath. You should really explain what's wrong.

>>> OK.:)
>>>
>>> In our scenario,  the Windows's virtio-net driver only use the first 4
>>> queues and it
>>> *only set desc/avail/used table for the first 4 queues*, so in QEMU
>>> the desc/avail/
>>> used of the last 3 queues are ZERO,  but unfortunately...
>>> '''
>>> vhost_net_start
>>>for (i = 0; i < total_queues; i++)
>>>  vhost_net_start_one
>>>vhost_dev_start
>>>  vhost_virtqueue_start
>>> '''
>>> In vhost_virtqueue_start(), it will calculate the HVA of
>>> desc/avail/used table, so for last
>>> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then
>>> send the results
>>> to the user-mode backend ( we use*vhost-user*  ) by 
>>> vhost_virtqueue_set_addr().
>>>
>>> When the EVS get these address, it will update a*idx*  which will be
>>> treated as  vq's
>>> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).
>>>
>>> So we get the following result after virtio-net stop:
>>>the desc/avail/used of the last 3 queues's vqs are all ZERO, but these 
>>> vqs's
>>>last_avail_idx is NOT ZERO.
>>>
>>> At last, virtio_load() reports an error:
>>> '''
>>>if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
>>> will be TRUE
>>>error_report("VQ %d address 0x0 "
>>>   "inconsistent with Host index 0x%x",
>>>   i, vdev->vq[i].last_avail_idx);
>>>  return -1;
>>> }
>>> '''
>>>
>>> BTW, the problem won't appear if use Linux guest, because the Linux 
>>> virtio-net
>>> driver will set all 7 queues's desc/avail/used tables. And the problem
>>> won't appear
>>> if the VM use vhost-net, because vhost-net won't update*idx*  in SET_ADDR 
>>> ioctl.
> 
> Just to make sure I understand here, I thought Windows guest + vhost_net hit
> this issue?
> 


Windows guest + vhost-user hit.
Windows guest + vhost-net is fine.

'''
In vhost_virtqueue_start(), it will calculate the HVA of
desc/avail/used tables, so for last
3 queues, it will use ZERO as the GPA to calculate the HVA, and then
send the results
to the user-mode backend ( we use *vhost-user* ) by vhost_virtqueue_set_addr().
'''
I think this is the root cause, it is strange, right ?

> Thanks
> 
>>>
>>> Sorry for my pool English, Maybe I could describe the problem in Chinese 
>>> for you
>>> in private if necessary.
>>>
>>>
 Thanks
>> -- Regards, Longpeng(Mike)
> 
> 
> .
> 


-- 
Regards,
Longpeng(Mike)




Re: [Qemu-devel] sheepdog block driver and read write error policy

2017-11-16 Thread Vasiliy Tolstov
2017-11-16 11:27 GMT+03:00 Fam Zheng :
> On Thu, 11/16 11:11, Vasiliy Tolstov wrote:
>> Hi. I'm try to write own sheepdog compatible daemon and test it with qemu.
>> Sometimes ago in qemu added read write error policy to allow to stop
>> domain or continue or something else. As i see in case of sheepdog
>> this policy is ignored and qemu try to reconnect to sheepdog daemon.
>> If nobody wants i can try to fix this, and if policy is not specified
>> work like now.
>> Where i need to start to easy understand how this works in case of file raw ?
>
> The driver callbacks (sd_co_readv/sd_co_writev) should simply return error
> instead of retrying.
>


Thanks,  how can i pass options to block driver? (as i understand
read/write policy affects concrete hardware - scsi/ide and need to be
passed down to lower level)

Also about sheepdog driver, i'm re-read discussion
https://patchwork.ozlabs.org/patch/501533/ and have a question - if
all logic about overlapping requests and oids calculation go from qemu
to sheepdog daemon,
does this be slowdown or not in case of iops? (I think that if qemu
driver only read/write/discard some block data it can be more simplify
and have less errors, and sheepdog daemon can handle internally all
overlapping and inode updates)
I'm understand that this may be completely new sheepdog qemu driver,
but i think that if sheepdog driver is more like nbd/rbd it can be
more simple...


-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru



Re: [Qemu-devel] [Qemu-block] [PATCH v2] qapi: block-core: Clarify events emitted by 'block-job-cancel'

2017-11-16 Thread Kashyap Chamarthy
On Wed, Nov 15, 2017 at 04:56:13PM -0500, John Snow wrote:
> 
> 
> On 11/15/2017 04:54 PM, Kashyap Chamarthy wrote:
> > On Wed, Nov 15, 2017 at 02:15:57PM -0500, John Snow wrote:

[...]

> >> is it covered sufficiently in live-block-operations.rst ?
> > 
> > I looked in there[2] too.  Short answer: no.  Long: In the "Live disk
> > synchronization — drive-mirror and blockdev-mirror" section, I simply
> > seemed to declare: 
> > 
> > "Issuing the command ``block-job-cancel`` after it emits the event
> > ``BLOCK_JOB_CANCELLED``"
> > 
> > As if that's the *only* event it emits, which is clearly not the case.
> > So while at it, wonder if should I also update it
> > ('live-block-operations.rst') too.
> > 
> 
> It's an interesting gotcha that I wasn't really acutely aware of myself,
> so having it in the doc format for API programmers who aren't
> necessarily digging through our source sounds like a pleasant courtesy.

Indeed, will do.  (Just for my own clarity, did you imply: don't update
it in block-core.json?  FWIW, my first instinct is to check the QAPI
documentation for such things, that's why I wrote there first :-))

Thanks for looking.

[...]

-- 
/kashyap



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Jason Wang



On 2017年11月16日 17:01, Gonglei (Arei) wrote:

No, Windows guest + vhost-user/DPDK.

BTW pls see virtio spec in :

"If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be 
used SHOULD be populated
with receive buffers."

It is not mandatory that all queues must be initialized.


I think not, since it said we should fill receive buffers for each queue 
which means we should initialize all queues. May Michael can clarify on 
this.


Thanks



Thanks,
-Gonglei








Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Longpeng (Mike)
Hi Jason,

On 2017/11/16 17:13, Jason Wang wrote:

> 
> 
> On 2017年11月16日 17:01, Gonglei (Arei) wrote:
>> No, Windows guest + vhost-user/DPDK.
>>
>> BTW pls see virtio spec in :
>>
>> "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will
>> be used SHOULD be populated
>> with receive buffers."
>>
>> It is not mandatory that all queues must be initialized.
> 
> I think not, since it said we should fill receive buffers for each queue which
> means we should initialize all queues. May Michael can clarify on this.
> 


I think this doesn't matter, but QEMU should consider this scenario...

For example, if one queues isn't initialized (Windows guest), the vring.avail=0,
so vq->desc_phys=0, then vq->desc='a avail HVA'(which is the start addr of 
pc.ram).

vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx);
vq->desc_phys = a = virtio_queue_get_desc_addr(vdev, idx);
vq->desc = vhost_memory_map(dev, a, &l, 0);
if (!vq->desc || l != s) {
r = -ENOMEM;
goto fail_alloc_desc;
}
.
r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
if (r < 0) {
r = -errno;
goto fail_alloc;
}

Then the HVA is send to the vhost-user.

I think this is wrong, because the '0' here means guest driver doesn't init this
queues, it should not be used to calculate the HVA for this vq.

> Thanks
> 
>>
>> Thanks,
>> -Gonglei
>>
> 
> 
> 
> 
> .
> 


-- 
Regards,
Longpeng(Mike)




Re: [Qemu-devel] [PATCH v3] xen-disk: use an IOThread per instance

2017-11-16 Thread Paul Durrant
> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 16 November 2017 01:11
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Stefano
> Stabellini ; Anthony Perard
> ; Kevin Wolf ; Max Reitz
> 
> Subject: RE: [PATCH v3] xen-disk: use an IOThread per instance
> 
> On Wed, 15 Nov 2017, Paul Durrant wrote:
> > Anthony, Stefano,
> >
> >   Ping?
> 
> Acked-by: Stefano Stabellini 
> 
> Unless Anthony or somebody else object, I'll queue it up in my "next"
> branch (which I'll send upstream after 2.11 is out).

Great. Thanks,

  Paul

> 
> Cheers,
> 
> Stefano
> 
> 
> > > -Original Message-
> > > From: Paul Durrant [mailto:paul.durr...@citrix.com]
> > > Sent: 07 November 2017 10:47
> > > To: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org
> > > Cc: Paul Durrant ; Stefano Stabellini
> > > ; Anthony Perard ;
> > > Kevin Wolf ; Max Reitz 
> > > Subject: [PATCH v3] xen-disk: use an IOThread per instance
> > >
> > > This patch allocates an IOThread object for each xen_disk instance and
> > > sets the AIO context appropriately on connect. This allows processing
> > > of I/O to proceed in parallel.
> > >
> > > The patch also adds tracepoints into xen_disk to make it possible to
> > > follow the state transtions of an instance in the log.
> > >
> > > Signed-off-by: Paul Durrant 
> > > ---
> > > Cc: Stefano Stabellini 
> > > Cc: Anthony Perard 
> > > Cc: Kevin Wolf 
> > > Cc: Max Reitz 
> > >
> > > v3:
> > >  - Use new iothread_create/destroy() functions
> > >
> > > v2:
> > >  - explicitly acquire and release AIO context in qemu_aio_complete() and
> > >blk_bh()
> > > ---
> > >  hw/block/trace-events |  7 +++
> > >  hw/block/xen_disk.c   | 53
> > > ---
> > >  2 files changed, 53 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > > index cb6767b3ee..962a3bfa24 100644
> > > --- a/hw/block/trace-events
> > > +++ b/hw/block/trace-events
> > > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *vdev, void *mrb,
> int
> > > start, int num_reqs, uint6
> > >  # hw/block/hd-geometry.c
> > >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> > > LCHS %d %d %d"
> > >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> secs,
> > > int trans) "blk %p CHS %u %u %u trans %d"
> > > +
> > > +# hw/block/xen_disk.c
> > > +xen_disk_alloc(char *name) "%s"
> > > +xen_disk_init(char *name) "%s"
> > > +xen_disk_connect(char *name) "%s"
> > > +xen_disk_disconnect(char *name) "%s"
> > > +xen_disk_free(char *name) "%s"
> > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > > index e431bd89e8..f74fcd42d1 100644
> > > --- a/hw/block/xen_disk.c
> > > +++ b/hw/block/xen_disk.c
> > > @@ -27,10 +27,12 @@
> > >  #include "hw/xen/xen_backend.h"
> > >  #include "xen_blkif.h"
> > >  #include "sysemu/blockdev.h"
> > > +#include "sysemu/iothread.h"
> > >  #include "sysemu/block-backend.h"
> > >  #include "qapi/error.h"
> > >  #include "qapi/qmp/qdict.h"
> > >  #include "qapi/qmp/qstring.h"
> > > +#include "trace.h"
> > >
> > >  /* - */
> > >
> > > @@ -125,6 +127,9 @@ struct XenBlkDev {
> > >  DriveInfo   *dinfo;
> > >  BlockBackend*blk;
> > >  QEMUBH  *bh;
> > > +
> > > +IOThread*iothread;
> > > +AioContext  *ctx;
> > >  };
> > >
> > >  /* - */
> > > @@ -596,9 +601,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> > > *ioreq);
> > >  static void qemu_aio_complete(void *opaque, int ret)
> > >  {
> > >  struct ioreq *ioreq = opaque;
> > > +struct XenBlkDev *blkdev = ioreq->blkdev;
> > > +
> > > +aio_context_acquire(blkdev->ctx);
> > >
> > >  if (ret != 0) {
> > > -xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n",
> > > +xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n",
> > >ioreq->req.operation == BLKIF_OP_READ ? "read" : 
> > > "write");
> > >  ioreq->aio_errors++;
> > >  }
> > > @@ -607,10 +615,10 @@ static void qemu_aio_complete(void *opaque,
> int
> > > ret)
> > >  if (ioreq->presync) {
> > >  ioreq->presync = 0;
> > >  ioreq_runio_qemu_aio(ioreq);
> > > -return;
> > > +goto done;
> > >  }
> > >  if (ioreq->aio_inflight > 0) {
> > > -return;
> > > +goto done;
> > >  }
> > >
> > >  if (xen_feature_grant_copy) {
> > > @@ -647,16 +655,19 @@ static void qemu_aio_complete(void *opaque,
> int
> > > ret)
> > >  }
> > >  case BLKIF_OP_READ:
> > >  if (ioreq->status == BLKIF_RSP_OKAY) {
> > > -block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq-
> >acct);
> > > +block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct);
> > >  } els

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-16 Thread Wouter Verhelst
On Tue, Nov 14, 2017 at 01:06:17PM -0600, Eric Blake wrote:
> On 11/14/2017 11:37 AM, Wouter Verhelst wrote:
> > On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote:
> >> Another thought - with structured replies, we finally have a way to let
> >> the client ask for the server to send resize information whenever the
> >> server wants, rather than having to be polled by a new client request
> >> all the time.  This is possible by having the server reply with a chunk
> >> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants,
> >> (that is, the server never officially ends the response to the single
> >> client request for on-going status, until the client sends an
> >> NBD_CMD_DISC).
> > 
> > Hrm, yeah, that could work.
> > 
> > Minor downside of this would be that a client would now be expected to
> > continue listening "forever" (probably needs to do a blocking read() or
> > a select() on the socket), whereas with the current situation a client
> > could get away with only reading for as long as it expects data.
> > 
> > I don't think that should be a blocker, but it might be something we
> > might want to document.
> > 
> >> I don't think the server should go into this mode without a flag bit
> >> from the client requesting it (as it potentially ties up a thread that
> >> could otherwise be used for parallel processing of other requests),
> > 
> > Yeah. I think we should probably initiate this with a BLOCK_STATUS
> > message that has a flag with which we mean "don't stop sending data on
> > the given region for contexts that support it".
> 
> Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE;

Eh, right -- I had forgotten about RESIZE, actually :-)

> I was thinking of the open-ended command for being informed of all
> server-side-initiated size changes in response to RESIZE; but your mention of
> an open-ended BLOCK_STATUS has an interesting connotation of being able to
> get live updates as sections of a file are dirtied.

For instance, or whatever other metadata we end up sending through
BLOCK_STATUS.

> I also remember from talking with Vladimir during KVM Forum last month
> that one of the shortfalls of the NBD protocol is that you can only ever
> send a length of up to 32 bits on the command side (unless we introduce
> structured commands in addition to our current work to add structured
> replies);

Yes, and I'm thinking we should do so. This will obviously require more
negotiation.

Can be done fairly easily though:
- Client negotiates structured replies (don't think it makes sense to do
  structured requests without structured replies)
- Server sets an extra transmission flag to say "I am capable of
  receiving extended requests"
- Extended requests have a different magic number, and should have a
  "request length" field as well. I'm thinking we make it:

magic  (32b)
request length (16b)
type   (16b)
flags  (64b)
handle (64b)
from   (64b)
data length(64b)
(extra data)

Request length in this proposal should always be at least 320.

I made flags 64 bits rather than 16 as per the current format, because
that way everything is aligned on a 4-byte boundary, which makes things
a bit easier on some architectures (e.g., I know that sparc doesn't like
unaligned 64-bit access). 64 bits for flags looks like a bit of a waste,
but then if we're going to waste some bits somewhere, I guess it's best
to assign them to flags.

The idea is that "request length" is the length of the data that the
client is sending, and "data length" is the length of the range that
we're trying to deal with.

A write request would thus have to have request length be (data length +
320); a read request would have request length be 320, and expect data
to be returned of data length bytes.

A metadata request could then tack on extra data, where request length
of 320 implies "all negotiated metadata contexts", but anything more
than that would imply there are some metadata IDs passed along.

etc.

Thoughts?

[...]
> > However, I could imagine that there might be some cases wherein a server
> > might be able to go into such a mode for two or more metadata contexts,
> > and where a client might want to go into that mode for one of them but
> > not all of them, while still wanting some information from them.
> > 
> > This could be covered with metadata context syntax, but it's annoying
> > and shouldn't be necessary.
> > 
> > I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS
> > can't take a metadata context ID. Okay, there's no space for it, but
> > that shouldn't have been a blocker.
> > 
> > Thoughts?
> 
> Nothing says the server has to reply the same length of information when
> replying for multiple selected metadata contexts; but if we allow
> different reply sizes all in one query, we may also need some way to
> easily tell that the server has stopped sending metadata for one context
> even though it is still providing additional replie

Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Richard W.M. Jones
Here's my solution, as a nbdkit plugin written in Perl.

As with Max's solution I don't bother to parse the virtual size out of
the XML file, so you need to specify that on the command line
otherwise the disk will be truncated to the largest extent stored in
the file.  Also the ‘.xva’ file must not be compressed.

  $ nbdkit perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296

  $ guestfish --ro --format=raw -a nbd://localhost -i
  
  Welcome to guestfish, the guest filesystem shell for
  editing virtual machine filesystems and disk images.
  
  Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell
  
  Operating system: 8.2
  /dev/sda1 mounted on /
  
  > ll /
  total 100
  drwxr-xr-x 22 root root  4096 Jan  8  2016 .
  drwxr-xr-x 19 root root  4096 Nov 16 09:50 ..
  drwxrwxr-x  2 root root  4096 Jan  8  2016 bin
  drwxr-xr-x  3 root root  4096 Jan  8  2016 boot
  drwxr-xr-x  4 root root  4096 Jan  8  2016 dev
  drwxr-xr-x 87 root root  4096 Jan  8  2016 etc
  drwxr-xr-x  3 root root  4096 Jan  8  2016 home
  lrwxrwxrwx  1 root root31 Jan  8  2016 initrd.img -> 
/boot/initrd.img-3.16.0-4-amd64
  drwxr-xr-x 14 root root  4096 Jan  8  2016 lib
  drwxr-xr-x  2 root root  4096 Jan  8  2016 lib64
  drwx--  2 root root 16384 Jan  8  2016 lost+found
  drwxr-xr-x  3 root root  4096 Jan  8  2016 media
  drwxr-xr-x  2 root root  4096 Jan  8  2016 mnt
  drwxr-xr-x  2 root root  4096 Jan  8  2016 opt
  drwxr-xr-x  2 root root  4096 May  4  2015 proc
  drwx--  2 root root  4096 Jan  8  2016 root
  drwxr-xr-x  2 root root  4096 Jan  8  2016 run
  drwxr-xr-x  2 root root  4096 Jan  8  2016 sbin
  drwxr-xr-x  2 root root  4096 Jan  8  2016 srv
  drwxr-xr-x  2 root root  4096 Apr  6  2015 sys
  drwxrwxrwt  7 root root  4096 Jan  8  2016 tmp
  drwxr-xr-x 10 root root  4096 Jan  8  2016 usr
  drwxr-xr-x 11 root root  4096 Jan  8  2016 var
  lrwxrwxrwx  1 root root27 Jan  8  2016 vmlinuz -> 
boot/vmlinuz-3.16.0-4-amd64

I even managed to boot the Debian 8 guest from the sample .xva file:

  $ qemu-system-x86_64 -cpu host -machine accel=kvm:tcg -m 2048 -drive 
file=nbd:localhost:10809,format=raw,if=virtio,snapshot=on

although it was pretty slow ...

As mentioned before you can use this to do a qemu-img convert using
captive nbdkit:

  $ nbdkit -U - \
  perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \
  --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p'

HTH,

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html


xva-reader.pl
Description: Perl program


Re: [Qemu-devel] [PATCH 0/5] compressed block-stream

2017-11-16 Thread Anton Nefedov

On 16/11/2017 6:26 AM, Fam Zheng wrote:

On Tue, 11/14 13:16, Anton Nefedov wrote:

It might be useful to compress images during block-stream;
this way the user can merge compressed images of a backing chain and
the result will remain compressed.


I haven't looked at the patches yet so maybe the answer is obvious, but still:
would the "compress" option be still necessary if what we have is
blockdev-stream instead?

Fam



Sorry, I can't see blockdev-stream available now. How that would be
different? Would it imply the blocks are pulled up to the top (active)
image?

Or did you mean block-commit? The option might be useful there as well,
and needs to be implemented separately

/Anton



Re: [Qemu-devel] [PATCH 3/5] block: support compressed write for copy-on-read

2017-11-16 Thread Anton Nefedov


On 15/11/2017 9:49 PM, Max Reitz wrote:

On 2017-11-14 11:16, Anton Nefedov wrote:

Signed-off-by: Anton Nefedov 
---
  block/io.c | 30 --
  block/trace-events |  2 +-
  2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/block/io.c b/block/io.c
index 3d5ef2c..93c6b24 100644
--- a/block/io.c
+++ b/block/io.c


[...]


@@ -1209,6 +1220,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child,
  return ret;
  }
  
+/* write compressed only makes sense with copy on read */

+if ((flags & BDRV_REQ_WRITE_COMPRESSED) &&
+!(flags & BDRV_REQ_COPY_ON_READ))
+{
+return -EINVAL;
+}
+


I think the assertion in bdrv_aligned_preadv() should be enough, but
either way:

Reviewed-by: Max Reitz 



Ok, and it will fail more loudly. Will remove.


  bdrv_inc_in_flight(bs);
  
  /* Don't do copy-on-read if we read data before write operation */






Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Gandalf Corvotempesta
2017-11-16 11:01 GMT+01:00 Richard W.M. Jones :
> As mentioned before you can use this to do a qemu-img convert using
> captive nbdkit:
>
>   $ nbdkit -U - \
>   perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \
>   --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p'


What if XVA is hosting 2 or more partitions ? Which partition are you
converting with this command ?



Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Gandalf Corvotempesta
2017-11-15 23:55 GMT+01:00 Max Reitz :
> https://xanclic.moe/convert-xva.rb -- does this work?
> (It seems to works on the two example images I found...)
>
> An example is in the code, you use it like this:
>
> $ ./convert-xva.rb ~/Downloads/stats-appliance-2.36.020502.xva Ref:73


It doesn't work on huge images:

# time convert-xva.rb 20171115_193814_efff_.xva Ref:10 -O qcow2 disk1.qcow2
/usr/bin/convert-xva.rb:119:in `exec': Argument list too long -
qemu-img (Errno::E2BIG)
from /usr/bin/convert-xva.rb:119:in `'

real 9m41.414s
user 0m19.280s
sys 0m3.260s



Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Richard W.M. Jones
On Thu, Nov 16, 2017 at 11:07:10AM +0100, Gandalf Corvotempesta wrote:
> 2017-11-16 11:01 GMT+01:00 Richard W.M. Jones :
> > As mentioned before you can use this to do a qemu-img convert using
> > captive nbdkit:
> >
> >   $ nbdkit -U - \
> >   perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \
> >   --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p'
> 
> 
> What if XVA is hosting 2 or more partitions ? Which partition are you
> converting with this command ?

See "XXX" comments in the code, left as an exercise to the reader :-)

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/



Re: [Qemu-devel] [PATCH 4/5] block-stream: add compress option

2017-11-16 Thread Anton Nefedov



On 15/11/2017 10:16 PM, Max Reitz wrote:

On 2017-11-14 11:16, Anton Nefedov wrote:

Signed-off-by: Anton Nefedov 
---
  qapi/block-core.json  |  4 
  include/block/block_int.h |  4 +++-
  block/stream.c| 16 
  blockdev.c| 13 -
  hmp.c |  2 ++
  hmp-commands.hx   |  4 ++--
  6 files changed, 35 insertions(+), 8 deletions(-)


Looks good to me overall, two notes below.

Max


diff --git a/qapi/block-core.json b/qapi/block-core.json
index ab96e34..b0d022f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2007,6 +2007,9 @@
  #
  # @speed:  the maximum speed, in bytes per second
  #
+# @compress: true to compress data, if the target format supports it.
+#(default: false). Since 2.12.
+#


Too many full stops (one before, one after the parentheses).  Also, this
makes it sound a bit like it would still be possible to specify this
parameter even if the target doesn't support it (and it would just be
ignored then), but that's not true.

Also, the format most parameter documentation follows for block-stream
is more "@parameter: Description. Default. (Since version)".

So I'd suggest:

"true to compress data; may only be set if the target format supports
it.  Defaults to false if omitted.  (Since 2.12)"

But I won't argue too much over the format, so the following is OK, too:

"true to compress data; may only be set if the target format supports it
(default: false). (Since 2.12)"



Thanks, will fix.

[..]


@@ -3034,11 +3039,17 @@ void qmp_block_stream(bool has_job_id, const char 
*job_id, const char *device,
  goto out;
  }
  
+if (compress && bs->drv->bdrv_co_pwritev_compressed == NULL) {

+error_setg(errp, "Compression is not supported for this drive %s",
+   bdrv_get_device_name(bs));


I think just device instead of bdrv_get_device_name(bs) is better (or
bdrv_get_device_or_node_name(bs) at least).



device should be enough indeed, done.

/Anton



[Qemu-devel] [PATCH v2] ps2: check PS2Queue indices in post_load routine

2017-11-16 Thread P J P
From: Prasad J Pandit 

During Qemu guest migration, a destination process invokes ps2
post_load function. In that, if 'rptr' and 'count' values were
invalid, it could lead to OOB access issue. Add check to avoid
it.

Reported-by: Cyrille Chatras 
Signed-off-by: Prasad J Pandit 
---
 hw/input/ps2.c | 39 ++-
 1 file changed, 18 insertions(+), 21 deletions(-)

Update: return -1(bad indices) error, instead of correcting values
  -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02964.html

diff --git a/hw/input/ps2.c b/hw/input/ps2.c
index f388a23c8e..35392d55fc 100644
--- a/hw/input/ps2.c
+++ b/hw/input/ps2.c
@@ -1222,32 +1222,37 @@ static void ps2_common_reset(PS2State *s)
 s->update_irq(s->update_arg, 0);
 }
 
-static void ps2_common_post_load(PS2State *s)
+static int ps2_common_post_load(PS2State *s)
 {
 PS2Queue *q = &s->queue;
 int size;
 int i;
 int tmp_data[PS2_QUEUE_SIZE];
 
+if (q->count < 0 || q->rptr < 0 || q->rptr >= sizeof(q->data)) {
+return -1;
+}
+
 /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */
 size = q->count > PS2_QUEUE_SIZE ? 0 : q->count;
 
 /* move the queue elements to the start of data array */
-if (size > 0) {
-for (i = 0; i < size; i++) {
-/* move the queue elements to the temporary buffer */
-tmp_data[i] = q->data[q->rptr];
-if (++q->rptr == 256) {
-q->rptr = 0;
-}
+for (i = 0; i < size; i++) {
+/* move the queue elements to the temporary buffer */
+tmp_data[i] = q->data[q->rptr];
+if (++q->rptr == 256) {
+q->rptr = 0;
 }
-memcpy(q->data, tmp_data, size);
 }
+memcpy(q->data, tmp_data, size);
+
 /* reset rptr/wptr/count */
 q->rptr = 0;
 q->wptr = size;
 q->count = size;
 s->update_irq(s->update_arg, q->count != 0);
+
+return 0;
 }
 
 static void ps2_kbd_reset(void *opaque)
@@ -1346,9 +1351,7 @@ static int ps2_kbd_post_load(void* opaque, int version_id)
 if (version_id == 2)
 s->scancode_set=2;
 
-ps2_common_post_load(ps2);
-
-return 0;
+return ps2_common_post_load(ps2);
 }
 
 static int ps2_kbd_pre_save(void *opaque)
@@ -1356,9 +1359,7 @@ static int ps2_kbd_pre_save(void *opaque)
 PS2KbdState *s = (PS2KbdState *)opaque;
 PS2State *ps2 = &s->common;
 
-ps2_common_post_load(ps2);
-
-return 0;
+return ps2_common_post_load(ps2);
 }
 
 static const VMStateDescription vmstate_ps2_keyboard = {
@@ -1386,9 +1387,7 @@ static int ps2_mouse_post_load(void *opaque, int 
version_id)
 PS2MouseState *s = (PS2MouseState *)opaque;
 PS2State *ps2 = &s->common;
 
-ps2_common_post_load(ps2);
-
-return 0;
+return ps2_common_post_load(ps2);
 }
 
 static int ps2_mouse_pre_save(void *opaque)
@@ -1396,9 +1395,7 @@ static int ps2_mouse_pre_save(void *opaque)
 PS2MouseState *s = (PS2MouseState *)opaque;
 PS2State *ps2 = &s->common;
 
-ps2_common_post_load(ps2);
-
-return 0;
+return ps2_common_post_load(ps2);
 }
 
 static const VMStateDescription vmstate_ps2_mouse = {
-- 
2.13.6




Re: [Qemu-devel] [PATCH] colo-compare: fix the dangerous assignment

2017-11-16 Thread Darren Kenny

On Thu, Nov 16, 2017 at 10:28:32AM +0800, Mao Zhongyi wrote:

Cc: Peter Maydell 
Cc: Jason Wang 
Cc: Zhang Chen 
Cc: Li Zhijian 
Cc: Paolo Bonzini 
Fixes: 8ec14402029d783720f4312ed8a925548e1dad61
Reported-by: Peter Maydell 
Reported-by: Paolo Bonzini 
Signed-off-by: Mao Zhongyi 


Code-wise, this looks like a valid fix to the existing code.

Reviewed-by: Darren Kenny 

But testing wise, have you confirmed that things are behaving as you
expected with the previous patch, since previously when calling
colo_compare_connection(), the value of conn would have always been
its initialized value of NULL.

Just want to be sure that fixing this doesn't end up breaking your
expected behaviour given that all your testing before would have had
a NULL value in conn.

Thanks,

Darren.


---
net/colo-compare.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index ccdcba2..1ce195f 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -179,7 +179,7 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
 "drop packet");
}
}
-con = &conn;
+*con = conn;

return 0;
}
--
2.9.4








Re: [Qemu-devel] [PATCH v1] ps2: check PS2Queue pointers in post_load routine

2017-11-16 Thread P J P
+-- On Thu, 16 Nov 2017, Paolo Bonzini wrote --+
| you don't need to change the invalid values to sane ones.  Instead, make
| ps2_common_post_load return an int (just like the .post_load member of
| VMStateDescription).  You can then detect out of range count/rptr/wptr
| and return -1 for bad indices.
| 
| In the callers,
| 
|   ps2_common_post_load(ps2);
| 
|   return 0;
| 
| can then become simply
| 
|   return ps2_common_post_load(ps2);

  Okay; Sent a revised patch v2.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F



Re: [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly

2017-11-16 Thread Thadeu Lima de Souza Cascardo
On Wed, Nov 15, 2017 at 08:18:50PM +0200, Michael S. Tsirkin wrote:
> From: Dou Liyang 
> 
> Linux and Windows need ACPI SRAT table to make memory hotplug work properly,
> however currently QEMU doesn't create SRAT table if numa options aren't 
> present
> on CLI.
> 
> Which breaks both linux and windows guests in certain conditions:
>  * Windows: won't enable memory hotplug without SRAT table at all
>  * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT 
> table
>present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers
>when memory is hotplugged and guest tries to use it with that drivers.
> 
> Fix above issues by automatically creating a numa node when QEMU is started 
> with
> memory hotplug enabled but without '-numa' options on CLI.
> (PS: auto-create numa node only for new machine types so not to break 
> migration).
> 
> Which would provide SRAT table to guests without explicit -numa options on CLI
> and would allow:
>  * Windows: to enable memory hotplug
>  * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit 
> allocated
>buffers that legacy drivers/hw can handle.
> 
> [Rewritten by Igor]

Thanks for copying me on this.

Acked-by: Thadeu Lima de Souza Cascardo 

> 
> Reported-by: Thadeu Lima de Souza Cascardo 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Dou Liyang 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: Igor Mammedov 
> Cc: David Hildenbrand 
> Cc: Thomas Huth 
> Cc: Alistair Francis 
> Cc: Takao Indoh 
> Cc: Izumi Taku 
> Reviewed-by: Igor Mammedov 
> Reviewed-by: Michael S. Tsirkin 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  include/hw/boards.h |  1 +
>  hw/i386/pc.c|  1 +
>  hw/i386/pc_piix.c   |  1 +
>  hw/i386/pc_q35.c|  1 +
>  numa.c  | 21 -
>  vl.c|  3 +--
>  6 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 62f160e..156b16f 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -197,6 +197,7 @@ struct MachineClass {
>  bool ignore_memory_transaction_failures;
>  int numa_mem_align_shift;
>  const char **valid_cpu_types;
> +bool auto_enable_numa_with_memhp;
>  void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes,
>   int nb_nodes, ram_addr_t size);
>  
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index fafe5ba..c3afe5b 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2347,6 +2347,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
> *data)
>  mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>  mc->get_default_cpu_node_id = pc_get_default_cpu_node_id;
>  mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
> +mc->auto_enable_numa_with_memhp = true;
>  mc->has_hotpluggable_cpus = true;
>  mc->default_boot_order = "cad";
>  mc->hot_add_cpu = pc_hot_add_cpu;
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index f79d5cb..5e47528 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass 
> *m)
>  m->is_default = 0;
>  m->alias = NULL;
>  SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
> +m->auto_enable_numa_with_memhp = false;
>  }
>  
>  DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL,
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index da3ea60..d606004 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m)
>  m->alias = NULL;
>  SET_MACHINE_COMPAT(m, PC_COMPAT_2_10);
>  m->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> +m->auto_enable_numa_with_memhp = false;
>  }
>  
>  DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL,
> diff --git a/numa.c b/numa.c
> index 8d78d95..7151b24 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, 
> NumaNodeOptions *node,
>  }
>  numa_info[nodenr].present = true;
>  max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> +nb_numa_nodes++;
>  }
>  
>  static void parse_numa_distance(NumaDistOptions *dist, Error **errp)
> @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error 
> **errp)
>  if (err) {
>  goto end;
>  }
> -nb_numa_nodes++;
>  break;
>  case NUMA_OPTIONS_TYPE_DIST:
>  parse_numa_distance(&object->u.dist, &err);
> @@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms)
>  exit(1);
>  }
>  
> +/*
> + * If memory hotplug is enabled (slots > 0) but without '-numa'
> + * options explicitly on CLI, guestes will break.
> + *
> + *   Windows: won't enable memory hotplug without SRAT table at all
> + *
> + *   Linux: if QEMU is started with initial memory all bel

Re: [Qemu-devel] [PATCH v2] ps2: check PS2Queue indices in post_load routine

2017-11-16 Thread Paolo Bonzini
On 16/11/2017 11:16, P J P wrote:
> From: Prasad J Pandit 
> 
> During Qemu guest migration, a destination process invokes ps2
> post_load function. In that, if 'rptr' and 'count' values were
> invalid, it could lead to OOB access issue. Add check to avoid
> it.
> 
> Reported-by: Cyrille Chatras 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/input/ps2.c | 39 ++-
>  1 file changed, 18 insertions(+), 21 deletions(-)
> 
> Update: return -1(bad indices) error, instead of correcting values
>   -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02964.html
> 
> diff --git a/hw/input/ps2.c b/hw/input/ps2.c
> index f388a23c8e..35392d55fc 100644
> --- a/hw/input/ps2.c
> +++ b/hw/input/ps2.c
> @@ -1222,32 +1222,37 @@ static void ps2_common_reset(PS2State *s)
>  s->update_irq(s->update_arg, 0);
>  }
>  
> -static void ps2_common_post_load(PS2State *s)
> +static int ps2_common_post_load(PS2State *s)
>  {
>  PS2Queue *q = &s->queue;
>  int size;
>  int i;
>  int tmp_data[PS2_QUEUE_SIZE];
>  
> +if (q->count < 0 || q->rptr < 0 || q->rptr >= sizeof(q->data)) {
> +return -1;
> +}
> +
>  /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */
>  size = q->count > PS2_QUEUE_SIZE ? 0 : q->count;
>  
>  /* move the queue elements to the start of data array */
> -if (size > 0) {
> -for (i = 0; i < size; i++) {
> -/* move the queue elements to the temporary buffer */
> -tmp_data[i] = q->data[q->rptr];
> -if (++q->rptr == 256) {
> -q->rptr = 0;
> -}
> +for (i = 0; i < size; i++) {
> +/* move the queue elements to the temporary buffer */
> +tmp_data[i] = q->data[q->rptr];
> +if (++q->rptr == 256) {
> +q->rptr = 0;
>  }
> -memcpy(q->data, tmp_data, size);
>  }
> +memcpy(q->data, tmp_data, size);
> +
>  /* reset rptr/wptr/count */
>  q->rptr = 0;
>  q->wptr = size;
>  q->count = size;
>  s->update_irq(s->update_arg, q->count != 0);
> +
> +return 0;
>  }
>  
>  static void ps2_kbd_reset(void *opaque)
> @@ -1346,9 +1351,7 @@ static int ps2_kbd_post_load(void* opaque, int 
> version_id)
>  if (version_id == 2)
>  s->scancode_set=2;
>  
> -ps2_common_post_load(ps2);
> -
> -return 0;
> +return ps2_common_post_load(ps2);
>  }
>  
>  static int ps2_kbd_pre_save(void *opaque)
> @@ -1356,9 +1359,7 @@ static int ps2_kbd_pre_save(void *opaque)
>  PS2KbdState *s = (PS2KbdState *)opaque;
>  PS2State *ps2 = &s->common;
>  
> -ps2_common_post_load(ps2);
> -
> -return 0;
> +return ps2_common_post_load(ps2);
>  }
>  
>  static const VMStateDescription vmstate_ps2_keyboard = {
> @@ -1386,9 +1387,7 @@ static int ps2_mouse_post_load(void *opaque, int 
> version_id)
>  PS2MouseState *s = (PS2MouseState *)opaque;
>  PS2State *ps2 = &s->common;
>  
> -ps2_common_post_load(ps2);
> -
> -return 0;
> +return ps2_common_post_load(ps2);
>  }
>  
>  static int ps2_mouse_pre_save(void *opaque)
> @@ -1396,9 +1395,7 @@ static int ps2_mouse_pre_save(void *opaque)
>  PS2MouseState *s = (PS2MouseState *)opaque;
>  PS2State *ps2 = &s->common;
>  
> -ps2_common_post_load(ps2);
> -
> -return 0;
> +return ps2_common_post_load(ps2);
>  }
>  
>  static const VMStateDescription vmstate_ps2_mouse = {
> 

Reviewed-by: Paolo Bonzini 



Re: [Qemu-devel] [PATCH v8 10/14] migration: add postcopy migration of dirty bitmaps

2017-11-16 Thread Vladimir Sementsov-Ogievskiy

15.11.2017 04:58, John Snow wrote:


On 10/30/2017 12:33 PM, Vladimir Sementsov-Ogievskiy wrote:

Postcopy migration of dirty bitmaps. Only named dirty bitmaps,
associated with root nodes and non-root named nodes are migrated.

If destination qemu is already containing a dirty bitmap with the same name
as a migrated bitmap (for the same node), then, if their granularities are
the same the migration will be done, otherwise the error will be generated.

If destination qemu doesn't contain such bitmap it will be created.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/migration/misc.h   |   3 +
  migration/migration.h  |   3 +
  migration/block-dirty-bitmap.c | 734 +

Ouch :\


  migration/migration.c  |   3 +
  migration/savevm.c |   2 +
  vl.c   |   1 +
  migration/Makefile.objs|   1 +
  migration/trace-events |  14 +
  8 files changed, 761 insertions(+)
  create mode 100644 migration/block-dirty-bitmap.c


Organizationally, you introduce three new 'public' prototypes:

dirty_bitmap_mig_init
dirty_bitmap_mig_before_vm_start
init_dirty_bitmap_incoming_migration

mig_init is advertised in migration/misc.h, the other two are in
migration/migration.h.
The definitions for all three are in migration/block-dirty-bitmap.c

In pure naivety, I find it weird to have something that you use in
migration.c and advertised in migration.h actually exist separately in
block-dirty-bitmap.c; but maybe this is the sanest thing to do.


diff --git a/include/migration/misc.h b/include/migration/misc.h
index c079b7771b..9cc539e232 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -55,4 +55,7 @@ bool migration_has_failed(MigrationState *);
  bool migration_in_postcopy_after_devices(MigrationState *);
  void migration_global_dump(Monitor *mon);
  
+/* migration/block-dirty-bitmap.c */

+void dirty_bitmap_mig_init(void);
+
  #endif
diff --git a/migration/migration.h b/migration/migration.h
index 50d1f01346..4e3ad04664 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -211,4 +211,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis,
  void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* 
rbname,
ram_addr_t start, size_t len);
  
+void dirty_bitmap_mig_before_vm_start(void);

+void init_dirty_bitmap_incoming_migration(void);
+
  #endif
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
new file mode 100644
index 00..53cb20045d
--- /dev/null
+++ b/migration/block-dirty-bitmap.c
@@ -0,0 +1,734 @@
+/*
+ * Block dirty bitmap postcopy migration
+ *
+ * Copyright IBM, Corp. 2009
+ * Copyright (c) 2016-2017 Parallels International GmbH
+ *
+ * Authors:
+ *  Liran Schour   
+ *  Vladimir Sementsov-Ogievskiy 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ * This file is derived from migration/block.c, so it's author and IBM 
copyright
+ * are here, although content is quite different.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ *
+ ****
+ *
+ * Here postcopy migration of dirty bitmaps is realized. Only named dirty
+ * bitmaps, associated with root nodes and non-root named nodes are migrated.

Put another way, only QMP-addressable bitmaps. Nodes without a name that
are not the root have no way to be addressed.


+ *
+ * If destination qemu is already containing a dirty bitmap with the same name

"If the destination QEMU already contains a dirty bitmap with the same name"


+ * as a migrated bitmap (for the same node), then, if their granularities are
+ * the same the migration will be done, otherwise the error will be generated.

"an error"


+ *
+ * If destination qemu doesn't contain such bitmap it will be created.

"If the destination QEMU doesn't contain such a bitmap, it will be created."


+ *
+ * format of migration:
+ *
+ * # Header (shared for different chunk types)
+ * 1, 2 or 4 bytes: flags (see qemu_{put,put}_flags)
+ * [ 1 byte: node name size ] \  flags & DEVICE_NAME
+ * [ n bytes: node name ] /
+ * [ 1 byte: bitmap name size ] \  flags & BITMAP_NAME
+ * [ n bytes: bitmap name ] /
+ *
+ * # Start of bitmap migration (flags & START)
+ * header
+ * be64: granularity
+ * 1 byte: bitmap flags (corresponds to BdrvDirtyBitmap)
+ *   bit 0-  bitmap is enabled
+ *   bit 1-  bitmap is persistent
+ *   bit 2-  bitmap is autoloading
+ *   bits 3-7 - reserved, must be zero
+ *
+ * # Complete of bitmap migration (flags & COMPLETE)
+ * header
+ *
+ * # Data chunk of bitmap migration
+ * header
+ * be64: start sector
+ * be32: number of sectors
+ * [ be64: buffer size  ] \ ! (flags & ZEROES)
+ * [ n bytes: buffer] /
+ *
+ * The last chunk in stream should contain flags

Re: [Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl

2017-11-16 Thread Daniel P. Berrange
On Thu, Nov 16, 2017 at 09:55:46AM +0100, Thomas Huth wrote:
> The bios-tables-test was writing out files that we pass to iasl in
> with the wrong endianness in the header when running on a big endian
> host. So instead of storing mixed endian information in our structures,
> let's keep everything in little endian and byte-swap it only when we
> need a value in the code.
> 
> Reported-by: Daniel P. Berrange 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Thomas Huth 
> ---
>  tests/acpi-utils.h   | 27 +--
>  tests/bios-tables-test.c | 42 ++
>  2 files changed, 27 insertions(+), 42 deletions(-)

This fixes bios-tables-test, but has broken vmgenid-tgst

TEST: tests/vmgenid-test... (pid=8197)
  /i386/vmgenid/vmgenid/set-guid:  **
ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia: 
assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT")
FAIL
GTester: last random seed: R02S1ab59ff8ca3a4e0f7ff4b8bbddb007f1
(pid=8204)
  /i386/vmgenid/vmgenid/set-guid-auto: **
ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia: 
assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT")
FAIL
GTester: last random seed: R02S1c3bcc9f459161566b19571adc5cb5d1
(pid=8216)

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [PATCH v5] nvme: Add tracing

2017-11-16 Thread Doug Gale
I submitted it with git Nov 3 - the long lines issue with git-am
should be resolved. Please let me know if there's still a problem.

Thanks.


On Fri, Nov 3, 2017 at 11:58 AM, Philippe Mathieu-Daudé  wrote:
> Cc'ing Trivial ;)
>
> On 11/03/2017 10:37 AM, Doug Gale wrote:
>> Add trace output for commands, errors, and undefined behavior.
>> Add guest error log output for undefined behavior.
>> Report invalid undefined accesses to MMIO.
>> Annotate unlikely error checks with unlikely.
>>
>> Signed-off-by: Doug Gale 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Reviewed-by: Stefan Hajnoczi 
>> ---
>> changes v4 -> v5: add R-b tags
>>  hw/block/nvme.c   | 349 
>> ++
>>  hw/block/trace-events |  93 ++
>>  2 files changed, 390 insertions(+), 52 deletions(-)
>>
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index 441e21ed1f..4d98ed9fba 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -34,8 +34,17 @@
>>  #include "qapi/visitor.h"
>>  #include "sysemu/block-backend.h"
>>
>> +#include "qemu/log.h"
>> +#include "trace.h"
>>  #include "nvme.h"
>>
>> +#define NVME_GUEST_ERR(trace, fmt, ...) \
>> +do { \
>> +(trace_##trace)(__VA_ARGS__); \
>> +qemu_log_mask(LOG_GUEST_ERROR, #trace \
>> +" in %s: " fmt "\n", __func__, ## __VA_ARGS__); \
>> +} while (0)
>> +
>>  static void nvme_process_sq(void *opaque);
>>
>>  static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size)
>> @@ -86,10 +95,14 @@ static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq)
>>  {
>>  if (cq->irq_enabled) {
>>  if (msix_enabled(&(n->parent_obj))) {
>> +trace_nvme_irq_msix(cq->vector);
>>  msix_notify(&(n->parent_obj), cq->vector);
>>  } else {
>> +trace_nvme_irq_pin();
>>  pci_irq_pulse(&n->parent_obj);
>>  }
>> +} else {
>> +trace_nvme_irq_masked();
>>  }
>>  }
>>
>> @@ -100,7 +113,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>  trans_len = MIN(len, trans_len);
>>  int num_prps = (len >> n->page_bits) + 1;
>>
>> -if (!prp1) {
>> +if (unlikely(!prp1)) {
>> +trace_nvme_err_invalid_prp();
>>  return NVME_INVALID_FIELD | NVME_DNR;
>>  } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
>> prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
>> @@ -113,7 +127,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>  }
>>  len -= trans_len;
>>  if (len) {
>> -if (!prp2) {
>> +if (unlikely(!prp2)) {
>> +trace_nvme_err_invalid_prp2_missing();
>>  goto unmap;
>>  }
>>  if (len > n->page_size) {
>> @@ -128,7 +143,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
>>
>>  if (i == n->max_prp_ents - 1 && len > n->page_size) {
>> -if (!prp_ent || prp_ent & (n->page_size - 1)) {
>> +if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) 
>> {
>> +trace_nvme_err_invalid_prplist_ent(prp_ent);
>>  goto unmap;
>>  }
>>
>> @@ -140,7 +156,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>  prp_ent = le64_to_cpu(prp_list[i]);
>>  }
>>
>> -if (!prp_ent || prp_ent & (n->page_size - 1)) {
>> +if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) {
>> +trace_nvme_err_invalid_prplist_ent(prp_ent);
>>  goto unmap;
>>  }
>>
>> @@ -154,7 +171,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, 
>> QEMUIOVector *iov, uint64_t prp1,
>>  i++;
>>  }
>>  } else {
>> -if (prp2 & (n->page_size - 1)) {
>> +if (unlikely(prp2 & (n->page_size - 1))) {
>> +trace_nvme_err_invalid_prp2_align(prp2);
>>  goto unmap;
>>  }
>>  if (qsg->nsg) {
>> @@ -178,16 +196,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
>> *ptr, uint32_t len,
>>  QEMUIOVector iov;
>>  uint16_t status = NVME_SUCCESS;
>>
>> +trace_nvme_dma_read(prp1, prp2);
>> +
>>  if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
>>  return NVME_INVALID_FIELD | NVME_DNR;
>>  }
>>  if (qsg.nsg > 0) {
>> -if (dma_buf_read(ptr, len, &qsg)) {
>> +if (unlikely(dma_buf_read(ptr, len, &qsg))) {
>> +trace_nvme_err_invalid_dma();
>>  status = NVME_INVALID_FIELD | NVME_DNR;
>>  }
>>  qemu_sglist_destroy(&qsg);
>>  } else {
>> -if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) {
>> +if (unlikely(qemu_iovec_t

[Qemu-devel] [Bug 1732671] [NEW] vnc websocket compatibility issue

2017-11-16 Thread Chen Zhang
Public bug reported:

WebSocket support in VNC should allow access from VNC client through
upgraded WebSocket connection. This feature is not working in IE 11/Edge
with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc.

The reason that IE 11/Edge fails to accept the connection upgrade is
that the value equality of the `Upgrade` header field is checked in a
strict case-sensitive manner in QEMU side, however, the IE/Edge does not
send the exactly same string value `websocket` but a capital letter `W`
instead.

Defined in section 4.2.1 of rfc6455, the upgrade header field shall be
treated case-insensitively.

A patch shall be made in `io/channel-websock.c`, converting the value of
upgrade string to lowercase before comparison is made with
QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive
comparison in the process.

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: vnc

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1732671

Title:
  vnc websocket compatibility issue

Status in QEMU:
  New

Bug description:
  WebSocket support in VNC should allow access from VNC client through
  upgraded WebSocket connection. This feature is not working in IE
  11/Edge with noVNC HTML5 client, in contrast to that in
  Firefox/Safari, etc.

  The reason that IE 11/Edge fails to accept the connection upgrade is
  that the value equality of the `Upgrade` header field is checked in a
  strict case-sensitive manner in QEMU side, however, the IE/Edge does
  not send the exactly same string value `websocket` but a capital
  letter `W` instead.

  Defined in section 4.2.1 of rfc6455, the upgrade header field shall be
  treated case-insensitively.

  A patch shall be made in `io/channel-websock.c`, converting the value
  of upgrade string to lowercase before comparison is made with
  QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive
  comparison in the process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions



[Qemu-devel] [PATCH for-2.11] throttle-groups: forget timer and schedule next TGM on detach

2017-11-16 Thread Stefan Hajnoczi
tg->any_timer_armed[] must be cleared when detaching pending timers from
the AioContext.  Failure to do so leads to hung I/O because it looks
like there are still timers pending when in fact they have been removed.

Other ThrottleGroupMembers might have requests pending too so it's
necessary to schedule the next TGM so it can set a timer.

This patch fixes hung I/O when QEMU is launched with drives that are in
the same throttling group:

  (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 &
  (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 &
  (qemu) stop
  (qemu) cont
  ...I/O is stuck...

Signed-off-by: Stefan Hajnoczi 
---
 block/throttle-groups.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 2587f19ca3..f26bcb5eee 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -593,13 +593,25 @@ void 
throttle_group_attach_aio_context(ThrottleGroupMember *tgm,
 
 void throttle_group_detach_aio_context(ThrottleGroupMember *tgm)
 {
+ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
 ThrottleTimers *tt = &tgm->throttle_timers;
+int i;
 
 /* Requests must have been drained */
 assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
 assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
 assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
 
+/* Kick off next ThrottleGroupMember, if necessary */
+qemu_mutex_lock(&tg->lock);
+for (i = 0; i < 2; i++) {
+if (timer_pending(tt->timers[i])) {
+tg->any_timer_armed[i] = false;
+schedule_next_request(tgm, i);
+}
+}
+qemu_mutex_unlock(&tg->lock);
+
 throttle_timers_detach_aio_context(tt);
 tgm->aio_context = NULL;
 }
-- 
2.13.6




Re: [Qemu-devel] [Qemu-block] [PATCH v4] throttle-groups: drain before detaching ThrottleState

2017-11-16 Thread Stefan Hajnoczi
On Mon, Nov 13, 2017 at 2:29 PM, Alberto Garcia  wrote:
> On Fri 10 Nov 2017 04:19:34 PM CET, Stefan Hajnoczi wrote:
>> I/O requests hang after stop/cont commands at least since QEMU 2.10.0
>> with -drive iops=100:
>>
>>   (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000
>>   (qemu) stop
>>   (qemu) cont
>>   ...I/O is stuck...
>>
>> This happens because blk_set_aio_context() detaches the ThrottleState
>> while requests may still be in flight:
>>
>>   if (tgm->throttle_state) {
>>   throttle_group_detach_aio_context(tgm);
>>   throttle_group_attach_aio_context(tgm, new_context);
>>   }
>>
>> This patch encloses the detach/attach calls in a drained region so no
>> I/O request is left hanging.  Also add assertions so we don't make the
>> same mistake again in the future.
>
> I'm wondering about the implications of this change... is it possible
> now to bypass the I/O limits simply by stopping and quickly resuming the
> VM? And is that a problem?

bdrv_set_aio_context() already drains so this patch doesn't change
existing behavior with respect to bypassing throttling.

It's not ideal that certain VM lifecycle operations temporarily
disable throttling, but it's a trade-off since synchronous drain is
usually performance sensitive and should not take a long time.
Perhaps there are ways to improve the situation, I haven't studied it
in detail.

Stefan



[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue

2017-11-16 Thread Daniel Berrange
Which version of QEMU did you test this against ?  It should be fixed in
current GIT master AFAIK

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1732671

Title:
  vnc websocket compatibility issue

Status in QEMU:
  Invalid

Bug description:
  WebSocket support in VNC should allow access from VNC client through
  upgraded WebSocket connection. This feature is not working in IE
  11/Edge with noVNC HTML5 client, in contrast to that in
  Firefox/Safari, etc.

  The reason that IE 11/Edge fails to accept the connection upgrade is
  that the value equality of the `Upgrade` header field is checked in a
  strict case-sensitive manner in QEMU side, however, the IE/Edge does
  not send the exactly same string value `websocket` but a capital
  letter `W` instead.

  Defined in section 4.2.1 of rfc6455, the upgrade header field shall be
  treated case-insensitively.

  A patch shall be made in `io/channel-websock.c`, converting the value
  of upgrade string to lowercase before comparison is made with
  QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive
  comparison in the process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions



Re: [Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl

2017-11-16 Thread Thomas Huth
On 16.11.2017 11:54, Daniel P. Berrange wrote:
> On Thu, Nov 16, 2017 at 09:55:46AM +0100, Thomas Huth wrote:
>> The bios-tables-test was writing out files that we pass to iasl in
>> with the wrong endianness in the header when running on a big endian
>> host. So instead of storing mixed endian information in our structures,
>> let's keep everything in little endian and byte-swap it only when we
>> need a value in the code.
>>
>> Reported-by: Daniel P. Berrange 
>> Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
>> Suggested-by: Michael S. Tsirkin 
>> Signed-off-by: Thomas Huth 
>> ---
>>  tests/acpi-utils.h   | 27 +--
>>  tests/bios-tables-test.c | 42 ++
>>  2 files changed, 27 insertions(+), 42 deletions(-)
> 
> This fixes bios-tables-test, but has broken vmgenid-tgst
> 
> TEST: tests/vmgenid-test... (pid=8197)
>   /i386/vmgenid/vmgenid/set-guid:  **
> ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia:
>  assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT")
> FAIL

Bummer. Sorry, I've should have run the whole test suite afterwards, I
only checked the bios-tables-test :-/

I'll send a v2 with a fix...

 Thomas



[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue

2017-11-16 Thread Chen Zhang
I think it should have been fixed in 33badfd.

Sorry for the noise.

** Changed in: qemu
   Status: New => Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1732671

Title:
  vnc websocket compatibility issue

Status in QEMU:
  Invalid

Bug description:
  WebSocket support in VNC should allow access from VNC client through
  upgraded WebSocket connection. This feature is not working in IE
  11/Edge with noVNC HTML5 client, in contrast to that in
  Firefox/Safari, etc.

  The reason that IE 11/Edge fails to accept the connection upgrade is
  that the value equality of the `Upgrade` header field is checked in a
  strict case-sensitive manner in QEMU side, however, the IE/Edge does
  not send the exactly same string value `websocket` but a capital
  letter `W` instead.

  Defined in section 4.2.1 of rfc6455, the upgrade header field shall be
  treated case-insensitively.

  A patch shall be made in `io/channel-websock.c`, converting the value
  of upgrade string to lowercase before comparison is made with
  QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive
  comparison in the process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions



Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Jason Wang



On 2017年11月16日 16:11, Yan Vugenfirer wrote:

Hi Jason,

Windows driver will initialise only the amount of queue based on the amount of 
available vCPUs. So if there will be more queues in the device than we have 
vCPUs on the guest, the driver will not initialise “excessive” queues. This is 
tied to the way RSS on Windows should be implemented.
Exactly as in described scenario (7 queues, but only 4 vCPUs).


I see but is there any chance that driver can use the left 3 queues (e 
.g cpu hotplug)?


Thanks



Best regards,
Yan.


On 16 Nov 2017, at 07:53, Longpeng (Mike)  wrote:

Hi Jason & Michael,

Do you have any idea about this problem ?

--
Regards,
Longpeng(Mike)

On 2017/11/15 23:54, Longpeng(Mike) wrote:


2017-11-15 23:05 GMT+08:00 Jason Wang :


On 2017年11月15日 22:55, Longpeng(Mike) wrote:

Hi guys,

We got a BUG report from our testers yesterday, the testing scenario was
migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7
queues*).

We found the cause reason, and we'll report the BUG or send a fix patch
to upstream if necessary( we haven't test the upstream yet, sorry... ).


Could you explain this a little bit more?


We want to know why the vhost_net_start() must start *total queues* ( in
our
VM there're 7 queues ) but not *the queues that current used* ( in our VM,
guest
only uses the first 4 queues because it's limited by the number of vcpus)
?

Looking forward to your help, thx :)


Since the codes have been there for years and works well for kernel
datapath. You should really explain what's wrong.


OK. :)

In our scenario,  the Windows's virtio-net driver only use the first 4
queues and it
*only set desc/avail/used table for the first 4 queues*, so in QEMU
the desc/avail/
used of the last 3 queues are ZERO,  but unfortunately...
'''
vhost_net_start
  for (i = 0; i < total_queues; i++)
vhost_net_start_one
  vhost_dev_start
vhost_virtqueue_start
'''
In vhost_virtqueue_start(), it will calculate the HVA of
desc/avail/used table, so for last
3 queues, it will use ZERO as the GPA to calculate the HVA, and then
send the results
to the user-mode backend ( we use *vhost-user* ) by vhost_virtqueue_set_addr().

When the EVS get these address, it will update a *idx* which will be
treated as  vq's
last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ).

So we get the following result after virtio-net stop:
  the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's
  last_avail_idx is NOT ZERO.

At last, virtio_load() reports an error:
'''
  if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <--
will be TRUE
  error_report("VQ %d address 0x0 "
 "inconsistent with Host index 0x%x",
 i, vdev->vq[i].last_avail_idx);
return -1;
   }
'''

BTW, the problem won't appear if use Linux guest, because the Linux virtio-net
driver will set all 7 queues's desc/avail/used tables. And the problem
won't appear
if the VM use vhost-net, because vhost-net won't update *idx* in SET_ADDR ioctl.

Sorry for my pool English, Maybe I could describe the problem in Chinese for you
in private if necessary.



Thanks




--
Regards,
Longpeng(Mike)





[Qemu-devel] [PULL 04/11] thread-posix: fix qemu_rec_mutex_trylock macro

2017-11-16 Thread Paolo Bonzini
From: "Emilio G. Cota" 

We never noticed because it has no users.

Signed-off-by: Emilio G. Cota 
Message-Id: <1510273811-13419-1-git-send-email-c...@braap.org>
Signed-off-by: Paolo Bonzini 
---
 include/qemu/thread-posix.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index f4296d31c4..f3f47e426f 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -7,7 +7,7 @@
 typedef QemuMutex QemuRecMutex;
 #define qemu_rec_mutex_destroy qemu_mutex_destroy
 #define qemu_rec_mutex_lock qemu_mutex_lock
-#define qemu_rec_mutex_try_lock qemu_mutex_try_lock
+#define qemu_rec_mutex_trylock qemu_mutex_trylock
 #define qemu_rec_mutex_unlock qemu_mutex_unlock
 
 struct QemuMutex {
-- 
2.14.3





[Qemu-devel] [PULL 03/11] Makefile: simpler/faster "make help"

2017-11-16 Thread Paolo Bonzini
From: Philippe Mathieu-Daudé 

Using obscure black magic introduced in eaa2ddbb767 :)

In an out-of-tree directory, running "../configure && make help" will generate
some required files (.mak), then clone some submodules, compile at least
the capstone submodule, generate QMP and Trace files, and finally display
the help.

On an outdated computer (Sun Blade workstation), running "make help" took
more than 5h :) With this patch it took roughly 37min.

Suggested-by: Daniel P. Berrange 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20171108032052.20029-1-f4...@amsat.org>
Reviewed-by: Laurent Vivier 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Daniel P. Berrange 
Signed-off-by: Paolo Bonzini 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index ec73acfa9a..143ac81736 100644
--- a/Makefile
+++ b/Makefile
@@ -6,7 +6,7 @@ BUILD_DIR=$(CURDIR)
 # Before including a proper config-host.mak, assume we are in the source tree
 SRC_PATH=.
 
-UNCHECKED_GOALS := %clean TAGS cscope ctags docker docker-%
+UNCHECKED_GOALS := %clean TAGS cscope ctags docker docker-% help
 
 # All following code might depend on configuration variables
 ifneq ($(wildcard config-host.mak),)
-- 
2.14.3





[Qemu-devel] [PULL 00/11] Miscellaneous patches for QEMU 2.11-rc2

2017-11-16 Thread Paolo Bonzini
The following changes since commit d24aaf2a2915424962fb101142f28fa4307f4740:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2017-11-06 11:24:14 +)

are available in the Git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 4950b1a766a16bd3feef4c9471b9794e6fe0e1b2:

  fix scripts/update-linux-headers.sh here document (2017-11-15 16:27:01 +0100)


Miscellaneous bugfixes


Dariusz Stojaczyk (1):
  vhost-user-scsi: add missing virtqueue_size param

Dr. David Alan Gilbert (1):
  ioapic/tracing: Remove last DPRINTFs

Emilio G. Cota (1):
  thread-posix: fix qemu_rec_mutex_trylock macro

Gerd Hoffmann (1):
  fix scripts/update-linux-headers.sh here document

Max Reitz (1):
  util/stats64: Fix min/max comparisons

Mike Nawrocki (1):
  Enable 8-byte wide MMIO for 16550 serial devices

Paolo Bonzini (1):
  exec: Do not resolve subpage in mru_section

Pavel Dovgalyuk (2):
  cpu-exec: don't overwrite exception_index
  cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay

Philippe Mathieu-Daudé (1):
  Makefile: simpler/faster "make help"

Wanpeng Li (1):
  target-i386: adds PV_TLB_FLUSH CPUID feature bit

 Makefile|  2 +-
 accel/tcg/cpu-exec.c| 99 -
 exec.c  | 12 ++---
 hw/char/serial.c|  8 +++-
 hw/intc/ioapic.c| 17 ++-
 hw/intc/trace-events|  5 ++-
 hw/scsi/vhost-user-scsi.c   |  2 +
 include/qemu/thread-posix.h |  2 +-
 scripts/update-linux-headers.sh |  2 +-
 target/i386/cpu.c   |  2 +-
 util/stats64.c  |  4 +-
 11 files changed, 81 insertions(+), 74 deletions(-)
-- 
2.14.3




[Qemu-devel] [PULL 02/11] ioapic/tracing: Remove last DPRINTFs

2017-11-16 Thread Paolo Bonzini
From: "Dr. David Alan Gilbert" 

Remove the last few DPRINTFs from hw/intc/ioapic.c and turn
them into tracing.  In one case it's a new trace, in the others
it's just adding a parameter to the existing traces.

Signed-off-by: Dr. David Alan Gilbert 
Message-Id: <20171102180310.24760-1-dgilb...@redhat.com>
Reviewed-by: Peter Xu 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Paolo Bonzini 
---
 hw/intc/ioapic.c | 17 +++--
 hw/intc/trace-events |  5 +++--
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 37c4386ae3..36139a4db6 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -35,15 +35,6 @@
 #include "hw/i386/x86-iommu.h"
 #include "trace.h"
 
-//#define DEBUG_IOAPIC
-
-#ifdef DEBUG_IOAPIC
-#define DPRINTF(fmt, ...)   \
-do { printf("ioapic: " fmt , ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...)
-#endif
-
 #define APIC_DELIVERY_MODE_SHIFT 8
 #define APIC_POLARITY_SHIFT 14
 #define APIC_TRIG_MODE_SHIFT 15
@@ -157,7 +148,7 @@ static void ioapic_set_irq(void *opaque, int vector, int 
level)
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-DPRINTF("%s: %s vec %x\n", __func__, level ? "raise" : "lower", vector);
+trace_ioapic_set_irq(vector, level);
 if (vector == 0) {
 vector = 2;
 }
@@ -290,11 +281,10 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int 
size)
 }
 }
 }
-DPRINTF("read: %08x = %08x\n", s->ioregsel, val);
 break;
 }
 
-trace_ioapic_mem_read(addr, size, val);
+trace_ioapic_mem_read(addr, s->ioregsel, size, val);
 
 return val;
 }
@@ -335,7 +325,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
 int index;
 
 addr &= 0xff;
-trace_ioapic_mem_write(addr, size, val);
+trace_ioapic_mem_write(addr, s->ioregsel, size, val);
 
 switch (addr) {
 case IOAPIC_IOREGSEL:
@@ -345,7 +335,6 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
 if (size != 4) {
 break;
 }
-DPRINTF("write: %08x = %08" PRIx64 "\n", s->ioregsel, val);
 switch (s->ioregsel) {
 case IOAPIC_REG_ID:
 s->id = (val >> IOAPIC_ID_SHIFT) & IOAPIC_ID_MASK;
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index b86f242b0f..b298fac7c6 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -18,8 +18,9 @@ apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 
0x%08x"
 ioapic_set_remote_irr(int n) "set remote irr for pin %d"
 ioapic_clear_remote_irr(int n, int vector) "clear remote irr for pin %d vector 
%d"
 ioapic_eoi_broadcast(int vector) "EOI broadcast for vector %d"
-ioapic_mem_read(uint8_t addr, uint8_t size, uint32_t val) "ioapic mem read 
addr 0x%"PRIx8" size 0x%"PRIx8" retval 0x%"PRIx32
-ioapic_mem_write(uint8_t addr, uint8_t size, uint32_t val) "ioapic mem write 
addr 0x%"PRIx8" size 0x%"PRIx8" val 0x%"PRIx32
+ioapic_mem_read(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) 
"ioapic mem read addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" retval 
0x%"PRIx32
+ioapic_mem_write(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) 
"ioapic mem write addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" val 
0x%"PRIx32
+ioapic_set_irq(int vector, int level) "vector: %d level: %d"
 
 # hw/intc/slavio_intctl.c
 slavio_intctl_mem_readl(uint32_t cpu, uint64_t addr, uint32_t ret) "read cpu 
%d reg 0x%"PRIx64" = 0x%x"
-- 
2.14.3





[Qemu-devel] [PULL 09/11] util/stats64: Fix min/max comparisons

2017-11-16 Thread Paolo Bonzini
From: Max Reitz 

stat64_min_slow() and stat64_max_slow() compare the wrong way.  This
makes iotest 136 fail with clang and -m32.

Signed-off-by: Max Reitz 
Message-Id: <20171114232223.25207-1-mre...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 util/stats64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/stats64.c b/util/stats64.c
index 9968fcceac..389c365a9e 100644
--- a/util/stats64.c
+++ b/util/stats64.c
@@ -91,7 +91,7 @@ bool stat64_min_slow(Stat64 *s, uint64_t value)
 low = atomic_read(&s->low);
 
 orig = ((uint64_t)high << 32) | low;
-if (orig < value) {
+if (value < orig) {
 /* We have to set low before high, just like stat64_min reads
  * high before low.  The value may become higher temporarily, but
  * stat64_get does not notice (it takes the lock) and the only ill
@@ -120,7 +120,7 @@ bool stat64_max_slow(Stat64 *s, uint64_t value)
 low = atomic_read(&s->low);
 
 orig = ((uint64_t)high << 32) | low;
-if (orig > value) {
+if (value > orig) {
 /* We have to set low before high, just like stat64_max reads
  * high before low.  The value may become lower temporarily, but
  * stat64_get does not notice (it takes the lock) and the only ill
-- 
2.14.3





[Qemu-devel] [PULL 06/11] vhost-user-scsi: add missing virtqueue_size param

2017-11-16 Thread Paolo Bonzini
From: Dariusz Stojaczyk 

Commit 5c0919d0 [1] introduced virtqueue_size parameter
for common virtio-scsi path, without updaing the vhost-user-scsi
code. vhost-user-scsi devices right now report size 0 for each vq.

This patch introduces virtqueue_size param to vhost-user-scsi,
that can now be set by the user. However, the most importantly, it
now has a default value of 128 (same as QEMU's virtio-scsi).

[1] 5c0919d0 ("virtio-scsi: Add virtqueue_size parameter
allowing virtqueue size to be set.")

Change-Id: I70e87eab702ebf1196c028dbf17d54fdc0c89a14
Signed-off-by: Dariusz Stojaczyk 
Message-Id: <1510676916-76409-1-git-send-email-dariuszx.stojac...@intel.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/vhost-user-scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 500fa6a067..f7561e23fa 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -135,6 +135,8 @@ static Property vhost_user_scsi_properties[] = {
 DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev),
 DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0),
 DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1),
+DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size,
+   128),
 DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors,
0x),
 DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSICommon, conf.cmd_per_lun, 128),
-- 
2.14.3





[Qemu-devel] [PULL 01/11] Enable 8-byte wide MMIO for 16550 serial devices

2017-11-16 Thread Paolo Bonzini
From: Mike Nawrocki 

Some drivers for the PPMC7400 PowerPC evaluation board accesses the
serial registers through the floating point unit (stfd/ldfd), which is
an 8-byte wide access. This patch enables that behavior.

Signed-off-by: Mike Nawrocki 
Message-Id: <20171106161039.32596-1-michael.nawro...@gtri.gatech.edu>
Signed-off-by: Paolo Bonzini 
---
 hw/char/serial.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/char/serial.c b/hw/char/serial.c
index 376bd2f240..eb72191ee7 100644
--- a/hw/char/serial.c
+++ b/hw/char/serial.c
@@ -1005,7 +1005,7 @@ static void serial_mm_write(void *opaque, hwaddr addr,
 uint64_t value, unsigned size)
 {
 SerialState *s = opaque;
-value &= ~0u >> (32 - (size * 8));
+value &= 255;
 serial_ioport_write(s, addr >> s->it_shift, value, 1);
 }
 
@@ -1014,16 +1014,22 @@ static const MemoryRegionOps serial_mm_ops[3] = {
 .read = serial_mm_read,
 .write = serial_mm_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
+.valid.max_access_size = 8,
+.impl.max_access_size = 8,
 },
 [DEVICE_LITTLE_ENDIAN] = {
 .read = serial_mm_read,
 .write = serial_mm_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.valid.max_access_size = 8,
+.impl.max_access_size = 8,
 },
 [DEVICE_BIG_ENDIAN] = {
 .read = serial_mm_read,
 .write = serial_mm_write,
 .endianness = DEVICE_BIG_ENDIAN,
+.valid.max_access_size = 8,
+.impl.max_access_size = 8,
 },
 };
 
-- 
2.14.3





[Qemu-devel] [PULL 05/11] target-i386: adds PV_TLB_FLUSH CPUID feature bit

2017-11-16 Thread Paolo Bonzini
From: Wanpeng Li 

Adds PV_TLB_FLUSH CPUID feature bit.

Cc: Paolo Bonzini 
Cc: Radim KrÄmář 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Signed-off-by: Wanpeng Li 
Message-Id: <1510299947-11287-1-git-send-email-wanpeng...@hotmail.com>
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6f21a5e518..ecebc5a70a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -347,7 +347,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .feat_names = {
 "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
 "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
-NULL, NULL, NULL, NULL,
+NULL, "kvm-pv-tlb-flush", NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-- 
2.14.3





Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start

2017-11-16 Thread Jason Wang



On 2017年11月16日 17:32, Longpeng (Mike) wrote:

Hi Jason,

On 2017/11/16 17:13, Jason Wang wrote:



On 2017年11月16日 17:01, Gonglei (Arei) wrote:

No, Windows guest + vhost-user/DPDK.

BTW pls see virtio spec in :

"If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will
be used SHOULD be populated
with receive buffers."

It is not mandatory that all queues must be initialized.

I think not, since it said we should fill receive buffers for each queue which
means we should initialize all queues. May Michael can clarify on this.



I think this doesn't matter, but QEMU should consider this scenario...

For example, if one queues isn't initialized (Windows guest), the vring.avail=0,
so vq->desc_phys=0, then vq->desc='a avail HVA'(which is the start addr of 
pc.ram).

 vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx);
 vq->desc_phys = a = virtio_queue_get_desc_addr(vdev, idx);
 vq->desc = vhost_memory_map(dev, a, &l, 0);
 if (!vq->desc || l != s) {
 r = -ENOMEM;
 goto fail_alloc_desc;
 }
 .
 r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled);
 if (r < 0) {
 r = -errno;
 goto fail_alloc;
 }

Then the HVA is send to the vhost-user.

I think this is wrong, because the '0' here means guest driver doesn't init this
queues, it should not be used to calculate the HVA for this vq.


Yes, workaround is not hard if windows driver won't use the left 3 
queues any more. But we should have a complete solution. The main 
problem is when vhost need to be started. For legacy device, there's no 
easy way to detect whether or not a specific virtqueue is ready to be 
used. For modern device, we can probably do this through queue_enable 
(but this is not implemented in current code).


Thanks




Thanks


Thanks,
-Gonglei





.








[Qemu-devel] [PULL 07/11] cpu-exec: don't overwrite exception_index

2017-11-16 Thread Paolo Bonzini
From: Pavel Dovgalyuk 

This patch adds a condition before overwriting exception_index fiels.
It is needed when exception_index is already set to some meaningful value.

Signed-off-by: Pavel Dovgalyuk 

Message-Id: <20171114081812.27640.26372.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini 
---
 accel/tcg/cpu-exec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 61297f8f4a..0473055a08 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -594,7 +594,9 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 if (unlikely(atomic_read(&cpu->exit_request)
 || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) 
{
 atomic_set(&cpu->exit_request, 0);
-cpu->exception_index = EXCP_INTERRUPT;
+if (cpu->exception_index == -1) {
+cpu->exception_index = EXCP_INTERRUPT;
+}
 return true;
 }
 
-- 
2.14.3





[Qemu-devel] [PULL 08/11] cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay

2017-11-16 Thread Paolo Bonzini
From: Pavel Dovgalyuk 

This patch ensures that icount_decr.u32.high is clear before calling
cpu_exec_nocache when exception is pending.  Because the exception is
caused by the first instruction in the block and it cannot be executed
without resetting the flag.

There are two parts in the fix.  First, clear icount_decr.u32.high in
cpu_handle_interrupt (just before processing the "dependent" request,
stored in cpu->interrupt_request or cpu->exit_request) rather than
cpu_loop_exec_tb; this ensures that cpu_handle_exception is always
reached with zero icount_decr.u32.high unless another interrupt has
happened in the meanwhile.

Second, try to cause the exception at the beginning of
cpu_handle_exception, and exit immediately if the TB cannot
execute.  With this change, interrupts are processed and
cpu_exec_nocache can make process.

Signed-off-by: Maria Klimushenkova 
Signed-off-by: Pavel Dovgalyuk 
Message-Id: <20171114081818.27640.33165.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini 
---
 accel/tcg/cpu-exec.c | 95 +---
 1 file changed, 54 insertions(+), 41 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 0473055a08..f3de96f346 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -470,48 +470,51 @@ static inline void cpu_handle_debug_exception(CPUState 
*cpu)
 
 static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 {
-if (cpu->exception_index >= 0) {
-if (cpu->exception_index >= EXCP_INTERRUPT) {
-/* exit request from the cpu execution loop */
-*ret = cpu->exception_index;
-if (*ret == EXCP_DEBUG) {
-cpu_handle_debug_exception(cpu);
-}
-cpu->exception_index = -1;
-return true;
-} else {
+if (cpu->exception_index < 0) {
+#ifndef CONFIG_USER_ONLY
+if (replay_has_exception()
+   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* try to cause an exception pending in the log */
+cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), 
true);
+}
+#endif
+if (cpu->exception_index < 0) {
+return false;
+}
+}
+
+if (cpu->exception_index >= EXCP_INTERRUPT) {
+/* exit request from the cpu execution loop */
+*ret = cpu->exception_index;
+if (*ret == EXCP_DEBUG) {
+cpu_handle_debug_exception(cpu);
+}
+cpu->exception_index = -1;
+return true;
+} else {
 #if defined(CONFIG_USER_ONLY)
-/* if user mode only, we simulate a fake exception
-   which will be handled outside the cpu execution
-   loop */
+/* if user mode only, we simulate a fake exception
+   which will be handled outside the cpu execution
+   loop */
 #if defined(TARGET_I386)
+CPUClass *cc = CPU_GET_CLASS(cpu);
+cc->do_interrupt(cpu);
+#endif
+*ret = cpu->exception_index;
+cpu->exception_index = -1;
+return true;
+#else
+if (replay_exception()) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+qemu_mutex_lock_iothread();
 cc->do_interrupt(cpu);
-#endif
-*ret = cpu->exception_index;
+qemu_mutex_unlock_iothread();
 cpu->exception_index = -1;
+} else if (!replay_has_interrupt()) {
+/* give a chance to iothread in replay mode */
+*ret = EXCP_INTERRUPT;
 return true;
-#else
-if (replay_exception()) {
-CPUClass *cc = CPU_GET_CLASS(cpu);
-qemu_mutex_lock_iothread();
-cc->do_interrupt(cpu);
-qemu_mutex_unlock_iothread();
-cpu->exception_index = -1;
-} else if (!replay_has_interrupt()) {
-/* give a chance to iothread in replay mode */
-*ret = EXCP_INTERRUPT;
-return true;
-}
-#endif
 }
-#ifndef CONFIG_USER_ONLY
-} else if (replay_has_exception()
-   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
-/* try to cause an exception pending in the log */
-cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
-*ret = -1;
-return true;
 #endif
 }
 
@@ -522,6 +525,19 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 TranslationBlock **last_tb)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+int32_t insns_left;
+
+/* Clear the interrupt flag now since we're processing
+ * cpu->interrupt_request and cpu->exit_request.
+ */
+insns_left = atomic_read(&cpu->icount_decr.u32);
+atomic_set(&cpu->icount_decr.u16.high, 0);
+if (unlikely(insns_left < 0)) {
+/* Ensure the zeroing of icount_decr comes before the next read
+ * of cpu->exit_request or cpu->interrupt_request.
+   

Re: [Qemu-devel] [PATCH] block: Fix error path in bdrv_backing_update_filename()

2017-11-16 Thread Kevin Wolf
Am 15.11.2017 um 20:41 hat Peter Maydell geschrieben:
> On 6 November 2017 at 16:55, Kevin Wolf  wrote:
> > error_setg_errno() takes a positive errno code.
> >
> > Signed-off-by: Kevin Wolf 
> > ---
> >  block.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block.c b/block.c
> > index 684cb018da..f6415547fe 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -998,7 +998,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
> > BlockDriverState *base,
> >  ret = bdrv_change_backing_file(parent, filename,
> > base->drv ? base->drv->format_name : 
> > "");
> >  if (ret < 0) {
> > -error_setg_errno(errp, ret, "Could not update backing file link");
> > +error_setg_errno(errp, -ret, "Could not update backing file link");
> >  }
> >
> >  if (!(orig_flags & BDRV_O_RDWR)) {
> > --
> > 2.13.6
> 
> Just noticed going through coverity warnings that this didn't
> make it into rc1 -- could we get it in for rc2, please?

Yes. I was sick at the start of the week and Max didn't include my queue
in his pull request, so all the patches I had queued missed -rc1 and
will be sent for -rc2 instead.

Kevin



[Qemu-devel] [PULL 10/11] exec: Do not resolve subpage in mru_section

2017-11-16 Thread Paolo Bonzini
This fixes a crash caused by picking the wrong memory region in
address_space_lookup_region seen with client code accessing a device
model that uses alias memory regions.  The expensive part of
address_space_lookup_region anyway is phys_page_find; performance-wise
it is okay to repeat the subsequent subpage lookup.

Signed-off-by: BALATON Zoltan 
Message-Id: <20171114225941.07270745...@zero.eik.bme.hu>
Signed-off-by: Paolo Bonzini 
---
 exec.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/exec.c b/exec.c
index 97a24a875e..3bb9fcf257 100644
--- a/exec.c
+++ b/exec.c
@@ -410,22 +410,16 @@ static MemoryRegionSection 
*address_space_lookup_region(AddressSpaceDispatch *d,
 {
 MemoryRegionSection *section = atomic_read(&d->mru_section);
 subpage_t *subpage;
-bool update;
 
-if (section && section != &d->map.sections[PHYS_SECTION_UNASSIGNED] &&
-section_covers_addr(section, addr)) {
-update = false;
-} else {
+if (!section || section == &d->map.sections[PHYS_SECTION_UNASSIGNED] ||
+!section_covers_addr(section, addr)) {
 section = phys_page_find(d, addr);
-update = true;
+atomic_set(&d->mru_section, section);
 }
 if (resolve_subpage && section->mr->subpage) {
 subpage = container_of(section->mr, subpage_t, iomem);
 section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]];
 }
-if (update) {
-atomic_set(&d->mru_section, section);
-}
 return section;
 }
 
-- 
2.14.3





[Qemu-devel] [PULL 11/11] fix scripts/update-linux-headers.sh here document

2017-11-16 Thread Paolo Bonzini
From: Gerd Hoffmann 

The minus sign after << causes the shell to strip only
preceding tabs, not spaces.

Signed-off-by: Gerd Hoffmann 
Message-Id: <20171110090354.29608-1-kra...@redhat.com>
Fixes: 40bf8e9aede0f9105a9e1e4aaf17b20aaa55f9a0
Reviewed-by: Roman Kagan 
Reviewed-by: Stefan Hajnoczi 
Tested-by: Christian Borntraeger 
Signed-off-by: Paolo Bonzini 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index ad80fe3fca..76fd894a77 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -106,7 +106,7 @@ for arch in $ARCHLIST; do
 if [ $arch = x86 ]; then
 cat <<-EOF >"$output/include/standard-headers/asm-x86/hyperv.h"
 /* this is a temporary placeholder until kvm_para.h stops including it 
*/
-EOF
+EOF
 cp "$tmpdir/include/asm/unistd_32.h" "$output/linux-headers/asm-x86/"
 cp "$tmpdir/include/asm/unistd_x32.h" "$output/linux-headers/asm-x86/"
 cp "$tmpdir/include/asm/unistd_64.h" "$output/linux-headers/asm-x86/"
-- 
2.14.3




[Qemu-devel] [PATCH v2] tests/bios-tables-test: Fix endianess problems when passing data to iasl

2017-11-16 Thread Thomas Huth
The bios-tables-test was writing out files that we pass to iasl in
with the wrong endianness in the header when running on a big endian
host. So instead of storing mixed endian information in our structures,
let's keep everything in little endian and byte-swap it only when we
need a value in the code.

Reported-by: Daniel P. Berrange 
Buglink: https://bugs.launchpad.net/qemu/+bug/1724570
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Thomas Huth 
---
 v2: Fixed vmgenid-test which was accidentially broken in v1

 tests/acpi-utils.h   | 27 +--
 tests/bios-tables-test.c | 42 ++
 tests/vmgenid-test.c | 22 --
 3 files changed, 39 insertions(+), 52 deletions(-)

diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h
index f8d8723..d5ca5b6 100644
--- a/tests/acpi-utils.h
+++ b/tests/acpi-utils.h
@@ -28,24 +28,9 @@ typedef struct {
 bool tmp_files_retain;   /* do not delete the temp asl/aml */
 } AcpiSdtTable;
 
-#define ACPI_READ_FIELD(field, addr)   \
-do {   \
-switch (sizeof(field)) {   \
-case 1:\
-field = readb(addr);   \
-break; \
-case 2:\
-field = readw(addr);   \
-break; \
-case 4:\
-field = readl(addr);   \
-break; \
-case 8:\
-field = readq(addr);   \
-break; \
-default:   \
-g_assert(false);   \
-}  \
+#define ACPI_READ_FIELD(field, addr)\
+do {\
+memread(addr, &field, sizeof(field));   \
 addr += sizeof(field);  \
 } while (0);
 
@@ -74,16 +59,14 @@ typedef struct {
 } while (0);
 
 #define ACPI_ASSERT_CMP(actual, expected) do { \
-uint32_t ACPI_ASSERT_CMP_le = cpu_to_le32(actual); \
 char ACPI_ASSERT_CMP_str[5] = {}; \
-memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 4); \
+memcpy(ACPI_ASSERT_CMP_str, &actual, 4); \
 g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
 } while (0)
 
 #define ACPI_ASSERT_CMP64(actual, expected) do { \
-uint64_t ACPI_ASSERT_CMP_le = cpu_to_le64(actual); \
 char ACPI_ASSERT_CMP_str[9] = {}; \
-memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 8); \
+memcpy(ACPI_ASSERT_CMP_str, &actual, 8); \
 g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
 } while (0)
 
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 564da45..e28e0c9 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -96,17 +96,20 @@ static void test_acpi_rsdp_table(test_data *data)
 static void test_acpi_rsdt_table(test_data *data)
 {
 AcpiRsdtDescriptorRev1 *rsdt_table = &data->rsdt_table;
-uint32_t addr = data->rsdp_table.rsdt_physical_address;
+uint32_t addr = le32_to_cpu(data->rsdp_table.rsdt_physical_address);
 uint32_t *tables;
 int tables_nr;
 uint8_t checksum;
+uint32_t rsdt_table_length;
 
 /* read the header */
 ACPI_READ_TABLE_HEADER(rsdt_table, addr);
 ACPI_ASSERT_CMP(rsdt_table->signature, "RSDT");
 
+rsdt_table_length = le32_to_cpu(rsdt_table->length);
+
 /* compute the table entries in rsdt */
-tables_nr = (rsdt_table->length - sizeof(AcpiRsdtDescriptorRev1)) /
+tables_nr = (rsdt_table_length - sizeof(AcpiRsdtDescriptorRev1)) /
 sizeof(uint32_t);
 g_assert(tables_nr > 0);
 
@@ -114,7 +117,7 @@ static void test_acpi_rsdt_table(test_data *data)
 tables = g_new0(uint32_t, tables_nr);
 ACPI_READ_ARRAY_PTR(tables, tables_nr, addr);
 
-checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table->length) +
+checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table_length) +
acpi_calc_checksum((uint8_t *)tables,
   tables_nr * sizeof(uint32_t));
 g_assert(!checksum);
@@ -130,7 +133,7 @@ static void test_acpi_fadt_table(test_data *data)
 uint32_t addr;
 
 /* FADT table comes first */
-addr = data->rsdt_tables_addr[0];
+addr = le32_to_cpu(data->rsdt_tables_addr[0]);
 ACPI_READ_TABLE_HEADER(fadt_table, addr);
 
 ACPI_READ_FIELD(fadt_table->firmware_ctrl, addr);
@@ -187,13 +190,14 @@ static void test_acpi_fadt_table(test_data *data)
 ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe1_block, addr);
 
 ACPI_ASSERT_CMP(fadt_table->signature, "FACP");
-g_assert(!acpi_calc_checksum((uint8_t *)fadt_table, fadt_table->length));
+g_assert(!acpi_calc_checksum((uint8_t *)fadt_table

[Qemu-devel] [PATCH] Revert "Add new PCI ID for i82559a"

2017-11-16 Thread Jason Wang
This reverts commit 5e89dc01133f8f5e621f6b66b356c6f37d31dafb since:

- we should use ID in the spec instead the one used by OEM
- in the future, we should allow changing id through either property
  or EEPROM file.

Cc: Stefan Weil 
Cc: Michael Nawrocki 
Cc: Peter Maydell 
Cc: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hw/net/eepro100.c| 13 -
 include/hw/compat.h  |  4 
 include/hw/pci/pci.h |  1 -
 qemu-options.hx  |  2 +-
 4 files changed, 1 insertion(+), 19 deletions(-)

diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index a63ed2c..91dd058 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -132,7 +132,6 @@ typedef struct {
 const char *name;
 const char *desc;
 uint16_t device_id;
-uint16_t alt_device_id;
 uint8_t revision;
 uint16_t subsystem_vendor_id;
 uint16_t subsystem_id;
@@ -277,7 +276,6 @@ typedef struct {
 /* Quasi static device properties (no need to save them). */
 uint16_t stats_size;
 bool has_extended_tcb_support;
-bool use_alt_device_id;
 } EEPRO100State;
 
 /* Word indices in EEPROM. */
@@ -1857,14 +1855,6 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error 
**errp)
 
 TRACE(OTHER, logout("\n"));
 
-/* By default, the i82559a adapter uses the legacy PCI ID (for the
- * i82557). This allows the PCI ID to be changed to the alternate
- * i82559 ID if needed.
- */
-if (s->use_alt_device_id && strcmp(info->name, "i82559a") == 0) {
-pci_config_set_device_id(s->dev.config, info->alt_device_id);
-}
-
 s->device = info->device;
 
 e100_pci_reset(s, &local_err);
@@ -1984,7 +1974,6 @@ static E100PCIDeviceInfo e100_devices[] = {
 .desc = "Intel i82559A Ethernet",
 .device = i82559A,
 .device_id = PCI_DEVICE_ID_INTEL_82557,
-.alt_device_id = PCI_DEVICE_ID_INTEL_82559,
 .revision = 0x06,
 .stats_size = 80,
 .has_extended_tcb_support = true,
@@ -2078,8 +2067,6 @@ static E100PCIDeviceInfo 
*eepro100_get_class(EEPRO100State *s)
 
 static Property e100_properties[] = {
 DEFINE_NIC_PROPERTIES(EEPRO100State, conf),
-DEFINE_PROP_BOOL("x-use-alt-device-id", EEPRO100State, use_alt_device_id,
- true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/compat.h b/include/hw/compat.h
index f96212c..cf389b4 100644
--- a/include/hw/compat.h
+++ b/include/hw/compat.h
@@ -10,10 +10,6 @@
 .driver   = "virtio-tablet-device",\
 .property = "wheel-axis",\
 .value= "false",\
-},{\
-.driver   = "i82559a",\
-.property = "x-use-alt-device-id",\
-.value= "false",\
 },
 
 #define HW_COMPAT_2_9 \
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f30e2cf..8d02a0a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -70,7 +70,6 @@ extern bool pci_available;
 /* Intel (0x8086) */
 #define PCI_DEVICE_ID_INTEL_82551IT  0x1209
 #define PCI_DEVICE_ID_INTEL_825570x1229
-#define PCI_DEVICE_ID_INTEL_825590x1030
 #define PCI_DEVICE_ID_INTEL_82801IR  0x2922
 
 /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */
diff --git a/qemu-options.hx b/qemu-options.hx
index a39c7e4..3728e9b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2047,7 +2047,7 @@ that the card should have; this option currently only 
affects virtio cards; set
 @var{v} = 0 to disable MSI-X. If no @option{-net} option is specified, a single
 NIC is created.  QEMU can emulate several different models of network card.
 Valid values for @var{type} are
-@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559a}, @code{i82559er},
+@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559er},
 @code{ne2k_pci}, @code{ne2k_isa}, @code{pcnet}, @code{rtl8139},
 @code{e1000}, @code{smc91c111}, @code{lance} and @code{mcf_fec}.
 Not all devices are supported on all targets.  Use @code{-net nic,model=help}
-- 
2.7.4




[Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on

2017-11-16 Thread Greg Kurz
Hi,

I'm resurrecting a thread about a QEMU crash we're still hitting on ppc64. It
was reported to the list by Bharata 2 months ago:

https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03685.html

"Hi,

QEMU hits the below assert

qemu-system-ppc64: used ring relocated for ring 2
qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' 
failed.

in the following scenario:

1. Boot guest with vhost=on
  -netdev tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on 
-device virtio-net-pci,netdev=mynet0
2. Hot add a DIMM device 
3. Reboot
   When the guest reboots, we can see
   vhost_virtqueue_start:vq->used_phys getting assigned an address that
   falls in the hotplugged memory range.
4. Remove the DIMM device
   Guest refuses the removal as the hotplugged memory is under use.
5. Reboot
   QEMU forces the removal of the DIMM device during reset and that's
   when we hit the above assert.

Any pointers on why we are hitting this assert ? Shouldn't vhost be
done with using the hotplugged memory when we hit reset ?

Regards,
Bharata."

#0  0x7760eff0 in raise () from /lib64/libc.so.6
#1  0x7761136c in abort () from /lib64/libc.so.6
#2  0x77604c44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x77604d34 in __assert_fail () from /lib64/libc.so.6
#4  0x10161138 in vhost_commit (listener=0x11469e88) at 
/home/greg/Work/qemu/qemu-spapr/hw/virtio/vhost.c:650
#5  0x100917fc in memory_region_transaction_commit () at 
/home/greg/Work/qemu/qemu-spapr/memory.c:1094
#6  0x10096748 in memory_region_del_subregion (mr=0x1143eed0, 
subregion=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/memory.c:2337
#7  0x104a9aec in pc_dimm_memory_unplug (dev=0x11445c50, 
hpms=0x1143eec0, mr=0x116f1920) at 
/home/greg/Work/qemu/qemu-spapr/hw/mem/pc-dimm.c:126
#8  0x10180454 in spapr_lmb_release (dev=0x11445c50) at 
/home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:3151
#9  0x101a397c in spapr_drc_release (drc=0x116b9cc0) at 
/home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:401
#10 0x101a3ba0 in spapr_drc_reset (drc=0x116b9cc0) at 
/home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:439
#11 0x101a3c88 in drc_reset (opaque=0x116b9cc0) at 
/home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:460
#12 0x10447380 in qemu_devices_reset () at 
/home/greg/Work/qemu/qemu-spapr/hw/core/reset.c:69
#13 0x1017ae80 in ppc_spapr_reset () at 
/home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:1445
#14 0x10377c60 in qemu_system_reset (reason=SHUTDOWN_CAUSE_HOST_QMP) at 
/home/greg/Work/qemu/qemu-spapr/vl.c:1788
#15 0x103785ac in main_loop_should_exit () at 
/home/greg/Work/qemu/qemu-spapr/vl.c:1962
#16 0x10378708 in main_loop () at 
/home/greg/Work/qemu/qemu-spapr/vl.c:1999
#17 0x10382c54 in main (argc=21, argv=0x7098, 
envp=0x7148) at /home/greg/Work/qemu/qemu-spapr/vl.c:4897


This basically happens because on pseries, like x86, we usually wait
for the guest to eject the DIMM before actually removing it, BUT,
unlike x86, we force the removal on reset. This is handled by a DRC
object which registers a handler with qemu_register_reset().

At reset time, the machine calls qemu_devices_reset() but unfortunately,
the DRC reset handler gets called BEFORE the VirtIONet device one. The
vhost device is still active and it doesn't like the ring addresses to
change while in this state.

Michael,

The assert() has been around since the beginning, at a time I believe there was
no such thing as memory hot-unplug. Now that memory can go away at reset time,
is it really legitimate to crash QEMU if vhost detects a ring address change ?

David,

Depending on Michael's answer (or anyone else that knows vhost well enough),
I'm not sure we can fix that properly for 2.11. A possible fix/workaround
could be to reset DRCs (at least the LMB ones) after qemu_devices_reset(),
so that we're sure vhost is stopped.

Thoughts ?

--
Greg



Re: [Qemu-devel] [PATCH] Revert "Add new PCI ID for i82559a"

2017-11-16 Thread Stefan Weil
Am 16.11.2017 um 13:20 schrieb Jason Wang:
> This reverts commit 5e89dc01133f8f5e621f6b66b356c6f37d31dafb since:
>
> - we should use ID in the spec instead the one used by OEM
> - in the future, we should allow changing id through either property
>   or EEPROM file.
>
> Cc: Stefan Weil 
> Cc: Michael Nawrocki 
> Cc: Peter Maydell 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
> ---
>  hw/net/eepro100.c| 13 -
>  include/hw/compat.h  |  4 
>  include/hw/pci/pci.h |  1 -
>  qemu-options.hx  |  2 +-
>  4 files changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
> index a63ed2c..91dd058 100644
> --- a/hw/net/eepro100.c
> +++ b/hw/net/eepro100.c
> @@ -132,7 +132,6 @@ typedef struct {
>  const char *name;
>  const char *desc;
>  uint16_t device_id;
> -uint16_t alt_device_id;
>  uint8_t revision;
>  uint16_t subsystem_vendor_id;
>  uint16_t subsystem_id;
> @@ -277,7 +276,6 @@ typedef struct {
>  /* Quasi static device properties (no need to save them). */
>  uint16_t stats_size;
>  bool has_extended_tcb_support;
> -bool use_alt_device_id;
>  } EEPRO100State;
>  
>  /* Word indices in EEPROM. */
> @@ -1857,14 +1855,6 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error 
> **errp)
>  
>  TRACE(OTHER, logout("\n"));
>  
> -/* By default, the i82559a adapter uses the legacy PCI ID (for the
> - * i82557). This allows the PCI ID to be changed to the alternate
> - * i82559 ID if needed.
> - */
> -if (s->use_alt_device_id && strcmp(info->name, "i82559a") == 0) {
> -pci_config_set_device_id(s->dev.config, info->alt_device_id);
> -}
> -
>  s->device = info->device;
>  
>  e100_pci_reset(s, &local_err);
> @@ -1984,7 +1974,6 @@ static E100PCIDeviceInfo e100_devices[] = {
>  .desc = "Intel i82559A Ethernet",
>  .device = i82559A,
>  .device_id = PCI_DEVICE_ID_INTEL_82557,
> -.alt_device_id = PCI_DEVICE_ID_INTEL_82559,
>  .revision = 0x06,
>  .stats_size = 80,
>  .has_extended_tcb_support = true,
> @@ -2078,8 +2067,6 @@ static E100PCIDeviceInfo 
> *eepro100_get_class(EEPRO100State *s)
>  
>  static Property e100_properties[] = {
>  DEFINE_NIC_PROPERTIES(EEPRO100State, conf),
> -DEFINE_PROP_BOOL("x-use-alt-device-id", EEPRO100State, use_alt_device_id,
> - true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/compat.h b/include/hw/compat.h
> index f96212c..cf389b4 100644
> --- a/include/hw/compat.h
> +++ b/include/hw/compat.h
> @@ -10,10 +10,6 @@
>  .driver   = "virtio-tablet-device",\
>  .property = "wheel-axis",\
>  .value= "false",\
> -},{\
> -.driver   = "i82559a",\
> -.property = "x-use-alt-device-id",\
> -.value= "false",\
>  },
>  
>  #define HW_COMPAT_2_9 \
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index f30e2cf..8d02a0a 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -70,7 +70,6 @@ extern bool pci_available;
>  /* Intel (0x8086) */
>  #define PCI_DEVICE_ID_INTEL_82551IT  0x1209
>  #define PCI_DEVICE_ID_INTEL_825570x1229
> -#define PCI_DEVICE_ID_INTEL_825590x1030
>  #define PCI_DEVICE_ID_INTEL_82801IR  0x2922
>  
>  /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */
> diff --git a/qemu-options.hx b/qemu-options.hx
> index a39c7e4..3728e9b 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2047,7 +2047,7 @@ that the card should have; this option currently only 
> affects virtio cards; set
>  @var{v} = 0 to disable MSI-X. If no @option{-net} option is specified, a 
> single
>  NIC is created.  QEMU can emulate several different models of network card.
>  Valid values for @var{type} are
> -@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559a}, 
> @code{i82559er},
> +@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559er},
>  @code{ne2k_pci}, @code{ne2k_isa}, @code{pcnet}, @code{rtl8139},
>  @code{e1000}, @code{smc91c111}, @code{lance} and @code{mcf_fec}.
>  Not all devices are supported on all targets.  Use @code{-net nic,model=help}


Reviewed-by: Stefan Weil 

Thanks, Stefan




[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue

2017-11-16 Thread Daniel Berrange
No problem, it is a valid bug report, since we've not actually released
the fix yet, so changing status.

** Changed in: qemu
   Status: Invalid => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1732671

Title:
  vnc websocket compatibility issue

Status in QEMU:
  Fix Committed

Bug description:
  WebSocket support in VNC should allow access from VNC client through
  upgraded WebSocket connection. This feature is not working in IE
  11/Edge with noVNC HTML5 client, in contrast to that in
  Firefox/Safari, etc.

  The reason that IE 11/Edge fails to accept the connection upgrade is
  that the value equality of the `Upgrade` header field is checked in a
  strict case-sensitive manner in QEMU side, however, the IE/Edge does
  not send the exactly same string value `websocket` but a capital
  letter `W` instead.

  Defined in section 4.2.1 of rfc6455, the upgrade header field shall be
  treated case-insensitively.

  A patch shall be made in `io/channel-websock.c`, converting the value
  of upgrade string to lowercase before comparison is made with
  QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive
  comparison in the process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions



Re: [Qemu-devel] [PATCH] colo-compare: fix the dangerous assignment

2017-11-16 Thread Mao Zhongyi



On 11/16/2017 06:13 PM, Darren Kenny wrote:

On Thu, Nov 16, 2017 at 10:28:32AM +0800, Mao Zhongyi wrote:

Cc: Peter Maydell 
Cc: Jason Wang 
Cc: Zhang Chen 
Cc: Li Zhijian 
Cc: Paolo Bonzini 
Fixes: 8ec14402029d783720f4312ed8a925548e1dad61
Reported-by: Peter Maydell 
Reported-by: Paolo Bonzini 
Signed-off-by: Mao Zhongyi 


Code-wise, this looks like a valid fix to the existing code.

Reviewed-by: Darren Kenny 


Hi, Darren


But testing wise, have you confirmed that things are behaving as you
expected with the previous patch, since previously when calling
colo_compare_connection(), the value of conn would have always been
its initialized value of NULL.


Well, in my test machine the code like *con = conn, but when I made
the patch on another machine I wrote the code con = &conn carelessly.


Just want to be sure that fixing this doesn't end up breaking your
expected behaviour given that all your testing before would have had
a NULL value in conn.


Thanks for the kind reminder.

Thanks,
Mao


Thanks,

Darren.


---
net/colo-compare.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index ccdcba2..1ce195f 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -179,7 +179,7 @@ static int packet_enqueue(CompareState *s, int mode, 
Connection **con)
 "drop packet");
}
}
-con = &conn;
+*con = conn;

return 0;
}
--
2.9.4














[Qemu-devel] weird behavior of libvirt/QEMU with virtserial ports

2017-11-16 Thread Denis V. Lunev
Hello, All!

We observe the following behavior for QEMU configured by libvirt
to use guest agent as usual for the guest without virtio-serial
driver (Windows or the guest remaining in BIOS stage).

In QEMU on first connect to listen character device socket
the listen socket is removed from poll just after the accept().
virtio_serial_guest_ready() returns 0 and the descriptor
of the connected Unix socket is removed from poll and it will
not be present in poll() until the guest will initialize the driver
and change the state of the serial to "guest connected".

In libvirt connect() to guest agent is performed on restart and
is run under VM state lock. Connec() is blocking and can
wait forever as
- accept queue in QEMU is not polled
- it will exhaust sooner or later.
In this case libvirt can not perform ANY operation on that VM. Weird!

The problem should be addressed from both sides IMHO,
as it is bad to keep stale connection in QEMU.

IMHO we should  tweak io_watch_poll_prepare() to register for
G_IO_ERR | G_IO_HUP | G_IO_NVAL even if the read is disable and
handle connection closing gracefully.

Though there are some questions remaining. What should we
do with the data in the socket queue? Are we free to drop it
as the requester has been left? In the other case we should
fix each qemu client including libvirtd.

Alternatively we can still poll listen socket and close any connections
if there is active one.

Any opinion?

Den

P.S. Quick reproducer

VM config, as usual
    
  
    
    
  
  
    

Quick reproducer:

strace socat -
UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0
^C (interrupt)
strace socat -
UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0
^C (interrupt)
strace socat -
UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0
^C (interrupt)
strace socat -
UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0
^C (interrupt)

Normal behavior of the 'socat' is passed connect() routine and waiting on
select(). Wrong behavior - hang in 'connect'.




[Qemu-devel] [Bug 1732679] [NEW] Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian 1:2.10.0+dfsg-2)

2017-11-16 Thread jean-christophe manciot
Public bug reported:

Ubuntu 17.04
qemu 2.10.1(Debian 1:2.10.0+dfsg-2)
gns3 2.0.3
NX-OSv 9k 7.0.3.I6.1

- No such issue with previous qemu 2.8.x
- the issue does not seem to come from the debian packaging
- the issue does not seem to come from GNS3 either, as confirmed by Jeremy 
Grossmann at 
https://github.com/GNS3/gns3-server/issues/1193#issuecomment-344240460

Either some parameters usage have changed (for instance -bios) (which would 
make qemu not backwards compatible) or there is an issue with qemu itself.
The configuration parameters are:
```
"compute_id": "local",
"console": 2010,
"console_type": "telnet",
"first_port_name": "mgmt0",
"height": 48,
"label": {
"rotation": 0,
"style": "font-family: TypeWriter;font-size: 
10.0;font-weight: bold;fill: #00;fill-opacity: 1.0;",
"text": "NX_OSv_9k_Spine_31",
"x": -54,
"y": -25
},
"name": "NX_OSv_9k_Spine_31",
"node_id": "8d01119a-0adc-41bc-950b-c5639db7708c",
"node_type": "qemu",
"port_name_format": "Ethernet1/{port1}",
"port_segment_size": 0,
"properties": {
"acpi_shutdown": false,
"adapter_type": "e1000",
"adapters": 10,
"bios_image": "",
"bios_image_md5sum": null,
"boot_priority": "c",
"cdrom_image": "",
"cdrom_image_md5sum": null,
"cpu_throttling": 0,
"cpus": 2,
"hda_disk_image": "NX-OSv-9k-7.0.3.I6.1.free.qcow2",
"hda_disk_image_md5sum": "18bb991b814a508d1190575f99deed99",
"hda_disk_interface": "ide",
"hdb_disk_image": "",
"hdb_disk_image_md5sum": null,
"hdb_disk_interface": "ide",
"hdc_disk_image": "",
"hdc_disk_image_md5sum": null,
"hdc_disk_interface": "ide",
"hdd_disk_image": "",
"hdd_disk_image_md5sum": null,
"hdd_disk_interface": "ide",
"initrd": "",
"initrd_md5sum": null,
"kernel_command_line": "",
"kernel_image": "",
"kernel_image_md5sum": null,
"legacy_networking": false,
"mac_address": "00:07:00:03:16:01",
"options": "-nographic -enable-kvm -cpu host -machine q35 
-smp cpus=2 -bios /usr/share/ovmf/OVMF.fd",
"platform": "x86_64",
"process_priority": "normal",
"qemu_path": "/usr/bin/qemu-system-x86_64",
"ram": 6144,
"usage": ""
```

The logs are:
- [execution 
log](https://github.com/GNS3/gns3-server/files/1381651/qemu.log.txt)
- [terminal 
log](https://github.com/GNS3/gns3-server/files/1381660/terminal.log.txt)

With the latest qemu, I can boot:
- Cisco IOSv 15.6(2)T
- Cisco IOSv-L2 15.2(20170321:233949)
- Cisco CSR 1000v 16.5.1b
- Cisco ASAv 9.6(2)

The major difference with NX-OSv 9k is the bios parameter: ```-bios 
/usr/share/ovmf/OVMF.fd```: 
```
ll /usr/share/ovmf/OVMF.fd
-rw-r--r-- 1 root root 2097152 Dec  9  2016 /usr/share/ovmf/OVMF.fd
```
A normal boot log with qemu 2.8.1 is available 
[here](https://github.com/GNS3/gns3-server/files/1381729/terminal.log.2.8.txt)

Highlighting the differences: qemu 2.8.1 on the left, qemu 2.10.1 on the right 
hand side with the same boot parameters
![qemu 2 8 vs qemu 2 
10](https://user-images.githubusercontent.com/13176858/31534998-8429462e-aff9-11e7-9cf3-bf2b00c21e8a.jpg)

** Affects: qemu
 Importance: Undecided
 Status: New

** Summary changed:

- Cisco NX-OSv 9k crashes during boot with latest qemu
+ Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian 1:2.10.0+dfsg-2)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1732679

Title:
  Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian
  1:2.10.0+dfsg-2)

Status in QEMU:
  New

Bug description:
  Ubuntu 17.04
  qemu 2.10.1(Debian 1:2.10.0+dfsg-2)
  gns3 2.0.3
  NX-OSv 9k 7.0.3.I6.1

  - No such issue with previous qemu 2.8.x
  - the issue does not seem to come from the debian packaging
  - the issue does not seem to come from GNS3 either, as confirmed by Jeremy 
Grossmann at 
https://github.com/GNS3/gns3-server/issues/1193#issuecomment-344240460

  Either some parameters usage have changed (for instance -bios) (which would 
make qemu not backwards compatible) or there is an issue with qemu itself.
  The configuration parameters are:

Re: [Qemu-devel] [PATCH v6 3/5] fw_cfg: do DMA read operation

2017-11-16 Thread Marc-André Lureau
Hi

- Original Message -
> On Mon, Nov 13, 2017 at 08:27:48PM +0100, Marc-André Lureau wrote:
> > Modify fw_cfg_read_blob() to use DMA if the device supports it.
> > Return errors, because the operation may fail.
> > 
> > To avoid polling with unbound amount of time, the DMA operation is
> > expected to complete within 200ms, or will return ETIME error.
> > 
> > We may want to switch all the *buf addresses to use only kmalloc'ed
> > buffers (instead of using stack/image addresses with dma=false).
> > 
> > Signed-off-by: Marc-André Lureau 
> > Reviewed-by: Gabriel Somlo 
> 
> This is causing issues for some people. I have reverted it for now.

I didn't know you merged it, where did it go?

If you are talking about kernel test robot, it seems to be the timeout. I 
picked FW_CFG_DMA_TIMEOUT 200ms by guess, we may want to extend that for busy 
hosts. How much? several seconds?

> I wonder why bother with DMA reads. Can we limit DMA to writes?

Well, we could, but DMA reads are supposedly quite faster. And if DMA read is 
broken, there are good chances that DMA write will too.

> > ---
> >  drivers/firmware/qemu_fw_cfg.c | 154
> >  -
> >  1 file changed, 137 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/firmware/qemu_fw_cfg.c
> > b/drivers/firmware/qemu_fw_cfg.c
> > index 1f3e8545dab7..2ac4cd869fe6 100644
> > --- a/drivers/firmware/qemu_fw_cfg.c
> > +++ b/drivers/firmware/qemu_fw_cfg.c
> > @@ -33,6 +33,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  MODULE_AUTHOR("Gabriel L. Somlo ");
> >  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> > @@ -43,12 +45,26 @@ MODULE_LICENSE("GPL");
> >  #define FW_CFG_ID 0x01
> >  #define FW_CFG_FILE_DIR   0x19
> >  
> > +#define FW_CFG_VERSION_DMA 0x02
> > +#define FW_CFG_DMA_CTL_ERROR   0x01
> > +#define FW_CFG_DMA_CTL_READ0x02
> > +#define FW_CFG_DMA_CTL_SKIP0x04
> > +#define FW_CFG_DMA_CTL_SELECT  0x08
> > +#define FW_CFG_DMA_CTL_WRITE   0x10
> > +#define FW_CFG_DMA_TIMEOUT 200 /* ms */
> > +
> >  /* size in bytes of fw_cfg signature */
> >  #define FW_CFG_SIG_SIZE 4
> >  
> >  /* fw_cfg "file name" is up to 56 characters (including terminating nul)
> >  */
> >  #define FW_CFG_MAX_FILE_PATH 56
> >  
> > +/* platform device for dma mapping */
> > +static struct device *dev;
> > +
> > +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir.
> > */
> > +static u32 fw_cfg_rev;
> > +
> >  /* fw_cfg file directory entry type */
> >  struct fw_cfg_file {
> > u32 size;
> > @@ -57,6 +73,12 @@ struct fw_cfg_file {
> > char name[FW_CFG_MAX_FILE_PATH];
> >  };
> >  
> > +struct fw_cfg_dma {
> > +   u32 control;
> > +   u32 length;
> > +   u64 address;
> > +} __packed;
> > +
> >  /* fw_cfg device i/o register addresses */
> >  static bool fw_cfg_is_mmio;
> >  static phys_addr_t fw_cfg_p_base;
> > @@ -75,12 +97,93 @@ static inline u16 fw_cfg_sel_endianness(u16 key)
> > return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> >  }
> >  
> > +static inline bool fw_cfg_dma_enabled(void)
> > +{
> > +   return fw_cfg_rev & FW_CFG_VERSION_DMA && fw_cfg_reg_dma;
> > +}
> > +
> > +static bool fw_cfg_wait_for_control(struct fw_cfg_dma *d, unsigned long
> > timeout)
> > +{
> > +   ktime_t start;
> > +   ktime_t stop;
> > +
> > +   start = ktime_get();
> > +   stop = ktime_add(start, ms_to_ktime(timeout));
> > +
> > +   do {
> > +   if ((be32_to_cpu(d->control) & ~FW_CFG_DMA_CTL_ERROR) == 0)
> > +   return true;
> > +
> > +   usleep_range(50, 100);
> > +   } while (ktime_before(ktime_get(), stop));
> > +
> > +   return false;
> > +}
> > +
> > +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> > +{
> > +   dma_addr_t dma_addr = 0;
> > +   struct fw_cfg_dma *d;
> > +   dma_addr_t dma;
> > +   ssize_t ret = length;
> > +   enum dma_data_direction dir =
> > +   (control & FW_CFG_DMA_CTL_READ ? DMA_FROM_DEVICE : 0);
> > +
> > +   if (address && length) {
> > +   dma_addr = dma_map_single(dev, address, length, dir);
> > +   if (dma_mapping_error(NULL, dma_addr)) {
> > +   WARN(1, "%s: failed to map address\n", __func__);
> > +   return -EFAULT;
> > +   }
> > +   }
> > +
> > +   d = kmalloc(sizeof(*d), GFP_KERNEL | GFP_DMA);
> > +   if (!d) {
> > +   ret = -ENOMEM;
> > +   goto end;
> > +   }
> > +
> > +   dma = dma_map_single(dev, d, sizeof(*d), DMA_BIDIRECTIONAL);
> > +   if (dma_mapping_error(NULL, dma)) {
> > +   WARN(1, "%s: failed to map fw_cfg_dma\n", __func__);
> > +   ret = -EFAULT;
> > +   goto end;
> > +   }
> > +
> > +   *d = (struct fw_cfg_dma) {
> > +   .address = cpu_to_be64(dma_addr),
> > +   .length = cpu_to_be32(length),
> > +   .control = cpu_to_be32(control)
> > +   };
> > +
> > +   iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> > +   io

Re: [Qemu-devel] [PULL 0/4] Merge tpm 2017/11/15 v1

2017-11-16 Thread Peter Maydell
On 15 November 2017 at 12:31, Stefan Berger  wrote:
> This pull request is for 2.11 and extends documentation as well as fixes
> bugs related to concurrency and failure mode.
>
> The following changes since commit 4ffa88c99c54d2a30f79e3dbecec50b023eff1c8:
>
>   Merge remote-tracking branch 
> 'remotes/berrange/tags/pull-qcrypto-2017-11-08-1' into staging (2017-11-10 
> 16:01:35 +)
>
> are available in the git repository at:
>
>   git://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2017-11-15-1
>
> for you to fetch changes up to 6cd65969da57eaa9bdff07b80ecca2becce0597a:
>
>   tpm_tis: Return 0 for every register in case of failure mode (2017-11-15 
> 06:47:35 -0500)
>
> 
> Merge tpm 2017/11/15 v1
>
> 
> Marc-André Lureau (1):
>   tpm-emulator: protect concurrent ctrl_chr access
>
> Stefan Berger (3):
>   specs: Extend TPM spec with TPM emulator description
>   tpm_tis: Return TPM_VERSION_UNSPEC in case of BE failure
>   tpm_tis: Return 0 for every register in case of failure mode
>
>  docs/specs/tpm.txt| 79 
> +++
>  hw/tpm/tpm_emulator.c | 44 +---
>  hw/tpm/tpm_tis.c  |  6 +++-
>  3 files changed, 112 insertions(+), 17 deletions(-)

Applied, thanks.

-- PMM



Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Tomáš Golembiovský
On Wed, 15 Nov 2017 21:41:20 +0100
Gandalf Corvotempesta  wrote:

> 2017-11-15 21:29 GMT+01:00 Richard W.M. Jones :
> > Gandalf, is there an XVA file publically available (pref. not
> > too big) that we can look at?
> 
> I can try to provide one, but it's simple:
> 
> # tar tvf 20160630_124823_aa72_.xva.gz | head -n 50
> -- 0/0   42353 1970-01-01 01:00 ova.xml
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/
> -- 0/0  40 1970-01-01 01:00 Ref:175/.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0001
> -- 0/0  40 1970-01-01 01:00 Ref:175/0001.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0003
> -- 0/0  40 1970-01-01 01:00 Ref:175/0003.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0004
> -- 0/0  40 1970-01-01 01:00 Ref:175/0004.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0005
> -- 0/0  40 1970-01-01 01:00 Ref:175/0005.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0006
> -- 0/0  40 1970-01-01 01:00 Ref:175/0006.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0007
> -- 0/0  40 1970-01-01 01:00 Ref:175/0007.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0009
> -- 0/0  40 1970-01-01 01:00 Ref:175/0009.checksum
> -- 0/0 1048576 1970-01-01 01:00 Ref:175/0010
> -- 0/0  40 1970-01-01 01:00 Ref:175/0010.checksum
> 
> 
> You can ignore the ova.xml and just use the "Ref:175" directory.
> Inside the XVA you'll fine one "Ref" directory for each virtual disk
> (ref number is different for each disk)
> Inside each Ref directory, you'll get tons of 1MB file, corrisponding
> to the RAW image.
> You have to merge these files in a single raw file with just an
> exception: numbers are not sequential.
> as you can see above, we have: , 0001, 0003
> 
> The 0002 is missing, because it's totally blank. XenServer doesn't
> export any empty block, thus it will skip the corrisponding 1MB file.
> When building the raw image, you have to replace empty blocks with 1MB
> full of zeros.
> 
> This is (in addition to the tar extract) the most time-consuming part.
> Currently I'm rebuilding a 250GB image, with tons of files to be
> merge.
> If qemu-img can be patched to automatically convert this kind of
> format, I can save about 3 hours (30 minutes for extracting the
> tarball, and about 2 hours to merge 250-300GB image)
> 

Hi,

I like what Richard and Max came up with. Pretty nifty solutions.
While there's nothing wrong with them I decided to take my own shot at
it. Since the blocks in tar file are pieces of raw image there is no
conversion happening. All in all it's just moving bytes from one place
to another. That means there shouldn't be a need for any heavy
machinery, right? :)

Attached is a shell script that uses dd to do the byte-shuffling.
I'm pretty sure this could even be safely parallelized by running
multiple instances of dd at the same time (e.g. with xargs). But I did
not try that.


Tomas

-- 
Tomáš Golembiovský 


xva2raw.sh
Description: application/shellscript


[Qemu-devel] [RFC v4 02/27] qobject: introduce qobject_get_try_str()

2017-11-16 Thread Peter Xu
A quick way to fetch string from qobject when it's a QString.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 include/qapi/qmp/qstring.h |  1 +
 qobject/qstring.c  | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
index 34278bd639..12ae4e1c29 100644
--- a/include/qapi/qmp/qstring.h
+++ b/include/qapi/qmp/qstring.h
@@ -28,6 +28,7 @@ QString *qstring_from_substr(const char *str, int start, int 
end);
 size_t qstring_get_length(const QString *qstring);
 const char *qstring_get_str(const QString *qstring);
 const char *qstring_get_try_str(const QString *qstring);
+const char *qobject_get_try_str(const QObject *qstring);
 void qstring_append_int(QString *qstring, int64_t value);
 void qstring_append(QString *qstring, const char *str);
 void qstring_append_chr(QString *qstring, int c);
diff --git a/qobject/qstring.c b/qobject/qstring.c
index 9df04e3c7b..bbe689a9e3 100644
--- a/qobject/qstring.c
+++ b/qobject/qstring.c
@@ -139,6 +139,17 @@ const char *qstring_get_try_str(const QString *qstring)
 }
 
 /**
+ * qobject_get_try_str(): Return a pointer of the backstore string
+ *
+ * NOTE: the string will only be returned if the object is valid, and
+ * its type is QString, otherwise NULL is returned.
+ */
+const char *qobject_get_try_str(const QObject *qstring)
+{
+return qstring_get_try_str(qobject_to_qstring(qstring));
+}
+
+/**
  * qstring_destroy_obj(): Free all memory allocated by a QString
  * object
  */
-- 
2.13.6




[Qemu-devel] [RFC v4 00/27] QMP: out-of-band (OOB) execution support

2017-11-16 Thread Peter Xu
v4:
- drop first patch to fix IOWatchPool [Stefan, Dan]
- add s-o-b where missing, and newly got r-bs
- fix English error in commit msg [Fam]
- some tunes on request-dropped event:  [Stefan]
  - firstly let 'id' be any type, meanwhile make sure "id" is there as
long as OOB is enabled for the monitor.
  - some comments fix
- add new command "x-oob-test" for testing oob commands [Stefan]
- simplify the test codes to use new x-oob-test
- flush response queue before cleanup monitors

This series was born from this one:

  https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04310.html

The idea comes from Markus Armbruster and the discussion we had in the
thread.  It's not a super ideal solution (I believe Markus had been
thinking hard to keep everything in order meanwhile trying to satisfy
the migration requirement), but AFAIU it's currently the best.

What is OOB?


It's the shortcut of Out-Of-Band execution, its name is given by
Markus.  It's a way to quickly execute a QMP request.  Say, originally
QMP is going throw these steps:

  JSON Parser --> QMP Dispatcher --> Respond
  /|\(2)(3) |
   (1) |   \|/ (4)
   +-  main thread  +

The requests are executed by the so-called QMP-dispatcher after the
JSON is parsed.  If OOB is on, we run the command directly in the
parser and quickly returns.

This series changed the QMP handling logic by moving the command
parsing and responding phases into IOThreads, so to be more accurate,
after the series the above graph would change into this:

   queue/kick  queue/kick
 JSON Parser ==> QMP Dispatcher => Responder
 /|\ | (3)   /|\  |  (4)  | /|\
  (1) |  | (2)|   |   |  |
  |  ||  \|/   (6)|  |(5)
  |  |main thread |  |
  |  ||  |
  |  +> monitor IO thread <---+  |
  +---/   \--+

New Interfaces
==

QMP Introspection Changes
-

When do query-qmp-schema, we were getting something like:

  {"name": "migrate-incoming", "ret-type": "17",
   "meta-type": "command", "arg-type": "89"}

Now we will get a new key named "allow-oob":

  {"name": "migrate-incoming", "ret-type": "17", "allow-oob": true,
   "meta-type": "command", "arg-type": "89"}

Which shows whether a command supports OOB.

QMP Negociation Changes
---

We were running "qmp_capabilities" always without parameters like:

  {"execute": "qmp_capabilities"}

Now we can enable capabilities (we don't have any capability before
this series) like OOB using:

  {"execute": "qmp_capabilities", "arguments": {"enable": ["oob"]}}

Only after we explicitly enable OOB capability can we send requests in
OOB manner.  Otherwise we'll have exactly the same QMP session as
before, just like when OOB is not there.

When OOB is enabled, it's possible that OOB reply reaches faster than
previous command, so clients should be prepared.

Trigger OOB execution
-

Let's take migrate-incoming as example.  The old command looks like:

  {"execute": "migrate-incoming", "arguments" : {"uri": "xx"}}

To execute a command with OOB execution, we need to specify it in the
QMP request in the extra "control" key:

  {"execute": "migrate-incoming", "arguments" : {"uri": "xx"},
   "control" { "run-oob": true } }

Then the command will be run in parser, and it can preempt other
commands.

Others
=

The last patch of OOB test may need some attention.  I used
dump-guest-memory as a time-consuming command to test whether OOB is
working, and the only command I can test now is "migrate-incoming".  I
hope that is a "okay" solution for unit tests.  Any other suggestions
on that would be welcomed.

Please review.  Thanks.

Peter Xu (27):
  qobject: introduce qstring_get_try_str()
  qobject: introduce qobject_get_try_str()
  qobject: let object_property_get_str() use new API
  monitor: move skip_flush into monitor_data_init
  qjson: add "opaque" field to JSONMessageParser
  monitor: move the cur_mon hack deeper for QMP
  monitor: unify global init
  monitor: let mon_list be tail queue
  monitor: create monitor dedicate iothread
  monitor: allow to use IO thread for parsing
  qmp: introduce QMPCapability
  qmp: negociate QMP capabilities
  qmp: introduce some capability helpers
  monitor: introduce monitor_qmp_respond()
  monitor: let monitor_{suspend|resume} thread safe
  monitor: separate QMP parser and dispatcher
  qmp: add new event "request-dropped"
  monitor: send event when request queue full
  qapi: introduce new cmd option "allow-oob"
  qmp: support out-of-band (oob) execution
  qmp: let migrate-incoming allow out-of-band
  qmp: isolate responses into io thread
  monitor: enable IO thread for (qmp & !

[Qemu-devel] [RFC v4 03/27] qobject: let object_property_get_str() use new API

2017-11-16 Thread Peter Xu
We can simplify object_property_get_str() using the new
qobject_get_try_str().

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 qom/object.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index c58c52d518..9cbeb51f0b 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1116,18 +1116,15 @@ char *object_property_get_str(Object *obj, const char 
*name,
   Error **errp)
 {
 QObject *ret = object_property_get_qobject(obj, name, errp);
-QString *qstring;
 char *retval;
 
 if (!ret) {
 return NULL;
 }
-qstring = qobject_to_qstring(ret);
-if (!qstring) {
+
+retval = g_strdup(qobject_get_try_str(ret));
+if (!retval) {
 error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name, "string");
-retval = NULL;
-} else {
-retval = g_strdup(qstring_get_str(qstring));
 }
 
 qobject_decref(ret);
-- 
2.13.6




[Qemu-devel] [RFC v4 07/27] monitor: unify global init

2017-11-16 Thread Peter Xu
There are many places for monitor init its globals, at least:

- monitor_init_qmp_commands() at the very beginning
- single function to init monitor_lock
- in the first entry of monitor_init() using "is_first_init"

Unify them a bit.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 include/monitor/monitor.h |  2 +-
 monitor.c | 25 ++---
 vl.c  |  3 ++-
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 83ea4a1aaf..3a5128ab1b 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -16,7 +16,7 @@ extern Monitor *cur_mon;
 
 bool monitor_cur_is_qmp(void);
 
-void monitor_init_qmp_commands(void);
+void monitor_init_globals(void);
 void monitor_init(Chardev *chr, int flags);
 void monitor_cleanup(void);
 
diff --git a/monitor.c b/monitor.c
index 322dfb5f31..ac5313023b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1001,7 +1001,7 @@ static void qmp_unregister_commands_hack(void)
 #endif
 }
 
-void monitor_init_qmp_commands(void)
+static void monitor_init_qmp_commands(void)
 {
 /*
  * Two command lists:
@@ -4034,6 +4034,14 @@ static void sortcmdlist(void)
 qsort((void *)info_cmds, array_num, elem_size, compare_mon_cmd);
 }
 
+void monitor_init_globals(void)
+{
+monitor_init_qmp_commands();
+monitor_qapi_event_init();
+sortcmdlist();
+qemu_mutex_init(&monitor_lock);
+}
+
 /* These functions just adapt the readline interface in a typesafe way.  We
  * could cast function pointers but that discards compiler checks.
  */
@@ -4074,23 +4082,10 @@ void error_vprintf_unless_qmp(const char *fmt, va_list 
ap)
 }
 }
 
-static void __attribute__((constructor)) monitor_lock_init(void)
-{
-qemu_mutex_init(&monitor_lock);
-}
-
 void monitor_init(Chardev *chr, int flags)
 {
-static int is_first_init = 1;
-Monitor *mon;
-
-if (is_first_init) {
-monitor_qapi_event_init();
-sortcmdlist();
-is_first_init = 0;
-}
+Monitor *mon = g_malloc(sizeof(*mon));
 
-mon = g_malloc(sizeof(*mon));
 monitor_data_init(mon, false);
 
 qemu_chr_fe_init(&mon->chr, chr, &error_abort);
diff --git a/vl.c b/vl.c
index 7372424fa7..f1c6ef6d95 100644
--- a/vl.c
+++ b/vl.c
@@ -3144,7 +3144,6 @@ int main(int argc, char **argv, char **envp)
 qemu_init_exec_dir(argv[0]);
 
 module_call_init(MODULE_INIT_QOM);
-monitor_init_qmp_commands();
 
 qemu_add_opts(&qemu_drive_opts);
 qemu_add_drive_opts(&qemu_legacy_drive_opts);
@@ -4692,6 +4691,8 @@ int main(int argc, char **argv, char **envp)
 
 parse_numa_opts(current_machine);
 
+monitor_init_globals();
+
 if (qemu_opts_foreach(qemu_find_opts("mon"),
   mon_init_func, NULL, NULL)) {
 exit(1);
-- 
2.13.6




[Qemu-devel] [RFC v4 01/27] qobject: introduce qstring_get_try_str()

2017-11-16 Thread Peter Xu
The only difference from qstring_get_str() is that it allows the qstring
to be NULL.  If so, NULL is returned.

CC: Eric Blake 
CC: Markus Armbruster 
Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 include/qapi/qmp/qstring.h |  1 +
 qobject/qstring.c  | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
index 10076b7c8c..34278bd639 100644
--- a/include/qapi/qmp/qstring.h
+++ b/include/qapi/qmp/qstring.h
@@ -27,6 +27,7 @@ QString *qstring_from_str(const char *str);
 QString *qstring_from_substr(const char *str, int start, int end);
 size_t qstring_get_length(const QString *qstring);
 const char *qstring_get_str(const QString *qstring);
+const char *qstring_get_try_str(const QString *qstring);
 void qstring_append_int(QString *qstring, int64_t value);
 void qstring_append(QString *qstring, const char *str);
 void qstring_append_chr(QString *qstring, int c);
diff --git a/qobject/qstring.c b/qobject/qstring.c
index 5da7b5f37c..9df04e3c7b 100644
--- a/qobject/qstring.c
+++ b/qobject/qstring.c
@@ -129,6 +129,16 @@ const char *qstring_get_str(const QString *qstring)
 }
 
 /**
+ * qstring_get_try_str(): Return a pointer to the stored string
+ *
+ * NOTE: will return NULL if qstring is not provided.
+ */
+const char *qstring_get_try_str(const QString *qstring)
+{
+return qstring ? qstring_get_str(qstring) : NULL;
+}
+
+/**
  * qstring_destroy_obj(): Free all memory allocated by a QString
  * object
  */
-- 
2.13.6




[Qemu-devel] [RFC v4 09/27] monitor: create monitor dedicate iothread

2017-11-16 Thread Peter Xu
Create one IOThread for the monitors, prepared to handle all the
input/output IOs using existing iothread framework.

Signed-off-by: Peter Xu 
---
 monitor.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/monitor.c b/monitor.c
index a70ab5606b..4ce9828fab 100644
--- a/monitor.c
+++ b/monitor.c
@@ -76,6 +76,7 @@
 #include "qmp-introspect.h"
 #include "sysemu/qtest.h"
 #include "sysemu/cpus.h"
+#include "sysemu/iothread.h"
 #include "qemu/cutils.h"
 #include "qapi/qmp/dispatch.h"
 
@@ -208,6 +209,12 @@ struct Monitor {
 QTAILQ_ENTRY(Monitor) entry;
 };
 
+struct MonitorGlobal {
+IOThread *mon_iothread;
+};
+
+static struct MonitorGlobal mon_global;
+
 /* QMP checker flags */
 #define QMP_ACCEPT_UNKNOWNS 1
 
@@ -4034,12 +4041,24 @@ static void sortcmdlist(void)
 qsort((void *)info_cmds, array_num, elem_size, compare_mon_cmd);
 }
 
+static GMainContext *monitor_io_context_get(void)
+{
+return iothread_get_g_main_context(mon_global.mon_iothread);
+}
+
+static void monitor_iothread_init(void)
+{
+mon_global.mon_iothread = iothread_create("monitor_iothread",
+  &error_abort);
+}
+
 void monitor_init_globals(void)
 {
 monitor_init_qmp_commands();
 monitor_qapi_event_init();
 sortcmdlist();
 qemu_mutex_init(&monitor_lock);
+monitor_iothread_init();
 }
 
 /* These functions just adapt the readline interface in a typesafe way.  We
@@ -4117,6 +4136,13 @@ void monitor_cleanup(void)
 {
 Monitor *mon, *next;
 
+/*
+ * We need to explicitly stop the iothread (but not destroy it),
+ * cleanup the monitor resources, then destroy the iothread.  See
+ * again on the glib bug mentioned in 2b316774f6 for a reason.
+ */
+iothread_stop(mon_global.mon_iothread);
+
 qemu_mutex_lock(&monitor_lock);
 QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) {
 QTAILQ_REMOVE(&mon_list, mon, entry);
@@ -4124,6 +4150,9 @@ void monitor_cleanup(void)
 g_free(mon);
 }
 qemu_mutex_unlock(&monitor_lock);
+
+iothread_destroy(mon_global.mon_iothread);
+mon_global.mon_iothread = NULL;
 }
 
 QemuOptsList qemu_mon_opts = {
-- 
2.13.6




[Qemu-devel] [RFC v4 05/27] qjson: add "opaque" field to JSONMessageParser

2017-11-16 Thread Peter Xu
It'll be passed to emit() as well when it happens.  Since at it, add a
typedef for the emitter function.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 include/qapi/qmp/json-streamer.h | 10 --
 monitor.c|  7 ---
 qga/main.c   |  5 +++--
 qobject/json-streamer.c  |  6 --
 qobject/qjson.c  |  5 +++--
 tests/libqtest.c |  5 +++--
 6 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h
index 00d8a23af8..9198b67342 100644
--- a/include/qapi/qmp/json-streamer.h
+++ b/include/qapi/qmp/json-streamer.h
@@ -23,9 +23,14 @@ typedef struct JSONToken {
 char str[];
 } JSONToken;
 
+struct JSONMessageParser;
+typedef void (*JSONMessageEmitFunc)(struct JSONMessageParser *parser,
+GQueue *tokens, void *opaque);
+
 typedef struct JSONMessageParser
 {
-void (*emit)(struct JSONMessageParser *parser, GQueue *tokens);
+JSONMessageEmitFunc emit;
+void *opaque;
 JSONLexer lexer;
 int brace_count;
 int bracket_count;
@@ -34,7 +39,8 @@ typedef struct JSONMessageParser
 } JSONMessageParser;
 
 void json_message_parser_init(JSONMessageParser *parser,
-  void (*func)(JSONMessageParser *, GQueue *));
+  JSONMessageEmitFunc func,
+  void *opaque);
 
 int json_message_parser_feed(JSONMessageParser *parser,
  const char *buffer, size_t size);
diff --git a/monitor.c b/monitor.c
index 3940737c1c..ab80d32c70 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3808,7 +3808,8 @@ static int monitor_can_read(void *opaque)
 return (mon->suspend_cnt == 0) ? 1 : 0;
 }
 
-static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens)
+static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens,
+   void *opaque)
 {
 QObject *req, *rsp = NULL, *id = NULL;
 QDict *qdict = NULL;
@@ -3955,7 +3956,7 @@ static void monitor_qmp_event(void *opaque, int event)
 break;
 case CHR_EVENT_CLOSED:
 json_message_parser_destroy(&mon->qmp.parser);
-json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
+json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL);
 mon_refcount--;
 monitor_fdsets_cleanup();
 break;
@@ -4105,7 +4106,7 @@ void monitor_init(Chardev *chr, int flags)
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
  monitor_qmp_event, NULL, mon, NULL, true);
 qemu_chr_fe_set_echo(&mon->chr, true);
-json_message_parser_init(&mon->qmp.parser, handle_qmp_command);
+json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL);
 } else {
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read,
  monitor_event, NULL, mon, NULL, true);
diff --git a/qga/main.c b/qga/main.c
index 62a62755bd..3b5ebbc1ee 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -593,7 +593,8 @@ static void process_command(GAState *s, QDict *req)
 }
 
 /* handle requests/control events coming in over the channel */
-static void process_event(JSONMessageParser *parser, GQueue *tokens)
+static void process_event(JSONMessageParser *parser, GQueue *tokens,
+  void *opaque)
 {
 GAState *s = container_of(parser, GAState, parser);
 QDict *qdict;
@@ -1320,7 +1321,7 @@ static int run_agent(GAState *s, GAConfig *config, int 
socket_activation)
 s->command_state = ga_command_state_new();
 ga_command_state_init(s, s->command_state);
 ga_command_state_init_all(s->command_state);
-json_message_parser_init(&s->parser, process_event);
+json_message_parser_init(&s->parser, process_event, NULL);
 
 #ifndef _WIN32
 if (!register_signal_handlers()) {
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index c51c2021f9..8fc1b15321 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -102,18 +102,20 @@ out_emit:
  */
 tokens = parser->tokens;
 parser->tokens = g_queue_new();
-parser->emit(parser, tokens);
+parser->emit(parser, tokens, parser->opaque);
 parser->token_size = 0;
 }
 
 void json_message_parser_init(JSONMessageParser *parser,
-  void (*func)(JSONMessageParser *, GQueue *))
+  JSONMessageEmitFunc func,
+  void *opaque)
 {
 parser->emit = func;
 parser->brace_count = 0;
 parser->bracket_count = 0;
 parser->tokens = g_queue_new();
 parser->token_size = 0;
+parser->opaque = opaque;
 
 json_lexer_init(&parser->lexer, json_message_process_token);
 }
diff --git a/qobject/qjson.c b/qobject/qjson.c
index 2e0930884e..f9766febe3 100644
--- a/qobject/qjson.c
+++ b/qobj

[Qemu-devel] [RFC v4 10/27] monitor: allow to use IO thread for parsing

2017-11-16 Thread Peter Xu
For each Monitor, add one field "use_io_thr" to show whether it will be
using the dedicated monitor IO thread to handle input/output.  When set,
monitor IO parsing work will be offloaded to dedicated monitor IO
thread, rather than the original main loop thread.

This only works for QMP.  HMP will always be run on main loop thread.

Currently we're still keeping use_io_thr to off always.  Will turn it on
later at some point.

One thing to mention is that we cannot set use_io_thr for every QMP
monitors.  The problem is that MUXed typed chardevs may not work well
with it now. When MUX is used, frontend of chardev can be the monitor
plus something else.  The only thing we know would be safe to be run
outside main thread so far is the monitor frontend. All the rest of the
frontends should still be run in main thread only.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 38 ++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/monitor.c b/monitor.c
index 4ce9828fab..685938b9f2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -191,6 +191,7 @@ struct Monitor {
 int flags;
 int suspend_cnt;
 bool skip_flush;
+bool use_io_thr;
 
 QemuMutex out_lock;
 QString *outbuf;
@@ -575,7 +576,8 @@ static void monitor_qapi_event_init(void)
 
 static void handle_hmp_command(Monitor *mon, const char *cmdline);
 
-static void monitor_data_init(Monitor *mon, bool skip_flush)
+static void monitor_data_init(Monitor *mon, bool skip_flush,
+  bool use_io_thr)
 {
 memset(mon, 0, sizeof(Monitor));
 qemu_mutex_init(&mon->out_lock);
@@ -583,6 +585,7 @@ static void monitor_data_init(Monitor *mon, bool skip_flush)
 /* Use *mon_cmds by default. */
 mon->cmd_table = mon_cmds;
 mon->skip_flush = skip_flush;
+mon->use_io_thr = use_io_thr;
 }
 
 static void monitor_data_destroy(Monitor *mon)
@@ -603,7 +606,7 @@ char *qmp_human_monitor_command(const char *command_line, 
bool has_cpu_index,
 char *output = NULL;
 Monitor *old_mon, hmp;
 
-monitor_data_init(&hmp, true);
+monitor_data_init(&hmp, true, false);
 
 old_mon = cur_mon;
 cur_mon = &hmp;
@@ -4104,8 +4107,9 @@ void error_vprintf_unless_qmp(const char *fmt, va_list ap)
 void monitor_init(Chardev *chr, int flags)
 {
 Monitor *mon = g_malloc(sizeof(*mon));
+GMainContext *context;
 
-monitor_data_init(mon, false);
+monitor_data_init(mon, false, false);
 
 qemu_chr_fe_init(&mon->chr, chr, &error_abort);
 mon->flags = flags;
@@ -4117,19 +4121,37 @@ void monitor_init(Chardev *chr, int flags)
 monitor_read_command(mon, 0);
 }
 
+if (mon->use_io_thr) {
+/*
+ * When use_io_thr is set, we use the global shared dedicated
+ * IO thread for this monitor to handle input/output.
+ */
+context = monitor_io_context_get();
+/* We should have inited globals before reaching here. */
+assert(context);
+} else {
+/* The default main loop, which is the main thread */
+context = NULL;
+}
+
+/*
+ * Add the monitor before running it (which is triggered by
+ * qemu_chr_fe_set_handlers), otherwise one monitor may find
+ * itself not on the mon_list when running.
+ */
+qemu_mutex_lock(&monitor_lock);
+QTAILQ_INSERT_HEAD(&mon_list, mon, entry);
+qemu_mutex_unlock(&monitor_lock);
+
 if (monitor_is_qmp(mon)) {
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
- monitor_qmp_event, NULL, mon, NULL, true);
+ monitor_qmp_event, NULL, mon, context, true);
 qemu_chr_fe_set_echo(&mon->chr, true);
 json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon);
 } else {
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read,
  monitor_event, NULL, mon, NULL, true);
 }
-
-qemu_mutex_lock(&monitor_lock);
-QLIST_INSERT_HEAD(&mon_list, mon, entry);
-qemu_mutex_unlock(&monitor_lock);
 }
 
 void monitor_cleanup(void)
-- 
2.13.6




[Qemu-devel] [RFC v4 04/27] monitor: move skip_flush into monitor_data_init

2017-11-16 Thread Peter Xu
It's part of the data init.  Collect it.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index e36fb5308d..3940737c1c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -568,13 +568,14 @@ static void monitor_qapi_event_init(void)
 
 static void handle_hmp_command(Monitor *mon, const char *cmdline);
 
-static void monitor_data_init(Monitor *mon)
+static void monitor_data_init(Monitor *mon, bool skip_flush)
 {
 memset(mon, 0, sizeof(Monitor));
 qemu_mutex_init(&mon->out_lock);
 mon->outbuf = qstring_new();
 /* Use *mon_cmds by default. */
 mon->cmd_table = mon_cmds;
+mon->skip_flush = skip_flush;
 }
 
 static void monitor_data_destroy(Monitor *mon)
@@ -595,8 +596,7 @@ char *qmp_human_monitor_command(const char *command_line, 
bool has_cpu_index,
 char *output = NULL;
 Monitor *old_mon, hmp;
 
-monitor_data_init(&hmp);
-hmp.skip_flush = true;
+monitor_data_init(&hmp, true);
 
 old_mon = cur_mon;
 cur_mon = &hmp;
@@ -4089,7 +4089,7 @@ void monitor_init(Chardev *chr, int flags)
 }
 
 mon = g_malloc(sizeof(*mon));
-monitor_data_init(mon);
+monitor_data_init(mon, false);
 
 qemu_chr_fe_init(&mon->chr, chr, &error_abort);
 mon->flags = flags;
-- 
2.13.6




[Qemu-devel] [RFC v4 06/27] monitor: move the cur_mon hack deeper for QMP

2017-11-16 Thread Peter Xu
In monitor_qmp_read(), we have the hack to temporarily replace the
cur_mon pointer.  Now we move this hack deeper inside the QMP dispatcher
routine since the Monitor pointer can be passed in to that using the new
JSON Parser opaque field now.

This does not make much sense as a single patch.  However, this will be
a big step for the next patch, when the QMP dispatcher routine will be
split from the QMP parser.

Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/monitor.c b/monitor.c
index ab80d32c70..322dfb5f31 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3813,7 +3813,7 @@ static void handle_qmp_command(JSONMessageParser *parser, 
GQueue *tokens,
 {
 QObject *req, *rsp = NULL, *id = NULL;
 QDict *qdict = NULL;
-Monitor *mon = cur_mon;
+Monitor *mon = opaque, *old_mon;
 Error *err = NULL;
 
 req = json_parser_parse_err(tokens, NULL, &err);
@@ -3838,8 +3838,13 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 QDECREF(req_json);
 }
 
+old_mon = cur_mon;
+cur_mon = mon;
+
 rsp = qmp_dispatch(cur_mon->qmp.commands, req);
 
+cur_mon = old_mon;
+
 if (mon->qmp.commands == &qmp_cap_negotiation_commands) {
 qdict = qdict_get_qdict(qobject_to_qdict(rsp), "error");
 if (qdict
@@ -3876,13 +3881,9 @@ err_out:
 
 static void monitor_qmp_read(void *opaque, const uint8_t *buf, int size)
 {
-Monitor *old_mon = cur_mon;
-
-cur_mon = opaque;
-
-json_message_parser_feed(&cur_mon->qmp.parser, (const char *) buf, size);
+Monitor *mon = opaque;
 
-cur_mon = old_mon;
+json_message_parser_feed(&mon->qmp.parser, (const char *) buf, size);
 }
 
 static void monitor_read(void *opaque, const uint8_t *buf, int size)
@@ -3956,7 +3957,7 @@ static void monitor_qmp_event(void *opaque, int event)
 break;
 case CHR_EVENT_CLOSED:
 json_message_parser_destroy(&mon->qmp.parser);
-json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL);
+json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon);
 mon_refcount--;
 monitor_fdsets_cleanup();
 break;
@@ -4106,7 +4107,7 @@ void monitor_init(Chardev *chr, int flags)
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read,
  monitor_qmp_event, NULL, mon, NULL, true);
 qemu_chr_fe_set_echo(&mon->chr, true);
-json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL);
+json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon);
 } else {
 qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read,
  monitor_event, NULL, mon, NULL, true);
-- 
2.13.6




[Qemu-devel] [RFC v4 13/27] qmp: introduce some capability helpers

2017-11-16 Thread Peter Xu
Introduce qmp_cap_enabled() and qmp_oob_enabled() helpers.

Signed-off-by: Peter Xu 
---
 monitor.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/monitor.c b/monitor.c
index da03d1370f..655c25041c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1038,6 +1038,16 @@ static void monitor_init_qmp_commands(void)
  qmp_marshal_qmp_capabilities, QCO_NO_OPTIONS);
 }
 
+static bool qmp_cap_enabled(Monitor *mon, QMPCapability cap)
+{
+return mon->qmp.qmp_caps[cap];
+}
+
+static bool qmp_oob_enabled(Monitor *mon)
+{
+return qmp_cap_enabled(mon, QMP_CAPABILITY_OOB);
+}
+
 static void qmp_caps_check(Monitor *mon, QMPCapabilityList *list,
Error **errp)
 {
-- 
2.13.6




[Qemu-devel] [RFC v4 18/27] monitor: send event when request queue full

2017-11-16 Thread Peter Xu
Set maximum QMP request queue length to 8.  If queue full, instead of
queue the command, we directly return a "request-dropped" event, telling
client that specific command is dropped.

Note that this flow control mechanism is only valid if OOB is enabled.
If it's not, the effective queue length will always be 1, which strictly
follows original behavior of QMP command handling (which never drop
messages).

Signed-off-by: Peter Xu 
---
 monitor.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/monitor.c b/monitor.c
index 0e2ff114cc..92e846e195 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4029,6 +4029,8 @@ static void monitor_qmp_bh_dispatcher(void *data)
 }
 }
 
+#define  QMP_REQ_QUEUE_LEN_MAX  (8)
+
 static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens,
void *opaque)
 {
@@ -4061,6 +4063,19 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 req_obj->id = id;
 req_obj->req = req;
 
+if (qmp_oob_enabled(mon)) {
+/* Drop the request if queue is full. */
+if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
+qapi_event_send_request_dropped(id,
+REQUEST_DROP_REASON_QUEUE_FULL,
+NULL);
+qobject_decref(id);
+qobject_decref(req);
+g_free(req_obj);
+return;
+}
+}
+
 /*
  * Put the request to the end of queue so that requests will be
  * handled in time order.  Ownership for req_obj, req, id,
-- 
2.13.6




[Qemu-devel] [RFC v4 12/27] qmp: negociate QMP capabilities

2017-11-16 Thread Peter Xu
After this patch, we will allow QMP clients to enable QMP capabilities
when sending the first "qmp_capabilities" command.  Originally we are
starting QMP session with no arguments like:

  { "execute": "qmp_capabilities" }

Now we can enable some QMP capabilities using (take OOB as example,
which is the only one capability that we support):

  { "execute": "qmp_capabilities",
"argument": { "enable": [ "oob" ] } }

When the "argument" key is not provided, no capability is enabled.

For capability "oob", the monitor needs to be run on dedicated IO
thread, otherwise the command will fail.  For example, trying to enable
OOB on a MUXed typed QMP monitor will fail.

One thing to mention is that, QMP capabilities are per-monitor, and also
when the connection is closed due to some reason, the capabilities will
be reset.

Signed-off-by: Peter Xu 
---
 monitor.c| 57 +++-
 qapi-schema.json | 15 ---
 2 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/monitor.c b/monitor.c
index 468303b472..da03d1370f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -167,6 +167,7 @@ typedef struct {
  * mode.
  */
 QmpCommandList *commands;
+bool qmp_caps[QMP_CAPABILITY__MAX];
 } MonitorQMP;
 
 /*
@@ -1037,8 +1038,42 @@ static void monitor_init_qmp_commands(void)
  qmp_marshal_qmp_capabilities, QCO_NO_OPTIONS);
 }
 
-void qmp_qmp_capabilities(Error **errp)
+static void qmp_caps_check(Monitor *mon, QMPCapabilityList *list,
+   Error **errp)
+{
+for (; list; list = list->next) {
+assert(list->value < QMP_CAPABILITY__MAX);
+switch (list->value) {
+case QMP_CAPABILITY_OOB:
+if (!mon->use_io_thr) {
+/*
+ * Out-Of-Band only works with monitors that are
+ * running on dedicated IOThread.
+ */
+error_setg(errp, "This monitor does not support "
+   "Out-Of-Band (OOB)");
+return;
+}
+break;
+default:
+break;
+}
+}
+}
+
+/* This function should only be called after capabilities are checked. */
+static void qmp_caps_apply(Monitor *mon, QMPCapabilityList *list)
+{
+for (; list; list = list->next) {
+mon->qmp.qmp_caps[list->value] = true;
+}
+}
+
+void qmp_qmp_capabilities(bool has_enable, QMPCapabilityList *enable,
+  Error **errp)
 {
+Error *local_err = NULL;
+
 if (cur_mon->qmp.commands == &qmp_commands) {
 error_set(errp, ERROR_CLASS_COMMAND_NOT_FOUND,
   "Capabilities negotiation is already complete, command "
@@ -1046,6 +1081,20 @@ void qmp_qmp_capabilities(Error **errp)
 return;
 }
 
+/* Enable QMP capabilities provided by the guest if applicable. */
+if (has_enable) {
+qmp_caps_check(cur_mon, enable, &local_err);
+if (local_err) {
+/*
+ * Failed check on either of the capabilities will fail
+ * the apply of all.
+ */
+error_propagate(errp, local_err);
+return;
+}
+qmp_caps_apply(cur_mon, enable);
+}
+
 cur_mon->qmp.commands = &qmp_commands;
 }
 
@@ -3963,6 +4012,11 @@ static QObject *get_qmp_greeting(void)
 return QOBJECT(result);
 }
 
+static void monitor_qmp_caps_reset(Monitor *mon)
+{
+memset(mon->qmp.qmp_caps, 0, sizeof(mon->qmp.qmp_caps));
+}
+
 static void monitor_qmp_event(void *opaque, int event)
 {
 QObject *data;
@@ -3971,6 +4025,7 @@ static void monitor_qmp_event(void *opaque, int event)
 switch (event) {
 case CHR_EVENT_OPENED:
 mon->qmp.commands = &qmp_cap_negotiation_commands;
+monitor_qmp_caps_reset(mon);
 data = get_qmp_greeting();
 monitor_json_emitter(mon, data);
 qobject_decref(data);
diff --git a/qapi-schema.json b/qapi-schema.json
index 03201578b4..531fd4c0db 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -102,21 +102,30 @@
 #
 # Enable QMP capabilities.
 #
-# Arguments: None.
+# Arguments:
+#
+# @enable:List of QMPCapabilities to enable, which is optional.
+# If not provided, it means no QMP capabilities will be
+# enabled.  (since 2.12)
 #
 # Example:
 #
-# -> { "execute": "qmp_capabilities" }
+# -> { "execute": "qmp_capabilities",
+#  "arguments": { "enable": [ "oob" ] } }
 # <- { "return": {} }
 #
 # Notes: This command is valid exactly when first connecting: it must be
 # issued before any other command will be accepted, and will fail once the
 # monitor is accepting other commands. (see qemu docs/interop/qmp-spec.txt)
 #
+# QMP client needs to explicitly enable QMP capabilities, otherwise
+# all the QMP capabilities will be turned off by default.
+#
 # Since: 0.13
 #
 ##
-{ 'command': 'qmp_capabilities' }
+{ 'command': 'qmp_capabilities',
+  'data': { '*enable'

[Qemu-devel] [RFC v4 08/27] monitor: let mon_list be tail queue

2017-11-16 Thread Peter Xu
It was QLIST.  I want to use this list to do monitor priority job later,
which need tail insertion ability.  So switching to a tail queue.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/monitor.c b/monitor.c
index ac5313023b..a70ab5606b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -205,7 +205,7 @@ struct Monitor {
 void *password_opaque;
 mon_cmd_t *cmd_table;
 QLIST_HEAD(,mon_fd_t) fds;
-QLIST_ENTRY(Monitor) entry;
+QTAILQ_ENTRY(Monitor) entry;
 };
 
 /* QMP checker flags */
@@ -214,7 +214,7 @@ struct Monitor {
 /* Protects mon_list, monitor_event_state.  */
 static QemuMutex monitor_lock;
 
-static QLIST_HEAD(mon_list, Monitor) mon_list;
+static QTAILQ_HEAD(mon_list, Monitor) mon_list;
 static QLIST_HEAD(mon_fdsets, MonFdset) mon_fdsets;
 static int mon_refcount;
 
@@ -415,7 +415,7 @@ static void monitor_qapi_event_emit(QAPIEvent event, QDict 
*qdict)
 Monitor *mon;
 
 trace_monitor_protocol_event_emit(event, qdict);
-QLIST_FOREACH(mon, &mon_list, entry) {
+QTAILQ_FOREACH(mon, &mon_list, entry) {
 if (monitor_is_qmp(mon)
 && mon->qmp.commands != &qmp_cap_negotiation_commands) {
 monitor_json_emitter(mon, QOBJECT(qdict));
@@ -4118,8 +4118,8 @@ void monitor_cleanup(void)
 Monitor *mon, *next;
 
 qemu_mutex_lock(&monitor_lock);
-QLIST_FOREACH_SAFE(mon, &mon_list, entry, next) {
-QLIST_REMOVE(mon, entry);
+QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) {
+QTAILQ_REMOVE(&mon_list, mon, entry);
 monitor_data_destroy(mon);
 g_free(mon);
 }
-- 
2.13.6




[Qemu-devel] [RFC v4 14/27] monitor: introduce monitor_qmp_respond()

2017-11-16 Thread Peter Xu
A tiny refactoring, preparing to split the QMP dispatcher away.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 48 +++-
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/monitor.c b/monitor.c
index 655c25041c..ec03f1b232 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3877,6 +3877,36 @@ static int monitor_can_read(void *opaque)
 return (mon->suspend_cnt == 0) ? 1 : 0;
 }
 
+/*
+ * When rsp/err/id is passed in, the function will be responsible for
+ * the cleanup.
+ */
+static void monitor_qmp_respond(Monitor *mon, QObject *rsp,
+Error *err, QObject *id)
+{
+QDict *qdict = NULL;
+
+if (err) {
+qdict = qdict_new();
+qdict_put_obj(qdict, "error", qmp_build_error_object(err));
+error_free(err);
+rsp = QOBJECT(qdict);
+}
+
+if (rsp) {
+if (id) {
+/* This is for the qdict below. */
+qobject_incref(id);
+qdict_put_obj(qobject_to_qdict(rsp), "id", id);
+}
+
+monitor_json_emitter(mon, rsp);
+}
+
+qobject_decref(id);
+qobject_decref(rsp);
+}
+
 static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens,
void *opaque)
 {
@@ -3927,24 +3957,8 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 }
 
 err_out:
-if (err) {
-qdict = qdict_new();
-qdict_put_obj(qdict, "error", qmp_build_error_object(err));
-error_free(err);
-rsp = QOBJECT(qdict);
-}
+monitor_qmp_respond(mon, rsp, err, id);
 
-if (rsp) {
-if (id) {
-qdict_put_obj(qobject_to_qdict(rsp), "id", id);
-id = NULL;
-}
-
-monitor_json_emitter(mon, rsp);
-}
-
-qobject_decref(id);
-qobject_decref(rsp);
 qobject_decref(req);
 }
 
-- 
2.13.6




[Qemu-devel] [RFC v4 21/27] qmp: let migrate-incoming allow out-of-band

2017-11-16 Thread Peter Xu
So it can get rid of being run on main thread.

Signed-off-by: Peter Xu 
---
 qapi/migration.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index bbc4671ded..95098072dd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1063,7 +1063,8 @@
 # <- { "return": {} }
 #
 ##
-{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
+{ 'command': 'migrate-incoming', 'data': {'uri': 'str' },
+  'allow-oob': true }
 
 ##
 # @xen-save-devices-state:
-- 
2.13.6




[Qemu-devel] [RFC v4 15/27] monitor: let monitor_{suspend|resume} thread safe

2017-11-16 Thread Peter Xu
Monitor code now can be run in more than one thread.  Let the suspend
and resume code be thread safe.

Reviewed-by: Fam Zheng 
Signed-off-by: Peter Xu 
---
 monitor.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/monitor.c b/monitor.c
index ec03f1b232..30f9cd80de 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4003,7 +4003,7 @@ int monitor_suspend(Monitor *mon)
 {
 if (!mon->rs)
 return -ENOTTY;
-mon->suspend_cnt++;
+atomic_inc(&mon->suspend_cnt);
 return 0;
 }
 
@@ -4011,8 +4011,9 @@ void monitor_resume(Monitor *mon)
 {
 if (!mon->rs)
 return;
-if (--mon->suspend_cnt == 0)
+if (atomic_dec_fetch(&mon->suspend_cnt) == 0) {
 readline_show_prompt(mon->rs);
+}
 }
 
 static QObject *get_qmp_greeting(void)
-- 
2.13.6




[Qemu-devel] [RFC v4 19/27] qapi: introduce new cmd option "allow-oob"

2017-11-16 Thread Peter Xu
Here "oob" stands for "Out-Of-Band".  When "allow-oob" is set, it means
the command allows out-of-band execution.

The "oob" idea is proposed by Markus Armbruster in following thread:

  https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg02057.html

This new "allow-oob" boolean will be exposed by "query-qmp-schema" as
well for command entries, so that QMP clients can know which command can
be used as out-of-band calls. For example the command "migrate"
originally looks like:

  {"name": "migrate", "ret-type": "17", "meta-type": "command",
   "arg-type": "86"}

And it'll be changed into:

  {"name": "migrate", "ret-type": "17", "allow-oob": false,
   "meta-type": "command", "arg-type": "86"}

This patch only provides the QMP interface level changes.  It does not
contains the real out-of-band execution implementation yet.

Suggested-by: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 include/qapi/qmp/dispatch.h|  1 +
 qapi/introspect.json   |  6 +-
 scripts/qapi-commands.py   | 19 ++-
 scripts/qapi-introspect.py | 10 --
 scripts/qapi.py| 15 ++-
 scripts/qapi2texi.py   |  2 +-
 tests/qapi-schema/test-qapi.py |  2 +-
 7 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 20578dcd48..b76798800c 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -23,6 +23,7 @@ typedef enum QmpCommandOptions
 {
 QCO_NO_OPTIONS = 0x0,
 QCO_NO_SUCCESS_RESP = 0x1,
+QCO_ALLOW_OOB = 0x2,
 } QmpCommandOptions;
 
 typedef struct QmpCommand
diff --git a/qapi/introspect.json b/qapi/introspect.json
index 5b3e6e9d78..57cc137627 100644
--- a/qapi/introspect.json
+++ b/qapi/introspect.json
@@ -259,12 +259,16 @@
 #
 # @ret-type: the name of the command's result type.
 #
+# @allow-oob: whether the command allows out-of-band execution.
+# (Since: 2.11)
+#
 # TODO: @success-response (currently irrelevant, because it's QGA, not QMP)
 #
 # Since: 2.5
 ##
 { 'struct': 'SchemaInfoCommand',
-  'data': { 'arg-type': 'str', 'ret-type': 'str' } }
+  'data': { 'arg-type': 'str', 'ret-type': 'str',
+'allow-oob': 'bool' } }
 
 ##
 # @SchemaInfoEvent:
diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
index 974d0a4a80..b2b0bc0510 100644
--- a/scripts/qapi-commands.py
+++ b/scripts/qapi-commands.py
@@ -192,10 +192,18 @@ out:
 return ret
 
 
-def gen_register_command(name, success_response):
-options = 'QCO_NO_OPTIONS'
+def gen_register_command(name, success_response, allow_oob):
+options = []
+
 if not success_response:
-options = 'QCO_NO_SUCCESS_RESP'
+options += ['QCO_NO_SUCCESS_RESP']
+if allow_oob:
+options += ['QCO_ALLOW_OOB']
+
+if not options:
+options = ['QCO_NO_OPTIONS']
+
+options = " | ".join(options)
 
 ret = mcgen('''
 qmp_register_command(cmds, "%(name)s",
@@ -241,7 +249,7 @@ class QAPISchemaGenCommandVisitor(QAPISchemaVisitor):
 self._visited_ret_types = None
 
 def visit_command(self, name, info, arg_type, ret_type,
-  gen, success_response, boxed):
+  gen, success_response, boxed, allow_oob):
 if not gen:
 return
 self.decl += gen_command_decl(name, arg_type, boxed, ret_type)
@@ -250,7 +258,8 @@ class QAPISchemaGenCommandVisitor(QAPISchemaVisitor):
 self.defn += gen_marshal_output(ret_type)
 self.decl += gen_marshal_decl(name)
 self.defn += gen_marshal(name, arg_type, boxed, ret_type)
-self._regy += gen_register_command(name, success_response)
+self._regy += gen_register_command(name, success_response,
+   allow_oob)
 
 
 (input_file, output_dir, do_c, do_h, prefix, opts) = parse_command_line()
diff --git a/scripts/qapi-introspect.py b/scripts/qapi-introspect.py
index 032bcea491..9fbf88b644 100644
--- a/scripts/qapi-introspect.py
+++ b/scripts/qapi-introspect.py
@@ -28,6 +28,11 @@ def to_json(obj, level=0):
   to_json(obj[key], level + 1))
 for key in sorted(obj.keys())]
 ret = '{' + ', '.join(elts) + '}'
+elif isinstance(obj, bool):
+if obj:
+ret = 'true'
+else:
+ret = 'false'
 else:
 assert False# not implemented
 if level == 1:
@@ -154,12 +159,13 @@ const char %(c_name)s[] = %(c_string)s;
 for m in variants.variants]})
 
 def visit_command(self, name, info, arg_type, ret_type,
-  gen, success_response, boxed):
+  gen, success_response, boxed, allow_oob):
 arg_type = arg_type or self._schema.the_empty_object_type
 ret_type = ret_type or self._schema.the_empty_object_type
 self._gen_json(name, 'command',
{'arg-type': self._u

[Qemu-devel] [RFC v4 11/27] qmp: introduce QMPCapability

2017-11-16 Thread Peter Xu
There was no QMP capabilities defined.  Define the first "oob" as
capability to allow out-of-band messages.

Also, touch up qmp-test.c to test the new bits.

Signed-off-by: Peter Xu 
---
 monitor.c| 15 +--
 qapi-schema.json | 13 +
 tests/qmp-test.c | 10 +-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index 685938b9f2..468303b472 100644
--- a/monitor.c
+++ b/monitor.c
@@ -3944,12 +3944,23 @@ void monitor_resume(Monitor *mon)
 
 static QObject *get_qmp_greeting(void)
 {
+QDict *result = qdict_new(), *qmp = qdict_new();
+QList *cap_list = qlist_new();
 QObject *ver = NULL;
+QMPCapability cap;
+
+qdict_put(result, "QMP", qmp);
 
 qmp_marshal_query_version(NULL, &ver, NULL);
+qdict_put_obj(qmp, "version", ver);
+
+for (cap = 0; cap < QMP_CAPABILITY__MAX; cap++) {
+qlist_append(cap_list, qstring_from_str(
+ QMPCapability_str(cap)));
+}
+qdict_put(qmp, "capabilities", cap_list);
 
-return qobject_from_jsonf("{'QMP': {'version': %p, 'capabilities': []}}",
-  ver);
+return QOBJECT(result);
 }
 
 static void monitor_qmp_event(void *opaque, int event)
diff --git a/qapi-schema.json b/qapi-schema.json
index 18457954a8..03201578b4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -119,6 +119,19 @@
 { 'command': 'qmp_capabilities' }
 
 ##
+# @QMPCapability:
+#
+# QMP supported capabilities to be broadcasted to the clients.
+#
+# @oob:   QMP ability to support Out-Of-Band requests.
+#
+# Since: 2.12
+#
+##
+{ 'enum': 'QMPCapability',
+  'data': [ 'oob' ] }
+
+##
 # @VersionTriple:
 #
 # A three-part version number.
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index c5a5c10b41..292c5f135a 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -17,6 +17,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/util.h"
 #include "qapi/visitor.h"
+#include "qapi/qmp/qstring.h"
 
 const char common_args[] = "-nodefaults -machine none";
 
@@ -75,6 +76,8 @@ static void test_qmp_protocol(void)
 {
 QDict *resp, *q, *ret;
 QList *capabilities;
+const QListEntry *entry;
+QString *qstr;
 
 global_qtest = qtest_init_without_qmp_handshake(common_args);
 
@@ -84,7 +87,12 @@ static void test_qmp_protocol(void)
 g_assert(q);
 test_version(qdict_get(q, "version"));
 capabilities = qdict_get_qlist(q, "capabilities");
-g_assert(capabilities && qlist_empty(capabilities));
+g_assert(capabilities);
+entry = qlist_first(capabilities);
+g_assert(entry);
+qstr = qobject_to_qstring(entry->value);
+g_assert(qstr);
+g_assert_cmpstr(qstring_get_str(qstr), ==, "oob");
 QDECREF(resp);
 
 /* Test valid command before handshake */
-- 
2.13.6




[Qemu-devel] [RFC v4 17/27] qmp: add new event "request-dropped"

2017-11-16 Thread Peter Xu
This event will be emitted if one QMP request is dropped.  Along,
declare an enum for the reasons.

Signed-off-by: Peter Xu 
---
 qapi-schema.json | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 531fd4c0db..9d2625b6b3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3222,3 +3222,37 @@
 # Since: 2.11
 ##
 { 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} }
+
+##
+# @RequestDropReason:
+#
+# Reasons that caused one command to be dropped.
+#
+# @queue-full: the queue of request is full.
+#
+# Since: 2.12
+##
+{ 'enum': 'RequestDropReason',
+  'data': [ 'queue-full' ] }
+
+##
+# @REQUEST_DROPPED:
+#
+# Emitted when one QMP request is dropped due to some reason.
+# REQUEST_DROPPED is only emitted when the oob capability is enabled.
+#
+# @id: The dropped command's "id" field.
+#
+# @reason: The reason why the request is dropped.
+#
+# Since: 2.12
+#
+# Example:
+#
+# { "event": "REQUEST_DROPPED",
+#   "data": {"result": {"id": "libvirt-102",
+#   "reason": "queue-full" } } }
+#
+##
+{ 'event': 'REQUEST_DROPPED' ,
+  'data': { 'id': 'any', 'reason': 'RequestDropReason' } }
-- 
2.13.6




[Qemu-devel] [RFC v4 23/27] monitor: enable IO thread for (qmp & !mux) typed

2017-11-16 Thread Peter Xu
Start to use dedicate IO thread for QMP monitors that are not using
MUXed chardev.

Signed-off-by: Peter Xu 
---
 monitor.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index b18d9d696c..5e0b8ed9c5 100644
--- a/monitor.c
+++ b/monitor.c
@@ -36,6 +36,7 @@
 #include "net/net.h"
 #include "net/slirp.h"
 #include "chardev/char-fe.h"
+#include "chardev/char-mux.h"
 #include "ui/qemu-spice.h"
 #include "sysemu/numa.h"
 #include "monitor/monitor.h"
@@ -4456,7 +4457,7 @@ void monitor_init(Chardev *chr, int flags)
 Monitor *mon = g_malloc(sizeof(*mon));
 GMainContext *context;
 
-monitor_data_init(mon, false, false);
+monitor_data_init(mon, false, !CHARDEV_IS_MUX(chr));
 
 qemu_chr_fe_init(&mon->chr, chr, &error_abort);
 mon->flags = flags;
-- 
2.13.6




Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA

2017-11-16 Thread Tomáš Golembiovský
On Thu, 16 Nov 2017 13:56:16 +0100
Tomáš Golembiovský  wrote:

> On Wed, 15 Nov 2017 21:41:20 +0100
> Gandalf Corvotempesta  wrote:
> 
> > 2017-11-15 21:29 GMT+01:00 Richard W.M. Jones :  
> > > Gandalf, is there an XVA file publically available (pref. not
> > > too big) that we can look at?  
> > 
> > I can try to provide one, but it's simple:
> > 
> > # tar tvf 20160630_124823_aa72_.xva.gz | head -n 50
> > -- 0/0   42353 1970-01-01 01:00 ova.xml
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/
> > -- 0/0  40 1970-01-01 01:00 Ref:175/.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0001
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0001.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0003
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0003.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0004
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0004.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0005
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0005.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0006
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0006.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0007
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0007.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0009
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0009.checksum
> > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0010
> > -- 0/0  40 1970-01-01 01:00 Ref:175/0010.checksum
> > 
> > 
> > You can ignore the ova.xml and just use the "Ref:175" directory.
> > Inside the XVA you'll fine one "Ref" directory for each virtual disk
> > (ref number is different for each disk)
> > Inside each Ref directory, you'll get tons of 1MB file, corrisponding
> > to the RAW image.
> > You have to merge these files in a single raw file with just an
> > exception: numbers are not sequential.
> > as you can see above, we have: , 0001, 0003
> > 
> > The 0002 is missing, because it's totally blank. XenServer doesn't
> > export any empty block, thus it will skip the corrisponding 1MB file.
> > When building the raw image, you have to replace empty blocks with 1MB
> > full of zeros.
> > 
> > This is (in addition to the tar extract) the most time-consuming part.
> > Currently I'm rebuilding a 250GB image, with tons of files to be
> > merge.
> > If qemu-img can be patched to automatically convert this kind of
> > format, I can save about 3 hours (30 minutes for extracting the
> > tarball, and about 2 hours to merge 250-300GB image)
> >   
> 
> Hi,
> 
> I like what Richard and Max came up with. Pretty nifty solutions.
> While there's nothing wrong with them I decided to take my own shot at
> it. Since the blocks in tar file are pieces of raw image there is no
> conversion happening. All in all it's just moving bytes from one place
> to another. That means there shouldn't be a need for any heavy
> machinery, right? :)
> 
> Attached is a shell script that uses dd to do the byte-shuffling.
> I'm pretty sure this could even be safely parallelized by running
> multiple instances of dd at the same time (e.g. with xargs). But I did
> not try that.

Oh yes, and one more thing: as with the other solutions I didn't bother
reading the XML for the target size. So resize may be necessary
afterwards.

Tomas

-- 
Tomáš Golembiovský 



[Qemu-devel] [RFC v4 16/27] monitor: separate QMP parser and dispatcher

2017-11-16 Thread Peter Xu
Originally QMP goes throw these steps:

  JSON Parser --> QMP Dispatcher --> Respond
  /|\(2)(3) |
   (1) |   \|/ (4)
   +-  main thread  +

This patch does this:

  JSON Parser QMP Dispatcher --> Respond
  /|\ |   /|\   (4) |
   |  | (2)| (3)|  (5)
   (1) |  +->  |   \|/
   +-  main thread  <---+

So the parsing job and the dispatching job is isolated now.  It gives us
a chance in following up patches to totally move the parser outside.

The isloation is done using one QEMUBH. Only one dispatcher QEMUBH is
used for all the monitors.

Signed-off-by: Peter Xu 
---
 monitor.c | 179 ++
 1 file changed, 157 insertions(+), 22 deletions(-)

diff --git a/monitor.c b/monitor.c
index 30f9cd80de..0e2ff114cc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -168,6 +168,10 @@ typedef struct {
  */
 QmpCommandList *commands;
 bool qmp_caps[QMP_CAPABILITY__MAX];
+/* Protects qmp request/response queue. */
+QemuMutex qmp_queue_lock;
+/* Input queue that holds all the parsed QMP requests */
+GQueue *qmp_requests;
 } MonitorQMP;
 
 /*
@@ -213,6 +217,8 @@ struct Monitor {
 
 struct MonitorGlobal {
 IOThread *mon_iothread;
+/* Bottom half to dispatch the requests received from IO thread */
+QEMUBH *qmp_dispatcher_bh;
 };
 
 static struct MonitorGlobal mon_global;
@@ -582,11 +588,13 @@ static void monitor_data_init(Monitor *mon, bool 
skip_flush,
 {
 memset(mon, 0, sizeof(Monitor));
 qemu_mutex_init(&mon->out_lock);
+qemu_mutex_init(&mon->qmp.qmp_queue_lock);
 mon->outbuf = qstring_new();
 /* Use *mon_cmds by default. */
 mon->cmd_table = mon_cmds;
 mon->skip_flush = skip_flush;
 mon->use_io_thr = use_io_thr;
+mon->qmp.qmp_requests = g_queue_new();
 }
 
 static void monitor_data_destroy(Monitor *mon)
@@ -599,6 +607,8 @@ static void monitor_data_destroy(Monitor *mon)
 g_free(mon->rs);
 QDECREF(mon->outbuf);
 qemu_mutex_destroy(&mon->out_lock);
+qemu_mutex_destroy(&mon->qmp.qmp_queue_lock);
+g_queue_free(mon->qmp.qmp_requests);
 }
 
 char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
@@ -3907,29 +3917,31 @@ static void monitor_qmp_respond(Monitor *mon, QObject 
*rsp,
 qobject_decref(rsp);
 }
 
-static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens,
-   void *opaque)
+struct QMPRequest {
+/* Owner of the request */
+Monitor *mon;
+/* "id" field of the request */
+QObject *id;
+/* Request object to be handled */
+QObject *req;
+};
+typedef struct QMPRequest QMPRequest;
+
+/*
+ * Dispatch one single QMP request. The function will free the req_obj
+ * and objects inside it before return.
+ */
+static void monitor_qmp_dispatch_one(QMPRequest *req_obj)
 {
-QObject *req, *rsp = NULL, *id = NULL;
+Monitor *mon, *old_mon;
+QObject *req, *rsp = NULL, *id;
 QDict *qdict = NULL;
-Monitor *mon = opaque, *old_mon;
-Error *err = NULL;
 
-req = json_parser_parse_err(tokens, NULL, &err);
-if (!req && !err) {
-/* json_parser_parse_err() sucks: can fail without setting @err */
-error_setg(&err, QERR_JSON_PARSING);
-}
-if (err) {
-goto err_out;
-}
+req = req_obj->req;
+mon = req_obj->mon;
+id = req_obj->id;
 
-qdict = qobject_to_qdict(req);
-if (qdict) {
-id = qdict_get(qdict, "id");
-qobject_incref(id);
-qdict_del(qdict, "id");
-} /* else will fail qmp_dispatch() */
+g_free(req_obj);
 
 if (trace_event_get_state_backends(TRACE_HANDLE_QMP_COMMAND)) {
 QString *req_json = qobject_to_json(req);
@@ -3940,7 +3952,7 @@ static void handle_qmp_command(JSONMessageParser *parser, 
GQueue *tokens,
 old_mon = cur_mon;
 cur_mon = mon;
 
-rsp = qmp_dispatch(cur_mon->qmp.commands, req);
+rsp = qmp_dispatch(mon->qmp.commands, req);
 
 cur_mon = old_mon;
 
@@ -3956,12 +3968,122 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 }
 }
 
-err_out:
-monitor_qmp_respond(mon, rsp, err, id);
+/* Respond if necessary */
+monitor_qmp_respond(mon, rsp, NULL, id);
+
+/* This pairs with the monitor_suspend() in handle_qmp_command(). */
+if (!qmp_oob_enabled(mon)) {
+monitor_resume(mon);
+}
 
 qobject_decref(req);
 }
 
+/*
+ * Pop one QMP request from monitor queues, return NULL if not found.
+ * We are using round-robin fasion to pop the request, to avoid
+ * processing command only on a very busy monitor.  To achieve that,
+ * when we processed one request on specific monitor, we put that
+ * monitor to the end of mon_list queue.
+ */
+static QMPRequest *monitor_qmp_requests_pop_one(void)
+{
+QMPRequest *req_obj = NULL;
+Monitor *mon;
+
+  

[Qemu-devel] [RFC v4 25/27] docs: update QMP documents for OOB commands

2017-11-16 Thread Peter Xu
Update both the developer and spec for the new QMP OOB (Out-Of-Band)
command.

Signed-off-by: Peter Xu 
---
 docs/devel/qapi-code-gen.txt | 51 +++-
 docs/interop/qmp-spec.txt| 35 +-
 2 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt
index f04c63fe82..8597fdb087 100644
--- a/docs/devel/qapi-code-gen.txt
+++ b/docs/devel/qapi-code-gen.txt
@@ -556,7 +556,8 @@ following example objects:
 
 Usage: { 'command': STRING, '*data': COMPLEX-TYPE-NAME-OR-DICT,
  '*returns': TYPE-NAME, '*boxed': true,
- '*gen': false, '*success-response': false }
+ '*gen': false, '*success-response': false,
+ '*allow-oob': false }
 
 Commands are defined by using a dictionary containing several members,
 where three members are most common.  The 'command' member is a
@@ -636,6 +637,44 @@ possible, the command expression should include the 
optional key
 'success-response' with boolean value false.  So far, only QGA makes
 use of this member.
 
+Most of the QMP commands are handled sequentially in such a order:
+Firstly, the JSON Parser parses the command request into some internal
+message, delivers the message to QMP dispatchers. Secondly, the QMP
+dispatchers will handle the commands one by one in time order, respond
+when necessary.  For some commands that always complete "quickly" can
+instead be executed directly during parsing, at the QMP client's
+request.  This kind of commands that allow direct execution is called
+"out-of-band" ("oob" as shortcut) commands. The response can overtake
+prior in-band commands' responses.  By default, commands are always
+in-band.  We need to explicitly specify "allow-oob" to "True" to show
+that one command can be run out-of-band.
+
+One thing to mention for developers is that, although out-of-band
+execution of commands benefit from quick and asynchronous execution,
+it need to satisfy at least the following:
+
+(1) It is extremely quick and never blocks, so that its execution will
+not block parsing routine of any other monitors.
+
+(2) It does not need BQL, since the parser can be run without BQL,
+while the dispatcher is always with BQL held.
+
+If not, the command is not suitable to be allowed to run out-of-band,
+and it should set its "allow-oob" to "False".  Whether a command is
+allowed to run out-of-band can also be introspected using
+query-qmp-schema command.  Please see the section "Client JSON
+Protocol introspection" for more information.
+
+To execute a command in out-of-band way, we need to specify the
+"control" field in the request, with "run-oob" set to true. Example:
+
+ => { "execute": "command-support-oob",
+  "arguments": { ... },
+  "control": { "run-oob": true } }
+ <= { "return": { } }
+
+Without it, even the commands that supports out-of-band execution will
+still be run in-band.
 
 === Events ===
 
@@ -739,10 +778,12 @@ references by name.
 QAPI schema definitions not reachable that way are omitted.
 
 The SchemaInfo for a command has meta-type "command", and variant
-members "arg-type" and "ret-type".  On the wire, the "arguments"
-member of a client's "execute" command must conform to the object type
-named by "arg-type".  The "return" member that the server passes in a
-success response conforms to the type named by "ret-type".
+members "arg-type", "ret-type" and "allow-oob".  On the wire, the
+"arguments" member of a client's "execute" command must conform to the
+object type named by "arg-type".  The "return" member that the server
+passes in a success response conforms to the type named by
+"ret-type".  When "allow-oob" is set, it means the command supports
+out-of-band execution.
 
 If the command takes no arguments, "arg-type" names an object type
 without members.  Likewise, if the command returns nothing, "ret-type"
diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt
index f8b5356015..c5c02589a9 100644
--- a/docs/interop/qmp-spec.txt
+++ b/docs/interop/qmp-spec.txt
@@ -78,21 +78,32 @@ The greeting message format is:
 - The "capabilities" member specify the availability of features beyond the
   baseline specification; the order of elements in this array has no
   particular significance, so a client must search the entire array
-  when looking for a particular capability
+  when looking for a particular capability.
 
 2.2.1 Capabilities
 --
 
-As of the date this document was last revised, no server or client
-capability strings have been defined.
+Currently supported capabilities are:
 
+- "oob": it means the QMP server supports "Out-Of-Band" command
+  execution.  For more detail, please see "run-oob" parameter in
+  "Issuing Commands" section below.  Not all commands allow this "oob"
+  execution.  One can know whether one command supports "oob" by
+  "query-qmp-schema" command.
+
+QMP clients can get a list of supported QMP capabili

Re: [Qemu-devel] [PATCH for-2.11] throttle-groups: forget timer and schedule next TGM on detach

2017-11-16 Thread Alberto Garcia
On Thu 16 Nov 2017 12:21:50 PM CET, Stefan Hajnoczi wrote:
> tg->any_timer_armed[] must be cleared when detaching pending timers from
> the AioContext.  Failure to do so leads to hung I/O because it looks
> like there are still timers pending when in fact they have been removed.
>
> Other ThrottleGroupMembers might have requests pending too so it's
> necessary to schedule the next TGM so it can set a timer.
>
> This patch fixes hung I/O when QEMU is launched with drives that are in
> the same throttling group:
>
>   (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 &
>   (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 &
>   (qemu) stop
>   (qemu) cont
>   ...I/O is stuck...
>
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-devel] [virtio-dev] Re: [PATCH v17 6/6] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2017-11-16 Thread Wei Wang

On 11/16/2017 04:32 AM, Michael S. Tsirkin wrote:

On Fri, Nov 03, 2017 at 04:13:06PM +0800, Wei Wang wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
support of reporting hints of guest free pages to the host via
virtio-balloon. The host requests the guest to report the free pages by
sending commands via the virtio-balloon configuration registers.

When the guest starts to report, the first element added to the free page
vq is a sequence id of the start reporting command. The id is given by
the host, and it indicates whether the following free pages correspond
to the command. For example, the host may stop the report and start again
with a new command id. The obsolete pages for the previous start command
can be detected by the id dismatching on the host. The id is added to the
vq using an output buffer, and the free pages are added to the vq using
input buffer.

Here are some explainations about the added configuration registers:
- host2guest_cmd: a register used by the host to send commands to the
guest.
- guest2host_cmd: written by the guest to ACK to the host about the
commands that have been received. The host will clear the corresponding
bits on the host2guest_cmd register. The guest also uses this register
to send commands to the host (e.g. when finish free page reporting).
- free_page_cmd_id: the sequence id of the free page report command
given by the host.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---

+
+static void report_free_page(struct work_struct *work)
+{
+   struct virtio_balloon *vb;
+
+   vb = container_of(work, struct virtio_balloon, report_free_page_work);
+   report_free_page_cmd_id(vb);
+   walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
+   /*
+* The last few free page blocks that were added may not reach the
+* batch size, but need a kick to notify the device to handle them.
+*/
+   virtqueue_kick(vb->free_page_vq);
+   report_free_page_end(vb);
+}
+

I think there's an issue here: if pages are poisoned and hypervisor
subsequently drops them, testing them after allocation will
trigger a false positive.

The specific configuration:

PAGE_POISONING on
PAGE_POISONING_NO_SANITY off
PAGE_POISONING_ZERO off


Solutions:
1. disable the feature in that configuration
suggested as an initial step


Thanks for the finding.
Similar to this option: I'm thinking could we make walk_free_mem_block() 
simply return if that option is on?

That is, at the beginning of the function:
if (!page_poisoning_enabled())
return;

I think in most usages, people would not choose to use the poisoning 
option due to the added overhead.



Probably we could make it a separate fix patch of this report following 
patch 5 to explain the above reasons in the commit.



2. pass poison value to host so it can validate page content
before it drops it
3. pass poison value to host so it can init allocated pages with that value

In fact one nice side effect would be that unmap
becomes safe even though free list is not locked anymore.


I haven't got this point yet,  how would it bring performance benefit?


It would be interesting to see whether this last has
any value performance-wise.



Best,
Wei



[Qemu-devel] [RFC v4 26/27] tests: qmp-test: verify command batching

2017-11-16 Thread Peter Xu
OOB introduced DROP event for flow control.  This should not affect old
QMP clients.  Add a command batching check to make sure of it.

Signed-off-by: Peter Xu 
---
 tests/qmp-test.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 292c5f135a..729ec59b0a 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -78,6 +78,7 @@ static void test_qmp_protocol(void)
 QList *capabilities;
 const QListEntry *entry;
 QString *qstr;
+int i;
 
 global_qtest = qtest_init_without_qmp_handshake(common_args);
 
@@ -135,6 +136,24 @@ static void test_qmp_protocol(void)
 g_assert_cmpint(qdict_get_int(resp, "id"), ==, 2);
 QDECREF(resp);
 
+/*
+ * Test command batching.  In current test OOB is not enabled, we
+ * should be able to run as many commands in batch as we like.
+ * Using 16 (>8, which is OOB queue length) to make sure OOB
+ * won't break existing clients.
+ */
+for (i = 0; i < 16; i++) {
+qmp_async("{ 'execute': 'query-version' }");
+}
+/* Verify the replies to make sure no command is dropped. */
+for (i = 0; i < 16; i++) {
+resp = qmp_receive();
+/* It should never be dropped.  Each of them should be a reply. */
+g_assert(qdict_haskey(resp, "return"));
+g_assert(!qdict_haskey(resp, "event"));
+QDECREF(resp);
+}
+
 qtest_end();
 }
 
-- 
2.13.6




[Qemu-devel] [RFC v4 20/27] qmp: support out-of-band (oob) execution

2017-11-16 Thread Peter Xu
Having "allow-oob" to true for a command does not mean that this command
will always be run in out-of-band mode.  The out-of-band quick path will
only be executed if we specify the extra "run-oob" flag when sending the
QMP request:

{ "execute":   "command-that-allows-oob",
  "arguments": { ... },
  "control":   { "run-oob": true } }

The "control" key is introduced to store this extra flag.  "control"
field is used to store arguments that are shared by all the commands,
rather than command specific arguments.  Let "run-oob" be the first.

Signed-off-by: Peter Xu 
---
 include/qapi/qmp/dispatch.h |  1 +
 monitor.c   | 24 
 qapi/qmp-dispatch.c | 39 +++
 trace-events|  2 ++
 4 files changed, 66 insertions(+)

diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index b76798800c..ee2b8ce576 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -49,6 +49,7 @@ bool qmp_command_is_enabled(const QmpCommand *cmd);
 const char *qmp_command_name(const QmpCommand *cmd);
 bool qmp_has_success_response(const QmpCommand *cmd);
 QObject *qmp_build_error_object(Error *err);
+bool qmp_is_oob(const QObject *request);
 
 typedef void (*qmp_cmd_callback_fn)(QmpCommand *cmd, void *opaque);
 
diff --git a/monitor.c b/monitor.c
index 92e846e195..ccd350dca2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4025,6 +4025,7 @@ static void monitor_qmp_bh_dispatcher(void *data)
 if (!req_obj) {
 break;
 }
+trace_monitor_qmp_cmd_in_band(qobject_get_try_str(req_obj->id));
 monitor_qmp_dispatch_one(req_obj);
 }
 }
@@ -4051,9 +4052,25 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 return;
 }
 
+if (!qmp_oob_enabled(mon) && qmp_is_oob(req)) {
+error_setg(&err, "Out-Of-Band is only allowed "
+   "when OOB is enabled.");
+monitor_qmp_respond(mon, NULL, err, NULL);
+qobject_decref(req);
+return;
+}
+
 qdict = qobject_to_qdict(req);
 if (qdict) {
 id = qdict_get(qdict, "id");
+/* When OOB is enabled, the "id" field is mandatory. */
+if (qmp_oob_enabled(mon) && !id) {
+error_setg(&err, "Out-Of-Band capability requires that "
+   "every command contains an 'id' field.");
+monitor_qmp_respond(mon, NULL, err, NULL);
+qobject_decref(req);
+return;
+}
 qobject_incref(id);
 qdict_del(qdict, "id");
 } /* else will fail qmp_dispatch() */
@@ -4064,6 +4081,13 @@ static void handle_qmp_command(JSONMessageParser 
*parser, GQueue *tokens,
 req_obj->req = req;
 
 if (qmp_oob_enabled(mon)) {
+if (qmp_is_oob(req)) {
+/* Out-Of-Band (OOB) requests are executed directly in parser. */
+
trace_monitor_qmp_cmd_out_of_band(qobject_get_try_str(req_obj->id));
+monitor_qmp_dispatch_one(req_obj);
+return;
+}
+
 /* Drop the request if queue is full. */
 if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
 qapi_event_send_request_dropped(id,
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index b41fa174fe..ed7e5d5a75 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -52,6 +52,12 @@ static QDict *qmp_dispatch_check_obj(const QObject *request, 
Error **errp)
"QMP input member 'arguments' must be an object");
 return NULL;
 }
+} else if (!strcmp(arg_name, "control")) {
+if (qobject_type(arg_obj) != QTYPE_QDICT) {
+error_setg(errp,
+   "QMP input member 'control' must be a dict");
+return NULL;
+}
 } else {
 error_setg(errp, "QMP input member '%s' is unexpected",
arg_name);
@@ -94,6 +100,11 @@ static QObject *do_qmp_dispatch(QmpCommandList *cmds, 
QObject *request,
 return NULL;
 }
 
+if (qmp_is_oob(request) && !(cmd->options & QCO_ALLOW_OOB)) {
+error_setg(errp, "The command %s does not support OOB", command);
+return NULL;
+}
+
 if (!qdict_haskey(dict, "arguments")) {
 args = qdict_new();
 } else {
@@ -122,6 +133,34 @@ QObject *qmp_build_error_object(Error *err)
   error_get_pretty(err));
 }
 
+/*
+ * Detect whether a request should be run out-of-band, by quickly
+ * peeking at whether we have: { "control": { "run-oob": True } }. By
+ * default commands are run in-band.
+ */
+bool qmp_is_oob(const QObject *request)
+{
+QDict *dict;
+QBool *bool_obj;
+
+dict = qobject_to_qdict(request);
+if (!dict) {
+return false;
+}
+
+dict = qdict_get_qdict(dict, "control");
+if (!dict) {
+return false;
+}
+
+bool_obj = qobject_to_qb

Re: [Qemu-devel] [PATCH 5/5] iotest 030: add compressed block-stream test

2017-11-16 Thread Anton Nefedov



On 15/11/2017 10:51 PM, Max Reitz wrote:

On 2017-11-14 11:16, Anton Nefedov wrote:

Signed-off-by: Anton Nefedov 
---
  tests/qemu-iotests/030 | 69 +-
  tests/qemu-iotests/030.out |  4 +--
  2 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 1883894..52cbe5d 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -21,7 +21,7 @@
  import time
  import os
  import iotests
-from iotests import qemu_img, qemu_io
+from iotests import qemu_img, qemu_img_pipe, qemu_io
  
  backing_img = os.path.join(iotests.test_dir, 'backing.img')

  mid_img = os.path.join(iotests.test_dir, 'mid.img')
@@ -800,5 +800,72 @@ class TestSetSpeed(iotests.QMPTestCase):
  
  self.cancel_and_wait()
  
+class TestCompressed(iotests.QMPTestCase):

+supported_fmts = ['qcow2']
+cluster_size = 64 * 1024;
+image_len = 1 * 1024 * 1024;
+
+def setUp(self):
+qemu_img('create', backing_img, str(TestCompressed.image_len))


First, this is missing the '-f raw', if you want to keep it in line with
the next command.


+qemu_io('-f', 'raw', '-c', 'write -P 1 0 ' + 
str(TestCompressed.image_len), backing_img)


But why would you want this to be raw?



Copied this from another test in this file :)
The source image format does not really matter. Will fix though.


+qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, test_img)
+
+# write '4' in every 4th cluster
+step=4


I'd prefer spaces around operators.  (Also in _pattern() and in the call
to pattern ([1,4,2] should be [1, 4, 2]).)


+for i in range(TestCompressed.image_len / TestCompressed.cluster_size 
/ step + 1):


As far as I know, range() has an actual step parameter, maybe that would
be simpler -- and I don't know what the +1 is supposed to mean.

How about "for ofs in range(0, TestCompressed.image_len,
 TestCompressed.cluster_size * step)"?
Would that work?



It does, thank you.


+qemu_io('-c', 'write -P %d %d %d' %
+(step, step * i * TestCompressed.cluster_size,> +  
   TestCompressed.cluster_size),
+test_img)
+
+self.vm = iotests.VM().add_drive(test_img)
+self.vm.launch()
+
+def tearDown(self):
+os.remove(test_img)
+os.remove(backing_img)
+
+def _pattern(self, x, divs):
+return divs[-1] if not x%divs[-1] else self._pattern(x, divs[:-1])


I have no idea what this function does.

...OK, having looked into how it's used, I understand better.  A comment
would certainly be helpful, though.

Maybe also call it differently.  It doesn't really return a pattern.
What this function does is it returns the first integer from @divs
(starting from the end, which is weird) which is a divider of @x.  So
maybe call it _first_divider(self, x, dividers), that would help already.



Yeah, I really had to comment.

Exactly, it starts from the end as I meant it to accept numbers in the 
order they were written. So 'first_divider' wouldn't be right.

Not that important though, will rename and reorder the input.



+
+def test_compressed(self):
+self.assert_no_active_block_jobs()
+
+result = self.vm.qmp('block-stream', device='drive0', compress=True)
+if iotests.imgfmt not in TestCompressed.supported_fmts:
+self.assert_qmp(
+result, 'error/desc',
+'Compression is not supported for this drive drive0')
+return
+self.assert_qmp(result, 'return', {})
+
+# write '2' in every 2nd cluster
+step = 2
+for i in range(TestCompressed.image_len / TestCompressed.cluster_size 
/ step + 1):
+result = self.vm.qmp('human-monitor-command',
+ command_line=
+ 'qemu-io drive0 \"write -P %d %d %d\"' %


Just " instead of \" would be enough, I think.


Done.


+ (step, step * i * TestCompressed.cluster_size,
+  TestCompressed.cluster_size))
+self.assert_qmp(result, 'return', "")
+
+self.wait_until_completed()
+self.assert_no_active_block_jobs()
+
+self.vm.shutdown()
+
+for i in range(TestCompressed.image_len / TestCompressed.cluster_size):
+outp = qemu_io('-c', 'read -P %d %d %d' %
+   (self._pattern(i, [1,4,2]),


The 4 is useless, because everything that's divisible by 4 is divisible
by 2 already.

Also, I don't quite see what this should achieve exactly.  You first
write the backing image full of 1s, OK.  Then you write 4s to every
fourth cluster in the top image.  Then you start streaming, and then you
write 2s to ever second cluster in the top image, which overwrites all
of the 4s you wrote.

Do you maybe not want to write 2 into ev

[Qemu-devel] [RFC v4 22/27] qmp: isolate responses into io thread

2017-11-16 Thread Peter Xu
For those monitors who has enabled IO thread, we'll offload the
responding procedure into IO thread.  The main reason is that chardev is
not thread safe, and we need to do all the read/write IOs in the same
thread.  For use_io_thr=true monitors, that thread is the IO thread.

We do this isolation in similar pattern as what we have done to the
request queue: we first create one response queue for each monitor, then
instead of reply directly in main thread, we queue the responses and
kick the IO thread to do the rest of the job for us.

A funny thing after doing this is that, when the QMP clients send "quit"
to QEMU, it's possible that we close the IOThread even earlier than
replying to that "quit".  So another thing we need to do before cleaning
up the monitors is that we need to flush the response queue (we don't
need to do that for command queue; after all we are quitting) to make
sure replies for handled commands are always flushed back to clients.

Signed-off-by: Peter Xu 
---
 monitor.c | 97 ++-
 1 file changed, 96 insertions(+), 1 deletion(-)

diff --git a/monitor.c b/monitor.c
index ccd350dca2..b18d9d696c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -172,6 +172,8 @@ typedef struct {
 QemuMutex qmp_queue_lock;
 /* Input queue that holds all the parsed QMP requests */
 GQueue *qmp_requests;
+/* Output queue contains all the QMP responses in order */
+GQueue *qmp_responses;
 } MonitorQMP;
 
 /*
@@ -219,6 +221,8 @@ struct MonitorGlobal {
 IOThread *mon_iothread;
 /* Bottom half to dispatch the requests received from IO thread */
 QEMUBH *qmp_dispatcher_bh;
+/* Bottom half to deliver the responses back to clients */
+QEMUBH *qmp_respond_bh;
 };
 
 static struct MonitorGlobal mon_global;
@@ -395,7 +399,8 @@ int monitor_fprintf(FILE *stream, const char *fmt, ...)
 return 0;
 }
 
-static void monitor_json_emitter(Monitor *mon, const QObject *data)
+static void monitor_json_emitter_raw(Monitor *mon,
+ QObject *data)
 {
 QString *json;
 
@@ -409,6 +414,71 @@ static void monitor_json_emitter(Monitor *mon, const 
QObject *data)
 QDECREF(json);
 }
 
+static void monitor_json_emitter(Monitor *mon, QObject *data)
+{
+if (mon->use_io_thr) {
+/*
+ * If using IO thread, we need to queue the item so that IO
+ * thread will do the rest for us.  Take refcount so that
+ * caller won't free the data (which will be finally freed in
+ * responder thread).
+ */
+qobject_incref(data);
+qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
+g_queue_push_tail(mon->qmp.qmp_responses, (void *)data);
+qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
+qemu_bh_schedule(mon_global.qmp_respond_bh);
+} else {
+/*
+ * If not using monitor IO thread, then we are in main thread.
+ * Do the emission right away.
+ */
+monitor_json_emitter_raw(mon, data);
+}
+}
+
+struct QMPResponse {
+Monitor *mon;
+QObject *data;
+};
+typedef struct QMPResponse QMPResponse;
+
+/*
+ * Return one QMPResponse.  The response is only valid if
+ * response.data is not NULL.
+ */
+static QMPResponse monitor_qmp_response_pop_one(void)
+{
+Monitor *mon;
+QObject *data = NULL;
+
+qemu_mutex_lock(&monitor_lock);
+QTAILQ_FOREACH(mon, &mon_list, entry) {
+qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
+data = g_queue_pop_head(mon->qmp.qmp_responses);
+qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
+if (data) {
+break;
+}
+}
+qemu_mutex_unlock(&monitor_lock);
+return (QMPResponse) { .mon = mon, .data = data };
+}
+
+static void monitor_qmp_bh_responder(void *opaque)
+{
+QMPResponse response;
+
+while (true) {
+response = monitor_qmp_response_pop_one();
+if (!response.data) {
+break;
+}
+monitor_json_emitter_raw(response.mon, response.data);
+qobject_decref(response.data);
+}
+}
+
 static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
 /* Limit guest-triggerable events to 1 per second */
 [QAPI_EVENT_RTC_CHANGE]= { 1000 * SCALE_MS },
@@ -595,6 +665,7 @@ static void monitor_data_init(Monitor *mon, bool skip_flush,
 mon->skip_flush = skip_flush;
 mon->use_io_thr = use_io_thr;
 mon->qmp.qmp_requests = g_queue_new();
+mon->qmp.qmp_responses = g_queue_new();
 }
 
 static void monitor_data_destroy(Monitor *mon)
@@ -609,6 +680,7 @@ static void monitor_data_destroy(Monitor *mon)
 qemu_mutex_destroy(&mon->out_lock);
 qemu_mutex_destroy(&mon->qmp.qmp_queue_lock);
 g_queue_free(mon->qmp.qmp_requests);
+g_queue_free(mon->qmp.qmp_responses);
 }
 
 char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index,
@@ -4301,6 +4373,11 @@ static GMainContext *monitor_io_context_get(void)
 return iot

Re: [Qemu-devel] [PATCH v6 3/5] fw_cfg: do DMA read operation

2017-11-16 Thread Marc-André Lureau
Hi

- Original Message -
> On Mon, Nov 13, 2017 at 08:29:56PM +0100, Marc-André Lureau wrote:
> > Modify fw_cfg_read_blob() to use DMA if the device supports it.
> > Return errors, because the operation may fail.
> > 
> > To avoid polling with unbound amount of time, the DMA operation is
> > expected to complete within 200ms, or will return ETIME error.
> > 
> > We may want to switch all the *buf addresses to use only kmalloc'ed
> > buffers (instead of using stack/image addresses with dma=false).
> > 
> > Signed-off-by: Marc-André Lureau 
> > Reviewed-by: Gabriel Somlo 
> > ---
> >  drivers/firmware/qemu_fw_cfg.c | 154
> >  -
> >  1 file changed, 137 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/firmware/qemu_fw_cfg.c
> > b/drivers/firmware/qemu_fw_cfg.c
> > index 1f3e8545dab7..2ac4cd869fe6 100644
> > --- a/drivers/firmware/qemu_fw_cfg.c
> > +++ b/drivers/firmware/qemu_fw_cfg.c
> > @@ -33,6 +33,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  MODULE_AUTHOR("Gabriel L. Somlo ");
> >  MODULE_DESCRIPTION("QEMU fw_cfg sysfs support");
> > @@ -43,12 +45,26 @@ MODULE_LICENSE("GPL");
> >  #define FW_CFG_ID 0x01
> >  #define FW_CFG_FILE_DIR   0x19
> >  
> > +#define FW_CFG_VERSION_DMA 0x02
> > +#define FW_CFG_DMA_CTL_ERROR   0x01
> > +#define FW_CFG_DMA_CTL_READ0x02
> > +#define FW_CFG_DMA_CTL_SKIP0x04
> > +#define FW_CFG_DMA_CTL_SELECT  0x08
> > +#define FW_CFG_DMA_CTL_WRITE   0x10
> > +#define FW_CFG_DMA_TIMEOUT 200 /* ms */
> > +
> >  /* size in bytes of fw_cfg signature */
> >  #define FW_CFG_SIG_SIZE 4
> >  
> >  /* fw_cfg "file name" is up to 56 characters (including terminating nul)
> >  */
> >  #define FW_CFG_MAX_FILE_PATH 56
> >  
> > +/* platform device for dma mapping */
> > +static struct device *dev;
> > +
> > +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir.
> > */
> > +static u32 fw_cfg_rev;
> > +
> >  /* fw_cfg file directory entry type */
> >  struct fw_cfg_file {
> > u32 size;
> > @@ -57,6 +73,12 @@ struct fw_cfg_file {
> > char name[FW_CFG_MAX_FILE_PATH];
> >  };
> >  
> > +struct fw_cfg_dma {
> > +   u32 control;
> > +   u32 length;
> > +   u64 address;
> > +} __packed;
> > +
> >  /* fw_cfg device i/o register addresses */
> >  static bool fw_cfg_is_mmio;
> >  static phys_addr_t fw_cfg_p_base;
> > @@ -75,12 +97,93 @@ static inline u16 fw_cfg_sel_endianness(u16 key)
> > return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key);
> >  }
> >  
> > +static inline bool fw_cfg_dma_enabled(void)
> > +{
> > +   return fw_cfg_rev & FW_CFG_VERSION_DMA && fw_cfg_reg_dma;
> > +}
> > +
> > +static bool fw_cfg_wait_for_control(struct fw_cfg_dma *d, unsigned long
> > timeout)
> > +{
> > +   ktime_t start;
> > +   ktime_t stop;
> > +
> > +   start = ktime_get();
> > +   stop = ktime_add(start, ms_to_ktime(timeout));
> > +
> > +   do {
> > +   if ((be32_to_cpu(d->control) & ~FW_CFG_DMA_CTL_ERROR) == 0)
> > +   return true;
> > +
> > +   usleep_range(50, 100);
> 
> BTW it's not nice that this is uninterruptible. I think we need a
> variant of usleep_range that is interruptible and call that here.
> Thoughts?
> 

This usleep_range() pattern is pretty common apparently.

(and it's probably better than calling yield() in a loop, like v1-3 did) 
> 
> > +   } while (ktime_before(ktime_get(), stop));
> > +
> > +   return false;
> > +}
> > +
> > +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control)
> > +{
> > +   dma_addr_t dma_addr = 0;
> > +   struct fw_cfg_dma *d;
> > +   dma_addr_t dma;
> > +   ssize_t ret = length;
> > +   enum dma_data_direction dir =
> > +   (control & FW_CFG_DMA_CTL_READ ? DMA_FROM_DEVICE : 0);
> > +
> > +   if (address && length) {
> > +   dma_addr = dma_map_single(dev, address, length, dir);
> > +   if (dma_mapping_error(NULL, dma_addr)) {
> > +   WARN(1, "%s: failed to map address\n", __func__);
> > +   return -EFAULT;
> > +   }
> > +   }
> > +
> > +   d = kmalloc(sizeof(*d), GFP_KERNEL | GFP_DMA);
> > +   if (!d) {
> > +   ret = -ENOMEM;
> > +   goto end;
> > +   }
> > +
> > +   dma = dma_map_single(dev, d, sizeof(*d), DMA_BIDIRECTIONAL);
> > +   if (dma_mapping_error(NULL, dma)) {
> > +   WARN(1, "%s: failed to map fw_cfg_dma\n", __func__);
> > +   ret = -EFAULT;
> > +   goto end;
> > +   }
> > +
> > +   *d = (struct fw_cfg_dma) {
> > +   .address = cpu_to_be64(dma_addr),
> > +   .length = cpu_to_be32(length),
> > +   .control = cpu_to_be32(control)
> > +   };
> > +
> > +   iowrite32be((u64)dma >> 32, fw_cfg_reg_dma);
> > +   iowrite32be(dma, fw_cfg_reg_dma + 4);
> > +
> > +   if (!fw_cfg_wait_for_control(d, FW_CFG_DMA_TIMEOUT)) {
> > +   WARN(1, "%s: timeout", __func__);
> > +   ret = -ETIME;
> > +   } else if (be32_to_cpu(d->cont

  1   2   3   >