Re: [Qemu-devel] [PATCH 1/4 for-2.11?] block/dirty-bitmap: add lock to bdrv_enable/disable_dirty_bitmap
13.11.2017 20:50, Eric Blake wrote: On 11/13/2017 10:20 AM, Vladimir Sementsov-Ogievskiy wrote: Like other setters here these functions should take a lock. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/dirty-bitmap.c | 4 1 file changed, 4 insertions(+) Should this patch be in 2.11? these functions are unused now, so, no, it's not necessary diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index bd04e991b1..2a0bcd9e51 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -397,15 +397,19 @@ void bdrv_remove_persistent_dirty_bitmap(BlockDriverState *bs, /* Called with BQL taken. */ void bdrv_disable_dirty_bitmap(BdrvDirtyBitmap *bitmap) { +bdrv_dirty_bitmap_lock(bitmap); assert(!bdrv_dirty_bitmap_frozen(bitmap)); bitmap->disabled = true; +bdrv_dirty_bitmap_unlock(bitmap); Why do we need this lock in addition to BQL? } /* Called with BQL taken. */ void bdrv_enable_dirty_bitmap(BdrvDirtyBitmap *bitmap) { +bdrv_dirty_bitmap_lock(bitmap); assert(!bdrv_dirty_bitmap_frozen(bitmap)); bitmap->disabled = false; +bdrv_dirty_bitmap_unlock(bitmap); Again, why do we need this in addition to BQL? The commit message needs more details about a scenario where our existing BQL lock is insufficient to prevent misuse of bitmap->disabled. -- Best regards, Vladimir
Re: [Qemu-devel] [RFC v3 27/27] tests: qmp-test: add oob test
On Wed, Nov 15, 2017 at 10:21:16AM +, Stefan Hajnoczi wrote: > On Mon, Nov 06, 2017 at 05:46:43PM +0800, Peter Xu wrote: > > +/* > > + * Try a time-consuming command, following by a OOB command, make > > + * sure we get OOB command before the time-consuming one (which is > > + * run in the parser). > > + * > > + * When writting up this test script, the only command that > > + * support OOB is migrate-incoming. It's not the best command to > > + * test OOB but we don't really have a choice here. We will check > > + * arriving order but not command errors, which does not really > > + * matter to us. > > + */ > > +qmp_async("{ 'execute': 'dump-guest-memory'," > > + " 'arguments': { 'paging': true, " > > + " 'protocol': 'file:/dev/null' }, " > > + " 'id': 'time-consuming-cmd'}"); > > +qmp_async("{ 'execute': 'migrate-incoming', " > > + " 'control': { 'run-oob': true }, " > > + " 'id': 'oob-cmd' }"); > > + > > +/* Ignore all events. Wait for 2 acks */ > > +while (acks < 2) { > > +resp = qmp_receive(); > > +if (qdict_haskey(resp, "event")) { > > +/* Skip possible events */ > > +continue; > > +} > > +cmd_id = qdict_get_str(resp, "id"); > > +if (acks == 0) { > > +/* Need to receive OOB response first */ > > +g_assert_cmpstr(cmd_id, ==, "oob-cmd"); > > +} else if (acks == 1) { > > +g_assert_cmpstr(cmd_id, ==, "time-consuming-cmd"); > > +} > > +acks++; > > +} > > This test is non-deterministic. The dump-guest-memory command could > complete first on a fast machine. > > On a slow machine this test might take a long time... > > Please introduce a test command that is deterministic. For example, > when 'x-oob-test' is invoked without 'run-oob': true it waits until > invoked again, this time with 'run-oob': true. Yes this sounds good. > > We have similar interfaces in the block layer for controlling the order > in which parallel I/O requests are processed. This allows test cases to > deterministically take specific code paths. It's great to know that I can create a command to test it. That should be much easier. Thanks, -- Peter Xu
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
Hi Jason, Windows driver will initialise only the amount of queue based on the amount of available vCPUs. So if there will be more queues in the device than we have vCPUs on the guest, the driver will not initialise “excessive” queues. This is tied to the way RSS on Windows should be implemented. Exactly as in described scenario (7 queues, but only 4 vCPUs). Best regards, Yan. > On 16 Nov 2017, at 07:53, Longpeng (Mike) wrote: > > Hi Jason & Michael, > > Do you have any idea about this problem ? > > -- > Regards, > Longpeng(Mike) > > On 2017/11/15 23:54, Longpeng(Mike) wrote: > >> 2017-11-15 23:05 GMT+08:00 Jason Wang : >>> >>> >>> On 2017年11月15日 22:55, Longpeng(Mike) wrote: Hi guys, We got a BUG report from our testers yesterday, the testing scenario was migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 queues*). We found the cause reason, and we'll report the BUG or send a fix patch to upstream if necessary( we haven't test the upstream yet, sorry... ). >>> >>> >>> Could you explain this a little bit more? >>> We want to know why the vhost_net_start() must start *total queues* ( in our VM there're 7 queues ) but not *the queues that current used* ( in our VM, guest only uses the first 4 queues because it's limited by the number of vcpus) ? Looking forward to your help, thx :) >>> >>> >>> Since the codes have been there for years and works well for kernel >>> datapath. You should really explain what's wrong. >>> >> >> OK. :) >> >> In our scenario, the Windows's virtio-net driver only use the first 4 >> queues and it >> *only set desc/avail/used table for the first 4 queues*, so in QEMU >> the desc/avail/ >> used of the last 3 queues are ZERO, but unfortunately... >> ''' >> vhost_net_start >> for (i = 0; i < total_queues; i++) >>vhost_net_start_one >> vhost_dev_start >>vhost_virtqueue_start >> ''' >> In vhost_virtqueue_start(), it will calculate the HVA of >> desc/avail/used table, so for last >> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then >> send the results >> to the user-mode backend ( we use *vhost-user* ) by >> vhost_virtqueue_set_addr(). >> >> When the EVS get these address, it will update a *idx* which will be >> treated as vq's >> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). >> >> So we get the following result after virtio-net stop: >> the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's >> last_avail_idx is NOT ZERO. >> >> At last, virtio_load() reports an error: >> ''' >> if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- >> will be TRUE >> error_report("VQ %d address 0x0 " >> "inconsistent with Host index 0x%x", >> i, vdev->vq[i].last_avail_idx); >>return -1; >> } >> ''' >> >> BTW, the problem won't appear if use Linux guest, because the Linux >> virtio-net >> driver will set all 7 queues's desc/avail/used tables. And the problem >> won't appear >> if the VM use vhost-net, because vhost-net won't update *idx* in SET_ADDR >> ioctl. >> >> Sorry for my pool English, Maybe I could describe the problem in Chinese for >> you >> in private if necessary. >> >> >>> Thanks >> >> > > > -- > Regards, > Longpeng(Mike)
[Qemu-devel] sheepdog block driver and read write error policy
Hi. I'm try to write own sheepdog compatible daemon and test it with qemu. Sometimes ago in qemu added read write error policy to allow to stop domain or continue or something else. As i see in case of sheepdog this policy is ignored and qemu try to reconnect to sheepdog daemon. If nobody wants i can try to fix this, and if policy is not specified work like now. Where i need to start to easy understand how this works in case of file raw ? Also i'm fedora user with virt-preview repo and see, that if sheepdog daemon is unavailable 2 or more minutes and started after that qemu crashed. Does i need to retest with latest qemu version? b2587932582333197c88bf663785b19f441989d7 f1af3251f885f3b8adf73ba078500f2eeefbedae 5eceb01adfbe513c0309528293b0b86e32a6e27d qemu-system-x86_64 -machine q35 -accel kvm -m 512M -vnc 0.0.0.0:1 -device virtio-scsi-pci,id=scsi0 -drive aio=native,rerror=stop,werror=stop,if=none,format=raw,id=drive-scsi-disk0,cache=none,file=sheepdog:test,discard=unmap,detect-zeroes=off -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=device-scsi-disk0 qemu-system-x86_64: failed to get the header, Invalid argument qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: failed to send a req, Bad file descriptor qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: Failed to connect socket: Connection refused qemu-system-x86_64: /builddir/build/BUILD/qemu-2.9.0/util/iov.c:167: iov_send_recv: Assertion `niov < iov_cnt' failed. Makefile:5: recipe for target 'test' failed make: *** [test] Aborted (core dumped) And if someone from fedora/redhat team is present - when new qemu version available in virt preview repo for fedora 25 or it EOL? -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API
16.11.2017 00:20, John Snow wrote: On 11/13/2017 11:20 AM, Vladimir Sementsov-Ogievskiy wrote: Hi all. There are three qmp commands, needed to implement external backup API. Using these three commands, client may do all needed bitmap management by hand: on backup start we need to do a transaction: {disable old bitmap, create new bitmap} on backup success: drop old bitmap on backup fail: enable old bitmap merge new bitmap to old bitmap drop new bitmap Can you give me an example of how you expect these commands to be used, and why they're required? I'm a little weary about how error-prone these commands might be and the potential for incorrect usage seems... high. Why do we require them? It is needed for incremental backup. It looks like bad idea to export abdicate/reclaim functionality, it is simpler and clearer to allow user to merge/enable/disable bitmaps by hand. usage is like this: 1. we have dirty bitmap bitmap0 for incremental backup. 2. prepare image fleecing (create temporary image with backing=our_disk) 3. in qmp transaction: - disable bitmap0 - create bitmap1 - start image fleecing (backup sync=none our_disk -> temp_disk) 4. export temp_disk (it looks like a snapshot) and bitmap0 through NBD === external tool can get bitmap0 and then copy some clusters through NBD === on successful backup external tool can drop bitmap0. or can save it and reuse somehow on failed backup external tool can do the following: - enable bitmap0 - merge bitmap1 to bitmap0 - drop bitmap1 -- Best regards, Vladimir
Re: [Qemu-devel] sheepdog block driver and read write error policy
On Thu, 11/16 11:11, Vasiliy Tolstov wrote: > Hi. I'm try to write own sheepdog compatible daemon and test it with qemu. > Sometimes ago in qemu added read write error policy to allow to stop > domain or continue or something else. As i see in case of sheepdog > this policy is ignored and qemu try to reconnect to sheepdog daemon. > If nobody wants i can try to fix this, and if policy is not specified > work like now. > Where i need to start to easy understand how this works in case of file raw ? The driver callbacks (sd_co_readv/sd_co_writev) should simply return error instead of retrying. > > Also i'm fedora user with virt-preview repo and see, that if sheepdog > daemon is unavailable 2 or more minutes and started after that qemu > crashed. Does i need to retest with latest qemu version? > b2587932582333197c88bf663785b19f441989d7 > f1af3251f885f3b8adf73ba078500f2eeefbedae > 5eceb01adfbe513c0309528293b0b86e32a6e27d > > > qemu-system-x86_64 -machine q35 -accel kvm -m 512M -vnc 0.0.0.0:1 > -device virtio-scsi-pci,id=scsi0 -drive > aio=native,rerror=stop,werror=stop,if=none,format=raw,id=drive-scsi-disk0,cache=none,file=sheepdog:test,discard=unmap,detect-zeroes=off > -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=device-scsi-disk0 > qemu-system-x86_64: failed to get the header, Invalid argument > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: failed to send a req, Bad file descriptor > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > qemu-system-x86_64: Failed to connect socket: Connection refused > > > > qemu-system-x86_64: /builddir/build/BUILD/qemu-2.9.0/util/iov.c:167: > iov_send_recv: Assertion `niov < iov_cnt' failed. > Makefile:5: recipe for target 'test' failed > make: *** [test] Aborted (core dumped) > > And if someone from fedora/redhat team is present - when new qemu > version available in virt preview repo for fedora 25 or it EOL? It's recommended to use qemu.git if you want to develop new features for QEMU: http://wiki.qemu.org/Contribute Fam
Re: [Qemu-devel] [PATCH v2 for-2.11] nbd/server: Fix error reporting for bad requests
16.11.2017 00:35, Eric Blake wrote: The NBD spec says an attempt to NBD_CMD_TRIM on a read-only export should fail with EPERM, as a trim has the potential to change disk contents, but we were relying on the block layer to catch that for us, which might not always give the right error (and even if it does, it does not let us pass back a sane message for structured replies). The NBD spec says an attempt to NBD_CMD_WRITE_ZEROES out of bounds should fail with ENOSPC, not EINVAL. Our check for u64 offset + u32 length wraparound up front is pointless; nothing uses offset until after the second round of sanity checks, and we can just as easily ensure there is no wraparound by checking whether offset is in bounds (since a disk size cannot exceed off_t which is 63 bits, adding a 32-bit number for a valid offset can't overflow). looks like here is another problem which you've fixed: with old code connection would be lost if this check fails in case of CMD_WRITE, as req->complete would be false. Solve all of these issues by some code motion and improved request validation. Signed-off-by: Eric Blake --- v2: actually commit the compiler-error fixes before submitting... nbd/server.c | 36 1 file changed, 12 insertions(+), 24 deletions(-) diff --git a/nbd/server.c b/nbd/server.c index df771fd42f..7d6801b427 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -1366,15 +1366,6 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, return -EIO; } -/* Check for sanity in the parameters, part 1. Defer as many - * checks as possible until after reading any NBD_CMD_WRITE - * payload, so we can try and keep the connection alive. */ I don't understand the comment. Opposite: to keep connection alive we must read the payload, even in case of sanity check fail. -if ((request->from + request->len) < request->from) { -error_setg(errp, - "integer overflow detected, you're probably being attacked"); -return -EINVAL; -} - if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE) { if (request->len > NBD_MAX_BUFFER_SIZE) { error_setg(errp, "len (%" PRIu32" ) is larger than max len (%u)", related idea here: if request->len > NBD_MAX_BUFFER_SIZE or if we failed to allocate buffer in following if, we can call nbd_drop to read CMD_WRITE payload and set req->complete = true;, to keep connection in this cases. However, it may be done later. @@ -1399,12 +1390,21 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request, request->len); } -/* Sanity checks, part 2. */ -if (request->from + request->len > client->exp->size) { +/* Sanity checks. */ +if (client->exp->nbdflags & NBD_FLAG_READ_ONLY && +(request->type == NBD_CMD_WRITE || + request->type == NBD_CMD_WRITE_ZEROES || + request->type == NBD_CMD_TRIM)) { +error_setg(errp, "Export is read-only"); +return -EROFS; +} +if (request->from > client->exp->size || +request->from + request->len > client->exp->size) { error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" PRIu32 ", Size: %" PRIu64, request->from, request->len, (uint64_t)client->exp->size); -return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL; +return (request->type == NBD_CMD_WRITE || +request->type == NBD_CMD_WRITE_ZEROES) ? -ENOSPC : -EINVAL; } valid_flags = NBD_CMD_FLAG_FUA; if (request->type == NBD_CMD_READ && client->structured_reply) { @@ -1482,12 +1482,6 @@ static coroutine_fn void nbd_trip(void *opaque) break; case NBD_CMD_WRITE: -if (exp->nbdflags & NBD_FLAG_READ_ONLY) { -error_setg(&local_err, "Export is read-only"); -ret = -EROFS; -break; -} - flags = 0; if (request.flags & NBD_CMD_FLAG_FUA) { flags |= BDRV_REQ_FUA; @@ -1500,12 +1494,6 @@ static coroutine_fn void nbd_trip(void *opaque) break; case NBD_CMD_WRITE_ZEROES: -if (exp->nbdflags & NBD_FLAG_READ_ONLY) { -error_setg(&local_err, "Export is read-only"); -ret = -EROFS; -break; -} - flags = 0; if (request.flags & NBD_CMD_FLAG_FUA) { flags |= BDRV_REQ_FUA; Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017年11月16日 13:53, Longpeng (Mike) wrote: On 2017/11/15 23:54, Longpeng(Mike) wrote: 2017-11-15 23:05 GMT+08:00 Jason Wang: On 2017年11月15日 22:55, Longpeng(Mike) wrote: Hi guys, We got a BUG report from our testers yesterday, the testing scenario was migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 queues*). We found the cause reason, and we'll report the BUG or send a fix patch to upstream if necessary( we haven't test the upstream yet, sorry... ). Could you explain this a little bit more? We want to know why the vhost_net_start() must start*total queues* ( in our VM there're 7 queues ) but not*the queues that current used* ( in our VM, guest only uses the first 4 queues because it's limited by the number of vcpus) ? Looking forward to your help, thx:) Since the codes have been there for years and works well for kernel datapath. You should really explain what's wrong. OK.:) In our scenario, the Windows's virtio-net driver only use the first 4 queues and it *only set desc/avail/used table for the first 4 queues*, so in QEMU the desc/avail/ used of the last 3 queues are ZERO, but unfortunately... ''' vhost_net_start for (i = 0; i < total_queues; i++) vhost_net_start_one vhost_dev_start vhost_virtqueue_start ''' In vhost_virtqueue_start(), it will calculate the HVA of desc/avail/used table, so for last 3 queues, it will use ZERO as the GPA to calculate the HVA, and then send the results to the user-mode backend ( we use*vhost-user* ) by vhost_virtqueue_set_addr(). When the EVS get these address, it will update a*idx* which will be treated as vq's last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). So we get the following result after virtio-net stop: the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's last_avail_idx is NOT ZERO. At last, virtio_load() reports an error: ''' if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- will be TRUE error_report("VQ %d address 0x0 " "inconsistent with Host index 0x%x", i, vdev->vq[i].last_avail_idx); return -1; } ''' BTW, the problem won't appear if use Linux guest, because the Linux virtio-net driver will set all 7 queues's desc/avail/used tables. And the problem won't appear if the VM use vhost-net, because vhost-net won't update*idx* in SET_ADDR ioctl. Just to make sure I understand here, I thought Windows guest + vhost_net hit this issue? Thanks Sorry for my pool English, Maybe I could describe the problem in Chinese for you in private if necessary. Thanks -- Regards, Longpeng(Mike)
[Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl
The bios-tables-test was writing out files that we pass to iasl in with the wrong endianness in the header when running on a big endian host. So instead of storing mixed endian information in our structures, let's keep everything in little endian and byte-swap it only when we need a value in the code. Reported-by: Daniel P. Berrange Buglink: https://bugs.launchpad.net/qemu/+bug/1724570 Suggested-by: Michael S. Tsirkin Signed-off-by: Thomas Huth --- tests/acpi-utils.h | 27 +-- tests/bios-tables-test.c | 42 ++ 2 files changed, 27 insertions(+), 42 deletions(-) diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h index f8d8723..d5ca5b6 100644 --- a/tests/acpi-utils.h +++ b/tests/acpi-utils.h @@ -28,24 +28,9 @@ typedef struct { bool tmp_files_retain; /* do not delete the temp asl/aml */ } AcpiSdtTable; -#define ACPI_READ_FIELD(field, addr) \ -do { \ -switch (sizeof(field)) { \ -case 1:\ -field = readb(addr); \ -break; \ -case 2:\ -field = readw(addr); \ -break; \ -case 4:\ -field = readl(addr); \ -break; \ -case 8:\ -field = readq(addr); \ -break; \ -default: \ -g_assert(false); \ -} \ +#define ACPI_READ_FIELD(field, addr)\ +do {\ +memread(addr, &field, sizeof(field)); \ addr += sizeof(field); \ } while (0); @@ -74,16 +59,14 @@ typedef struct { } while (0); #define ACPI_ASSERT_CMP(actual, expected) do { \ -uint32_t ACPI_ASSERT_CMP_le = cpu_to_le32(actual); \ char ACPI_ASSERT_CMP_str[5] = {}; \ -memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 4); \ +memcpy(ACPI_ASSERT_CMP_str, &actual, 4); \ g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \ } while (0) #define ACPI_ASSERT_CMP64(actual, expected) do { \ -uint64_t ACPI_ASSERT_CMP_le = cpu_to_le64(actual); \ char ACPI_ASSERT_CMP_str[9] = {}; \ -memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 8); \ +memcpy(ACPI_ASSERT_CMP_str, &actual, 8); \ g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \ } while (0) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 564da45..e28e0c9 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -96,17 +96,20 @@ static void test_acpi_rsdp_table(test_data *data) static void test_acpi_rsdt_table(test_data *data) { AcpiRsdtDescriptorRev1 *rsdt_table = &data->rsdt_table; -uint32_t addr = data->rsdp_table.rsdt_physical_address; +uint32_t addr = le32_to_cpu(data->rsdp_table.rsdt_physical_address); uint32_t *tables; int tables_nr; uint8_t checksum; +uint32_t rsdt_table_length; /* read the header */ ACPI_READ_TABLE_HEADER(rsdt_table, addr); ACPI_ASSERT_CMP(rsdt_table->signature, "RSDT"); +rsdt_table_length = le32_to_cpu(rsdt_table->length); + /* compute the table entries in rsdt */ -tables_nr = (rsdt_table->length - sizeof(AcpiRsdtDescriptorRev1)) / +tables_nr = (rsdt_table_length - sizeof(AcpiRsdtDescriptorRev1)) / sizeof(uint32_t); g_assert(tables_nr > 0); @@ -114,7 +117,7 @@ static void test_acpi_rsdt_table(test_data *data) tables = g_new0(uint32_t, tables_nr); ACPI_READ_ARRAY_PTR(tables, tables_nr, addr); -checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table->length) + +checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table_length) + acpi_calc_checksum((uint8_t *)tables, tables_nr * sizeof(uint32_t)); g_assert(!checksum); @@ -130,7 +133,7 @@ static void test_acpi_fadt_table(test_data *data) uint32_t addr; /* FADT table comes first */ -addr = data->rsdt_tables_addr[0]; +addr = le32_to_cpu(data->rsdt_tables_addr[0]); ACPI_READ_TABLE_HEADER(fadt_table, addr); ACPI_READ_FIELD(fadt_table->firmware_ctrl, addr); @@ -187,13 +190,14 @@ static void test_acpi_fadt_table(test_data *data) ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe1_block, addr); ACPI_ASSERT_CMP(fadt_table->signature, "FACP"); -g_assert(!acpi_calc_checksum((uint8_t *)fadt_table, fadt_table->length)); +g_assert(!acpi_calc_checksum((uint8_t *)fadt_table, + le32_to_cpu(fadt_table->length))); } static void test_acpi_facs_table(test_d
Re: [Qemu-devel] [PATCH v8 02/14] block/dirty-bitmap: add locked version of bdrv_release_dirty_bitmap
11.11.2017 01:52, John Snow wrote: On 10/30/2017 12:32 PM, Vladimir Sementsov-Ogievskiy wrote: It is needed to realize bdrv_dirty_bitmap_release_successor in the following patch. OK, but... Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/dirty-bitmap.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c index 81adbeb6d4..981f99d362 100644 --- a/block/dirty-bitmap.c +++ b/block/dirty-bitmap.c @@ -326,13 +326,13 @@ static bool bdrv_dirty_bitmap_has_name(BdrvDirtyBitmap *bitmap) return !!bdrv_dirty_bitmap_name(bitmap); } -/* Called with BQL taken. */ -static void bdrv_do_release_matching_dirty_bitmap( +/* Called within bdrv_dirty_bitmap_lock..unlock */ ...Add this so it will compile: __attribute__((__unused__)) ok +static void bdrv_do_release_matching_dirty_bitmap_locked( BlockDriverState *bs, BdrvDirtyBitmap *bitmap, bool (*cond)(BdrvDirtyBitmap *bitmap)) { BdrvDirtyBitmap *bm, *next; -bdrv_dirty_bitmaps_lock(bs); + QLIST_FOREACH_SAFE(bm, &bs->dirty_bitmaps, list, next) { if ((!bitmap || bm == bitmap) && (!cond || cond(bm))) { assert(!bm->active_iterators); @@ -344,18 +344,33 @@ static void bdrv_do_release_matching_dirty_bitmap( g_free(bm); if (bitmap) { -goto out; +return; } } } + if (bitmap) { abort(); } Do we have any style guide rules on using abort() instead of assert()? The rest of this function uses assert, and it'd be less lines to simply write: assert(!bitmap); which I think might also carry better semantic information for coverity beyond an actual runtime conditional branch. (I think. Please correct me if I am wrong, I'm a little hazy on this.) agree, but it is a preexisting code, so I'll fix it with an additional patch. +} -out: +/* Called with BQL taken. */ +static void bdrv_do_release_matching_dirty_bitmap( +BlockDriverState *bs, BdrvDirtyBitmap *bitmap, +bool (*cond)(BdrvDirtyBitmap *bitmap)) +{ +bdrv_dirty_bitmaps_lock(bs); +bdrv_do_release_matching_dirty_bitmap_locked(bs, bitmap, cond); bdrv_dirty_bitmaps_unlock(bs); } +/* Called within bdrv_dirty_bitmap_lock..unlock */ +static void bdrv_release_dirty_bitmap_locked(BlockDriverState *bs, + BdrvDirtyBitmap *bitmap) +{ +bdrv_do_release_matching_dirty_bitmap_locked(bs, bitmap, NULL); +} + /* Called with BQL taken. */ void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap) { If you agree with those two changes, you may add: ok Reviewed-by: John Snow -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 2/2] Add new PCI ID for i82559a
On 2017年11月16日 14:40, Stefan Weil wrote: Am 15.11.2017 um 14:09 schrieb Michael Nawrocki: Hi Stefan, I've got a VxWorks driver binary that explicitly looks for device ID 0x1030 (which is admittedly not ideal). It seems like the "82559 InBusiness 10/100" hardware uses this, though I've had trouble finding an official source. The following documents reference that ID: https://pci-ids.ucw.cz/read/PC/8086/1030 http://ks.pams.ncsu.edu/pub/ncsuscyld/i386/misc/src/trees/hdstg2/modules/pcitable https://cateee.net/lkddb/web-lkddb/E100.html And I found a similar post on a different mailing list that might shed some light: http://www.beowulf.org/pipermail/eepro100/2000-January/000760.html It looks like the 8255x series of devices have a number of potential IDs; maybe a property to set a specific PCI device ID would work? Thanks, Mike Yes, that might be a very general solution which could be applied to all PCI devices. It could even be extended to include the vendor ID or any PCI configuration value as well. Nevertheless the technically correct solution would be a full emulation of the EEPROM. Then we could provide an EEPROM for the 82559 InBusiness 10/100", and the data from that EEPROM would set the right PCI device ID. Jason, until we get a better solution, the last commit should be reverted before the new QEMU version is made. That commit also added a wrong help text. Regards Stefan Let me post a patch soon. Thanks
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
> -Original Message- > From: Jason Wang [mailto:jasow...@redhat.com] > Sent: Thursday, November 16, 2017 4:55 PM > To: longpeng; m...@redhat.com > Cc: Longpeng(Mike); qemu-devel@nongnu.org; Gonglei (Arei); Wangjing (King, > Euler); Huangweidong (C); stefa...@redhat.com > Subject: Re: [Question] why need to start all queues in vhost_net_start > > > > On 2017年11月16日 13:53, Longpeng (Mike) wrote: > > On 2017/11/15 23:54, Longpeng(Mike) wrote: > >> 2017-11-15 23:05 GMT+08:00 Jason Wang: > >>> On 2017年11月15日 22:55, Longpeng(Mike) wrote: > Hi guys, > > We got a BUG report from our testers yesterday, the testing scenario was > migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 > queues*). > > We found the cause reason, and we'll report the BUG or send a fix patch > to upstream if necessary( we haven't test the upstream yet, sorry... ). > >>> Could you explain this a little bit more? > >>> > We want to know why the vhost_net_start() must start*total queues* > ( in > our > VM there're 7 queues ) but not*the queues that current used* ( in our > VM, > guest > only uses the first 4 queues because it's limited by the number of vcpus) > ? > > Looking forward to your help, thx:) > >>> Since the codes have been there for years and works well for kernel > >>> datapath. You should really explain what's wrong. > >>> > >> OK.:) > >> > >> In our scenario, the Windows's virtio-net driver only use the first 4 > >> queues and it > >> *only set desc/avail/used table for the first 4 queues*, so in QEMU > >> the desc/avail/ > >> used of the last 3 queues are ZERO, but unfortunately... > >> ''' > >> vhost_net_start > >>for (i = 0; i < total_queues; i++) > >> vhost_net_start_one > >>vhost_dev_start > >> vhost_virtqueue_start > >> ''' > >> In vhost_virtqueue_start(), it will calculate the HVA of > >> desc/avail/used table, so for last > >> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then > >> send the results > >> to the user-mode backend ( we use*vhost-user* ) by > vhost_virtqueue_set_addr(). > >> > >> When the EVS get these address, it will update a*idx* which will be > >> treated as vq's > >> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). > >> > >> So we get the following result after virtio-net stop: > >>the desc/avail/used of the last 3 queues's vqs are all ZERO, but these > vqs's > >>last_avail_idx is NOT ZERO. > >> > >> At last, virtio_load() reports an error: > >> ''' > >>if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- > >> will be TRUE > >>error_report("VQ %d address 0x0 " > >> "inconsistent with Host index 0x%x", > >> i, vdev->vq[i].last_avail_idx); > >> return -1; > >> } > >> ''' > >> > >> BTW, the problem won't appear if use Linux guest, because the Linux > virtio-net > >> driver will set all 7 queues's desc/avail/used tables. And the problem > >> won't appear > >> if the VM use vhost-net, because vhost-net won't update*idx* in > SET_ADDR ioctl. > > Just to make sure I understand here, I thought Windows guest + vhost_net > hit this issue? > No, Windows guest + vhost-user/DPDK. BTW pls see virtio spec in : "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be used SHOULD be populated with receive buffers." It is not mandatory that all queues must be initialized. Thanks, -Gonglei
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017年11月16日 17:01, Gonglei (Arei) wrote: -Original Message- From: Jason Wang [mailto:jasow...@redhat.com] Sent: Thursday, November 16, 2017 4:55 PM To: longpeng; m...@redhat.com Cc: Longpeng(Mike); qemu-devel@nongnu.org; Gonglei (Arei); Wangjing (King, Euler); Huangweidong (C); stefa...@redhat.com Subject: Re: [Question] why need to start all queues in vhost_net_start On 2017年11月16日 13:53, Longpeng (Mike) wrote: On 2017/11/15 23:54, Longpeng(Mike) wrote: 2017-11-15 23:05 GMT+08:00 Jason Wang: On 2017年11月15日 22:55, Longpeng(Mike) wrote: Hi guys, We got a BUG report from our testers yesterday, the testing scenario was migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 queues*). We found the cause reason, and we'll report the BUG or send a fix patch to upstream if necessary( we haven't test the upstream yet, sorry... ). Could you explain this a little bit more? We want to know why the vhost_net_start() must start*total queues* ( in our VM there're 7 queues ) but not*the queues that current used* ( in our VM, guest only uses the first 4 queues because it's limited by the number of vcpus) ? Looking forward to your help, thx:) Since the codes have been there for years and works well for kernel datapath. You should really explain what's wrong. OK.:) In our scenario, the Windows's virtio-net driver only use the first 4 queues and it *only set desc/avail/used table for the first 4 queues*, so in QEMU the desc/avail/ used of the last 3 queues are ZERO, but unfortunately... ''' vhost_net_start for (i = 0; i < total_queues; i++) vhost_net_start_one vhost_dev_start vhost_virtqueue_start ''' In vhost_virtqueue_start(), it will calculate the HVA of desc/avail/used table, so for last 3 queues, it will use ZERO as the GPA to calculate the HVA, and then send the results to the user-mode backend ( we use*vhost-user* ) by vhost_virtqueue_set_addr(). When the EVS get these address, it will update a*idx* which will be treated as vq's last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). So we get the following result after virtio-net stop: the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's last_avail_idx is NOT ZERO. At last, virtio_load() reports an error: ''' if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- will be TRUE error_report("VQ %d address 0x0 " "inconsistent with Host index 0x%x", i, vdev->vq[i].last_avail_idx); return -1; } ''' BTW, the problem won't appear if use Linux guest, because the Linux virtio-net driver will set all 7 queues's desc/avail/used tables. And the problem won't appear if the VM use vhost-net, because vhost-net won't update*idx* in SET_ADDR ioctl. Just to make sure I understand here, I thought Windows guest + vhost_net hit this issue? No, Windows guest + vhost-user/DPDK. BTW pls see virtio spec in : "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be used SHOULD be populated with receive buffers." It is not mandatory that all queues must be initialized. Thanks, -Gonglei Interesting, vhost_net will set last_avail_idx to vq.num during SET_VRING_BASE. So I thought it should hit this. Btw, maybe we should relax the check to: if (!vdev->vq[i].vring.desc && (vdev->vq[i].last_avail_idx != vdev->vq[i].vring.num)) { Thanks
Re: [Qemu-devel] [PATCH] s390/kvm_virtio/linux-headers: remove traces of old virtio transport
On Thu, 16 Nov 2017 08:45:09 +0100 Christian Borntraeger wrote: > On 11/15/2017 06:10 PM, Cornelia Huck wrote: > > On Wed, 15 Nov 2017 16:42:23 +0100 > > Christian Borntraeger wrote: > > > >> We no longer support the old s390 transport, neither does the newest > >> Linux kernel. Remove it from the linux header script as well as the > >> s390x virtio code. We still should handle the VIRTIO_NOTIFY hypercall, > >> to tolerate early printk on older guest kernels without an sclp console. > > > > Are there any such guests still around? Wouldn't they be unable to run > > because of the missing old transport anyway? > > As far as I can see even an 4.13 will do > > static int __init s390_virtio_console_init(void) > { > if (sclp.has_vt220 || sclp.has_linemode) > return -ENODEV; > return virtio_cons_early_init(early_put_chars); > } > console_initcall(s390_virtio_console_init); > > No matter if there is the old transport or not available. > > So as soon as somebody chooses virtio-console you should see the diag500 from > the early > printk. Grmpf, and the first condition does not trigger when you don't define any sclp console. Oh well, it seems we still have to drag this along :( > > > >> We continue to ignore these events. > >> > >> Signed-off-by: Christian Borntraeger > >> --- > >> hw/s390x/s390-virtio-hcall.h | 6 ++- > >> include/standard-headers/asm-s390/kvm_virtio.h | 64 > >> -- > >> scripts/update-linux-headers.sh| 1 - 3 files > >> changed, 4 insertions(+), 67 deletions(-) delete mode 100644 > >> include/standard-headers/asm-s390/kvm_virtio.h > > I think this becomes relevant only when someone does a headers update against 4.15+, which is unlikely to happen in freeze. So I'll queue this to s390-next only.
Re: [Qemu-devel] [PATCH v1] ps2: check PS2Queue pointers in post_load routine
On 16/11/2017 08:51, P J P wrote: > From: Prasad J Pandit > > During Qemu guest migration, a destination process invokes ps2 > post_load function. In that, if 'rptr' and 'count' values were > invalid, it could lead to OOB access or infinite loop issue. > Add check to avoid it. > > Reported-by: Cyrille Chatras > Signed-off-by: Prasad J Pandit > --- > hw/input/ps2.c | 21 + > 1 file changed, 9 insertions(+), 12 deletions(-) > > Update: avoid changing field type, add range check on values > -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02815.html > > diff --git a/hw/input/ps2.c b/hw/input/ps2.c > index f388a23c8e..de171a28dd 100644 > --- a/hw/input/ps2.c > +++ b/hw/input/ps2.c > @@ -1225,24 +1225,21 @@ static void ps2_common_reset(PS2State *s) > static void ps2_common_post_load(PS2State *s) > { > PS2Queue *q = &s->queue; > -int size; > -int i; > -int tmp_data[PS2_QUEUE_SIZE]; > +uint8_t i, size; > +uint8_t tmp_data[PS2_QUEUE_SIZE]; Hi Prasad, you don't need to change the invalid values to sane ones. Instead, make ps2_common_post_load return an int (just like the .post_load member of VMStateDescription). You can then detect out of range count/rptr/wptr and return -1 for bad indices. In the callers, ps2_common_post_load(ps2); return 0; can then become simply return ps2_common_post_load(ps2); Thanks, Paolo > /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */ > -size = q->count > PS2_QUEUE_SIZE ? 0 : q->count; > +size = (q->count < 0 || q->count > PS2_QUEUE_SIZE) ? 0 : q->count; > > /* move the queue elements to the start of data array */ > -if (size > 0) { > -for (i = 0; i < size; i++) { > -/* move the queue elements to the temporary buffer */ > -tmp_data[i] = q->data[q->rptr]; > -if (++q->rptr == 256) { > -q->rptr = 0; > -} > +for (i = 0; i < size; i++) { > +if (q->rptr < 0 || q->rptr >= sizeof(q->data)) { > +q->rptr = 0; > } > -memcpy(q->data, tmp_data, size); > +tmp_data[i] = q->data[q->rptr++]; > } > +memcpy(q->data, tmp_data, size); > + > /* reset rptr/wptr/count */ > q->rptr = 0; > q->wptr = size; >
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017/11/16 16:54, Jason Wang wrote: > > > On 2017年11月16日 13:53, Longpeng (Mike) wrote: >> On 2017/11/15 23:54, Longpeng(Mike) wrote: >>> 2017-11-15 23:05 GMT+08:00 Jason Wang: On 2017年11月15日 22:55, Longpeng(Mike) wrote: > Hi guys, > > We got a BUG report from our testers yesterday, the testing scenario was > migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 > queues*). > > We found the cause reason, and we'll report the BUG or send a fix patch > to upstream if necessary( we haven't test the upstream yet, sorry... ). Could you explain this a little bit more? > We want to know why the vhost_net_start() must start*total queues* ( in > our > VM there're 7 queues ) but not*the queues that current used* ( in our VM, > guest > only uses the first 4 queues because it's limited by the number of vcpus) > ? > > Looking forward to your help, thx:) Since the codes have been there for years and works well for kernel datapath. You should really explain what's wrong. >>> OK.:) >>> >>> In our scenario, the Windows's virtio-net driver only use the first 4 >>> queues and it >>> *only set desc/avail/used table for the first 4 queues*, so in QEMU >>> the desc/avail/ >>> used of the last 3 queues are ZERO, but unfortunately... >>> ''' >>> vhost_net_start >>>for (i = 0; i < total_queues; i++) >>> vhost_net_start_one >>>vhost_dev_start >>> vhost_virtqueue_start >>> ''' >>> In vhost_virtqueue_start(), it will calculate the HVA of >>> desc/avail/used table, so for last >>> 3 queues, it will use ZERO as the GPA to calculate the HVA, and then >>> send the results >>> to the user-mode backend ( we use*vhost-user* ) by >>> vhost_virtqueue_set_addr(). >>> >>> When the EVS get these address, it will update a*idx* which will be >>> treated as vq's >>> last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). >>> >>> So we get the following result after virtio-net stop: >>>the desc/avail/used of the last 3 queues's vqs are all ZERO, but these >>> vqs's >>>last_avail_idx is NOT ZERO. >>> >>> At last, virtio_load() reports an error: >>> ''' >>>if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- >>> will be TRUE >>>error_report("VQ %d address 0x0 " >>> "inconsistent with Host index 0x%x", >>> i, vdev->vq[i].last_avail_idx); >>> return -1; >>> } >>> ''' >>> >>> BTW, the problem won't appear if use Linux guest, because the Linux >>> virtio-net >>> driver will set all 7 queues's desc/avail/used tables. And the problem >>> won't appear >>> if the VM use vhost-net, because vhost-net won't update*idx* in SET_ADDR >>> ioctl. > > Just to make sure I understand here, I thought Windows guest + vhost_net hit > this issue? > Windows guest + vhost-user hit. Windows guest + vhost-net is fine. ''' In vhost_virtqueue_start(), it will calculate the HVA of desc/avail/used tables, so for last 3 queues, it will use ZERO as the GPA to calculate the HVA, and then send the results to the user-mode backend ( we use *vhost-user* ) by vhost_virtqueue_set_addr(). ''' I think this is the root cause, it is strange, right ? > Thanks > >>> >>> Sorry for my pool English, Maybe I could describe the problem in Chinese >>> for you >>> in private if necessary. >>> >>> Thanks >> -- Regards, Longpeng(Mike) > > > . > -- Regards, Longpeng(Mike)
Re: [Qemu-devel] sheepdog block driver and read write error policy
2017-11-16 11:27 GMT+03:00 Fam Zheng : > On Thu, 11/16 11:11, Vasiliy Tolstov wrote: >> Hi. I'm try to write own sheepdog compatible daemon and test it with qemu. >> Sometimes ago in qemu added read write error policy to allow to stop >> domain or continue or something else. As i see in case of sheepdog >> this policy is ignored and qemu try to reconnect to sheepdog daemon. >> If nobody wants i can try to fix this, and if policy is not specified >> work like now. >> Where i need to start to easy understand how this works in case of file raw ? > > The driver callbacks (sd_co_readv/sd_co_writev) should simply return error > instead of retrying. > Thanks, how can i pass options to block driver? (as i understand read/write policy affects concrete hardware - scsi/ide and need to be passed down to lower level) Also about sheepdog driver, i'm re-read discussion https://patchwork.ozlabs.org/patch/501533/ and have a question - if all logic about overlapping requests and oids calculation go from qemu to sheepdog daemon, does this be slowdown or not in case of iops? (I think that if qemu driver only read/write/discard some block data it can be more simplify and have less errors, and sheepdog daemon can handle internally all overlapping and inode updates) I'm understand that this may be completely new sheepdog qemu driver, but i think that if sheepdog driver is more like nbd/rbd it can be more simple... -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] [Qemu-block] [PATCH v2] qapi: block-core: Clarify events emitted by 'block-job-cancel'
On Wed, Nov 15, 2017 at 04:56:13PM -0500, John Snow wrote: > > > On 11/15/2017 04:54 PM, Kashyap Chamarthy wrote: > > On Wed, Nov 15, 2017 at 02:15:57PM -0500, John Snow wrote: [...] > >> is it covered sufficiently in live-block-operations.rst ? > > > > I looked in there[2] too. Short answer: no. Long: In the "Live disk > > synchronization — drive-mirror and blockdev-mirror" section, I simply > > seemed to declare: > > > > "Issuing the command ``block-job-cancel`` after it emits the event > > ``BLOCK_JOB_CANCELLED``" > > > > As if that's the *only* event it emits, which is clearly not the case. > > So while at it, wonder if should I also update it > > ('live-block-operations.rst') too. > > > > It's an interesting gotcha that I wasn't really acutely aware of myself, > so having it in the doc format for API programmers who aren't > necessarily digging through our source sounds like a pleasant courtesy. Indeed, will do. (Just for my own clarity, did you imply: don't update it in block-core.json? FWIW, my first instinct is to check the QAPI documentation for such things, that's why I wrote there first :-)) Thanks for looking. [...] -- /kashyap
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017年11月16日 17:01, Gonglei (Arei) wrote: No, Windows guest + vhost-user/DPDK. BTW pls see virtio spec in : "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be used SHOULD be populated with receive buffers." It is not mandatory that all queues must be initialized. I think not, since it said we should fill receive buffers for each queue which means we should initialize all queues. May Michael can clarify on this. Thanks Thanks, -Gonglei
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
Hi Jason, On 2017/11/16 17:13, Jason Wang wrote: > > > On 2017年11月16日 17:01, Gonglei (Arei) wrote: >> No, Windows guest + vhost-user/DPDK. >> >> BTW pls see virtio spec in : >> >> "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will >> be used SHOULD be populated >> with receive buffers." >> >> It is not mandatory that all queues must be initialized. > > I think not, since it said we should fill receive buffers for each queue which > means we should initialize all queues. May Michael can clarify on this. > I think this doesn't matter, but QEMU should consider this scenario... For example, if one queues isn't initialized (Windows guest), the vring.avail=0, so vq->desc_phys=0, then vq->desc='a avail HVA'(which is the start addr of pc.ram). vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx); vq->desc_phys = a = virtio_queue_get_desc_addr(vdev, idx); vq->desc = vhost_memory_map(dev, a, &l, 0); if (!vq->desc || l != s) { r = -ENOMEM; goto fail_alloc_desc; } . r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled); if (r < 0) { r = -errno; goto fail_alloc; } Then the HVA is send to the vhost-user. I think this is wrong, because the '0' here means guest driver doesn't init this queues, it should not be used to calculate the HVA for this vq. > Thanks > >> >> Thanks, >> -Gonglei >> > > > > > . > -- Regards, Longpeng(Mike)
Re: [Qemu-devel] [PATCH v3] xen-disk: use an IOThread per instance
> -Original Message- > From: Stefano Stabellini [mailto:sstabell...@kernel.org] > Sent: 16 November 2017 01:11 > To: Paul Durrant > Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Stefano > Stabellini ; Anthony Perard > ; Kevin Wolf ; Max Reitz > > Subject: RE: [PATCH v3] xen-disk: use an IOThread per instance > > On Wed, 15 Nov 2017, Paul Durrant wrote: > > Anthony, Stefano, > > > > Ping? > > Acked-by: Stefano Stabellini > > Unless Anthony or somebody else object, I'll queue it up in my "next" > branch (which I'll send upstream after 2.11 is out). Great. Thanks, Paul > > Cheers, > > Stefano > > > > > -Original Message- > > > From: Paul Durrant [mailto:paul.durr...@citrix.com] > > > Sent: 07 November 2017 10:47 > > > To: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org > > > Cc: Paul Durrant ; Stefano Stabellini > > > ; Anthony Perard ; > > > Kevin Wolf ; Max Reitz > > > Subject: [PATCH v3] xen-disk: use an IOThread per instance > > > > > > This patch allocates an IOThread object for each xen_disk instance and > > > sets the AIO context appropriately on connect. This allows processing > > > of I/O to proceed in parallel. > > > > > > The patch also adds tracepoints into xen_disk to make it possible to > > > follow the state transtions of an instance in the log. > > > > > > Signed-off-by: Paul Durrant > > > --- > > > Cc: Stefano Stabellini > > > Cc: Anthony Perard > > > Cc: Kevin Wolf > > > Cc: Max Reitz > > > > > > v3: > > > - Use new iothread_create/destroy() functions > > > > > > v2: > > > - explicitly acquire and release AIO context in qemu_aio_complete() and > > >blk_bh() > > > --- > > > hw/block/trace-events | 7 +++ > > > hw/block/xen_disk.c | 53 > > > --- > > > 2 files changed, 53 insertions(+), 7 deletions(-) > > > > > > diff --git a/hw/block/trace-events b/hw/block/trace-events > > > index cb6767b3ee..962a3bfa24 100644 > > > --- a/hw/block/trace-events > > > +++ b/hw/block/trace-events > > > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *vdev, void *mrb, > int > > > start, int num_reqs, uint6 > > > # hw/block/hd-geometry.c > > > hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p > > > LCHS %d %d %d" > > > hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t > secs, > > > int trans) "blk %p CHS %u %u %u trans %d" > > > + > > > +# hw/block/xen_disk.c > > > +xen_disk_alloc(char *name) "%s" > > > +xen_disk_init(char *name) "%s" > > > +xen_disk_connect(char *name) "%s" > > > +xen_disk_disconnect(char *name) "%s" > > > +xen_disk_free(char *name) "%s" > > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > > > index e431bd89e8..f74fcd42d1 100644 > > > --- a/hw/block/xen_disk.c > > > +++ b/hw/block/xen_disk.c > > > @@ -27,10 +27,12 @@ > > > #include "hw/xen/xen_backend.h" > > > #include "xen_blkif.h" > > > #include "sysemu/blockdev.h" > > > +#include "sysemu/iothread.h" > > > #include "sysemu/block-backend.h" > > > #include "qapi/error.h" > > > #include "qapi/qmp/qdict.h" > > > #include "qapi/qmp/qstring.h" > > > +#include "trace.h" > > > > > > /* - */ > > > > > > @@ -125,6 +127,9 @@ struct XenBlkDev { > > > DriveInfo *dinfo; > > > BlockBackend*blk; > > > QEMUBH *bh; > > > + > > > +IOThread*iothread; > > > +AioContext *ctx; > > > }; > > > > > > /* - */ > > > @@ -596,9 +601,12 @@ static int ioreq_runio_qemu_aio(struct ioreq > > > *ioreq); > > > static void qemu_aio_complete(void *opaque, int ret) > > > { > > > struct ioreq *ioreq = opaque; > > > +struct XenBlkDev *blkdev = ioreq->blkdev; > > > + > > > +aio_context_acquire(blkdev->ctx); > > > > > > if (ret != 0) { > > > -xen_pv_printf(&ioreq->blkdev->xendev, 0, "%s I/O error\n", > > > +xen_pv_printf(&blkdev->xendev, 0, "%s I/O error\n", > > >ioreq->req.operation == BLKIF_OP_READ ? "read" : > > > "write"); > > > ioreq->aio_errors++; > > > } > > > @@ -607,10 +615,10 @@ static void qemu_aio_complete(void *opaque, > int > > > ret) > > > if (ioreq->presync) { > > > ioreq->presync = 0; > > > ioreq_runio_qemu_aio(ioreq); > > > -return; > > > +goto done; > > > } > > > if (ioreq->aio_inflight > 0) { > > > -return; > > > +goto done; > > > } > > > > > > if (xen_feature_grant_copy) { > > > @@ -647,16 +655,19 @@ static void qemu_aio_complete(void *opaque, > int > > > ret) > > > } > > > case BLKIF_OP_READ: > > > if (ioreq->status == BLKIF_RSP_OKAY) { > > > -block_acct_done(blk_get_stats(ioreq->blkdev->blk), &ioreq- > >acct); > > > +block_acct_done(blk_get_stats(blkdev->blk), &ioreq->acct); > > > } els
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Tue, Nov 14, 2017 at 01:06:17PM -0600, Eric Blake wrote: > On 11/14/2017 11:37 AM, Wouter Verhelst wrote: > > On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote: > >> Another thought - with structured replies, we finally have a way to let > >> the client ask for the server to send resize information whenever the > >> server wants, rather than having to be polled by a new client request > >> all the time. This is possible by having the server reply with a chunk > >> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants, > >> (that is, the server never officially ends the response to the single > >> client request for on-going status, until the client sends an > >> NBD_CMD_DISC). > > > > Hrm, yeah, that could work. > > > > Minor downside of this would be that a client would now be expected to > > continue listening "forever" (probably needs to do a blocking read() or > > a select() on the socket), whereas with the current situation a client > > could get away with only reading for as long as it expects data. > > > > I don't think that should be a blocker, but it might be something we > > might want to document. > > > >> I don't think the server should go into this mode without a flag bit > >> from the client requesting it (as it potentially ties up a thread that > >> could otherwise be used for parallel processing of other requests), > > > > Yeah. I think we should probably initiate this with a BLOCK_STATUS > > message that has a flag with which we mean "don't stop sending data on > > the given region for contexts that support it". > > Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE; Eh, right -- I had forgotten about RESIZE, actually :-) > I was thinking of the open-ended command for being informed of all > server-side-initiated size changes in response to RESIZE; but your mention of > an open-ended BLOCK_STATUS has an interesting connotation of being able to > get live updates as sections of a file are dirtied. For instance, or whatever other metadata we end up sending through BLOCK_STATUS. > I also remember from talking with Vladimir during KVM Forum last month > that one of the shortfalls of the NBD protocol is that you can only ever > send a length of up to 32 bits on the command side (unless we introduce > structured commands in addition to our current work to add structured > replies); Yes, and I'm thinking we should do so. This will obviously require more negotiation. Can be done fairly easily though: - Client negotiates structured replies (don't think it makes sense to do structured requests without structured replies) - Server sets an extra transmission flag to say "I am capable of receiving extended requests" - Extended requests have a different magic number, and should have a "request length" field as well. I'm thinking we make it: magic (32b) request length (16b) type (16b) flags (64b) handle (64b) from (64b) data length(64b) (extra data) Request length in this proposal should always be at least 320. I made flags 64 bits rather than 16 as per the current format, because that way everything is aligned on a 4-byte boundary, which makes things a bit easier on some architectures (e.g., I know that sparc doesn't like unaligned 64-bit access). 64 bits for flags looks like a bit of a waste, but then if we're going to waste some bits somewhere, I guess it's best to assign them to flags. The idea is that "request length" is the length of the data that the client is sending, and "data length" is the length of the range that we're trying to deal with. A write request would thus have to have request length be (data length + 320); a read request would have request length be 320, and expect data to be returned of data length bytes. A metadata request could then tack on extra data, where request length of 320 implies "all negotiated metadata contexts", but anything more than that would imply there are some metadata IDs passed along. etc. Thoughts? [...] > > However, I could imagine that there might be some cases wherein a server > > might be able to go into such a mode for two or more metadata contexts, > > and where a client might want to go into that mode for one of them but > > not all of them, while still wanting some information from them. > > > > This could be covered with metadata context syntax, but it's annoying > > and shouldn't be necessary. > > > > I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS > > can't take a metadata context ID. Okay, there's no space for it, but > > that shouldn't have been a blocker. > > > > Thoughts? > > Nothing says the server has to reply the same length of information when > replying for multiple selected metadata contexts; but if we allow > different reply sizes all in one query, we may also need some way to > easily tell that the server has stopped sending metadata for one context > even though it is still providing additional replie
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
Here's my solution, as a nbdkit plugin written in Perl. As with Max's solution I don't bother to parse the virtual size out of the XML file, so you need to specify that on the command line otherwise the disk will be truncated to the largest extent stored in the file. Also the ‘.xva’ file must not be compressed. $ nbdkit perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 $ guestfish --ro --format=raw -a nbd://localhost -i Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell Operating system: 8.2 /dev/sda1 mounted on / > ll / total 100 drwxr-xr-x 22 root root 4096 Jan 8 2016 . drwxr-xr-x 19 root root 4096 Nov 16 09:50 .. drwxrwxr-x 2 root root 4096 Jan 8 2016 bin drwxr-xr-x 3 root root 4096 Jan 8 2016 boot drwxr-xr-x 4 root root 4096 Jan 8 2016 dev drwxr-xr-x 87 root root 4096 Jan 8 2016 etc drwxr-xr-x 3 root root 4096 Jan 8 2016 home lrwxrwxrwx 1 root root31 Jan 8 2016 initrd.img -> /boot/initrd.img-3.16.0-4-amd64 drwxr-xr-x 14 root root 4096 Jan 8 2016 lib drwxr-xr-x 2 root root 4096 Jan 8 2016 lib64 drwx-- 2 root root 16384 Jan 8 2016 lost+found drwxr-xr-x 3 root root 4096 Jan 8 2016 media drwxr-xr-x 2 root root 4096 Jan 8 2016 mnt drwxr-xr-x 2 root root 4096 Jan 8 2016 opt drwxr-xr-x 2 root root 4096 May 4 2015 proc drwx-- 2 root root 4096 Jan 8 2016 root drwxr-xr-x 2 root root 4096 Jan 8 2016 run drwxr-xr-x 2 root root 4096 Jan 8 2016 sbin drwxr-xr-x 2 root root 4096 Jan 8 2016 srv drwxr-xr-x 2 root root 4096 Apr 6 2015 sys drwxrwxrwt 7 root root 4096 Jan 8 2016 tmp drwxr-xr-x 10 root root 4096 Jan 8 2016 usr drwxr-xr-x 11 root root 4096 Jan 8 2016 var lrwxrwxrwx 1 root root27 Jan 8 2016 vmlinuz -> boot/vmlinuz-3.16.0-4-amd64 I even managed to boot the Debian 8 guest from the sample .xva file: $ qemu-system-x86_64 -cpu host -machine accel=kvm:tcg -m 2048 -drive file=nbd:localhost:10809,format=raw,if=virtio,snapshot=on although it was pretty slow ... As mentioned before you can use this to do a qemu-img convert using captive nbdkit: $ nbdkit -U - \ perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \ --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p' HTH, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html xva-reader.pl Description: Perl program
Re: [Qemu-devel] [PATCH 0/5] compressed block-stream
On 16/11/2017 6:26 AM, Fam Zheng wrote: On Tue, 11/14 13:16, Anton Nefedov wrote: It might be useful to compress images during block-stream; this way the user can merge compressed images of a backing chain and the result will remain compressed. I haven't looked at the patches yet so maybe the answer is obvious, but still: would the "compress" option be still necessary if what we have is blockdev-stream instead? Fam Sorry, I can't see blockdev-stream available now. How that would be different? Would it imply the blocks are pulled up to the top (active) image? Or did you mean block-commit? The option might be useful there as well, and needs to be implemented separately /Anton
Re: [Qemu-devel] [PATCH 3/5] block: support compressed write for copy-on-read
On 15/11/2017 9:49 PM, Max Reitz wrote: On 2017-11-14 11:16, Anton Nefedov wrote: Signed-off-by: Anton Nefedov --- block/io.c | 30 -- block/trace-events | 2 +- 2 files changed, 25 insertions(+), 7 deletions(-) diff --git a/block/io.c b/block/io.c index 3d5ef2c..93c6b24 100644 --- a/block/io.c +++ b/block/io.c [...] @@ -1209,6 +1220,13 @@ int coroutine_fn bdrv_co_preadv(BdrvChild *child, return ret; } +/* write compressed only makes sense with copy on read */ +if ((flags & BDRV_REQ_WRITE_COMPRESSED) && +!(flags & BDRV_REQ_COPY_ON_READ)) +{ +return -EINVAL; +} + I think the assertion in bdrv_aligned_preadv() should be enough, but either way: Reviewed-by: Max Reitz Ok, and it will fail more loudly. Will remove. bdrv_inc_in_flight(bs); /* Don't do copy-on-read if we read data before write operation */
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
2017-11-16 11:01 GMT+01:00 Richard W.M. Jones : > As mentioned before you can use this to do a qemu-img convert using > captive nbdkit: > > $ nbdkit -U - \ > perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \ > --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p' What if XVA is hosting 2 or more partitions ? Which partition are you converting with this command ?
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
2017-11-15 23:55 GMT+01:00 Max Reitz : > https://xanclic.moe/convert-xva.rb -- does this work? > (It seems to works on the two example images I found...) > > An example is in the code, you use it like this: > > $ ./convert-xva.rb ~/Downloads/stats-appliance-2.36.020502.xva Ref:73 It doesn't work on huge images: # time convert-xva.rb 20171115_193814_efff_.xva Ref:10 -O qcow2 disk1.qcow2 /usr/bin/convert-xva.rb:119:in `exec': Argument list too long - qemu-img (Errno::E2BIG) from /usr/bin/convert-xva.rb:119:in `' real 9m41.414s user 0m19.280s sys 0m3.260s
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
On Thu, Nov 16, 2017 at 11:07:10AM +0100, Gandalf Corvotempesta wrote: > 2017-11-16 11:01 GMT+01:00 Richard W.M. Jones : > > As mentioned before you can use this to do a qemu-img convert using > > captive nbdkit: > > > > $ nbdkit -U - \ > > perl script=./xva-reader.pl file=./debian8cloud.xva size=4294967296 \ > > --run 'qemu-img convert -f raw $nbd -O qcow2 /var/tmp/output.qcow2 -p' > > > What if XVA is hosting 2 or more partitions ? Which partition are you > converting with this command ? See "XXX" comments in the code, left as an exercise to the reader :-) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
Re: [Qemu-devel] [PATCH 4/5] block-stream: add compress option
On 15/11/2017 10:16 PM, Max Reitz wrote: On 2017-11-14 11:16, Anton Nefedov wrote: Signed-off-by: Anton Nefedov --- qapi/block-core.json | 4 include/block/block_int.h | 4 +++- block/stream.c| 16 blockdev.c| 13 - hmp.c | 2 ++ hmp-commands.hx | 4 ++-- 6 files changed, 35 insertions(+), 8 deletions(-) Looks good to me overall, two notes below. Max diff --git a/qapi/block-core.json b/qapi/block-core.json index ab96e34..b0d022f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2007,6 +2007,9 @@ # # @speed: the maximum speed, in bytes per second # +# @compress: true to compress data, if the target format supports it. +#(default: false). Since 2.12. +# Too many full stops (one before, one after the parentheses). Also, this makes it sound a bit like it would still be possible to specify this parameter even if the target doesn't support it (and it would just be ignored then), but that's not true. Also, the format most parameter documentation follows for block-stream is more "@parameter: Description. Default. (Since version)". So I'd suggest: "true to compress data; may only be set if the target format supports it. Defaults to false if omitted. (Since 2.12)" But I won't argue too much over the format, so the following is OK, too: "true to compress data; may only be set if the target format supports it (default: false). (Since 2.12)" Thanks, will fix. [..] @@ -3034,11 +3039,17 @@ void qmp_block_stream(bool has_job_id, const char *job_id, const char *device, goto out; } +if (compress && bs->drv->bdrv_co_pwritev_compressed == NULL) { +error_setg(errp, "Compression is not supported for this drive %s", + bdrv_get_device_name(bs)); I think just device instead of bdrv_get_device_name(bs) is better (or bdrv_get_device_or_node_name(bs) at least). device should be enough indeed, done. /Anton
[Qemu-devel] [PATCH v2] ps2: check PS2Queue indices in post_load routine
From: Prasad J Pandit During Qemu guest migration, a destination process invokes ps2 post_load function. In that, if 'rptr' and 'count' values were invalid, it could lead to OOB access issue. Add check to avoid it. Reported-by: Cyrille Chatras Signed-off-by: Prasad J Pandit --- hw/input/ps2.c | 39 ++- 1 file changed, 18 insertions(+), 21 deletions(-) Update: return -1(bad indices) error, instead of correcting values -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02964.html diff --git a/hw/input/ps2.c b/hw/input/ps2.c index f388a23c8e..35392d55fc 100644 --- a/hw/input/ps2.c +++ b/hw/input/ps2.c @@ -1222,32 +1222,37 @@ static void ps2_common_reset(PS2State *s) s->update_irq(s->update_arg, 0); } -static void ps2_common_post_load(PS2State *s) +static int ps2_common_post_load(PS2State *s) { PS2Queue *q = &s->queue; int size; int i; int tmp_data[PS2_QUEUE_SIZE]; +if (q->count < 0 || q->rptr < 0 || q->rptr >= sizeof(q->data)) { +return -1; +} + /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */ size = q->count > PS2_QUEUE_SIZE ? 0 : q->count; /* move the queue elements to the start of data array */ -if (size > 0) { -for (i = 0; i < size; i++) { -/* move the queue elements to the temporary buffer */ -tmp_data[i] = q->data[q->rptr]; -if (++q->rptr == 256) { -q->rptr = 0; -} +for (i = 0; i < size; i++) { +/* move the queue elements to the temporary buffer */ +tmp_data[i] = q->data[q->rptr]; +if (++q->rptr == 256) { +q->rptr = 0; } -memcpy(q->data, tmp_data, size); } +memcpy(q->data, tmp_data, size); + /* reset rptr/wptr/count */ q->rptr = 0; q->wptr = size; q->count = size; s->update_irq(s->update_arg, q->count != 0); + +return 0; } static void ps2_kbd_reset(void *opaque) @@ -1346,9 +1351,7 @@ static int ps2_kbd_post_load(void* opaque, int version_id) if (version_id == 2) s->scancode_set=2; -ps2_common_post_load(ps2); - -return 0; +return ps2_common_post_load(ps2); } static int ps2_kbd_pre_save(void *opaque) @@ -1356,9 +1359,7 @@ static int ps2_kbd_pre_save(void *opaque) PS2KbdState *s = (PS2KbdState *)opaque; PS2State *ps2 = &s->common; -ps2_common_post_load(ps2); - -return 0; +return ps2_common_post_load(ps2); } static const VMStateDescription vmstate_ps2_keyboard = { @@ -1386,9 +1387,7 @@ static int ps2_mouse_post_load(void *opaque, int version_id) PS2MouseState *s = (PS2MouseState *)opaque; PS2State *ps2 = &s->common; -ps2_common_post_load(ps2); - -return 0; +return ps2_common_post_load(ps2); } static int ps2_mouse_pre_save(void *opaque) @@ -1396,9 +1395,7 @@ static int ps2_mouse_pre_save(void *opaque) PS2MouseState *s = (PS2MouseState *)opaque; PS2State *ps2 = &s->common; -ps2_common_post_load(ps2); - -return 0; +return ps2_common_post_load(ps2); } static const VMStateDescription vmstate_ps2_mouse = { -- 2.13.6
Re: [Qemu-devel] [PATCH] colo-compare: fix the dangerous assignment
On Thu, Nov 16, 2017 at 10:28:32AM +0800, Mao Zhongyi wrote: Cc: Peter Maydell Cc: Jason Wang Cc: Zhang Chen Cc: Li Zhijian Cc: Paolo Bonzini Fixes: 8ec14402029d783720f4312ed8a925548e1dad61 Reported-by: Peter Maydell Reported-by: Paolo Bonzini Signed-off-by: Mao Zhongyi Code-wise, this looks like a valid fix to the existing code. Reviewed-by: Darren Kenny But testing wise, have you confirmed that things are behaving as you expected with the previous patch, since previously when calling colo_compare_connection(), the value of conn would have always been its initialized value of NULL. Just want to be sure that fixing this doesn't end up breaking your expected behaviour given that all your testing before would have had a NULL value in conn. Thanks, Darren. --- net/colo-compare.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index ccdcba2..1ce195f 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -179,7 +179,7 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) "drop packet"); } } -con = &conn; +*con = conn; return 0; } -- 2.9.4
Re: [Qemu-devel] [PATCH v1] ps2: check PS2Queue pointers in post_load routine
+-- On Thu, 16 Nov 2017, Paolo Bonzini wrote --+ | you don't need to change the invalid values to sane ones. Instead, make | ps2_common_post_load return an int (just like the .post_load member of | VMStateDescription). You can then detect out of range count/rptr/wptr | and return -1 for bad indices. | | In the callers, | | ps2_common_post_load(ps2); | | return 0; | | can then become simply | | return ps2_common_post_load(ps2); Okay; Sent a revised patch v2. Thank you. -- Prasad J Pandit / Red Hat Product Security Team 47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F
Re: [Qemu-devel] [PULL 08/10] NUMA: Enable adding NUMA node implicitly
On Wed, Nov 15, 2017 at 08:18:50PM +0200, Michael S. Tsirkin wrote: > From: Dou Liyang > > Linux and Windows need ACPI SRAT table to make memory hotplug work properly, > however currently QEMU doesn't create SRAT table if numa options aren't > present > on CLI. > > Which breaks both linux and windows guests in certain conditions: > * Windows: won't enable memory hotplug without SRAT table at all > * Linux: if QEMU is started with initial memory all below 4Gb and no SRAT > table >present, guest kernel will use nommu DMA ops, which breaks 32bit hw drivers >when memory is hotplugged and guest tries to use it with that drivers. > > Fix above issues by automatically creating a numa node when QEMU is started > with > memory hotplug enabled but without '-numa' options on CLI. > (PS: auto-create numa node only for new machine types so not to break > migration). > > Which would provide SRAT table to guests without explicit -numa options on CLI > and would allow: > * Windows: to enable memory hotplug > * Linux: switch to SWIOTLB DMA ops, to bounce DMA transfers to 32bit > allocated >buffers that legacy drivers/hw can handle. > > [Rewritten by Igor] Thanks for copying me on this. Acked-by: Thadeu Lima de Souza Cascardo > > Reported-by: Thadeu Lima de Souza Cascardo > Suggested-by: Igor Mammedov > Signed-off-by: Dou Liyang > Cc: Paolo Bonzini > Cc: Richard Henderson > Cc: Eduardo Habkost > Cc: "Michael S. Tsirkin" > Cc: Marcel Apfelbaum > Cc: Igor Mammedov > Cc: David Hildenbrand > Cc: Thomas Huth > Cc: Alistair Francis > Cc: Takao Indoh > Cc: Izumi Taku > Reviewed-by: Igor Mammedov > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > --- > include/hw/boards.h | 1 + > hw/i386/pc.c| 1 + > hw/i386/pc_piix.c | 1 + > hw/i386/pc_q35.c| 1 + > numa.c | 21 - > vl.c| 3 +-- > 6 files changed, 25 insertions(+), 3 deletions(-) > > diff --git a/include/hw/boards.h b/include/hw/boards.h > index 62f160e..156b16f 100644 > --- a/include/hw/boards.h > +++ b/include/hw/boards.h > @@ -197,6 +197,7 @@ struct MachineClass { > bool ignore_memory_transaction_failures; > int numa_mem_align_shift; > const char **valid_cpu_types; > +bool auto_enable_numa_with_memhp; > void (*numa_auto_assign_ram)(MachineClass *mc, NodeInfo *nodes, > int nb_nodes, ram_addr_t size); > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index fafe5ba..c3afe5b 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -2347,6 +2347,7 @@ static void pc_machine_class_init(ObjectClass *oc, void > *data) > mc->cpu_index_to_instance_props = pc_cpu_index_to_props; > mc->get_default_cpu_node_id = pc_get_default_cpu_node_id; > mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids; > +mc->auto_enable_numa_with_memhp = true; > mc->has_hotpluggable_cpus = true; > mc->default_boot_order = "cad"; > mc->hot_add_cpu = pc_hot_add_cpu; > diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c > index f79d5cb..5e47528 100644 > --- a/hw/i386/pc_piix.c > +++ b/hw/i386/pc_piix.c > @@ -446,6 +446,7 @@ static void pc_i440fx_2_10_machine_options(MachineClass > *m) > m->is_default = 0; > m->alias = NULL; > SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); > +m->auto_enable_numa_with_memhp = false; > } > > DEFINE_I440FX_MACHINE(v2_10, "pc-i440fx-2.10", NULL, > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index da3ea60..d606004 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -318,6 +318,7 @@ static void pc_q35_2_10_machine_options(MachineClass *m) > m->alias = NULL; > SET_MACHINE_COMPAT(m, PC_COMPAT_2_10); > m->numa_auto_assign_ram = numa_legacy_auto_assign_ram; > +m->auto_enable_numa_with_memhp = false; > } > > DEFINE_Q35_MACHINE(v2_10, "pc-q35-2.10", NULL, > diff --git a/numa.c b/numa.c > index 8d78d95..7151b24 100644 > --- a/numa.c > +++ b/numa.c > @@ -216,6 +216,7 @@ static void parse_numa_node(MachineState *ms, > NumaNodeOptions *node, > } > numa_info[nodenr].present = true; > max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1); > +nb_numa_nodes++; > } > > static void parse_numa_distance(NumaDistOptions *dist, Error **errp) > @@ -282,7 +283,6 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error > **errp) > if (err) { > goto end; > } > -nb_numa_nodes++; > break; > case NUMA_OPTIONS_TYPE_DIST: > parse_numa_distance(&object->u.dist, &err); > @@ -433,6 +433,25 @@ void parse_numa_opts(MachineState *ms) > exit(1); > } > > +/* > + * If memory hotplug is enabled (slots > 0) but without '-numa' > + * options explicitly on CLI, guestes will break. > + * > + * Windows: won't enable memory hotplug without SRAT table at all > + * > + * Linux: if QEMU is started with initial memory all bel
Re: [Qemu-devel] [PATCH v2] ps2: check PS2Queue indices in post_load routine
On 16/11/2017 11:16, P J P wrote: > From: Prasad J Pandit > > During Qemu guest migration, a destination process invokes ps2 > post_load function. In that, if 'rptr' and 'count' values were > invalid, it could lead to OOB access issue. Add check to avoid > it. > > Reported-by: Cyrille Chatras > Signed-off-by: Prasad J Pandit > --- > hw/input/ps2.c | 39 ++- > 1 file changed, 18 insertions(+), 21 deletions(-) > > Update: return -1(bad indices) error, instead of correcting values > -> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg02964.html > > diff --git a/hw/input/ps2.c b/hw/input/ps2.c > index f388a23c8e..35392d55fc 100644 > --- a/hw/input/ps2.c > +++ b/hw/input/ps2.c > @@ -1222,32 +1222,37 @@ static void ps2_common_reset(PS2State *s) > s->update_irq(s->update_arg, 0); > } > > -static void ps2_common_post_load(PS2State *s) > +static int ps2_common_post_load(PS2State *s) > { > PS2Queue *q = &s->queue; > int size; > int i; > int tmp_data[PS2_QUEUE_SIZE]; > > +if (q->count < 0 || q->rptr < 0 || q->rptr >= sizeof(q->data)) { > +return -1; > +} > + > /* set the useful data buffer queue size, < PS2_QUEUE_SIZE */ > size = q->count > PS2_QUEUE_SIZE ? 0 : q->count; > > /* move the queue elements to the start of data array */ > -if (size > 0) { > -for (i = 0; i < size; i++) { > -/* move the queue elements to the temporary buffer */ > -tmp_data[i] = q->data[q->rptr]; > -if (++q->rptr == 256) { > -q->rptr = 0; > -} > +for (i = 0; i < size; i++) { > +/* move the queue elements to the temporary buffer */ > +tmp_data[i] = q->data[q->rptr]; > +if (++q->rptr == 256) { > +q->rptr = 0; > } > -memcpy(q->data, tmp_data, size); > } > +memcpy(q->data, tmp_data, size); > + > /* reset rptr/wptr/count */ > q->rptr = 0; > q->wptr = size; > q->count = size; > s->update_irq(s->update_arg, q->count != 0); > + > +return 0; > } > > static void ps2_kbd_reset(void *opaque) > @@ -1346,9 +1351,7 @@ static int ps2_kbd_post_load(void* opaque, int > version_id) > if (version_id == 2) > s->scancode_set=2; > > -ps2_common_post_load(ps2); > - > -return 0; > +return ps2_common_post_load(ps2); > } > > static int ps2_kbd_pre_save(void *opaque) > @@ -1356,9 +1359,7 @@ static int ps2_kbd_pre_save(void *opaque) > PS2KbdState *s = (PS2KbdState *)opaque; > PS2State *ps2 = &s->common; > > -ps2_common_post_load(ps2); > - > -return 0; > +return ps2_common_post_load(ps2); > } > > static const VMStateDescription vmstate_ps2_keyboard = { > @@ -1386,9 +1387,7 @@ static int ps2_mouse_post_load(void *opaque, int > version_id) > PS2MouseState *s = (PS2MouseState *)opaque; > PS2State *ps2 = &s->common; > > -ps2_common_post_load(ps2); > - > -return 0; > +return ps2_common_post_load(ps2); > } > > static int ps2_mouse_pre_save(void *opaque) > @@ -1396,9 +1395,7 @@ static int ps2_mouse_pre_save(void *opaque) > PS2MouseState *s = (PS2MouseState *)opaque; > PS2State *ps2 = &s->common; > > -ps2_common_post_load(ps2); > - > -return 0; > +return ps2_common_post_load(ps2); > } > > static const VMStateDescription vmstate_ps2_mouse = { > Reviewed-by: Paolo Bonzini
Re: [Qemu-devel] [PATCH v8 10/14] migration: add postcopy migration of dirty bitmaps
15.11.2017 04:58, John Snow wrote: On 10/30/2017 12:33 PM, Vladimir Sementsov-Ogievskiy wrote: Postcopy migration of dirty bitmaps. Only named dirty bitmaps, associated with root nodes and non-root named nodes are migrated. If destination qemu is already containing a dirty bitmap with the same name as a migrated bitmap (for the same node), then, if their granularities are the same the migration will be done, otherwise the error will be generated. If destination qemu doesn't contain such bitmap it will be created. Signed-off-by: Vladimir Sementsov-Ogievskiy --- include/migration/misc.h | 3 + migration/migration.h | 3 + migration/block-dirty-bitmap.c | 734 + Ouch :\ migration/migration.c | 3 + migration/savevm.c | 2 + vl.c | 1 + migration/Makefile.objs| 1 + migration/trace-events | 14 + 8 files changed, 761 insertions(+) create mode 100644 migration/block-dirty-bitmap.c Organizationally, you introduce three new 'public' prototypes: dirty_bitmap_mig_init dirty_bitmap_mig_before_vm_start init_dirty_bitmap_incoming_migration mig_init is advertised in migration/misc.h, the other two are in migration/migration.h. The definitions for all three are in migration/block-dirty-bitmap.c In pure naivety, I find it weird to have something that you use in migration.c and advertised in migration.h actually exist separately in block-dirty-bitmap.c; but maybe this is the sanest thing to do. diff --git a/include/migration/misc.h b/include/migration/misc.h index c079b7771b..9cc539e232 100644 --- a/include/migration/misc.h +++ b/include/migration/misc.h @@ -55,4 +55,7 @@ bool migration_has_failed(MigrationState *); bool migration_in_postcopy_after_devices(MigrationState *); void migration_global_dump(Monitor *mon); +/* migration/block-dirty-bitmap.c */ +void dirty_bitmap_mig_init(void); + #endif diff --git a/migration/migration.h b/migration/migration.h index 50d1f01346..4e3ad04664 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -211,4 +211,7 @@ void migrate_send_rp_pong(MigrationIncomingState *mis, void migrate_send_rp_req_pages(MigrationIncomingState *mis, const char* rbname, ram_addr_t start, size_t len); +void dirty_bitmap_mig_before_vm_start(void); +void init_dirty_bitmap_incoming_migration(void); + #endif diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c new file mode 100644 index 00..53cb20045d --- /dev/null +++ b/migration/block-dirty-bitmap.c @@ -0,0 +1,734 @@ +/* + * Block dirty bitmap postcopy migration + * + * Copyright IBM, Corp. 2009 + * Copyright (c) 2016-2017 Parallels International GmbH + * + * Authors: + * Liran Schour + * Vladimir Sementsov-Ogievskiy + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * This file is derived from migration/block.c, so it's author and IBM copyright + * are here, although content is quite different. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + * + **** + * + * Here postcopy migration of dirty bitmaps is realized. Only named dirty + * bitmaps, associated with root nodes and non-root named nodes are migrated. Put another way, only QMP-addressable bitmaps. Nodes without a name that are not the root have no way to be addressed. + * + * If destination qemu is already containing a dirty bitmap with the same name "If the destination QEMU already contains a dirty bitmap with the same name" + * as a migrated bitmap (for the same node), then, if their granularities are + * the same the migration will be done, otherwise the error will be generated. "an error" + * + * If destination qemu doesn't contain such bitmap it will be created. "If the destination QEMU doesn't contain such a bitmap, it will be created." + * + * format of migration: + * + * # Header (shared for different chunk types) + * 1, 2 or 4 bytes: flags (see qemu_{put,put}_flags) + * [ 1 byte: node name size ] \ flags & DEVICE_NAME + * [ n bytes: node name ] / + * [ 1 byte: bitmap name size ] \ flags & BITMAP_NAME + * [ n bytes: bitmap name ] / + * + * # Start of bitmap migration (flags & START) + * header + * be64: granularity + * 1 byte: bitmap flags (corresponds to BdrvDirtyBitmap) + * bit 0- bitmap is enabled + * bit 1- bitmap is persistent + * bit 2- bitmap is autoloading + * bits 3-7 - reserved, must be zero + * + * # Complete of bitmap migration (flags & COMPLETE) + * header + * + * # Data chunk of bitmap migration + * header + * be64: start sector + * be32: number of sectors + * [ be64: buffer size ] \ ! (flags & ZEROES) + * [ n bytes: buffer] / + * + * The last chunk in stream should contain flags
Re: [Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl
On Thu, Nov 16, 2017 at 09:55:46AM +0100, Thomas Huth wrote: > The bios-tables-test was writing out files that we pass to iasl in > with the wrong endianness in the header when running on a big endian > host. So instead of storing mixed endian information in our structures, > let's keep everything in little endian and byte-swap it only when we > need a value in the code. > > Reported-by: Daniel P. Berrange > Buglink: https://bugs.launchpad.net/qemu/+bug/1724570 > Suggested-by: Michael S. Tsirkin > Signed-off-by: Thomas Huth > --- > tests/acpi-utils.h | 27 +-- > tests/bios-tables-test.c | 42 ++ > 2 files changed, 27 insertions(+), 42 deletions(-) This fixes bios-tables-test, but has broken vmgenid-tgst TEST: tests/vmgenid-test... (pid=8197) /i386/vmgenid/vmgenid/set-guid: ** ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia: assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT") FAIL GTester: last random seed: R02S1ab59ff8ca3a4e0f7ff4b8bbddb007f1 (pid=8204) /i386/vmgenid/vmgenid/set-guid-auto: ** ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia: assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT") FAIL GTester: last random seed: R02S1c3bcc9f459161566b19571adc5cb5d1 (pid=8216) Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] [PATCH v5] nvme: Add tracing
I submitted it with git Nov 3 - the long lines issue with git-am should be resolved. Please let me know if there's still a problem. Thanks. On Fri, Nov 3, 2017 at 11:58 AM, Philippe Mathieu-Daudé wrote: > Cc'ing Trivial ;) > > On 11/03/2017 10:37 AM, Doug Gale wrote: >> Add trace output for commands, errors, and undefined behavior. >> Add guest error log output for undefined behavior. >> Report invalid undefined accesses to MMIO. >> Annotate unlikely error checks with unlikely. >> >> Signed-off-by: Doug Gale >> Reviewed-by: Philippe Mathieu-Daudé >> Reviewed-by: Stefan Hajnoczi >> --- >> changes v4 -> v5: add R-b tags >> hw/block/nvme.c | 349 >> ++ >> hw/block/trace-events | 93 ++ >> 2 files changed, 390 insertions(+), 52 deletions(-) >> >> diff --git a/hw/block/nvme.c b/hw/block/nvme.c >> index 441e21ed1f..4d98ed9fba 100644 >> --- a/hw/block/nvme.c >> +++ b/hw/block/nvme.c >> @@ -34,8 +34,17 @@ >> #include "qapi/visitor.h" >> #include "sysemu/block-backend.h" >> >> +#include "qemu/log.h" >> +#include "trace.h" >> #include "nvme.h" >> >> +#define NVME_GUEST_ERR(trace, fmt, ...) \ >> +do { \ >> +(trace_##trace)(__VA_ARGS__); \ >> +qemu_log_mask(LOG_GUEST_ERROR, #trace \ >> +" in %s: " fmt "\n", __func__, ## __VA_ARGS__); \ >> +} while (0) >> + >> static void nvme_process_sq(void *opaque); >> >> static void nvme_addr_read(NvmeCtrl *n, hwaddr addr, void *buf, int size) >> @@ -86,10 +95,14 @@ static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq) >> { >> if (cq->irq_enabled) { >> if (msix_enabled(&(n->parent_obj))) { >> +trace_nvme_irq_msix(cq->vector); >> msix_notify(&(n->parent_obj), cq->vector); >> } else { >> +trace_nvme_irq_pin(); >> pci_irq_pulse(&n->parent_obj); >> } >> +} else { >> +trace_nvme_irq_masked(); >> } >> } >> >> @@ -100,7 +113,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, >> QEMUIOVector *iov, uint64_t prp1, >> trans_len = MIN(len, trans_len); >> int num_prps = (len >> n->page_bits) + 1; >> >> -if (!prp1) { >> +if (unlikely(!prp1)) { >> +trace_nvme_err_invalid_prp(); >> return NVME_INVALID_FIELD | NVME_DNR; >> } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr && >> prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) { >> @@ -113,7 +127,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, >> QEMUIOVector *iov, uint64_t prp1, >> } >> len -= trans_len; >> if (len) { >> -if (!prp2) { >> +if (unlikely(!prp2)) { >> +trace_nvme_err_invalid_prp2_missing(); >> goto unmap; >> } >> if (len > n->page_size) { >> @@ -128,7 +143,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, >> QEMUIOVector *iov, uint64_t prp1, >> uint64_t prp_ent = le64_to_cpu(prp_list[i]); >> >> if (i == n->max_prp_ents - 1 && len > n->page_size) { >> -if (!prp_ent || prp_ent & (n->page_size - 1)) { >> +if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) >> { >> +trace_nvme_err_invalid_prplist_ent(prp_ent); >> goto unmap; >> } >> >> @@ -140,7 +156,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, >> QEMUIOVector *iov, uint64_t prp1, >> prp_ent = le64_to_cpu(prp_list[i]); >> } >> >> -if (!prp_ent || prp_ent & (n->page_size - 1)) { >> +if (unlikely(!prp_ent || prp_ent & (n->page_size - 1))) { >> +trace_nvme_err_invalid_prplist_ent(prp_ent); >> goto unmap; >> } >> >> @@ -154,7 +171,8 @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, >> QEMUIOVector *iov, uint64_t prp1, >> i++; >> } >> } else { >> -if (prp2 & (n->page_size - 1)) { >> +if (unlikely(prp2 & (n->page_size - 1))) { >> +trace_nvme_err_invalid_prp2_align(prp2); >> goto unmap; >> } >> if (qsg->nsg) { >> @@ -178,16 +196,20 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t >> *ptr, uint32_t len, >> QEMUIOVector iov; >> uint16_t status = NVME_SUCCESS; >> >> +trace_nvme_dma_read(prp1, prp2); >> + >> if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) { >> return NVME_INVALID_FIELD | NVME_DNR; >> } >> if (qsg.nsg > 0) { >> -if (dma_buf_read(ptr, len, &qsg)) { >> +if (unlikely(dma_buf_read(ptr, len, &qsg))) { >> +trace_nvme_err_invalid_dma(); >> status = NVME_INVALID_FIELD | NVME_DNR; >> } >> qemu_sglist_destroy(&qsg); >> } else { >> -if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) { >> +if (unlikely(qemu_iovec_t
[Qemu-devel] [Bug 1732671] [NEW] vnc websocket compatibility issue
Public bug reported: WebSocket support in VNC should allow access from VNC client through upgraded WebSocket connection. This feature is not working in IE 11/Edge with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc. The reason that IE 11/Edge fails to accept the connection upgrade is that the value equality of the `Upgrade` header field is checked in a strict case-sensitive manner in QEMU side, however, the IE/Edge does not send the exactly same string value `websocket` but a capital letter `W` instead. Defined in section 4.2.1 of rfc6455, the upgrade header field shall be treated case-insensitively. A patch shall be made in `io/channel-websock.c`, converting the value of upgrade string to lowercase before comparison is made with QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive comparison in the process. ** Affects: qemu Importance: Undecided Status: New ** Tags: vnc -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1732671 Title: vnc websocket compatibility issue Status in QEMU: New Bug description: WebSocket support in VNC should allow access from VNC client through upgraded WebSocket connection. This feature is not working in IE 11/Edge with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc. The reason that IE 11/Edge fails to accept the connection upgrade is that the value equality of the `Upgrade` header field is checked in a strict case-sensitive manner in QEMU side, however, the IE/Edge does not send the exactly same string value `websocket` but a capital letter `W` instead. Defined in section 4.2.1 of rfc6455, the upgrade header field shall be treated case-insensitively. A patch shall be made in `io/channel-websock.c`, converting the value of upgrade string to lowercase before comparison is made with QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive comparison in the process. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions
[Qemu-devel] [PATCH for-2.11] throttle-groups: forget timer and schedule next TGM on detach
tg->any_timer_armed[] must be cleared when detaching pending timers from the AioContext. Failure to do so leads to hung I/O because it looks like there are still timers pending when in fact they have been removed. Other ThrottleGroupMembers might have requests pending too so it's necessary to schedule the next TGM so it can set a timer. This patch fixes hung I/O when QEMU is launched with drives that are in the same throttling group: (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 & (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 & (qemu) stop (qemu) cont ...I/O is stuck... Signed-off-by: Stefan Hajnoczi --- block/throttle-groups.c | 12 1 file changed, 12 insertions(+) diff --git a/block/throttle-groups.c b/block/throttle-groups.c index 2587f19ca3..f26bcb5eee 100644 --- a/block/throttle-groups.c +++ b/block/throttle-groups.c @@ -593,13 +593,25 @@ void throttle_group_attach_aio_context(ThrottleGroupMember *tgm, void throttle_group_detach_aio_context(ThrottleGroupMember *tgm) { +ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts); ThrottleTimers *tt = &tgm->throttle_timers; +int i; /* Requests must have been drained */ assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0); assert(qemu_co_queue_empty(&tgm->throttled_reqs[0])); assert(qemu_co_queue_empty(&tgm->throttled_reqs[1])); +/* Kick off next ThrottleGroupMember, if necessary */ +qemu_mutex_lock(&tg->lock); +for (i = 0; i < 2; i++) { +if (timer_pending(tt->timers[i])) { +tg->any_timer_armed[i] = false; +schedule_next_request(tgm, i); +} +} +qemu_mutex_unlock(&tg->lock); + throttle_timers_detach_aio_context(tt); tgm->aio_context = NULL; } -- 2.13.6
Re: [Qemu-devel] [Qemu-block] [PATCH v4] throttle-groups: drain before detaching ThrottleState
On Mon, Nov 13, 2017 at 2:29 PM, Alberto Garcia wrote: > On Fri 10 Nov 2017 04:19:34 PM CET, Stefan Hajnoczi wrote: >> I/O requests hang after stop/cont commands at least since QEMU 2.10.0 >> with -drive iops=100: >> >> (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000 >> (qemu) stop >> (qemu) cont >> ...I/O is stuck... >> >> This happens because blk_set_aio_context() detaches the ThrottleState >> while requests may still be in flight: >> >> if (tgm->throttle_state) { >> throttle_group_detach_aio_context(tgm); >> throttle_group_attach_aio_context(tgm, new_context); >> } >> >> This patch encloses the detach/attach calls in a drained region so no >> I/O request is left hanging. Also add assertions so we don't make the >> same mistake again in the future. > > I'm wondering about the implications of this change... is it possible > now to bypass the I/O limits simply by stopping and quickly resuming the > VM? And is that a problem? bdrv_set_aio_context() already drains so this patch doesn't change existing behavior with respect to bypassing throttling. It's not ideal that certain VM lifecycle operations temporarily disable throttling, but it's a trade-off since synchronous drain is usually performance sensitive and should not take a long time. Perhaps there are ways to improve the situation, I haven't studied it in detail. Stefan
[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue
Which version of QEMU did you test this against ? It should be fixed in current GIT master AFAIK -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1732671 Title: vnc websocket compatibility issue Status in QEMU: Invalid Bug description: WebSocket support in VNC should allow access from VNC client through upgraded WebSocket connection. This feature is not working in IE 11/Edge with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc. The reason that IE 11/Edge fails to accept the connection upgrade is that the value equality of the `Upgrade` header field is checked in a strict case-sensitive manner in QEMU side, however, the IE/Edge does not send the exactly same string value `websocket` but a capital letter `W` instead. Defined in section 4.2.1 of rfc6455, the upgrade header field shall be treated case-insensitively. A patch shall be made in `io/channel-websock.c`, converting the value of upgrade string to lowercase before comparison is made with QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive comparison in the process. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions
Re: [Qemu-devel] [PATCH for-2.11] tests/bios-tables-test: Fix endianess problems when passing data to iasl
On 16.11.2017 11:54, Daniel P. Berrange wrote: > On Thu, Nov 16, 2017 at 09:55:46AM +0100, Thomas Huth wrote: >> The bios-tables-test was writing out files that we pass to iasl in >> with the wrong endianness in the header when running on a big endian >> host. So instead of storing mixed endian information in our structures, >> let's keep everything in little endian and byte-swap it only when we >> need a value in the code. >> >> Reported-by: Daniel P. Berrange >> Buglink: https://bugs.launchpad.net/qemu/+bug/1724570 >> Suggested-by: Michael S. Tsirkin >> Signed-off-by: Thomas Huth >> --- >> tests/acpi-utils.h | 27 +-- >> tests/bios-tables-test.c | 42 ++ >> 2 files changed, 27 insertions(+), 42 deletions(-) > > This fixes bios-tables-test, but has broken vmgenid-tgst > > TEST: tests/vmgenid-test... (pid=8197) > /i386/vmgenid/vmgenid/set-guid: ** > ERROR:/builddir/build/BUILD/qemu-2.10.0/tests/vmgenid-test.c:62:acpi_find_vgia: > assertion failed (ACPI_ASSERT_CMP_str == "RSDT"): ("" == "RSDT") > FAIL Bummer. Sorry, I've should have run the whole test suite afterwards, I only checked the bios-tables-test :-/ I'll send a v2 with a fix... Thomas
[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue
I think it should have been fixed in 33badfd. Sorry for the noise. ** Changed in: qemu Status: New => Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1732671 Title: vnc websocket compatibility issue Status in QEMU: Invalid Bug description: WebSocket support in VNC should allow access from VNC client through upgraded WebSocket connection. This feature is not working in IE 11/Edge with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc. The reason that IE 11/Edge fails to accept the connection upgrade is that the value equality of the `Upgrade` header field is checked in a strict case-sensitive manner in QEMU side, however, the IE/Edge does not send the exactly same string value `websocket` but a capital letter `W` instead. Defined in section 4.2.1 of rfc6455, the upgrade header field shall be treated case-insensitively. A patch shall be made in `io/channel-websock.c`, converting the value of upgrade string to lowercase before comparison is made with QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive comparison in the process. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017年11月16日 16:11, Yan Vugenfirer wrote: Hi Jason, Windows driver will initialise only the amount of queue based on the amount of available vCPUs. So if there will be more queues in the device than we have vCPUs on the guest, the driver will not initialise “excessive” queues. This is tied to the way RSS on Windows should be implemented. Exactly as in described scenario (7 queues, but only 4 vCPUs). I see but is there any chance that driver can use the left 3 queues (e .g cpu hotplug)? Thanks Best regards, Yan. On 16 Nov 2017, at 07:53, Longpeng (Mike) wrote: Hi Jason & Michael, Do you have any idea about this problem ? -- Regards, Longpeng(Mike) On 2017/11/15 23:54, Longpeng(Mike) wrote: 2017-11-15 23:05 GMT+08:00 Jason Wang : On 2017年11月15日 22:55, Longpeng(Mike) wrote: Hi guys, We got a BUG report from our testers yesterday, the testing scenario was migrating a VM (Windows guest, *4 vcpus*, 4GB, vhost-user net: *7 queues*). We found the cause reason, and we'll report the BUG or send a fix patch to upstream if necessary( we haven't test the upstream yet, sorry... ). Could you explain this a little bit more? We want to know why the vhost_net_start() must start *total queues* ( in our VM there're 7 queues ) but not *the queues that current used* ( in our VM, guest only uses the first 4 queues because it's limited by the number of vcpus) ? Looking forward to your help, thx :) Since the codes have been there for years and works well for kernel datapath. You should really explain what's wrong. OK. :) In our scenario, the Windows's virtio-net driver only use the first 4 queues and it *only set desc/avail/used table for the first 4 queues*, so in QEMU the desc/avail/ used of the last 3 queues are ZERO, but unfortunately... ''' vhost_net_start for (i = 0; i < total_queues; i++) vhost_net_start_one vhost_dev_start vhost_virtqueue_start ''' In vhost_virtqueue_start(), it will calculate the HVA of desc/avail/used table, so for last 3 queues, it will use ZERO as the GPA to calculate the HVA, and then send the results to the user-mode backend ( we use *vhost-user* ) by vhost_virtqueue_set_addr(). When the EVS get these address, it will update a *idx* which will be treated as vq's last_avail_idx when virtio-net stop ( pls see vhost_virtqueue_stop() ). So we get the following result after virtio-net stop: the desc/avail/used of the last 3 queues's vqs are all ZERO, but these vqs's last_avail_idx is NOT ZERO. At last, virtio_load() reports an error: ''' if (!vdev->vq[i].vring.desc && vdev->vq[i].last_avail_idx) { // <-- will be TRUE error_report("VQ %d address 0x0 " "inconsistent with Host index 0x%x", i, vdev->vq[i].last_avail_idx); return -1; } ''' BTW, the problem won't appear if use Linux guest, because the Linux virtio-net driver will set all 7 queues's desc/avail/used tables. And the problem won't appear if the VM use vhost-net, because vhost-net won't update *idx* in SET_ADDR ioctl. Sorry for my pool English, Maybe I could describe the problem in Chinese for you in private if necessary. Thanks -- Regards, Longpeng(Mike)
[Qemu-devel] [PULL 04/11] thread-posix: fix qemu_rec_mutex_trylock macro
From: "Emilio G. Cota" We never noticed because it has no users. Signed-off-by: Emilio G. Cota Message-Id: <1510273811-13419-1-git-send-email-c...@braap.org> Signed-off-by: Paolo Bonzini --- include/qemu/thread-posix.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h index f4296d31c4..f3f47e426f 100644 --- a/include/qemu/thread-posix.h +++ b/include/qemu/thread-posix.h @@ -7,7 +7,7 @@ typedef QemuMutex QemuRecMutex; #define qemu_rec_mutex_destroy qemu_mutex_destroy #define qemu_rec_mutex_lock qemu_mutex_lock -#define qemu_rec_mutex_try_lock qemu_mutex_try_lock +#define qemu_rec_mutex_trylock qemu_mutex_trylock #define qemu_rec_mutex_unlock qemu_mutex_unlock struct QemuMutex { -- 2.14.3
[Qemu-devel] [PULL 03/11] Makefile: simpler/faster "make help"
From: Philippe Mathieu-Daudé Using obscure black magic introduced in eaa2ddbb767 :) In an out-of-tree directory, running "../configure && make help" will generate some required files (.mak), then clone some submodules, compile at least the capstone submodule, generate QMP and Trace files, and finally display the help. On an outdated computer (Sun Blade workstation), running "make help" took more than 5h :) With this patch it took roughly 37min. Suggested-by: Daniel P. Berrange Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20171108032052.20029-1-f4...@amsat.org> Reviewed-by: Laurent Vivier Reviewed-by: Stefan Hajnoczi Reviewed-by: Daniel P. Berrange Signed-off-by: Paolo Bonzini --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ec73acfa9a..143ac81736 100644 --- a/Makefile +++ b/Makefile @@ -6,7 +6,7 @@ BUILD_DIR=$(CURDIR) # Before including a proper config-host.mak, assume we are in the source tree SRC_PATH=. -UNCHECKED_GOALS := %clean TAGS cscope ctags docker docker-% +UNCHECKED_GOALS := %clean TAGS cscope ctags docker docker-% help # All following code might depend on configuration variables ifneq ($(wildcard config-host.mak),) -- 2.14.3
[Qemu-devel] [PULL 00/11] Miscellaneous patches for QEMU 2.11-rc2
The following changes since commit d24aaf2a2915424962fb101142f28fa4307f4740: Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging (2017-11-06 11:24:14 +) are available in the Git repository at: git://github.com/bonzini/qemu.git tags/for-upstream for you to fetch changes up to 4950b1a766a16bd3feef4c9471b9794e6fe0e1b2: fix scripts/update-linux-headers.sh here document (2017-11-15 16:27:01 +0100) Miscellaneous bugfixes Dariusz Stojaczyk (1): vhost-user-scsi: add missing virtqueue_size param Dr. David Alan Gilbert (1): ioapic/tracing: Remove last DPRINTFs Emilio G. Cota (1): thread-posix: fix qemu_rec_mutex_trylock macro Gerd Hoffmann (1): fix scripts/update-linux-headers.sh here document Max Reitz (1): util/stats64: Fix min/max comparisons Mike Nawrocki (1): Enable 8-byte wide MMIO for 16550 serial devices Paolo Bonzini (1): exec: Do not resolve subpage in mru_section Pavel Dovgalyuk (2): cpu-exec: don't overwrite exception_index cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay Philippe Mathieu-Daudé (1): Makefile: simpler/faster "make help" Wanpeng Li (1): target-i386: adds PV_TLB_FLUSH CPUID feature bit Makefile| 2 +- accel/tcg/cpu-exec.c| 99 - exec.c | 12 ++--- hw/char/serial.c| 8 +++- hw/intc/ioapic.c| 17 ++- hw/intc/trace-events| 5 ++- hw/scsi/vhost-user-scsi.c | 2 + include/qemu/thread-posix.h | 2 +- scripts/update-linux-headers.sh | 2 +- target/i386/cpu.c | 2 +- util/stats64.c | 4 +- 11 files changed, 81 insertions(+), 74 deletions(-) -- 2.14.3
[Qemu-devel] [PULL 02/11] ioapic/tracing: Remove last DPRINTFs
From: "Dr. David Alan Gilbert" Remove the last few DPRINTFs from hw/intc/ioapic.c and turn them into tracing. In one case it's a new trace, in the others it's just adding a parameter to the existing traces. Signed-off-by: Dr. David Alan Gilbert Message-Id: <20171102180310.24760-1-dgilb...@redhat.com> Reviewed-by: Peter Xu Reviewed-by: Stefan Hajnoczi Signed-off-by: Paolo Bonzini --- hw/intc/ioapic.c | 17 +++-- hw/intc/trace-events | 5 +++-- 2 files changed, 6 insertions(+), 16 deletions(-) diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c index 37c4386ae3..36139a4db6 100644 --- a/hw/intc/ioapic.c +++ b/hw/intc/ioapic.c @@ -35,15 +35,6 @@ #include "hw/i386/x86-iommu.h" #include "trace.h" -//#define DEBUG_IOAPIC - -#ifdef DEBUG_IOAPIC -#define DPRINTF(fmt, ...) \ -do { printf("ioapic: " fmt , ## __VA_ARGS__); } while (0) -#else -#define DPRINTF(fmt, ...) -#endif - #define APIC_DELIVERY_MODE_SHIFT 8 #define APIC_POLARITY_SHIFT 14 #define APIC_TRIG_MODE_SHIFT 15 @@ -157,7 +148,7 @@ static void ioapic_set_irq(void *opaque, int vector, int level) * to GSI 2. GSI maps to ioapic 1-1. This is not * the cleanest way of doing it but it should work. */ -DPRINTF("%s: %s vec %x\n", __func__, level ? "raise" : "lower", vector); +trace_ioapic_set_irq(vector, level); if (vector == 0) { vector = 2; } @@ -290,11 +281,10 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int size) } } } -DPRINTF("read: %08x = %08x\n", s->ioregsel, val); break; } -trace_ioapic_mem_read(addr, size, val); +trace_ioapic_mem_read(addr, s->ioregsel, size, val); return val; } @@ -335,7 +325,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val, int index; addr &= 0xff; -trace_ioapic_mem_write(addr, size, val); +trace_ioapic_mem_write(addr, s->ioregsel, size, val); switch (addr) { case IOAPIC_IOREGSEL: @@ -345,7 +335,6 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val, if (size != 4) { break; } -DPRINTF("write: %08x = %08" PRIx64 "\n", s->ioregsel, val); switch (s->ioregsel) { case IOAPIC_REG_ID: s->id = (val >> IOAPIC_ID_SHIFT) & IOAPIC_ID_MASK; diff --git a/hw/intc/trace-events b/hw/intc/trace-events index b86f242b0f..b298fac7c6 100644 --- a/hw/intc/trace-events +++ b/hw/intc/trace-events @@ -18,8 +18,9 @@ apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" ioapic_set_remote_irr(int n) "set remote irr for pin %d" ioapic_clear_remote_irr(int n, int vector) "clear remote irr for pin %d vector %d" ioapic_eoi_broadcast(int vector) "EOI broadcast for vector %d" -ioapic_mem_read(uint8_t addr, uint8_t size, uint32_t val) "ioapic mem read addr 0x%"PRIx8" size 0x%"PRIx8" retval 0x%"PRIx32 -ioapic_mem_write(uint8_t addr, uint8_t size, uint32_t val) "ioapic mem write addr 0x%"PRIx8" size 0x%"PRIx8" val 0x%"PRIx32 +ioapic_mem_read(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) "ioapic mem read addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" retval 0x%"PRIx32 +ioapic_mem_write(uint8_t addr, uint8_t regsel, uint8_t size, uint32_t val) "ioapic mem write addr 0x%"PRIx8" regsel: 0x%"PRIx8" size 0x%"PRIx8" val 0x%"PRIx32 +ioapic_set_irq(int vector, int level) "vector: %d level: %d" # hw/intc/slavio_intctl.c slavio_intctl_mem_readl(uint32_t cpu, uint64_t addr, uint32_t ret) "read cpu %d reg 0x%"PRIx64" = 0x%x" -- 2.14.3
[Qemu-devel] [PULL 09/11] util/stats64: Fix min/max comparisons
From: Max Reitz stat64_min_slow() and stat64_max_slow() compare the wrong way. This makes iotest 136 fail with clang and -m32. Signed-off-by: Max Reitz Message-Id: <20171114232223.25207-1-mre...@redhat.com> Signed-off-by: Paolo Bonzini --- util/stats64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/util/stats64.c b/util/stats64.c index 9968fcceac..389c365a9e 100644 --- a/util/stats64.c +++ b/util/stats64.c @@ -91,7 +91,7 @@ bool stat64_min_slow(Stat64 *s, uint64_t value) low = atomic_read(&s->low); orig = ((uint64_t)high << 32) | low; -if (orig < value) { +if (value < orig) { /* We have to set low before high, just like stat64_min reads * high before low. The value may become higher temporarily, but * stat64_get does not notice (it takes the lock) and the only ill @@ -120,7 +120,7 @@ bool stat64_max_slow(Stat64 *s, uint64_t value) low = atomic_read(&s->low); orig = ((uint64_t)high << 32) | low; -if (orig > value) { +if (value > orig) { /* We have to set low before high, just like stat64_max reads * high before low. The value may become lower temporarily, but * stat64_get does not notice (it takes the lock) and the only ill -- 2.14.3
[Qemu-devel] [PULL 06/11] vhost-user-scsi: add missing virtqueue_size param
From: Dariusz Stojaczyk Commit 5c0919d0 [1] introduced virtqueue_size parameter for common virtio-scsi path, without updaing the vhost-user-scsi code. vhost-user-scsi devices right now report size 0 for each vq. This patch introduces virtqueue_size param to vhost-user-scsi, that can now be set by the user. However, the most importantly, it now has a default value of 128 (same as QEMU's virtio-scsi). [1] 5c0919d0 ("virtio-scsi: Add virtqueue_size parameter allowing virtqueue size to be set.") Change-Id: I70e87eab702ebf1196c028dbf17d54fdc0c89a14 Signed-off-by: Dariusz Stojaczyk Message-Id: <1510676916-76409-1-git-send-email-dariuszx.stojac...@intel.com> Signed-off-by: Paolo Bonzini --- hw/scsi/vhost-user-scsi.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c index 500fa6a067..f7561e23fa 100644 --- a/hw/scsi/vhost-user-scsi.c +++ b/hw/scsi/vhost-user-scsi.c @@ -135,6 +135,8 @@ static Property vhost_user_scsi_properties[] = { DEFINE_PROP_CHR("chardev", VirtIOSCSICommon, conf.chardev), DEFINE_PROP_UINT32("boot_tpgt", VirtIOSCSICommon, conf.boot_tpgt, 0), DEFINE_PROP_UINT32("num_queues", VirtIOSCSICommon, conf.num_queues, 1), +DEFINE_PROP_UINT32("virtqueue_size", VirtIOSCSICommon, conf.virtqueue_size, + 128), DEFINE_PROP_UINT32("max_sectors", VirtIOSCSICommon, conf.max_sectors, 0x), DEFINE_PROP_UINT32("cmd_per_lun", VirtIOSCSICommon, conf.cmd_per_lun, 128), -- 2.14.3
[Qemu-devel] [PULL 01/11] Enable 8-byte wide MMIO for 16550 serial devices
From: Mike Nawrocki Some drivers for the PPMC7400 PowerPC evaluation board accesses the serial registers through the floating point unit (stfd/ldfd), which is an 8-byte wide access. This patch enables that behavior. Signed-off-by: Mike Nawrocki Message-Id: <20171106161039.32596-1-michael.nawro...@gtri.gatech.edu> Signed-off-by: Paolo Bonzini --- hw/char/serial.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/char/serial.c b/hw/char/serial.c index 376bd2f240..eb72191ee7 100644 --- a/hw/char/serial.c +++ b/hw/char/serial.c @@ -1005,7 +1005,7 @@ static void serial_mm_write(void *opaque, hwaddr addr, uint64_t value, unsigned size) { SerialState *s = opaque; -value &= ~0u >> (32 - (size * 8)); +value &= 255; serial_ioport_write(s, addr >> s->it_shift, value, 1); } @@ -1014,16 +1014,22 @@ static const MemoryRegionOps serial_mm_ops[3] = { .read = serial_mm_read, .write = serial_mm_write, .endianness = DEVICE_NATIVE_ENDIAN, +.valid.max_access_size = 8, +.impl.max_access_size = 8, }, [DEVICE_LITTLE_ENDIAN] = { .read = serial_mm_read, .write = serial_mm_write, .endianness = DEVICE_LITTLE_ENDIAN, +.valid.max_access_size = 8, +.impl.max_access_size = 8, }, [DEVICE_BIG_ENDIAN] = { .read = serial_mm_read, .write = serial_mm_write, .endianness = DEVICE_BIG_ENDIAN, +.valid.max_access_size = 8, +.impl.max_access_size = 8, }, }; -- 2.14.3
[Qemu-devel] [PULL 05/11] target-i386: adds PV_TLB_FLUSH CPUID feature bit
From: Wanpeng Li Adds PV_TLB_FLUSH CPUID feature bit. Cc: Paolo Bonzini Cc: Radim KrÄmář Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Wanpeng Li Message-Id: <1510299947-11287-1-git-send-email-wanpeng...@hotmail.com> Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6f21a5e518..ecebc5a70a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -347,7 +347,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .feat_names = { "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock", "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt", -NULL, NULL, NULL, NULL, +NULL, "kvm-pv-tlb-flush", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, -- 2.14.3
Re: [Qemu-devel] [Question] why need to start all queues in vhost_net_start
On 2017年11月16日 17:32, Longpeng (Mike) wrote: Hi Jason, On 2017/11/16 17:13, Jason Wang wrote: On 2017年11月16日 17:01, Gonglei (Arei) wrote: No, Windows guest + vhost-user/DPDK. BTW pls see virtio spec in : "If VIRTIO_NET_F_MQ is negotiated, each of receiveq1. . .receiveqN that will be used SHOULD be populated with receive buffers." It is not mandatory that all queues must be initialized. I think not, since it said we should fill receive buffers for each queue which means we should initialize all queues. May Michael can clarify on this. I think this doesn't matter, but QEMU should consider this scenario... For example, if one queues isn't initialized (Windows guest), the vring.avail=0, so vq->desc_phys=0, then vq->desc='a avail HVA'(which is the start addr of pc.ram). vq->desc_size = s = l = virtio_queue_get_desc_size(vdev, idx); vq->desc_phys = a = virtio_queue_get_desc_addr(vdev, idx); vq->desc = vhost_memory_map(dev, a, &l, 0); if (!vq->desc || l != s) { r = -ENOMEM; goto fail_alloc_desc; } . r = vhost_virtqueue_set_addr(dev, vq, vhost_vq_index, dev->log_enabled); if (r < 0) { r = -errno; goto fail_alloc; } Then the HVA is send to the vhost-user. I think this is wrong, because the '0' here means guest driver doesn't init this queues, it should not be used to calculate the HVA for this vq. Yes, workaround is not hard if windows driver won't use the left 3 queues any more. But we should have a complete solution. The main problem is when vhost need to be started. For legacy device, there's no easy way to detect whether or not a specific virtqueue is ready to be used. For modern device, we can probably do this through queue_enable (but this is not implemented in current code). Thanks Thanks Thanks, -Gonglei .
[Qemu-devel] [PULL 07/11] cpu-exec: don't overwrite exception_index
From: Pavel Dovgalyuk This patch adds a condition before overwriting exception_index fiels. It is needed when exception_index is already set to some meaningful value. Signed-off-by: Pavel Dovgalyuk Message-Id: <20171114081812.27640.26372.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini --- accel/tcg/cpu-exec.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 61297f8f4a..0473055a08 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -594,7 +594,9 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, if (unlikely(atomic_read(&cpu->exit_request) || (use_icount && cpu->icount_decr.u16.low + cpu->icount_extra == 0))) { atomic_set(&cpu->exit_request, 0); -cpu->exception_index = EXCP_INTERRUPT; +if (cpu->exception_index == -1) { +cpu->exception_index = EXCP_INTERRUPT; +} return true; } -- 2.14.3
[Qemu-devel] [PULL 08/11] cpu-exec: avoid cpu_exec_nocache infinite loop with record/replay
From: Pavel Dovgalyuk This patch ensures that icount_decr.u32.high is clear before calling cpu_exec_nocache when exception is pending. Because the exception is caused by the first instruction in the block and it cannot be executed without resetting the flag. There are two parts in the fix. First, clear icount_decr.u32.high in cpu_handle_interrupt (just before processing the "dependent" request, stored in cpu->interrupt_request or cpu->exit_request) rather than cpu_loop_exec_tb; this ensures that cpu_handle_exception is always reached with zero icount_decr.u32.high unless another interrupt has happened in the meanwhile. Second, try to cause the exception at the beginning of cpu_handle_exception, and exit immediately if the TB cannot execute. With this change, interrupts are processed and cpu_exec_nocache can make process. Signed-off-by: Maria Klimushenkova Signed-off-by: Pavel Dovgalyuk Message-Id: <20171114081818.27640.33165.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini --- accel/tcg/cpu-exec.c | 95 +--- 1 file changed, 54 insertions(+), 41 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 0473055a08..f3de96f346 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -470,48 +470,51 @@ static inline void cpu_handle_debug_exception(CPUState *cpu) static inline bool cpu_handle_exception(CPUState *cpu, int *ret) { -if (cpu->exception_index >= 0) { -if (cpu->exception_index >= EXCP_INTERRUPT) { -/* exit request from the cpu execution loop */ -*ret = cpu->exception_index; -if (*ret == EXCP_DEBUG) { -cpu_handle_debug_exception(cpu); -} -cpu->exception_index = -1; -return true; -} else { +if (cpu->exception_index < 0) { +#ifndef CONFIG_USER_ONLY +if (replay_has_exception() + && cpu->icount_decr.u16.low + cpu->icount_extra == 0) { +/* try to cause an exception pending in the log */ +cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true); +} +#endif +if (cpu->exception_index < 0) { +return false; +} +} + +if (cpu->exception_index >= EXCP_INTERRUPT) { +/* exit request from the cpu execution loop */ +*ret = cpu->exception_index; +if (*ret == EXCP_DEBUG) { +cpu_handle_debug_exception(cpu); +} +cpu->exception_index = -1; +return true; +} else { #if defined(CONFIG_USER_ONLY) -/* if user mode only, we simulate a fake exception - which will be handled outside the cpu execution - loop */ +/* if user mode only, we simulate a fake exception + which will be handled outside the cpu execution + loop */ #if defined(TARGET_I386) +CPUClass *cc = CPU_GET_CLASS(cpu); +cc->do_interrupt(cpu); +#endif +*ret = cpu->exception_index; +cpu->exception_index = -1; +return true; +#else +if (replay_exception()) { CPUClass *cc = CPU_GET_CLASS(cpu); +qemu_mutex_lock_iothread(); cc->do_interrupt(cpu); -#endif -*ret = cpu->exception_index; +qemu_mutex_unlock_iothread(); cpu->exception_index = -1; +} else if (!replay_has_interrupt()) { +/* give a chance to iothread in replay mode */ +*ret = EXCP_INTERRUPT; return true; -#else -if (replay_exception()) { -CPUClass *cc = CPU_GET_CLASS(cpu); -qemu_mutex_lock_iothread(); -cc->do_interrupt(cpu); -qemu_mutex_unlock_iothread(); -cpu->exception_index = -1; -} else if (!replay_has_interrupt()) { -/* give a chance to iothread in replay mode */ -*ret = EXCP_INTERRUPT; -return true; -} -#endif } -#ifndef CONFIG_USER_ONLY -} else if (replay_has_exception() - && cpu->icount_decr.u16.low + cpu->icount_extra == 0) { -/* try to cause an exception pending in the log */ -cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true); -*ret = -1; -return true; #endif } @@ -522,6 +525,19 @@ static inline bool cpu_handle_interrupt(CPUState *cpu, TranslationBlock **last_tb) { CPUClass *cc = CPU_GET_CLASS(cpu); +int32_t insns_left; + +/* Clear the interrupt flag now since we're processing + * cpu->interrupt_request and cpu->exit_request. + */ +insns_left = atomic_read(&cpu->icount_decr.u32); +atomic_set(&cpu->icount_decr.u16.high, 0); +if (unlikely(insns_left < 0)) { +/* Ensure the zeroing of icount_decr comes before the next read + * of cpu->exit_request or cpu->interrupt_request. +
Re: [Qemu-devel] [PATCH] block: Fix error path in bdrv_backing_update_filename()
Am 15.11.2017 um 20:41 hat Peter Maydell geschrieben: > On 6 November 2017 at 16:55, Kevin Wolf wrote: > > error_setg_errno() takes a positive errno code. > > > > Signed-off-by: Kevin Wolf > > --- > > block.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/block.c b/block.c > > index 684cb018da..f6415547fe 100644 > > --- a/block.c > > +++ b/block.c > > @@ -998,7 +998,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, > > BlockDriverState *base, > > ret = bdrv_change_backing_file(parent, filename, > > base->drv ? base->drv->format_name : > > ""); > > if (ret < 0) { > > -error_setg_errno(errp, ret, "Could not update backing file link"); > > +error_setg_errno(errp, -ret, "Could not update backing file link"); > > } > > > > if (!(orig_flags & BDRV_O_RDWR)) { > > -- > > 2.13.6 > > Just noticed going through coverity warnings that this didn't > make it into rc1 -- could we get it in for rc2, please? Yes. I was sick at the start of the week and Max didn't include my queue in his pull request, so all the patches I had queued missed -rc1 and will be sent for -rc2 instead. Kevin
[Qemu-devel] [PULL 10/11] exec: Do not resolve subpage in mru_section
This fixes a crash caused by picking the wrong memory region in address_space_lookup_region seen with client code accessing a device model that uses alias memory regions. The expensive part of address_space_lookup_region anyway is phys_page_find; performance-wise it is okay to repeat the subsequent subpage lookup. Signed-off-by: BALATON Zoltan Message-Id: <20171114225941.07270745...@zero.eik.bme.hu> Signed-off-by: Paolo Bonzini --- exec.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/exec.c b/exec.c index 97a24a875e..3bb9fcf257 100644 --- a/exec.c +++ b/exec.c @@ -410,22 +410,16 @@ static MemoryRegionSection *address_space_lookup_region(AddressSpaceDispatch *d, { MemoryRegionSection *section = atomic_read(&d->mru_section); subpage_t *subpage; -bool update; -if (section && section != &d->map.sections[PHYS_SECTION_UNASSIGNED] && -section_covers_addr(section, addr)) { -update = false; -} else { +if (!section || section == &d->map.sections[PHYS_SECTION_UNASSIGNED] || +!section_covers_addr(section, addr)) { section = phys_page_find(d, addr); -update = true; +atomic_set(&d->mru_section, section); } if (resolve_subpage && section->mr->subpage) { subpage = container_of(section->mr, subpage_t, iomem); section = &d->map.sections[subpage->sub_section[SUBPAGE_IDX(addr)]]; } -if (update) { -atomic_set(&d->mru_section, section); -} return section; } -- 2.14.3
[Qemu-devel] [PULL 11/11] fix scripts/update-linux-headers.sh here document
From: Gerd Hoffmann The minus sign after << causes the shell to strip only preceding tabs, not spaces. Signed-off-by: Gerd Hoffmann Message-Id: <20171110090354.29608-1-kra...@redhat.com> Fixes: 40bf8e9aede0f9105a9e1e4aaf17b20aaa55f9a0 Reviewed-by: Roman Kagan Reviewed-by: Stefan Hajnoczi Tested-by: Christian Borntraeger Signed-off-by: Paolo Bonzini --- scripts/update-linux-headers.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index ad80fe3fca..76fd894a77 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -106,7 +106,7 @@ for arch in $ARCHLIST; do if [ $arch = x86 ]; then cat <<-EOF >"$output/include/standard-headers/asm-x86/hyperv.h" /* this is a temporary placeholder until kvm_para.h stops including it */ -EOF +EOF cp "$tmpdir/include/asm/unistd_32.h" "$output/linux-headers/asm-x86/" cp "$tmpdir/include/asm/unistd_x32.h" "$output/linux-headers/asm-x86/" cp "$tmpdir/include/asm/unistd_64.h" "$output/linux-headers/asm-x86/" -- 2.14.3
[Qemu-devel] [PATCH v2] tests/bios-tables-test: Fix endianess problems when passing data to iasl
The bios-tables-test was writing out files that we pass to iasl in with the wrong endianness in the header when running on a big endian host. So instead of storing mixed endian information in our structures, let's keep everything in little endian and byte-swap it only when we need a value in the code. Reported-by: Daniel P. Berrange Buglink: https://bugs.launchpad.net/qemu/+bug/1724570 Suggested-by: Michael S. Tsirkin Signed-off-by: Thomas Huth --- v2: Fixed vmgenid-test which was accidentially broken in v1 tests/acpi-utils.h | 27 +-- tests/bios-tables-test.c | 42 ++ tests/vmgenid-test.c | 22 -- 3 files changed, 39 insertions(+), 52 deletions(-) diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h index f8d8723..d5ca5b6 100644 --- a/tests/acpi-utils.h +++ b/tests/acpi-utils.h @@ -28,24 +28,9 @@ typedef struct { bool tmp_files_retain; /* do not delete the temp asl/aml */ } AcpiSdtTable; -#define ACPI_READ_FIELD(field, addr) \ -do { \ -switch (sizeof(field)) { \ -case 1:\ -field = readb(addr); \ -break; \ -case 2:\ -field = readw(addr); \ -break; \ -case 4:\ -field = readl(addr); \ -break; \ -case 8:\ -field = readq(addr); \ -break; \ -default: \ -g_assert(false); \ -} \ +#define ACPI_READ_FIELD(field, addr)\ +do {\ +memread(addr, &field, sizeof(field)); \ addr += sizeof(field); \ } while (0); @@ -74,16 +59,14 @@ typedef struct { } while (0); #define ACPI_ASSERT_CMP(actual, expected) do { \ -uint32_t ACPI_ASSERT_CMP_le = cpu_to_le32(actual); \ char ACPI_ASSERT_CMP_str[5] = {}; \ -memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 4); \ +memcpy(ACPI_ASSERT_CMP_str, &actual, 4); \ g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \ } while (0) #define ACPI_ASSERT_CMP64(actual, expected) do { \ -uint64_t ACPI_ASSERT_CMP_le = cpu_to_le64(actual); \ char ACPI_ASSERT_CMP_str[9] = {}; \ -memcpy(ACPI_ASSERT_CMP_str, &ACPI_ASSERT_CMP_le, 8); \ +memcpy(ACPI_ASSERT_CMP_str, &actual, 8); \ g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \ } while (0) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 564da45..e28e0c9 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -96,17 +96,20 @@ static void test_acpi_rsdp_table(test_data *data) static void test_acpi_rsdt_table(test_data *data) { AcpiRsdtDescriptorRev1 *rsdt_table = &data->rsdt_table; -uint32_t addr = data->rsdp_table.rsdt_physical_address; +uint32_t addr = le32_to_cpu(data->rsdp_table.rsdt_physical_address); uint32_t *tables; int tables_nr; uint8_t checksum; +uint32_t rsdt_table_length; /* read the header */ ACPI_READ_TABLE_HEADER(rsdt_table, addr); ACPI_ASSERT_CMP(rsdt_table->signature, "RSDT"); +rsdt_table_length = le32_to_cpu(rsdt_table->length); + /* compute the table entries in rsdt */ -tables_nr = (rsdt_table->length - sizeof(AcpiRsdtDescriptorRev1)) / +tables_nr = (rsdt_table_length - sizeof(AcpiRsdtDescriptorRev1)) / sizeof(uint32_t); g_assert(tables_nr > 0); @@ -114,7 +117,7 @@ static void test_acpi_rsdt_table(test_data *data) tables = g_new0(uint32_t, tables_nr); ACPI_READ_ARRAY_PTR(tables, tables_nr, addr); -checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table->length) + +checksum = acpi_calc_checksum((uint8_t *)rsdt_table, rsdt_table_length) + acpi_calc_checksum((uint8_t *)tables, tables_nr * sizeof(uint32_t)); g_assert(!checksum); @@ -130,7 +133,7 @@ static void test_acpi_fadt_table(test_data *data) uint32_t addr; /* FADT table comes first */ -addr = data->rsdt_tables_addr[0]; +addr = le32_to_cpu(data->rsdt_tables_addr[0]); ACPI_READ_TABLE_HEADER(fadt_table, addr); ACPI_READ_FIELD(fadt_table->firmware_ctrl, addr); @@ -187,13 +190,14 @@ static void test_acpi_fadt_table(test_data *data) ACPI_READ_GENERIC_ADDRESS(fadt_table->xgpe1_block, addr); ACPI_ASSERT_CMP(fadt_table->signature, "FACP"); -g_assert(!acpi_calc_checksum((uint8_t *)fadt_table, fadt_table->length)); +g_assert(!acpi_calc_checksum((uint8_t *)fadt_table
[Qemu-devel] [PATCH] Revert "Add new PCI ID for i82559a"
This reverts commit 5e89dc01133f8f5e621f6b66b356c6f37d31dafb since: - we should use ID in the spec instead the one used by OEM - in the future, we should allow changing id through either property or EEPROM file. Cc: Stefan Weil Cc: Michael Nawrocki Cc: Peter Maydell Cc: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/net/eepro100.c| 13 - include/hw/compat.h | 4 include/hw/pci/pci.h | 1 - qemu-options.hx | 2 +- 4 files changed, 1 insertion(+), 19 deletions(-) diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c index a63ed2c..91dd058 100644 --- a/hw/net/eepro100.c +++ b/hw/net/eepro100.c @@ -132,7 +132,6 @@ typedef struct { const char *name; const char *desc; uint16_t device_id; -uint16_t alt_device_id; uint8_t revision; uint16_t subsystem_vendor_id; uint16_t subsystem_id; @@ -277,7 +276,6 @@ typedef struct { /* Quasi static device properties (no need to save them). */ uint16_t stats_size; bool has_extended_tcb_support; -bool use_alt_device_id; } EEPRO100State; /* Word indices in EEPROM. */ @@ -1857,14 +1855,6 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error **errp) TRACE(OTHER, logout("\n")); -/* By default, the i82559a adapter uses the legacy PCI ID (for the - * i82557). This allows the PCI ID to be changed to the alternate - * i82559 ID if needed. - */ -if (s->use_alt_device_id && strcmp(info->name, "i82559a") == 0) { -pci_config_set_device_id(s->dev.config, info->alt_device_id); -} - s->device = info->device; e100_pci_reset(s, &local_err); @@ -1984,7 +1974,6 @@ static E100PCIDeviceInfo e100_devices[] = { .desc = "Intel i82559A Ethernet", .device = i82559A, .device_id = PCI_DEVICE_ID_INTEL_82557, -.alt_device_id = PCI_DEVICE_ID_INTEL_82559, .revision = 0x06, .stats_size = 80, .has_extended_tcb_support = true, @@ -2078,8 +2067,6 @@ static E100PCIDeviceInfo *eepro100_get_class(EEPRO100State *s) static Property e100_properties[] = { DEFINE_NIC_PROPERTIES(EEPRO100State, conf), -DEFINE_PROP_BOOL("x-use-alt-device-id", EEPRO100State, use_alt_device_id, - true), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/compat.h b/include/hw/compat.h index f96212c..cf389b4 100644 --- a/include/hw/compat.h +++ b/include/hw/compat.h @@ -10,10 +10,6 @@ .driver = "virtio-tablet-device",\ .property = "wheel-axis",\ .value= "false",\ -},{\ -.driver = "i82559a",\ -.property = "x-use-alt-device-id",\ -.value= "false",\ }, #define HW_COMPAT_2_9 \ diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index f30e2cf..8d02a0a 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -70,7 +70,6 @@ extern bool pci_available; /* Intel (0x8086) */ #define PCI_DEVICE_ID_INTEL_82551IT 0x1209 #define PCI_DEVICE_ID_INTEL_825570x1229 -#define PCI_DEVICE_ID_INTEL_825590x1030 #define PCI_DEVICE_ID_INTEL_82801IR 0x2922 /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */ diff --git a/qemu-options.hx b/qemu-options.hx index a39c7e4..3728e9b 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2047,7 +2047,7 @@ that the card should have; this option currently only affects virtio cards; set @var{v} = 0 to disable MSI-X. If no @option{-net} option is specified, a single NIC is created. QEMU can emulate several different models of network card. Valid values for @var{type} are -@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559a}, @code{i82559er}, +@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559er}, @code{ne2k_pci}, @code{ne2k_isa}, @code{pcnet}, @code{rtl8139}, @code{e1000}, @code{smc91c111}, @code{lance} and @code{mcf_fec}. Not all devices are supported on all targets. Use @code{-net nic,model=help} -- 2.7.4
[Qemu-devel] QEMU terminates during reboot after memory unplug with vhost=on
Hi, I'm resurrecting a thread about a QEMU crash we're still hitting on ppc64. It was reported to the list by Bharata 2 months ago: https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg03685.html "Hi, QEMU hits the below assert qemu-system-ppc64: used ring relocated for ring 2 qemu-system-ppc64: qemu/hw/virtio/vhost.c:649: vhost_commit: Assertion `r >= 0' failed. in the following scenario: 1. Boot guest with vhost=on -netdev tap,id=mynet0,script=qemu-ifup,downscript=qemu-ifdown,vhost=on -device virtio-net-pci,netdev=mynet0 2. Hot add a DIMM device 3. Reboot When the guest reboots, we can see vhost_virtqueue_start:vq->used_phys getting assigned an address that falls in the hotplugged memory range. 4. Remove the DIMM device Guest refuses the removal as the hotplugged memory is under use. 5. Reboot QEMU forces the removal of the DIMM device during reset and that's when we hit the above assert. Any pointers on why we are hitting this assert ? Shouldn't vhost be done with using the hotplugged memory when we hit reset ? Regards, Bharata." #0 0x7760eff0 in raise () from /lib64/libc.so.6 #1 0x7761136c in abort () from /lib64/libc.so.6 #2 0x77604c44 in __assert_fail_base () from /lib64/libc.so.6 #3 0x77604d34 in __assert_fail () from /lib64/libc.so.6 #4 0x10161138 in vhost_commit (listener=0x11469e88) at /home/greg/Work/qemu/qemu-spapr/hw/virtio/vhost.c:650 #5 0x100917fc in memory_region_transaction_commit () at /home/greg/Work/qemu/qemu-spapr/memory.c:1094 #6 0x10096748 in memory_region_del_subregion (mr=0x1143eed0, subregion=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/memory.c:2337 #7 0x104a9aec in pc_dimm_memory_unplug (dev=0x11445c50, hpms=0x1143eec0, mr=0x116f1920) at /home/greg/Work/qemu/qemu-spapr/hw/mem/pc-dimm.c:126 #8 0x10180454 in spapr_lmb_release (dev=0x11445c50) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:3151 #9 0x101a397c in spapr_drc_release (drc=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:401 #10 0x101a3ba0 in spapr_drc_reset (drc=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:439 #11 0x101a3c88 in drc_reset (opaque=0x116b9cc0) at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr_drc.c:460 #12 0x10447380 in qemu_devices_reset () at /home/greg/Work/qemu/qemu-spapr/hw/core/reset.c:69 #13 0x1017ae80 in ppc_spapr_reset () at /home/greg/Work/qemu/qemu-spapr/hw/ppc/spapr.c:1445 #14 0x10377c60 in qemu_system_reset (reason=SHUTDOWN_CAUSE_HOST_QMP) at /home/greg/Work/qemu/qemu-spapr/vl.c:1788 #15 0x103785ac in main_loop_should_exit () at /home/greg/Work/qemu/qemu-spapr/vl.c:1962 #16 0x10378708 in main_loop () at /home/greg/Work/qemu/qemu-spapr/vl.c:1999 #17 0x10382c54 in main (argc=21, argv=0x7098, envp=0x7148) at /home/greg/Work/qemu/qemu-spapr/vl.c:4897 This basically happens because on pseries, like x86, we usually wait for the guest to eject the DIMM before actually removing it, BUT, unlike x86, we force the removal on reset. This is handled by a DRC object which registers a handler with qemu_register_reset(). At reset time, the machine calls qemu_devices_reset() but unfortunately, the DRC reset handler gets called BEFORE the VirtIONet device one. The vhost device is still active and it doesn't like the ring addresses to change while in this state. Michael, The assert() has been around since the beginning, at a time I believe there was no such thing as memory hot-unplug. Now that memory can go away at reset time, is it really legitimate to crash QEMU if vhost detects a ring address change ? David, Depending on Michael's answer (or anyone else that knows vhost well enough), I'm not sure we can fix that properly for 2.11. A possible fix/workaround could be to reset DRCs (at least the LMB ones) after qemu_devices_reset(), so that we're sure vhost is stopped. Thoughts ? -- Greg
Re: [Qemu-devel] [PATCH] Revert "Add new PCI ID for i82559a"
Am 16.11.2017 um 13:20 schrieb Jason Wang: > This reverts commit 5e89dc01133f8f5e621f6b66b356c6f37d31dafb since: > > - we should use ID in the spec instead the one used by OEM > - in the future, we should allow changing id through either property > or EEPROM file. > > Cc: Stefan Weil > Cc: Michael Nawrocki > Cc: Peter Maydell > Cc: Michael S. Tsirkin > Signed-off-by: Jason Wang > --- > hw/net/eepro100.c| 13 - > include/hw/compat.h | 4 > include/hw/pci/pci.h | 1 - > qemu-options.hx | 2 +- > 4 files changed, 1 insertion(+), 19 deletions(-) > > diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c > index a63ed2c..91dd058 100644 > --- a/hw/net/eepro100.c > +++ b/hw/net/eepro100.c > @@ -132,7 +132,6 @@ typedef struct { > const char *name; > const char *desc; > uint16_t device_id; > -uint16_t alt_device_id; > uint8_t revision; > uint16_t subsystem_vendor_id; > uint16_t subsystem_id; > @@ -277,7 +276,6 @@ typedef struct { > /* Quasi static device properties (no need to save them). */ > uint16_t stats_size; > bool has_extended_tcb_support; > -bool use_alt_device_id; > } EEPRO100State; > > /* Word indices in EEPROM. */ > @@ -1857,14 +1855,6 @@ static void e100_nic_realize(PCIDevice *pci_dev, Error > **errp) > > TRACE(OTHER, logout("\n")); > > -/* By default, the i82559a adapter uses the legacy PCI ID (for the > - * i82557). This allows the PCI ID to be changed to the alternate > - * i82559 ID if needed. > - */ > -if (s->use_alt_device_id && strcmp(info->name, "i82559a") == 0) { > -pci_config_set_device_id(s->dev.config, info->alt_device_id); > -} > - > s->device = info->device; > > e100_pci_reset(s, &local_err); > @@ -1984,7 +1974,6 @@ static E100PCIDeviceInfo e100_devices[] = { > .desc = "Intel i82559A Ethernet", > .device = i82559A, > .device_id = PCI_DEVICE_ID_INTEL_82557, > -.alt_device_id = PCI_DEVICE_ID_INTEL_82559, > .revision = 0x06, > .stats_size = 80, > .has_extended_tcb_support = true, > @@ -2078,8 +2067,6 @@ static E100PCIDeviceInfo > *eepro100_get_class(EEPRO100State *s) > > static Property e100_properties[] = { > DEFINE_NIC_PROPERTIES(EEPRO100State, conf), > -DEFINE_PROP_BOOL("x-use-alt-device-id", EEPRO100State, use_alt_device_id, > - true), > DEFINE_PROP_END_OF_LIST(), > }; > > diff --git a/include/hw/compat.h b/include/hw/compat.h > index f96212c..cf389b4 100644 > --- a/include/hw/compat.h > +++ b/include/hw/compat.h > @@ -10,10 +10,6 @@ > .driver = "virtio-tablet-device",\ > .property = "wheel-axis",\ > .value= "false",\ > -},{\ > -.driver = "i82559a",\ > -.property = "x-use-alt-device-id",\ > -.value= "false",\ > }, > > #define HW_COMPAT_2_9 \ > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > index f30e2cf..8d02a0a 100644 > --- a/include/hw/pci/pci.h > +++ b/include/hw/pci/pci.h > @@ -70,7 +70,6 @@ extern bool pci_available; > /* Intel (0x8086) */ > #define PCI_DEVICE_ID_INTEL_82551IT 0x1209 > #define PCI_DEVICE_ID_INTEL_825570x1229 > -#define PCI_DEVICE_ID_INTEL_825590x1030 > #define PCI_DEVICE_ID_INTEL_82801IR 0x2922 > > /* Red Hat / Qumranet (for QEMU) -- see pci-ids.txt */ > diff --git a/qemu-options.hx b/qemu-options.hx > index a39c7e4..3728e9b 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -2047,7 +2047,7 @@ that the card should have; this option currently only > affects virtio cards; set > @var{v} = 0 to disable MSI-X. If no @option{-net} option is specified, a > single > NIC is created. QEMU can emulate several different models of network card. > Valid values for @var{type} are > -@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559a}, > @code{i82559er}, > +@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559er}, > @code{ne2k_pci}, @code{ne2k_isa}, @code{pcnet}, @code{rtl8139}, > @code{e1000}, @code{smc91c111}, @code{lance} and @code{mcf_fec}. > Not all devices are supported on all targets. Use @code{-net nic,model=help} Reviewed-by: Stefan Weil Thanks, Stefan
[Qemu-devel] [Bug 1732671] Re: vnc websocket compatibility issue
No problem, it is a valid bug report, since we've not actually released the fix yet, so changing status. ** Changed in: qemu Status: Invalid => Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1732671 Title: vnc websocket compatibility issue Status in QEMU: Fix Committed Bug description: WebSocket support in VNC should allow access from VNC client through upgraded WebSocket connection. This feature is not working in IE 11/Edge with noVNC HTML5 client, in contrast to that in Firefox/Safari, etc. The reason that IE 11/Edge fails to accept the connection upgrade is that the value equality of the `Upgrade` header field is checked in a strict case-sensitive manner in QEMU side, however, the IE/Edge does not send the exactly same string value `websocket` but a capital letter `W` instead. Defined in section 4.2.1 of rfc6455, the upgrade header field shall be treated case-insensitively. A patch shall be made in `io/channel-websock.c`, converting the value of upgrade string to lowercase before comparison is made with QIO_CHANNEL_WEBSOCK_UPGRADE_WEBSOCKET, to allow case-insensitive comparison in the process. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1732671/+subscriptions
Re: [Qemu-devel] [PATCH] colo-compare: fix the dangerous assignment
On 11/16/2017 06:13 PM, Darren Kenny wrote: On Thu, Nov 16, 2017 at 10:28:32AM +0800, Mao Zhongyi wrote: Cc: Peter Maydell Cc: Jason Wang Cc: Zhang Chen Cc: Li Zhijian Cc: Paolo Bonzini Fixes: 8ec14402029d783720f4312ed8a925548e1dad61 Reported-by: Peter Maydell Reported-by: Paolo Bonzini Signed-off-by: Mao Zhongyi Code-wise, this looks like a valid fix to the existing code. Reviewed-by: Darren Kenny Hi, Darren But testing wise, have you confirmed that things are behaving as you expected with the previous patch, since previously when calling colo_compare_connection(), the value of conn would have always been its initialized value of NULL. Well, in my test machine the code like *con = conn, but when I made the patch on another machine I wrote the code con = &conn carelessly. Just want to be sure that fixing this doesn't end up breaking your expected behaviour given that all your testing before would have had a NULL value in conn. Thanks for the kind reminder. Thanks, Mao Thanks, Darren. --- net/colo-compare.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/colo-compare.c b/net/colo-compare.c index ccdcba2..1ce195f 100644 --- a/net/colo-compare.c +++ b/net/colo-compare.c @@ -179,7 +179,7 @@ static int packet_enqueue(CompareState *s, int mode, Connection **con) "drop packet"); } } -con = &conn; +*con = conn; return 0; } -- 2.9.4
[Qemu-devel] weird behavior of libvirt/QEMU with virtserial ports
Hello, All! We observe the following behavior for QEMU configured by libvirt to use guest agent as usual for the guest without virtio-serial driver (Windows or the guest remaining in BIOS stage). In QEMU on first connect to listen character device socket the listen socket is removed from poll just after the accept(). virtio_serial_guest_ready() returns 0 and the descriptor of the connected Unix socket is removed from poll and it will not be present in poll() until the guest will initialize the driver and change the state of the serial to "guest connected". In libvirt connect() to guest agent is performed on restart and is run under VM state lock. Connec() is blocking and can wait forever as - accept queue in QEMU is not polled - it will exhaust sooner or later. In this case libvirt can not perform ANY operation on that VM. Weird! The problem should be addressed from both sides IMHO, as it is bad to keep stale connection in QEMU. IMHO we should tweak io_watch_poll_prepare() to register for G_IO_ERR | G_IO_HUP | G_IO_NVAL even if the read is disable and handle connection closing gracefully. Though there are some questions remaining. What should we do with the data in the socket queue? Are we free to drop it as the requester has been left? In the other case we should fix each qemu client including libvirtd. Alternatively we can still poll listen socket and close any connections if there is active one. Any opinion? Den P.S. Quick reproducer VM config, as usual Quick reproducer: strace socat - UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0 ^C (interrupt) strace socat - UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0 ^C (interrupt) strace socat - UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0 ^C (interrupt) strace socat - UNIX-CONNECT:/var/lib/libvirt/qemu/channel/target/domain-20-vm1/org.qemu.guest_agent.0 ^C (interrupt) Normal behavior of the 'socat' is passed connect() routine and waiting on select(). Wrong behavior - hang in 'connect'.
[Qemu-devel] [Bug 1732679] [NEW] Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian 1:2.10.0+dfsg-2)
Public bug reported: Ubuntu 17.04 qemu 2.10.1(Debian 1:2.10.0+dfsg-2) gns3 2.0.3 NX-OSv 9k 7.0.3.I6.1 - No such issue with previous qemu 2.8.x - the issue does not seem to come from the debian packaging - the issue does not seem to come from GNS3 either, as confirmed by Jeremy Grossmann at https://github.com/GNS3/gns3-server/issues/1193#issuecomment-344240460 Either some parameters usage have changed (for instance -bios) (which would make qemu not backwards compatible) or there is an issue with qemu itself. The configuration parameters are: ``` "compute_id": "local", "console": 2010, "console_type": "telnet", "first_port_name": "mgmt0", "height": 48, "label": { "rotation": 0, "style": "font-family: TypeWriter;font-size: 10.0;font-weight: bold;fill: #00;fill-opacity: 1.0;", "text": "NX_OSv_9k_Spine_31", "x": -54, "y": -25 }, "name": "NX_OSv_9k_Spine_31", "node_id": "8d01119a-0adc-41bc-950b-c5639db7708c", "node_type": "qemu", "port_name_format": "Ethernet1/{port1}", "port_segment_size": 0, "properties": { "acpi_shutdown": false, "adapter_type": "e1000", "adapters": 10, "bios_image": "", "bios_image_md5sum": null, "boot_priority": "c", "cdrom_image": "", "cdrom_image_md5sum": null, "cpu_throttling": 0, "cpus": 2, "hda_disk_image": "NX-OSv-9k-7.0.3.I6.1.free.qcow2", "hda_disk_image_md5sum": "18bb991b814a508d1190575f99deed99", "hda_disk_interface": "ide", "hdb_disk_image": "", "hdb_disk_image_md5sum": null, "hdb_disk_interface": "ide", "hdc_disk_image": "", "hdc_disk_image_md5sum": null, "hdc_disk_interface": "ide", "hdd_disk_image": "", "hdd_disk_image_md5sum": null, "hdd_disk_interface": "ide", "initrd": "", "initrd_md5sum": null, "kernel_command_line": "", "kernel_image": "", "kernel_image_md5sum": null, "legacy_networking": false, "mac_address": "00:07:00:03:16:01", "options": "-nographic -enable-kvm -cpu host -machine q35 -smp cpus=2 -bios /usr/share/ovmf/OVMF.fd", "platform": "x86_64", "process_priority": "normal", "qemu_path": "/usr/bin/qemu-system-x86_64", "ram": 6144, "usage": "" ``` The logs are: - [execution log](https://github.com/GNS3/gns3-server/files/1381651/qemu.log.txt) - [terminal log](https://github.com/GNS3/gns3-server/files/1381660/terminal.log.txt) With the latest qemu, I can boot: - Cisco IOSv 15.6(2)T - Cisco IOSv-L2 15.2(20170321:233949) - Cisco CSR 1000v 16.5.1b - Cisco ASAv 9.6(2) The major difference with NX-OSv 9k is the bios parameter: ```-bios /usr/share/ovmf/OVMF.fd```: ``` ll /usr/share/ovmf/OVMF.fd -rw-r--r-- 1 root root 2097152 Dec 9 2016 /usr/share/ovmf/OVMF.fd ``` A normal boot log with qemu 2.8.1 is available [here](https://github.com/GNS3/gns3-server/files/1381729/terminal.log.2.8.txt) Highlighting the differences: qemu 2.8.1 on the left, qemu 2.10.1 on the right hand side with the same boot parameters  ** Affects: qemu Importance: Undecided Status: New ** Summary changed: - Cisco NX-OSv 9k crashes during boot with latest qemu + Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian 1:2.10.0+dfsg-2) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1732679 Title: Cisco NX-OSv 9k crashes during boot with qemu 2.10.1(Debian 1:2.10.0+dfsg-2) Status in QEMU: New Bug description: Ubuntu 17.04 qemu 2.10.1(Debian 1:2.10.0+dfsg-2) gns3 2.0.3 NX-OSv 9k 7.0.3.I6.1 - No such issue with previous qemu 2.8.x - the issue does not seem to come from the debian packaging - the issue does not seem to come from GNS3 either, as confirmed by Jeremy Grossmann at https://github.com/GNS3/gns3-server/issues/1193#issuecomment-344240460 Either some parameters usage have changed (for instance -bios) (which would make qemu not backwards compatible) or there is an issue with qemu itself. The configuration parameters are:
Re: [Qemu-devel] [PATCH v6 3/5] fw_cfg: do DMA read operation
Hi - Original Message - > On Mon, Nov 13, 2017 at 08:27:48PM +0100, Marc-André Lureau wrote: > > Modify fw_cfg_read_blob() to use DMA if the device supports it. > > Return errors, because the operation may fail. > > > > To avoid polling with unbound amount of time, the DMA operation is > > expected to complete within 200ms, or will return ETIME error. > > > > We may want to switch all the *buf addresses to use only kmalloc'ed > > buffers (instead of using stack/image addresses with dma=false). > > > > Signed-off-by: Marc-André Lureau > > Reviewed-by: Gabriel Somlo > > This is causing issues for some people. I have reverted it for now. I didn't know you merged it, where did it go? If you are talking about kernel test robot, it seems to be the timeout. I picked FW_CFG_DMA_TIMEOUT 200ms by guess, we may want to extend that for busy hosts. How much? several seconds? > I wonder why bother with DMA reads. Can we limit DMA to writes? Well, we could, but DMA reads are supposedly quite faster. And if DMA read is broken, there are good chances that DMA write will too. > > --- > > drivers/firmware/qemu_fw_cfg.c | 154 > > - > > 1 file changed, 137 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/firmware/qemu_fw_cfg.c > > b/drivers/firmware/qemu_fw_cfg.c > > index 1f3e8545dab7..2ac4cd869fe6 100644 > > --- a/drivers/firmware/qemu_fw_cfg.c > > +++ b/drivers/firmware/qemu_fw_cfg.c > > @@ -33,6 +33,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > MODULE_AUTHOR("Gabriel L. Somlo "); > > MODULE_DESCRIPTION("QEMU fw_cfg sysfs support"); > > @@ -43,12 +45,26 @@ MODULE_LICENSE("GPL"); > > #define FW_CFG_ID 0x01 > > #define FW_CFG_FILE_DIR 0x19 > > > > +#define FW_CFG_VERSION_DMA 0x02 > > +#define FW_CFG_DMA_CTL_ERROR 0x01 > > +#define FW_CFG_DMA_CTL_READ0x02 > > +#define FW_CFG_DMA_CTL_SKIP0x04 > > +#define FW_CFG_DMA_CTL_SELECT 0x08 > > +#define FW_CFG_DMA_CTL_WRITE 0x10 > > +#define FW_CFG_DMA_TIMEOUT 200 /* ms */ > > + > > /* size in bytes of fw_cfg signature */ > > #define FW_CFG_SIG_SIZE 4 > > > > /* fw_cfg "file name" is up to 56 characters (including terminating nul) > > */ > > #define FW_CFG_MAX_FILE_PATH 56 > > > > +/* platform device for dma mapping */ > > +static struct device *dev; > > + > > +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. > > */ > > +static u32 fw_cfg_rev; > > + > > /* fw_cfg file directory entry type */ > > struct fw_cfg_file { > > u32 size; > > @@ -57,6 +73,12 @@ struct fw_cfg_file { > > char name[FW_CFG_MAX_FILE_PATH]; > > }; > > > > +struct fw_cfg_dma { > > + u32 control; > > + u32 length; > > + u64 address; > > +} __packed; > > + > > /* fw_cfg device i/o register addresses */ > > static bool fw_cfg_is_mmio; > > static phys_addr_t fw_cfg_p_base; > > @@ -75,12 +97,93 @@ static inline u16 fw_cfg_sel_endianness(u16 key) > > return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key); > > } > > > > +static inline bool fw_cfg_dma_enabled(void) > > +{ > > + return fw_cfg_rev & FW_CFG_VERSION_DMA && fw_cfg_reg_dma; > > +} > > + > > +static bool fw_cfg_wait_for_control(struct fw_cfg_dma *d, unsigned long > > timeout) > > +{ > > + ktime_t start; > > + ktime_t stop; > > + > > + start = ktime_get(); > > + stop = ktime_add(start, ms_to_ktime(timeout)); > > + > > + do { > > + if ((be32_to_cpu(d->control) & ~FW_CFG_DMA_CTL_ERROR) == 0) > > + return true; > > + > > + usleep_range(50, 100); > > + } while (ktime_before(ktime_get(), stop)); > > + > > + return false; > > +} > > + > > +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control) > > +{ > > + dma_addr_t dma_addr = 0; > > + struct fw_cfg_dma *d; > > + dma_addr_t dma; > > + ssize_t ret = length; > > + enum dma_data_direction dir = > > + (control & FW_CFG_DMA_CTL_READ ? DMA_FROM_DEVICE : 0); > > + > > + if (address && length) { > > + dma_addr = dma_map_single(dev, address, length, dir); > > + if (dma_mapping_error(NULL, dma_addr)) { > > + WARN(1, "%s: failed to map address\n", __func__); > > + return -EFAULT; > > + } > > + } > > + > > + d = kmalloc(sizeof(*d), GFP_KERNEL | GFP_DMA); > > + if (!d) { > > + ret = -ENOMEM; > > + goto end; > > + } > > + > > + dma = dma_map_single(dev, d, sizeof(*d), DMA_BIDIRECTIONAL); > > + if (dma_mapping_error(NULL, dma)) { > > + WARN(1, "%s: failed to map fw_cfg_dma\n", __func__); > > + ret = -EFAULT; > > + goto end; > > + } > > + > > + *d = (struct fw_cfg_dma) { > > + .address = cpu_to_be64(dma_addr), > > + .length = cpu_to_be32(length), > > + .control = cpu_to_be32(control) > > + }; > > + > > + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma); > > + io
Re: [Qemu-devel] [PULL 0/4] Merge tpm 2017/11/15 v1
On 15 November 2017 at 12:31, Stefan Berger wrote: > This pull request is for 2.11 and extends documentation as well as fixes > bugs related to concurrency and failure mode. > > The following changes since commit 4ffa88c99c54d2a30f79e3dbecec50b023eff1c8: > > Merge remote-tracking branch > 'remotes/berrange/tags/pull-qcrypto-2017-11-08-1' into staging (2017-11-10 > 16:01:35 +) > > are available in the git repository at: > > git://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2017-11-15-1 > > for you to fetch changes up to 6cd65969da57eaa9bdff07b80ecca2becce0597a: > > tpm_tis: Return 0 for every register in case of failure mode (2017-11-15 > 06:47:35 -0500) > > > Merge tpm 2017/11/15 v1 > > > Marc-André Lureau (1): > tpm-emulator: protect concurrent ctrl_chr access > > Stefan Berger (3): > specs: Extend TPM spec with TPM emulator description > tpm_tis: Return TPM_VERSION_UNSPEC in case of BE failure > tpm_tis: Return 0 for every register in case of failure mode > > docs/specs/tpm.txt| 79 > +++ > hw/tpm/tpm_emulator.c | 44 +--- > hw/tpm/tpm_tis.c | 6 +++- > 3 files changed, 112 insertions(+), 17 deletions(-) Applied, thanks. -- PMM
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
On Wed, 15 Nov 2017 21:41:20 +0100 Gandalf Corvotempesta wrote: > 2017-11-15 21:29 GMT+01:00 Richard W.M. Jones : > > Gandalf, is there an XVA file publically available (pref. not > > too big) that we can look at? > > I can try to provide one, but it's simple: > > # tar tvf 20160630_124823_aa72_.xva.gz | head -n 50 > -- 0/0 42353 1970-01-01 01:00 ova.xml > -- 0/0 1048576 1970-01-01 01:00 Ref:175/ > -- 0/0 40 1970-01-01 01:00 Ref:175/.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0001 > -- 0/0 40 1970-01-01 01:00 Ref:175/0001.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0003 > -- 0/0 40 1970-01-01 01:00 Ref:175/0003.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0004 > -- 0/0 40 1970-01-01 01:00 Ref:175/0004.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0005 > -- 0/0 40 1970-01-01 01:00 Ref:175/0005.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0006 > -- 0/0 40 1970-01-01 01:00 Ref:175/0006.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0007 > -- 0/0 40 1970-01-01 01:00 Ref:175/0007.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0009 > -- 0/0 40 1970-01-01 01:00 Ref:175/0009.checksum > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0010 > -- 0/0 40 1970-01-01 01:00 Ref:175/0010.checksum > > > You can ignore the ova.xml and just use the "Ref:175" directory. > Inside the XVA you'll fine one "Ref" directory for each virtual disk > (ref number is different for each disk) > Inside each Ref directory, you'll get tons of 1MB file, corrisponding > to the RAW image. > You have to merge these files in a single raw file with just an > exception: numbers are not sequential. > as you can see above, we have: , 0001, 0003 > > The 0002 is missing, because it's totally blank. XenServer doesn't > export any empty block, thus it will skip the corrisponding 1MB file. > When building the raw image, you have to replace empty blocks with 1MB > full of zeros. > > This is (in addition to the tar extract) the most time-consuming part. > Currently I'm rebuilding a 250GB image, with tons of files to be > merge. > If qemu-img can be patched to automatically convert this kind of > format, I can save about 3 hours (30 minutes for extracting the > tarball, and about 2 hours to merge 250-300GB image) > Hi, I like what Richard and Max came up with. Pretty nifty solutions. While there's nothing wrong with them I decided to take my own shot at it. Since the blocks in tar file are pieces of raw image there is no conversion happening. All in all it's just moving bytes from one place to another. That means there shouldn't be a need for any heavy machinery, right? :) Attached is a shell script that uses dd to do the byte-shuffling. I'm pretty sure this could even be safely parallelized by running multiple instances of dd at the same time (e.g. with xargs). But I did not try that. Tomas -- Tomáš Golembiovský xva2raw.sh Description: application/shellscript
[Qemu-devel] [RFC v4 02/27] qobject: introduce qobject_get_try_str()
A quick way to fetch string from qobject when it's a QString. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- include/qapi/qmp/qstring.h | 1 + qobject/qstring.c | 11 +++ 2 files changed, 12 insertions(+) diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h index 34278bd639..12ae4e1c29 100644 --- a/include/qapi/qmp/qstring.h +++ b/include/qapi/qmp/qstring.h @@ -28,6 +28,7 @@ QString *qstring_from_substr(const char *str, int start, int end); size_t qstring_get_length(const QString *qstring); const char *qstring_get_str(const QString *qstring); const char *qstring_get_try_str(const QString *qstring); +const char *qobject_get_try_str(const QObject *qstring); void qstring_append_int(QString *qstring, int64_t value); void qstring_append(QString *qstring, const char *str); void qstring_append_chr(QString *qstring, int c); diff --git a/qobject/qstring.c b/qobject/qstring.c index 9df04e3c7b..bbe689a9e3 100644 --- a/qobject/qstring.c +++ b/qobject/qstring.c @@ -139,6 +139,17 @@ const char *qstring_get_try_str(const QString *qstring) } /** + * qobject_get_try_str(): Return a pointer of the backstore string + * + * NOTE: the string will only be returned if the object is valid, and + * its type is QString, otherwise NULL is returned. + */ +const char *qobject_get_try_str(const QObject *qstring) +{ +return qstring_get_try_str(qobject_to_qstring(qstring)); +} + +/** * qstring_destroy_obj(): Free all memory allocated by a QString * object */ -- 2.13.6
[Qemu-devel] [RFC v4 00/27] QMP: out-of-band (OOB) execution support
v4: - drop first patch to fix IOWatchPool [Stefan, Dan] - add s-o-b where missing, and newly got r-bs - fix English error in commit msg [Fam] - some tunes on request-dropped event: [Stefan] - firstly let 'id' be any type, meanwhile make sure "id" is there as long as OOB is enabled for the monitor. - some comments fix - add new command "x-oob-test" for testing oob commands [Stefan] - simplify the test codes to use new x-oob-test - flush response queue before cleanup monitors This series was born from this one: https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg04310.html The idea comes from Markus Armbruster and the discussion we had in the thread. It's not a super ideal solution (I believe Markus had been thinking hard to keep everything in order meanwhile trying to satisfy the migration requirement), but AFAIU it's currently the best. What is OOB? It's the shortcut of Out-Of-Band execution, its name is given by Markus. It's a way to quickly execute a QMP request. Say, originally QMP is going throw these steps: JSON Parser --> QMP Dispatcher --> Respond /|\(2)(3) | (1) | \|/ (4) +- main thread + The requests are executed by the so-called QMP-dispatcher after the JSON is parsed. If OOB is on, we run the command directly in the parser and quickly returns. This series changed the QMP handling logic by moving the command parsing and responding phases into IOThreads, so to be more accurate, after the series the above graph would change into this: queue/kick queue/kick JSON Parser ==> QMP Dispatcher => Responder /|\ | (3) /|\ | (4) | /|\ (1) | | (2)| | | | | || \|/ (6)| |(5) | |main thread | | | || | | +> monitor IO thread <---+ | +---/ \--+ New Interfaces == QMP Introspection Changes - When do query-qmp-schema, we were getting something like: {"name": "migrate-incoming", "ret-type": "17", "meta-type": "command", "arg-type": "89"} Now we will get a new key named "allow-oob": {"name": "migrate-incoming", "ret-type": "17", "allow-oob": true, "meta-type": "command", "arg-type": "89"} Which shows whether a command supports OOB. QMP Negociation Changes --- We were running "qmp_capabilities" always without parameters like: {"execute": "qmp_capabilities"} Now we can enable capabilities (we don't have any capability before this series) like OOB using: {"execute": "qmp_capabilities", "arguments": {"enable": ["oob"]}} Only after we explicitly enable OOB capability can we send requests in OOB manner. Otherwise we'll have exactly the same QMP session as before, just like when OOB is not there. When OOB is enabled, it's possible that OOB reply reaches faster than previous command, so clients should be prepared. Trigger OOB execution - Let's take migrate-incoming as example. The old command looks like: {"execute": "migrate-incoming", "arguments" : {"uri": "xx"}} To execute a command with OOB execution, we need to specify it in the QMP request in the extra "control" key: {"execute": "migrate-incoming", "arguments" : {"uri": "xx"}, "control" { "run-oob": true } } Then the command will be run in parser, and it can preempt other commands. Others = The last patch of OOB test may need some attention. I used dump-guest-memory as a time-consuming command to test whether OOB is working, and the only command I can test now is "migrate-incoming". I hope that is a "okay" solution for unit tests. Any other suggestions on that would be welcomed. Please review. Thanks. Peter Xu (27): qobject: introduce qstring_get_try_str() qobject: introduce qobject_get_try_str() qobject: let object_property_get_str() use new API monitor: move skip_flush into monitor_data_init qjson: add "opaque" field to JSONMessageParser monitor: move the cur_mon hack deeper for QMP monitor: unify global init monitor: let mon_list be tail queue monitor: create monitor dedicate iothread monitor: allow to use IO thread for parsing qmp: introduce QMPCapability qmp: negociate QMP capabilities qmp: introduce some capability helpers monitor: introduce monitor_qmp_respond() monitor: let monitor_{suspend|resume} thread safe monitor: separate QMP parser and dispatcher qmp: add new event "request-dropped" monitor: send event when request queue full qapi: introduce new cmd option "allow-oob" qmp: support out-of-band (oob) execution qmp: let migrate-incoming allow out-of-band qmp: isolate responses into io thread monitor: enable IO thread for (qmp & !
[Qemu-devel] [RFC v4 03/27] qobject: let object_property_get_str() use new API
We can simplify object_property_get_str() using the new qobject_get_try_str(). Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- qom/object.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/qom/object.c b/qom/object.c index c58c52d518..9cbeb51f0b 100644 --- a/qom/object.c +++ b/qom/object.c @@ -1116,18 +1116,15 @@ char *object_property_get_str(Object *obj, const char *name, Error **errp) { QObject *ret = object_property_get_qobject(obj, name, errp); -QString *qstring; char *retval; if (!ret) { return NULL; } -qstring = qobject_to_qstring(ret); -if (!qstring) { + +retval = g_strdup(qobject_get_try_str(ret)); +if (!retval) { error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name, "string"); -retval = NULL; -} else { -retval = g_strdup(qstring_get_str(qstring)); } qobject_decref(ret); -- 2.13.6
[Qemu-devel] [RFC v4 07/27] monitor: unify global init
There are many places for monitor init its globals, at least: - monitor_init_qmp_commands() at the very beginning - single function to init monitor_lock - in the first entry of monitor_init() using "is_first_init" Unify them a bit. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- include/monitor/monitor.h | 2 +- monitor.c | 25 ++--- vl.c | 3 ++- 3 files changed, 13 insertions(+), 17 deletions(-) diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h index 83ea4a1aaf..3a5128ab1b 100644 --- a/include/monitor/monitor.h +++ b/include/monitor/monitor.h @@ -16,7 +16,7 @@ extern Monitor *cur_mon; bool monitor_cur_is_qmp(void); -void monitor_init_qmp_commands(void); +void monitor_init_globals(void); void monitor_init(Chardev *chr, int flags); void monitor_cleanup(void); diff --git a/monitor.c b/monitor.c index 322dfb5f31..ac5313023b 100644 --- a/monitor.c +++ b/monitor.c @@ -1001,7 +1001,7 @@ static void qmp_unregister_commands_hack(void) #endif } -void monitor_init_qmp_commands(void) +static void monitor_init_qmp_commands(void) { /* * Two command lists: @@ -4034,6 +4034,14 @@ static void sortcmdlist(void) qsort((void *)info_cmds, array_num, elem_size, compare_mon_cmd); } +void monitor_init_globals(void) +{ +monitor_init_qmp_commands(); +monitor_qapi_event_init(); +sortcmdlist(); +qemu_mutex_init(&monitor_lock); +} + /* These functions just adapt the readline interface in a typesafe way. We * could cast function pointers but that discards compiler checks. */ @@ -4074,23 +4082,10 @@ void error_vprintf_unless_qmp(const char *fmt, va_list ap) } } -static void __attribute__((constructor)) monitor_lock_init(void) -{ -qemu_mutex_init(&monitor_lock); -} - void monitor_init(Chardev *chr, int flags) { -static int is_first_init = 1; -Monitor *mon; - -if (is_first_init) { -monitor_qapi_event_init(); -sortcmdlist(); -is_first_init = 0; -} +Monitor *mon = g_malloc(sizeof(*mon)); -mon = g_malloc(sizeof(*mon)); monitor_data_init(mon, false); qemu_chr_fe_init(&mon->chr, chr, &error_abort); diff --git a/vl.c b/vl.c index 7372424fa7..f1c6ef6d95 100644 --- a/vl.c +++ b/vl.c @@ -3144,7 +3144,6 @@ int main(int argc, char **argv, char **envp) qemu_init_exec_dir(argv[0]); module_call_init(MODULE_INIT_QOM); -monitor_init_qmp_commands(); qemu_add_opts(&qemu_drive_opts); qemu_add_drive_opts(&qemu_legacy_drive_opts); @@ -4692,6 +4691,8 @@ int main(int argc, char **argv, char **envp) parse_numa_opts(current_machine); +monitor_init_globals(); + if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, NULL)) { exit(1); -- 2.13.6
[Qemu-devel] [RFC v4 01/27] qobject: introduce qstring_get_try_str()
The only difference from qstring_get_str() is that it allows the qstring to be NULL. If so, NULL is returned. CC: Eric Blake CC: Markus Armbruster Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- include/qapi/qmp/qstring.h | 1 + qobject/qstring.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h index 10076b7c8c..34278bd639 100644 --- a/include/qapi/qmp/qstring.h +++ b/include/qapi/qmp/qstring.h @@ -27,6 +27,7 @@ QString *qstring_from_str(const char *str); QString *qstring_from_substr(const char *str, int start, int end); size_t qstring_get_length(const QString *qstring); const char *qstring_get_str(const QString *qstring); +const char *qstring_get_try_str(const QString *qstring); void qstring_append_int(QString *qstring, int64_t value); void qstring_append(QString *qstring, const char *str); void qstring_append_chr(QString *qstring, int c); diff --git a/qobject/qstring.c b/qobject/qstring.c index 5da7b5f37c..9df04e3c7b 100644 --- a/qobject/qstring.c +++ b/qobject/qstring.c @@ -129,6 +129,16 @@ const char *qstring_get_str(const QString *qstring) } /** + * qstring_get_try_str(): Return a pointer to the stored string + * + * NOTE: will return NULL if qstring is not provided. + */ +const char *qstring_get_try_str(const QString *qstring) +{ +return qstring ? qstring_get_str(qstring) : NULL; +} + +/** * qstring_destroy_obj(): Free all memory allocated by a QString * object */ -- 2.13.6
[Qemu-devel] [RFC v4 09/27] monitor: create monitor dedicate iothread
Create one IOThread for the monitors, prepared to handle all the input/output IOs using existing iothread framework. Signed-off-by: Peter Xu --- monitor.c | 29 + 1 file changed, 29 insertions(+) diff --git a/monitor.c b/monitor.c index a70ab5606b..4ce9828fab 100644 --- a/monitor.c +++ b/monitor.c @@ -76,6 +76,7 @@ #include "qmp-introspect.h" #include "sysemu/qtest.h" #include "sysemu/cpus.h" +#include "sysemu/iothread.h" #include "qemu/cutils.h" #include "qapi/qmp/dispatch.h" @@ -208,6 +209,12 @@ struct Monitor { QTAILQ_ENTRY(Monitor) entry; }; +struct MonitorGlobal { +IOThread *mon_iothread; +}; + +static struct MonitorGlobal mon_global; + /* QMP checker flags */ #define QMP_ACCEPT_UNKNOWNS 1 @@ -4034,12 +4041,24 @@ static void sortcmdlist(void) qsort((void *)info_cmds, array_num, elem_size, compare_mon_cmd); } +static GMainContext *monitor_io_context_get(void) +{ +return iothread_get_g_main_context(mon_global.mon_iothread); +} + +static void monitor_iothread_init(void) +{ +mon_global.mon_iothread = iothread_create("monitor_iothread", + &error_abort); +} + void monitor_init_globals(void) { monitor_init_qmp_commands(); monitor_qapi_event_init(); sortcmdlist(); qemu_mutex_init(&monitor_lock); +monitor_iothread_init(); } /* These functions just adapt the readline interface in a typesafe way. We @@ -4117,6 +4136,13 @@ void monitor_cleanup(void) { Monitor *mon, *next; +/* + * We need to explicitly stop the iothread (but not destroy it), + * cleanup the monitor resources, then destroy the iothread. See + * again on the glib bug mentioned in 2b316774f6 for a reason. + */ +iothread_stop(mon_global.mon_iothread); + qemu_mutex_lock(&monitor_lock); QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) { QTAILQ_REMOVE(&mon_list, mon, entry); @@ -4124,6 +4150,9 @@ void monitor_cleanup(void) g_free(mon); } qemu_mutex_unlock(&monitor_lock); + +iothread_destroy(mon_global.mon_iothread); +mon_global.mon_iothread = NULL; } QemuOptsList qemu_mon_opts = { -- 2.13.6
[Qemu-devel] [RFC v4 05/27] qjson: add "opaque" field to JSONMessageParser
It'll be passed to emit() as well when it happens. Since at it, add a typedef for the emitter function. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- include/qapi/qmp/json-streamer.h | 10 -- monitor.c| 7 --- qga/main.c | 5 +++-- qobject/json-streamer.c | 6 -- qobject/qjson.c | 5 +++-- tests/libqtest.c | 5 +++-- 6 files changed, 25 insertions(+), 13 deletions(-) diff --git a/include/qapi/qmp/json-streamer.h b/include/qapi/qmp/json-streamer.h index 00d8a23af8..9198b67342 100644 --- a/include/qapi/qmp/json-streamer.h +++ b/include/qapi/qmp/json-streamer.h @@ -23,9 +23,14 @@ typedef struct JSONToken { char str[]; } JSONToken; +struct JSONMessageParser; +typedef void (*JSONMessageEmitFunc)(struct JSONMessageParser *parser, +GQueue *tokens, void *opaque); + typedef struct JSONMessageParser { -void (*emit)(struct JSONMessageParser *parser, GQueue *tokens); +JSONMessageEmitFunc emit; +void *opaque; JSONLexer lexer; int brace_count; int bracket_count; @@ -34,7 +39,8 @@ typedef struct JSONMessageParser } JSONMessageParser; void json_message_parser_init(JSONMessageParser *parser, - void (*func)(JSONMessageParser *, GQueue *)); + JSONMessageEmitFunc func, + void *opaque); int json_message_parser_feed(JSONMessageParser *parser, const char *buffer, size_t size); diff --git a/monitor.c b/monitor.c index 3940737c1c..ab80d32c70 100644 --- a/monitor.c +++ b/monitor.c @@ -3808,7 +3808,8 @@ static int monitor_can_read(void *opaque) return (mon->suspend_cnt == 0) ? 1 : 0; } -static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens) +static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, + void *opaque) { QObject *req, *rsp = NULL, *id = NULL; QDict *qdict = NULL; @@ -3955,7 +3956,7 @@ static void monitor_qmp_event(void *opaque, int event) break; case CHR_EVENT_CLOSED: json_message_parser_destroy(&mon->qmp.parser); -json_message_parser_init(&mon->qmp.parser, handle_qmp_command); +json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL); mon_refcount--; monitor_fdsets_cleanup(); break; @@ -4105,7 +4106,7 @@ void monitor_init(Chardev *chr, int flags) qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read, monitor_qmp_event, NULL, mon, NULL, true); qemu_chr_fe_set_echo(&mon->chr, true); -json_message_parser_init(&mon->qmp.parser, handle_qmp_command); +json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL); } else { qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read, monitor_event, NULL, mon, NULL, true); diff --git a/qga/main.c b/qga/main.c index 62a62755bd..3b5ebbc1ee 100644 --- a/qga/main.c +++ b/qga/main.c @@ -593,7 +593,8 @@ static void process_command(GAState *s, QDict *req) } /* handle requests/control events coming in over the channel */ -static void process_event(JSONMessageParser *parser, GQueue *tokens) +static void process_event(JSONMessageParser *parser, GQueue *tokens, + void *opaque) { GAState *s = container_of(parser, GAState, parser); QDict *qdict; @@ -1320,7 +1321,7 @@ static int run_agent(GAState *s, GAConfig *config, int socket_activation) s->command_state = ga_command_state_new(); ga_command_state_init(s, s->command_state); ga_command_state_init_all(s->command_state); -json_message_parser_init(&s->parser, process_event); +json_message_parser_init(&s->parser, process_event, NULL); #ifndef _WIN32 if (!register_signal_handlers()) { diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c index c51c2021f9..8fc1b15321 100644 --- a/qobject/json-streamer.c +++ b/qobject/json-streamer.c @@ -102,18 +102,20 @@ out_emit: */ tokens = parser->tokens; parser->tokens = g_queue_new(); -parser->emit(parser, tokens); +parser->emit(parser, tokens, parser->opaque); parser->token_size = 0; } void json_message_parser_init(JSONMessageParser *parser, - void (*func)(JSONMessageParser *, GQueue *)) + JSONMessageEmitFunc func, + void *opaque) { parser->emit = func; parser->brace_count = 0; parser->bracket_count = 0; parser->tokens = g_queue_new(); parser->token_size = 0; +parser->opaque = opaque; json_lexer_init(&parser->lexer, json_message_process_token); } diff --git a/qobject/qjson.c b/qobject/qjson.c index 2e0930884e..f9766febe3 100644 --- a/qobject/qjson.c +++ b/qobj
[Qemu-devel] [RFC v4 10/27] monitor: allow to use IO thread for parsing
For each Monitor, add one field "use_io_thr" to show whether it will be using the dedicated monitor IO thread to handle input/output. When set, monitor IO parsing work will be offloaded to dedicated monitor IO thread, rather than the original main loop thread. This only works for QMP. HMP will always be run on main loop thread. Currently we're still keeping use_io_thr to off always. Will turn it on later at some point. One thing to mention is that we cannot set use_io_thr for every QMP monitors. The problem is that MUXed typed chardevs may not work well with it now. When MUX is used, frontend of chardev can be the monitor plus something else. The only thing we know would be safe to be run outside main thread so far is the monitor frontend. All the rest of the frontends should still be run in main thread only. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/monitor.c b/monitor.c index 4ce9828fab..685938b9f2 100644 --- a/monitor.c +++ b/monitor.c @@ -191,6 +191,7 @@ struct Monitor { int flags; int suspend_cnt; bool skip_flush; +bool use_io_thr; QemuMutex out_lock; QString *outbuf; @@ -575,7 +576,8 @@ static void monitor_qapi_event_init(void) static void handle_hmp_command(Monitor *mon, const char *cmdline); -static void monitor_data_init(Monitor *mon, bool skip_flush) +static void monitor_data_init(Monitor *mon, bool skip_flush, + bool use_io_thr) { memset(mon, 0, sizeof(Monitor)); qemu_mutex_init(&mon->out_lock); @@ -583,6 +585,7 @@ static void monitor_data_init(Monitor *mon, bool skip_flush) /* Use *mon_cmds by default. */ mon->cmd_table = mon_cmds; mon->skip_flush = skip_flush; +mon->use_io_thr = use_io_thr; } static void monitor_data_destroy(Monitor *mon) @@ -603,7 +606,7 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index, char *output = NULL; Monitor *old_mon, hmp; -monitor_data_init(&hmp, true); +monitor_data_init(&hmp, true, false); old_mon = cur_mon; cur_mon = &hmp; @@ -4104,8 +4107,9 @@ void error_vprintf_unless_qmp(const char *fmt, va_list ap) void monitor_init(Chardev *chr, int flags) { Monitor *mon = g_malloc(sizeof(*mon)); +GMainContext *context; -monitor_data_init(mon, false); +monitor_data_init(mon, false, false); qemu_chr_fe_init(&mon->chr, chr, &error_abort); mon->flags = flags; @@ -4117,19 +4121,37 @@ void monitor_init(Chardev *chr, int flags) monitor_read_command(mon, 0); } +if (mon->use_io_thr) { +/* + * When use_io_thr is set, we use the global shared dedicated + * IO thread for this monitor to handle input/output. + */ +context = monitor_io_context_get(); +/* We should have inited globals before reaching here. */ +assert(context); +} else { +/* The default main loop, which is the main thread */ +context = NULL; +} + +/* + * Add the monitor before running it (which is triggered by + * qemu_chr_fe_set_handlers), otherwise one monitor may find + * itself not on the mon_list when running. + */ +qemu_mutex_lock(&monitor_lock); +QTAILQ_INSERT_HEAD(&mon_list, mon, entry); +qemu_mutex_unlock(&monitor_lock); + if (monitor_is_qmp(mon)) { qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read, - monitor_qmp_event, NULL, mon, NULL, true); + monitor_qmp_event, NULL, mon, context, true); qemu_chr_fe_set_echo(&mon->chr, true); json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon); } else { qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read, monitor_event, NULL, mon, NULL, true); } - -qemu_mutex_lock(&monitor_lock); -QLIST_INSERT_HEAD(&mon_list, mon, entry); -qemu_mutex_unlock(&monitor_lock); } void monitor_cleanup(void) -- 2.13.6
[Qemu-devel] [RFC v4 04/27] monitor: move skip_flush into monitor_data_init
It's part of the data init. Collect it. Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index e36fb5308d..3940737c1c 100644 --- a/monitor.c +++ b/monitor.c @@ -568,13 +568,14 @@ static void monitor_qapi_event_init(void) static void handle_hmp_command(Monitor *mon, const char *cmdline); -static void monitor_data_init(Monitor *mon) +static void monitor_data_init(Monitor *mon, bool skip_flush) { memset(mon, 0, sizeof(Monitor)); qemu_mutex_init(&mon->out_lock); mon->outbuf = qstring_new(); /* Use *mon_cmds by default. */ mon->cmd_table = mon_cmds; +mon->skip_flush = skip_flush; } static void monitor_data_destroy(Monitor *mon) @@ -595,8 +596,7 @@ char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index, char *output = NULL; Monitor *old_mon, hmp; -monitor_data_init(&hmp); -hmp.skip_flush = true; +monitor_data_init(&hmp, true); old_mon = cur_mon; cur_mon = &hmp; @@ -4089,7 +4089,7 @@ void monitor_init(Chardev *chr, int flags) } mon = g_malloc(sizeof(*mon)); -monitor_data_init(mon); +monitor_data_init(mon, false); qemu_chr_fe_init(&mon->chr, chr, &error_abort); mon->flags = flags; -- 2.13.6
[Qemu-devel] [RFC v4 06/27] monitor: move the cur_mon hack deeper for QMP
In monitor_qmp_read(), we have the hack to temporarily replace the cur_mon pointer. Now we move this hack deeper inside the QMP dispatcher routine since the Monitor pointer can be passed in to that using the new JSON Parser opaque field now. This does not make much sense as a single patch. However, this will be a big step for the next patch, when the QMP dispatcher routine will be split from the QMP parser. Reviewed-by: Eric Blake Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/monitor.c b/monitor.c index ab80d32c70..322dfb5f31 100644 --- a/monitor.c +++ b/monitor.c @@ -3813,7 +3813,7 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, { QObject *req, *rsp = NULL, *id = NULL; QDict *qdict = NULL; -Monitor *mon = cur_mon; +Monitor *mon = opaque, *old_mon; Error *err = NULL; req = json_parser_parse_err(tokens, NULL, &err); @@ -3838,8 +3838,13 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, QDECREF(req_json); } +old_mon = cur_mon; +cur_mon = mon; + rsp = qmp_dispatch(cur_mon->qmp.commands, req); +cur_mon = old_mon; + if (mon->qmp.commands == &qmp_cap_negotiation_commands) { qdict = qdict_get_qdict(qobject_to_qdict(rsp), "error"); if (qdict @@ -3876,13 +3881,9 @@ err_out: static void monitor_qmp_read(void *opaque, const uint8_t *buf, int size) { -Monitor *old_mon = cur_mon; - -cur_mon = opaque; - -json_message_parser_feed(&cur_mon->qmp.parser, (const char *) buf, size); +Monitor *mon = opaque; -cur_mon = old_mon; +json_message_parser_feed(&mon->qmp.parser, (const char *) buf, size); } static void monitor_read(void *opaque, const uint8_t *buf, int size) @@ -3956,7 +3957,7 @@ static void monitor_qmp_event(void *opaque, int event) break; case CHR_EVENT_CLOSED: json_message_parser_destroy(&mon->qmp.parser); -json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL); +json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon); mon_refcount--; monitor_fdsets_cleanup(); break; @@ -4106,7 +4107,7 @@ void monitor_init(Chardev *chr, int flags) qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_qmp_read, monitor_qmp_event, NULL, mon, NULL, true); qemu_chr_fe_set_echo(&mon->chr, true); -json_message_parser_init(&mon->qmp.parser, handle_qmp_command, NULL); +json_message_parser_init(&mon->qmp.parser, handle_qmp_command, mon); } else { qemu_chr_fe_set_handlers(&mon->chr, monitor_can_read, monitor_read, monitor_event, NULL, mon, NULL, true); -- 2.13.6
[Qemu-devel] [RFC v4 13/27] qmp: introduce some capability helpers
Introduce qmp_cap_enabled() and qmp_oob_enabled() helpers. Signed-off-by: Peter Xu --- monitor.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/monitor.c b/monitor.c index da03d1370f..655c25041c 100644 --- a/monitor.c +++ b/monitor.c @@ -1038,6 +1038,16 @@ static void monitor_init_qmp_commands(void) qmp_marshal_qmp_capabilities, QCO_NO_OPTIONS); } +static bool qmp_cap_enabled(Monitor *mon, QMPCapability cap) +{ +return mon->qmp.qmp_caps[cap]; +} + +static bool qmp_oob_enabled(Monitor *mon) +{ +return qmp_cap_enabled(mon, QMP_CAPABILITY_OOB); +} + static void qmp_caps_check(Monitor *mon, QMPCapabilityList *list, Error **errp) { -- 2.13.6
[Qemu-devel] [RFC v4 18/27] monitor: send event when request queue full
Set maximum QMP request queue length to 8. If queue full, instead of queue the command, we directly return a "request-dropped" event, telling client that specific command is dropped. Note that this flow control mechanism is only valid if OOB is enabled. If it's not, the effective queue length will always be 1, which strictly follows original behavior of QMP command handling (which never drop messages). Signed-off-by: Peter Xu --- monitor.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/monitor.c b/monitor.c index 0e2ff114cc..92e846e195 100644 --- a/monitor.c +++ b/monitor.c @@ -4029,6 +4029,8 @@ static void monitor_qmp_bh_dispatcher(void *data) } } +#define QMP_REQ_QUEUE_LEN_MAX (8) + static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, void *opaque) { @@ -4061,6 +4063,19 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, req_obj->id = id; req_obj->req = req; +if (qmp_oob_enabled(mon)) { +/* Drop the request if queue is full. */ +if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) { +qapi_event_send_request_dropped(id, +REQUEST_DROP_REASON_QUEUE_FULL, +NULL); +qobject_decref(id); +qobject_decref(req); +g_free(req_obj); +return; +} +} + /* * Put the request to the end of queue so that requests will be * handled in time order. Ownership for req_obj, req, id, -- 2.13.6
[Qemu-devel] [RFC v4 12/27] qmp: negociate QMP capabilities
After this patch, we will allow QMP clients to enable QMP capabilities when sending the first "qmp_capabilities" command. Originally we are starting QMP session with no arguments like: { "execute": "qmp_capabilities" } Now we can enable some QMP capabilities using (take OOB as example, which is the only one capability that we support): { "execute": "qmp_capabilities", "argument": { "enable": [ "oob" ] } } When the "argument" key is not provided, no capability is enabled. For capability "oob", the monitor needs to be run on dedicated IO thread, otherwise the command will fail. For example, trying to enable OOB on a MUXed typed QMP monitor will fail. One thing to mention is that, QMP capabilities are per-monitor, and also when the connection is closed due to some reason, the capabilities will be reset. Signed-off-by: Peter Xu --- monitor.c| 57 +++- qapi-schema.json | 15 --- 2 files changed, 68 insertions(+), 4 deletions(-) diff --git a/monitor.c b/monitor.c index 468303b472..da03d1370f 100644 --- a/monitor.c +++ b/monitor.c @@ -167,6 +167,7 @@ typedef struct { * mode. */ QmpCommandList *commands; +bool qmp_caps[QMP_CAPABILITY__MAX]; } MonitorQMP; /* @@ -1037,8 +1038,42 @@ static void monitor_init_qmp_commands(void) qmp_marshal_qmp_capabilities, QCO_NO_OPTIONS); } -void qmp_qmp_capabilities(Error **errp) +static void qmp_caps_check(Monitor *mon, QMPCapabilityList *list, + Error **errp) +{ +for (; list; list = list->next) { +assert(list->value < QMP_CAPABILITY__MAX); +switch (list->value) { +case QMP_CAPABILITY_OOB: +if (!mon->use_io_thr) { +/* + * Out-Of-Band only works with monitors that are + * running on dedicated IOThread. + */ +error_setg(errp, "This monitor does not support " + "Out-Of-Band (OOB)"); +return; +} +break; +default: +break; +} +} +} + +/* This function should only be called after capabilities are checked. */ +static void qmp_caps_apply(Monitor *mon, QMPCapabilityList *list) +{ +for (; list; list = list->next) { +mon->qmp.qmp_caps[list->value] = true; +} +} + +void qmp_qmp_capabilities(bool has_enable, QMPCapabilityList *enable, + Error **errp) { +Error *local_err = NULL; + if (cur_mon->qmp.commands == &qmp_commands) { error_set(errp, ERROR_CLASS_COMMAND_NOT_FOUND, "Capabilities negotiation is already complete, command " @@ -1046,6 +1081,20 @@ void qmp_qmp_capabilities(Error **errp) return; } +/* Enable QMP capabilities provided by the guest if applicable. */ +if (has_enable) { +qmp_caps_check(cur_mon, enable, &local_err); +if (local_err) { +/* + * Failed check on either of the capabilities will fail + * the apply of all. + */ +error_propagate(errp, local_err); +return; +} +qmp_caps_apply(cur_mon, enable); +} + cur_mon->qmp.commands = &qmp_commands; } @@ -3963,6 +4012,11 @@ static QObject *get_qmp_greeting(void) return QOBJECT(result); } +static void monitor_qmp_caps_reset(Monitor *mon) +{ +memset(mon->qmp.qmp_caps, 0, sizeof(mon->qmp.qmp_caps)); +} + static void monitor_qmp_event(void *opaque, int event) { QObject *data; @@ -3971,6 +4025,7 @@ static void monitor_qmp_event(void *opaque, int event) switch (event) { case CHR_EVENT_OPENED: mon->qmp.commands = &qmp_cap_negotiation_commands; +monitor_qmp_caps_reset(mon); data = get_qmp_greeting(); monitor_json_emitter(mon, data); qobject_decref(data); diff --git a/qapi-schema.json b/qapi-schema.json index 03201578b4..531fd4c0db 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -102,21 +102,30 @@ # # Enable QMP capabilities. # -# Arguments: None. +# Arguments: +# +# @enable:List of QMPCapabilities to enable, which is optional. +# If not provided, it means no QMP capabilities will be +# enabled. (since 2.12) # # Example: # -# -> { "execute": "qmp_capabilities" } +# -> { "execute": "qmp_capabilities", +# "arguments": { "enable": [ "oob" ] } } # <- { "return": {} } # # Notes: This command is valid exactly when first connecting: it must be # issued before any other command will be accepted, and will fail once the # monitor is accepting other commands. (see qemu docs/interop/qmp-spec.txt) # +# QMP client needs to explicitly enable QMP capabilities, otherwise +# all the QMP capabilities will be turned off by default. +# # Since: 0.13 # ## -{ 'command': 'qmp_capabilities' } +{ 'command': 'qmp_capabilities', + 'data': { '*enable'
[Qemu-devel] [RFC v4 08/27] monitor: let mon_list be tail queue
It was QLIST. I want to use this list to do monitor priority job later, which need tail insertion ability. So switching to a tail queue. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/monitor.c b/monitor.c index ac5313023b..a70ab5606b 100644 --- a/monitor.c +++ b/monitor.c @@ -205,7 +205,7 @@ struct Monitor { void *password_opaque; mon_cmd_t *cmd_table; QLIST_HEAD(,mon_fd_t) fds; -QLIST_ENTRY(Monitor) entry; +QTAILQ_ENTRY(Monitor) entry; }; /* QMP checker flags */ @@ -214,7 +214,7 @@ struct Monitor { /* Protects mon_list, monitor_event_state. */ static QemuMutex monitor_lock; -static QLIST_HEAD(mon_list, Monitor) mon_list; +static QTAILQ_HEAD(mon_list, Monitor) mon_list; static QLIST_HEAD(mon_fdsets, MonFdset) mon_fdsets; static int mon_refcount; @@ -415,7 +415,7 @@ static void monitor_qapi_event_emit(QAPIEvent event, QDict *qdict) Monitor *mon; trace_monitor_protocol_event_emit(event, qdict); -QLIST_FOREACH(mon, &mon_list, entry) { +QTAILQ_FOREACH(mon, &mon_list, entry) { if (monitor_is_qmp(mon) && mon->qmp.commands != &qmp_cap_negotiation_commands) { monitor_json_emitter(mon, QOBJECT(qdict)); @@ -4118,8 +4118,8 @@ void monitor_cleanup(void) Monitor *mon, *next; qemu_mutex_lock(&monitor_lock); -QLIST_FOREACH_SAFE(mon, &mon_list, entry, next) { -QLIST_REMOVE(mon, entry); +QTAILQ_FOREACH_SAFE(mon, &mon_list, entry, next) { +QTAILQ_REMOVE(&mon_list, mon, entry); monitor_data_destroy(mon); g_free(mon); } -- 2.13.6
[Qemu-devel] [RFC v4 14/27] monitor: introduce monitor_qmp_respond()
A tiny refactoring, preparing to split the QMP dispatcher away. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 48 +++- 1 file changed, 31 insertions(+), 17 deletions(-) diff --git a/monitor.c b/monitor.c index 655c25041c..ec03f1b232 100644 --- a/monitor.c +++ b/monitor.c @@ -3877,6 +3877,36 @@ static int monitor_can_read(void *opaque) return (mon->suspend_cnt == 0) ? 1 : 0; } +/* + * When rsp/err/id is passed in, the function will be responsible for + * the cleanup. + */ +static void monitor_qmp_respond(Monitor *mon, QObject *rsp, +Error *err, QObject *id) +{ +QDict *qdict = NULL; + +if (err) { +qdict = qdict_new(); +qdict_put_obj(qdict, "error", qmp_build_error_object(err)); +error_free(err); +rsp = QOBJECT(qdict); +} + +if (rsp) { +if (id) { +/* This is for the qdict below. */ +qobject_incref(id); +qdict_put_obj(qobject_to_qdict(rsp), "id", id); +} + +monitor_json_emitter(mon, rsp); +} + +qobject_decref(id); +qobject_decref(rsp); +} + static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, void *opaque) { @@ -3927,24 +3957,8 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, } err_out: -if (err) { -qdict = qdict_new(); -qdict_put_obj(qdict, "error", qmp_build_error_object(err)); -error_free(err); -rsp = QOBJECT(qdict); -} +monitor_qmp_respond(mon, rsp, err, id); -if (rsp) { -if (id) { -qdict_put_obj(qobject_to_qdict(rsp), "id", id); -id = NULL; -} - -monitor_json_emitter(mon, rsp); -} - -qobject_decref(id); -qobject_decref(rsp); qobject_decref(req); } -- 2.13.6
[Qemu-devel] [RFC v4 21/27] qmp: let migrate-incoming allow out-of-band
So it can get rid of being run on main thread. Signed-off-by: Peter Xu --- qapi/migration.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/qapi/migration.json b/qapi/migration.json index bbc4671ded..95098072dd 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1063,7 +1063,8 @@ # <- { "return": {} } # ## -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } } +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' }, + 'allow-oob': true } ## # @xen-save-devices-state: -- 2.13.6
[Qemu-devel] [RFC v4 15/27] monitor: let monitor_{suspend|resume} thread safe
Monitor code now can be run in more than one thread. Let the suspend and resume code be thread safe. Reviewed-by: Fam Zheng Signed-off-by: Peter Xu --- monitor.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/monitor.c b/monitor.c index ec03f1b232..30f9cd80de 100644 --- a/monitor.c +++ b/monitor.c @@ -4003,7 +4003,7 @@ int monitor_suspend(Monitor *mon) { if (!mon->rs) return -ENOTTY; -mon->suspend_cnt++; +atomic_inc(&mon->suspend_cnt); return 0; } @@ -4011,8 +4011,9 @@ void monitor_resume(Monitor *mon) { if (!mon->rs) return; -if (--mon->suspend_cnt == 0) +if (atomic_dec_fetch(&mon->suspend_cnt) == 0) { readline_show_prompt(mon->rs); +} } static QObject *get_qmp_greeting(void) -- 2.13.6
[Qemu-devel] [RFC v4 19/27] qapi: introduce new cmd option "allow-oob"
Here "oob" stands for "Out-Of-Band". When "allow-oob" is set, it means the command allows out-of-band execution. The "oob" idea is proposed by Markus Armbruster in following thread: https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg02057.html This new "allow-oob" boolean will be exposed by "query-qmp-schema" as well for command entries, so that QMP clients can know which command can be used as out-of-band calls. For example the command "migrate" originally looks like: {"name": "migrate", "ret-type": "17", "meta-type": "command", "arg-type": "86"} And it'll be changed into: {"name": "migrate", "ret-type": "17", "allow-oob": false, "meta-type": "command", "arg-type": "86"} This patch only provides the QMP interface level changes. It does not contains the real out-of-band execution implementation yet. Suggested-by: Markus Armbruster Signed-off-by: Peter Xu --- include/qapi/qmp/dispatch.h| 1 + qapi/introspect.json | 6 +- scripts/qapi-commands.py | 19 ++- scripts/qapi-introspect.py | 10 -- scripts/qapi.py| 15 ++- scripts/qapi2texi.py | 2 +- tests/qapi-schema/test-qapi.py | 2 +- 7 files changed, 40 insertions(+), 15 deletions(-) diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h index 20578dcd48..b76798800c 100644 --- a/include/qapi/qmp/dispatch.h +++ b/include/qapi/qmp/dispatch.h @@ -23,6 +23,7 @@ typedef enum QmpCommandOptions { QCO_NO_OPTIONS = 0x0, QCO_NO_SUCCESS_RESP = 0x1, +QCO_ALLOW_OOB = 0x2, } QmpCommandOptions; typedef struct QmpCommand diff --git a/qapi/introspect.json b/qapi/introspect.json index 5b3e6e9d78..57cc137627 100644 --- a/qapi/introspect.json +++ b/qapi/introspect.json @@ -259,12 +259,16 @@ # # @ret-type: the name of the command's result type. # +# @allow-oob: whether the command allows out-of-band execution. +# (Since: 2.11) +# # TODO: @success-response (currently irrelevant, because it's QGA, not QMP) # # Since: 2.5 ## { 'struct': 'SchemaInfoCommand', - 'data': { 'arg-type': 'str', 'ret-type': 'str' } } + 'data': { 'arg-type': 'str', 'ret-type': 'str', +'allow-oob': 'bool' } } ## # @SchemaInfoEvent: diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py index 974d0a4a80..b2b0bc0510 100644 --- a/scripts/qapi-commands.py +++ b/scripts/qapi-commands.py @@ -192,10 +192,18 @@ out: return ret -def gen_register_command(name, success_response): -options = 'QCO_NO_OPTIONS' +def gen_register_command(name, success_response, allow_oob): +options = [] + if not success_response: -options = 'QCO_NO_SUCCESS_RESP' +options += ['QCO_NO_SUCCESS_RESP'] +if allow_oob: +options += ['QCO_ALLOW_OOB'] + +if not options: +options = ['QCO_NO_OPTIONS'] + +options = " | ".join(options) ret = mcgen(''' qmp_register_command(cmds, "%(name)s", @@ -241,7 +249,7 @@ class QAPISchemaGenCommandVisitor(QAPISchemaVisitor): self._visited_ret_types = None def visit_command(self, name, info, arg_type, ret_type, - gen, success_response, boxed): + gen, success_response, boxed, allow_oob): if not gen: return self.decl += gen_command_decl(name, arg_type, boxed, ret_type) @@ -250,7 +258,8 @@ class QAPISchemaGenCommandVisitor(QAPISchemaVisitor): self.defn += gen_marshal_output(ret_type) self.decl += gen_marshal_decl(name) self.defn += gen_marshal(name, arg_type, boxed, ret_type) -self._regy += gen_register_command(name, success_response) +self._regy += gen_register_command(name, success_response, + allow_oob) (input_file, output_dir, do_c, do_h, prefix, opts) = parse_command_line() diff --git a/scripts/qapi-introspect.py b/scripts/qapi-introspect.py index 032bcea491..9fbf88b644 100644 --- a/scripts/qapi-introspect.py +++ b/scripts/qapi-introspect.py @@ -28,6 +28,11 @@ def to_json(obj, level=0): to_json(obj[key], level + 1)) for key in sorted(obj.keys())] ret = '{' + ', '.join(elts) + '}' +elif isinstance(obj, bool): +if obj: +ret = 'true' +else: +ret = 'false' else: assert False# not implemented if level == 1: @@ -154,12 +159,13 @@ const char %(c_name)s[] = %(c_string)s; for m in variants.variants]}) def visit_command(self, name, info, arg_type, ret_type, - gen, success_response, boxed): + gen, success_response, boxed, allow_oob): arg_type = arg_type or self._schema.the_empty_object_type ret_type = ret_type or self._schema.the_empty_object_type self._gen_json(name, 'command', {'arg-type': self._u
[Qemu-devel] [RFC v4 11/27] qmp: introduce QMPCapability
There was no QMP capabilities defined. Define the first "oob" as capability to allow out-of-band messages. Also, touch up qmp-test.c to test the new bits. Signed-off-by: Peter Xu --- monitor.c| 15 +-- qapi-schema.json | 13 + tests/qmp-test.c | 10 +- 3 files changed, 35 insertions(+), 3 deletions(-) diff --git a/monitor.c b/monitor.c index 685938b9f2..468303b472 100644 --- a/monitor.c +++ b/monitor.c @@ -3944,12 +3944,23 @@ void monitor_resume(Monitor *mon) static QObject *get_qmp_greeting(void) { +QDict *result = qdict_new(), *qmp = qdict_new(); +QList *cap_list = qlist_new(); QObject *ver = NULL; +QMPCapability cap; + +qdict_put(result, "QMP", qmp); qmp_marshal_query_version(NULL, &ver, NULL); +qdict_put_obj(qmp, "version", ver); + +for (cap = 0; cap < QMP_CAPABILITY__MAX; cap++) { +qlist_append(cap_list, qstring_from_str( + QMPCapability_str(cap))); +} +qdict_put(qmp, "capabilities", cap_list); -return qobject_from_jsonf("{'QMP': {'version': %p, 'capabilities': []}}", - ver); +return QOBJECT(result); } static void monitor_qmp_event(void *opaque, int event) diff --git a/qapi-schema.json b/qapi-schema.json index 18457954a8..03201578b4 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -119,6 +119,19 @@ { 'command': 'qmp_capabilities' } ## +# @QMPCapability: +# +# QMP supported capabilities to be broadcasted to the clients. +# +# @oob: QMP ability to support Out-Of-Band requests. +# +# Since: 2.12 +# +## +{ 'enum': 'QMPCapability', + 'data': [ 'oob' ] } + +## # @VersionTriple: # # A three-part version number. diff --git a/tests/qmp-test.c b/tests/qmp-test.c index c5a5c10b41..292c5f135a 100644 --- a/tests/qmp-test.c +++ b/tests/qmp-test.c @@ -17,6 +17,7 @@ #include "qapi/qobject-input-visitor.h" #include "qapi/util.h" #include "qapi/visitor.h" +#include "qapi/qmp/qstring.h" const char common_args[] = "-nodefaults -machine none"; @@ -75,6 +76,8 @@ static void test_qmp_protocol(void) { QDict *resp, *q, *ret; QList *capabilities; +const QListEntry *entry; +QString *qstr; global_qtest = qtest_init_without_qmp_handshake(common_args); @@ -84,7 +87,12 @@ static void test_qmp_protocol(void) g_assert(q); test_version(qdict_get(q, "version")); capabilities = qdict_get_qlist(q, "capabilities"); -g_assert(capabilities && qlist_empty(capabilities)); +g_assert(capabilities); +entry = qlist_first(capabilities); +g_assert(entry); +qstr = qobject_to_qstring(entry->value); +g_assert(qstr); +g_assert_cmpstr(qstring_get_str(qstr), ==, "oob"); QDECREF(resp); /* Test valid command before handshake */ -- 2.13.6
[Qemu-devel] [RFC v4 17/27] qmp: add new event "request-dropped"
This event will be emitted if one QMP request is dropped. Along, declare an enum for the reasons. Signed-off-by: Peter Xu --- qapi-schema.json | 34 ++ 1 file changed, 34 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index 531fd4c0db..9d2625b6b3 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3222,3 +3222,37 @@ # Since: 2.11 ## { 'command': 'watchdog-set-action', 'data' : {'action': 'WatchdogAction'} } + +## +# @RequestDropReason: +# +# Reasons that caused one command to be dropped. +# +# @queue-full: the queue of request is full. +# +# Since: 2.12 +## +{ 'enum': 'RequestDropReason', + 'data': [ 'queue-full' ] } + +## +# @REQUEST_DROPPED: +# +# Emitted when one QMP request is dropped due to some reason. +# REQUEST_DROPPED is only emitted when the oob capability is enabled. +# +# @id: The dropped command's "id" field. +# +# @reason: The reason why the request is dropped. +# +# Since: 2.12 +# +# Example: +# +# { "event": "REQUEST_DROPPED", +# "data": {"result": {"id": "libvirt-102", +# "reason": "queue-full" } } } +# +## +{ 'event': 'REQUEST_DROPPED' , + 'data': { 'id': 'any', 'reason': 'RequestDropReason' } } -- 2.13.6
[Qemu-devel] [RFC v4 23/27] monitor: enable IO thread for (qmp & !mux) typed
Start to use dedicate IO thread for QMP monitors that are not using MUXed chardev. Signed-off-by: Peter Xu --- monitor.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index b18d9d696c..5e0b8ed9c5 100644 --- a/monitor.c +++ b/monitor.c @@ -36,6 +36,7 @@ #include "net/net.h" #include "net/slirp.h" #include "chardev/char-fe.h" +#include "chardev/char-mux.h" #include "ui/qemu-spice.h" #include "sysemu/numa.h" #include "monitor/monitor.h" @@ -4456,7 +4457,7 @@ void monitor_init(Chardev *chr, int flags) Monitor *mon = g_malloc(sizeof(*mon)); GMainContext *context; -monitor_data_init(mon, false, false); +monitor_data_init(mon, false, !CHARDEV_IS_MUX(chr)); qemu_chr_fe_init(&mon->chr, chr, &error_abort); mon->flags = flags; -- 2.13.6
Re: [Qemu-devel] [Libguestfs] [qemu-img] support for XVA
On Thu, 16 Nov 2017 13:56:16 +0100 Tomáš Golembiovský wrote: > On Wed, 15 Nov 2017 21:41:20 +0100 > Gandalf Corvotempesta wrote: > > > 2017-11-15 21:29 GMT+01:00 Richard W.M. Jones : > > > Gandalf, is there an XVA file publically available (pref. not > > > too big) that we can look at? > > > > I can try to provide one, but it's simple: > > > > # tar tvf 20160630_124823_aa72_.xva.gz | head -n 50 > > -- 0/0 42353 1970-01-01 01:00 ova.xml > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/ > > -- 0/0 40 1970-01-01 01:00 Ref:175/.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0001 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0001.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0003 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0003.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0004 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0004.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0005 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0005.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0006 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0006.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0007 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0007.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0009 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0009.checksum > > -- 0/0 1048576 1970-01-01 01:00 Ref:175/0010 > > -- 0/0 40 1970-01-01 01:00 Ref:175/0010.checksum > > > > > > You can ignore the ova.xml and just use the "Ref:175" directory. > > Inside the XVA you'll fine one "Ref" directory for each virtual disk > > (ref number is different for each disk) > > Inside each Ref directory, you'll get tons of 1MB file, corrisponding > > to the RAW image. > > You have to merge these files in a single raw file with just an > > exception: numbers are not sequential. > > as you can see above, we have: , 0001, 0003 > > > > The 0002 is missing, because it's totally blank. XenServer doesn't > > export any empty block, thus it will skip the corrisponding 1MB file. > > When building the raw image, you have to replace empty blocks with 1MB > > full of zeros. > > > > This is (in addition to the tar extract) the most time-consuming part. > > Currently I'm rebuilding a 250GB image, with tons of files to be > > merge. > > If qemu-img can be patched to automatically convert this kind of > > format, I can save about 3 hours (30 minutes for extracting the > > tarball, and about 2 hours to merge 250-300GB image) > > > > Hi, > > I like what Richard and Max came up with. Pretty nifty solutions. > While there's nothing wrong with them I decided to take my own shot at > it. Since the blocks in tar file are pieces of raw image there is no > conversion happening. All in all it's just moving bytes from one place > to another. That means there shouldn't be a need for any heavy > machinery, right? :) > > Attached is a shell script that uses dd to do the byte-shuffling. > I'm pretty sure this could even be safely parallelized by running > multiple instances of dd at the same time (e.g. with xargs). But I did > not try that. Oh yes, and one more thing: as with the other solutions I didn't bother reading the XML for the target size. So resize may be necessary afterwards. Tomas -- Tomáš Golembiovský
[Qemu-devel] [RFC v4 16/27] monitor: separate QMP parser and dispatcher
Originally QMP goes throw these steps: JSON Parser --> QMP Dispatcher --> Respond /|\(2)(3) | (1) | \|/ (4) +- main thread + This patch does this: JSON Parser QMP Dispatcher --> Respond /|\ | /|\ (4) | | | (2)| (3)| (5) (1) | +-> | \|/ +- main thread <---+ So the parsing job and the dispatching job is isolated now. It gives us a chance in following up patches to totally move the parser outside. The isloation is done using one QEMUBH. Only one dispatcher QEMUBH is used for all the monitors. Signed-off-by: Peter Xu --- monitor.c | 179 ++ 1 file changed, 157 insertions(+), 22 deletions(-) diff --git a/monitor.c b/monitor.c index 30f9cd80de..0e2ff114cc 100644 --- a/monitor.c +++ b/monitor.c @@ -168,6 +168,10 @@ typedef struct { */ QmpCommandList *commands; bool qmp_caps[QMP_CAPABILITY__MAX]; +/* Protects qmp request/response queue. */ +QemuMutex qmp_queue_lock; +/* Input queue that holds all the parsed QMP requests */ +GQueue *qmp_requests; } MonitorQMP; /* @@ -213,6 +217,8 @@ struct Monitor { struct MonitorGlobal { IOThread *mon_iothread; +/* Bottom half to dispatch the requests received from IO thread */ +QEMUBH *qmp_dispatcher_bh; }; static struct MonitorGlobal mon_global; @@ -582,11 +588,13 @@ static void monitor_data_init(Monitor *mon, bool skip_flush, { memset(mon, 0, sizeof(Monitor)); qemu_mutex_init(&mon->out_lock); +qemu_mutex_init(&mon->qmp.qmp_queue_lock); mon->outbuf = qstring_new(); /* Use *mon_cmds by default. */ mon->cmd_table = mon_cmds; mon->skip_flush = skip_flush; mon->use_io_thr = use_io_thr; +mon->qmp.qmp_requests = g_queue_new(); } static void monitor_data_destroy(Monitor *mon) @@ -599,6 +607,8 @@ static void monitor_data_destroy(Monitor *mon) g_free(mon->rs); QDECREF(mon->outbuf); qemu_mutex_destroy(&mon->out_lock); +qemu_mutex_destroy(&mon->qmp.qmp_queue_lock); +g_queue_free(mon->qmp.qmp_requests); } char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index, @@ -3907,29 +3917,31 @@ static void monitor_qmp_respond(Monitor *mon, QObject *rsp, qobject_decref(rsp); } -static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, - void *opaque) +struct QMPRequest { +/* Owner of the request */ +Monitor *mon; +/* "id" field of the request */ +QObject *id; +/* Request object to be handled */ +QObject *req; +}; +typedef struct QMPRequest QMPRequest; + +/* + * Dispatch one single QMP request. The function will free the req_obj + * and objects inside it before return. + */ +static void monitor_qmp_dispatch_one(QMPRequest *req_obj) { -QObject *req, *rsp = NULL, *id = NULL; +Monitor *mon, *old_mon; +QObject *req, *rsp = NULL, *id; QDict *qdict = NULL; -Monitor *mon = opaque, *old_mon; -Error *err = NULL; -req = json_parser_parse_err(tokens, NULL, &err); -if (!req && !err) { -/* json_parser_parse_err() sucks: can fail without setting @err */ -error_setg(&err, QERR_JSON_PARSING); -} -if (err) { -goto err_out; -} +req = req_obj->req; +mon = req_obj->mon; +id = req_obj->id; -qdict = qobject_to_qdict(req); -if (qdict) { -id = qdict_get(qdict, "id"); -qobject_incref(id); -qdict_del(qdict, "id"); -} /* else will fail qmp_dispatch() */ +g_free(req_obj); if (trace_event_get_state_backends(TRACE_HANDLE_QMP_COMMAND)) { QString *req_json = qobject_to_json(req); @@ -3940,7 +3952,7 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, old_mon = cur_mon; cur_mon = mon; -rsp = qmp_dispatch(cur_mon->qmp.commands, req); +rsp = qmp_dispatch(mon->qmp.commands, req); cur_mon = old_mon; @@ -3956,12 +3968,122 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, } } -err_out: -monitor_qmp_respond(mon, rsp, err, id); +/* Respond if necessary */ +monitor_qmp_respond(mon, rsp, NULL, id); + +/* This pairs with the monitor_suspend() in handle_qmp_command(). */ +if (!qmp_oob_enabled(mon)) { +monitor_resume(mon); +} qobject_decref(req); } +/* + * Pop one QMP request from monitor queues, return NULL if not found. + * We are using round-robin fasion to pop the request, to avoid + * processing command only on a very busy monitor. To achieve that, + * when we processed one request on specific monitor, we put that + * monitor to the end of mon_list queue. + */ +static QMPRequest *monitor_qmp_requests_pop_one(void) +{ +QMPRequest *req_obj = NULL; +Monitor *mon; + +
[Qemu-devel] [RFC v4 25/27] docs: update QMP documents for OOB commands
Update both the developer and spec for the new QMP OOB (Out-Of-Band) command. Signed-off-by: Peter Xu --- docs/devel/qapi-code-gen.txt | 51 +++- docs/interop/qmp-spec.txt| 35 +- 2 files changed, 76 insertions(+), 10 deletions(-) diff --git a/docs/devel/qapi-code-gen.txt b/docs/devel/qapi-code-gen.txt index f04c63fe82..8597fdb087 100644 --- a/docs/devel/qapi-code-gen.txt +++ b/docs/devel/qapi-code-gen.txt @@ -556,7 +556,8 @@ following example objects: Usage: { 'command': STRING, '*data': COMPLEX-TYPE-NAME-OR-DICT, '*returns': TYPE-NAME, '*boxed': true, - '*gen': false, '*success-response': false } + '*gen': false, '*success-response': false, + '*allow-oob': false } Commands are defined by using a dictionary containing several members, where three members are most common. The 'command' member is a @@ -636,6 +637,44 @@ possible, the command expression should include the optional key 'success-response' with boolean value false. So far, only QGA makes use of this member. +Most of the QMP commands are handled sequentially in such a order: +Firstly, the JSON Parser parses the command request into some internal +message, delivers the message to QMP dispatchers. Secondly, the QMP +dispatchers will handle the commands one by one in time order, respond +when necessary. For some commands that always complete "quickly" can +instead be executed directly during parsing, at the QMP client's +request. This kind of commands that allow direct execution is called +"out-of-band" ("oob" as shortcut) commands. The response can overtake +prior in-band commands' responses. By default, commands are always +in-band. We need to explicitly specify "allow-oob" to "True" to show +that one command can be run out-of-band. + +One thing to mention for developers is that, although out-of-band +execution of commands benefit from quick and asynchronous execution, +it need to satisfy at least the following: + +(1) It is extremely quick and never blocks, so that its execution will +not block parsing routine of any other monitors. + +(2) It does not need BQL, since the parser can be run without BQL, +while the dispatcher is always with BQL held. + +If not, the command is not suitable to be allowed to run out-of-band, +and it should set its "allow-oob" to "False". Whether a command is +allowed to run out-of-band can also be introspected using +query-qmp-schema command. Please see the section "Client JSON +Protocol introspection" for more information. + +To execute a command in out-of-band way, we need to specify the +"control" field in the request, with "run-oob" set to true. Example: + + => { "execute": "command-support-oob", + "arguments": { ... }, + "control": { "run-oob": true } } + <= { "return": { } } + +Without it, even the commands that supports out-of-band execution will +still be run in-band. === Events === @@ -739,10 +778,12 @@ references by name. QAPI schema definitions not reachable that way are omitted. The SchemaInfo for a command has meta-type "command", and variant -members "arg-type" and "ret-type". On the wire, the "arguments" -member of a client's "execute" command must conform to the object type -named by "arg-type". The "return" member that the server passes in a -success response conforms to the type named by "ret-type". +members "arg-type", "ret-type" and "allow-oob". On the wire, the +"arguments" member of a client's "execute" command must conform to the +object type named by "arg-type". The "return" member that the server +passes in a success response conforms to the type named by +"ret-type". When "allow-oob" is set, it means the command supports +out-of-band execution. If the command takes no arguments, "arg-type" names an object type without members. Likewise, if the command returns nothing, "ret-type" diff --git a/docs/interop/qmp-spec.txt b/docs/interop/qmp-spec.txt index f8b5356015..c5c02589a9 100644 --- a/docs/interop/qmp-spec.txt +++ b/docs/interop/qmp-spec.txt @@ -78,21 +78,32 @@ The greeting message format is: - The "capabilities" member specify the availability of features beyond the baseline specification; the order of elements in this array has no particular significance, so a client must search the entire array - when looking for a particular capability + when looking for a particular capability. 2.2.1 Capabilities -- -As of the date this document was last revised, no server or client -capability strings have been defined. +Currently supported capabilities are: +- "oob": it means the QMP server supports "Out-Of-Band" command + execution. For more detail, please see "run-oob" parameter in + "Issuing Commands" section below. Not all commands allow this "oob" + execution. One can know whether one command supports "oob" by + "query-qmp-schema" command. + +QMP clients can get a list of supported QMP capabili
Re: [Qemu-devel] [PATCH for-2.11] throttle-groups: forget timer and schedule next TGM on detach
On Thu 16 Nov 2017 12:21:50 PM CET, Stefan Hajnoczi wrote: > tg->any_timer_armed[] must be cleared when detaching pending timers from > the AioContext. Failure to do so leads to hung I/O because it looks > like there are still timers pending when in fact they have been removed. > > Other ThrottleGroupMembers might have requests pending too so it's > necessary to schedule the next TGM so it can set a timer. > > This patch fixes hung I/O when QEMU is launched with drives that are in > the same throttling group: > > (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct bs=512 & > (guest)$ dd if=/dev/zero of=/dev/vdc oflag=direct bs=512 & > (qemu) stop > (qemu) cont > ...I/O is stuck... > > Signed-off-by: Stefan Hajnoczi Reviewed-by: Alberto Garcia Berto
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v17 6/6] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
On 11/16/2017 04:32 AM, Michael S. Tsirkin wrote: On Fri, Nov 03, 2017 at 04:13:06PM +0800, Wei Wang wrote: Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the support of reporting hints of guest free pages to the host via virtio-balloon. The host requests the guest to report the free pages by sending commands via the virtio-balloon configuration registers. When the guest starts to report, the first element added to the free page vq is a sequence id of the start reporting command. The id is given by the host, and it indicates whether the following free pages correspond to the command. For example, the host may stop the report and start again with a new command id. The obsolete pages for the previous start command can be detected by the id dismatching on the host. The id is added to the vq using an output buffer, and the free pages are added to the vq using input buffer. Here are some explainations about the added configuration registers: - host2guest_cmd: a register used by the host to send commands to the guest. - guest2host_cmd: written by the guest to ACK to the host about the commands that have been received. The host will clear the corresponding bits on the host2guest_cmd register. The guest also uses this register to send commands to the host (e.g. when finish free page reporting). - free_page_cmd_id: the sequence id of the free page report command given by the host. Signed-off-by: Wei Wang Signed-off-by: Liang Li Cc: Michael S. Tsirkin Cc: Michal Hocko --- + +static void report_free_page(struct work_struct *work) +{ + struct virtio_balloon *vb; + + vb = container_of(work, struct virtio_balloon, report_free_page_work); + report_free_page_cmd_id(vb); + walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages); + /* +* The last few free page blocks that were added may not reach the +* batch size, but need a kick to notify the device to handle them. +*/ + virtqueue_kick(vb->free_page_vq); + report_free_page_end(vb); +} + I think there's an issue here: if pages are poisoned and hypervisor subsequently drops them, testing them after allocation will trigger a false positive. The specific configuration: PAGE_POISONING on PAGE_POISONING_NO_SANITY off PAGE_POISONING_ZERO off Solutions: 1. disable the feature in that configuration suggested as an initial step Thanks for the finding. Similar to this option: I'm thinking could we make walk_free_mem_block() simply return if that option is on? That is, at the beginning of the function: if (!page_poisoning_enabled()) return; I think in most usages, people would not choose to use the poisoning option due to the added overhead. Probably we could make it a separate fix patch of this report following patch 5 to explain the above reasons in the commit. 2. pass poison value to host so it can validate page content before it drops it 3. pass poison value to host so it can init allocated pages with that value In fact one nice side effect would be that unmap becomes safe even though free list is not locked anymore. I haven't got this point yet, how would it bring performance benefit? It would be interesting to see whether this last has any value performance-wise. Best, Wei
[Qemu-devel] [RFC v4 26/27] tests: qmp-test: verify command batching
OOB introduced DROP event for flow control. This should not affect old QMP clients. Add a command batching check to make sure of it. Signed-off-by: Peter Xu --- tests/qmp-test.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/tests/qmp-test.c b/tests/qmp-test.c index 292c5f135a..729ec59b0a 100644 --- a/tests/qmp-test.c +++ b/tests/qmp-test.c @@ -78,6 +78,7 @@ static void test_qmp_protocol(void) QList *capabilities; const QListEntry *entry; QString *qstr; +int i; global_qtest = qtest_init_without_qmp_handshake(common_args); @@ -135,6 +136,24 @@ static void test_qmp_protocol(void) g_assert_cmpint(qdict_get_int(resp, "id"), ==, 2); QDECREF(resp); +/* + * Test command batching. In current test OOB is not enabled, we + * should be able to run as many commands in batch as we like. + * Using 16 (>8, which is OOB queue length) to make sure OOB + * won't break existing clients. + */ +for (i = 0; i < 16; i++) { +qmp_async("{ 'execute': 'query-version' }"); +} +/* Verify the replies to make sure no command is dropped. */ +for (i = 0; i < 16; i++) { +resp = qmp_receive(); +/* It should never be dropped. Each of them should be a reply. */ +g_assert(qdict_haskey(resp, "return")); +g_assert(!qdict_haskey(resp, "event")); +QDECREF(resp); +} + qtest_end(); } -- 2.13.6
[Qemu-devel] [RFC v4 20/27] qmp: support out-of-band (oob) execution
Having "allow-oob" to true for a command does not mean that this command will always be run in out-of-band mode. The out-of-band quick path will only be executed if we specify the extra "run-oob" flag when sending the QMP request: { "execute": "command-that-allows-oob", "arguments": { ... }, "control": { "run-oob": true } } The "control" key is introduced to store this extra flag. "control" field is used to store arguments that are shared by all the commands, rather than command specific arguments. Let "run-oob" be the first. Signed-off-by: Peter Xu --- include/qapi/qmp/dispatch.h | 1 + monitor.c | 24 qapi/qmp-dispatch.c | 39 +++ trace-events| 2 ++ 4 files changed, 66 insertions(+) diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h index b76798800c..ee2b8ce576 100644 --- a/include/qapi/qmp/dispatch.h +++ b/include/qapi/qmp/dispatch.h @@ -49,6 +49,7 @@ bool qmp_command_is_enabled(const QmpCommand *cmd); const char *qmp_command_name(const QmpCommand *cmd); bool qmp_has_success_response(const QmpCommand *cmd); QObject *qmp_build_error_object(Error *err); +bool qmp_is_oob(const QObject *request); typedef void (*qmp_cmd_callback_fn)(QmpCommand *cmd, void *opaque); diff --git a/monitor.c b/monitor.c index 92e846e195..ccd350dca2 100644 --- a/monitor.c +++ b/monitor.c @@ -4025,6 +4025,7 @@ static void monitor_qmp_bh_dispatcher(void *data) if (!req_obj) { break; } +trace_monitor_qmp_cmd_in_band(qobject_get_try_str(req_obj->id)); monitor_qmp_dispatch_one(req_obj); } } @@ -4051,9 +4052,25 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, return; } +if (!qmp_oob_enabled(mon) && qmp_is_oob(req)) { +error_setg(&err, "Out-Of-Band is only allowed " + "when OOB is enabled."); +monitor_qmp_respond(mon, NULL, err, NULL); +qobject_decref(req); +return; +} + qdict = qobject_to_qdict(req); if (qdict) { id = qdict_get(qdict, "id"); +/* When OOB is enabled, the "id" field is mandatory. */ +if (qmp_oob_enabled(mon) && !id) { +error_setg(&err, "Out-Of-Band capability requires that " + "every command contains an 'id' field."); +monitor_qmp_respond(mon, NULL, err, NULL); +qobject_decref(req); +return; +} qobject_incref(id); qdict_del(qdict, "id"); } /* else will fail qmp_dispatch() */ @@ -4064,6 +4081,13 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens, req_obj->req = req; if (qmp_oob_enabled(mon)) { +if (qmp_is_oob(req)) { +/* Out-Of-Band (OOB) requests are executed directly in parser. */ + trace_monitor_qmp_cmd_out_of_band(qobject_get_try_str(req_obj->id)); +monitor_qmp_dispatch_one(req_obj); +return; +} + /* Drop the request if queue is full. */ if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) { qapi_event_send_request_dropped(id, diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c index b41fa174fe..ed7e5d5a75 100644 --- a/qapi/qmp-dispatch.c +++ b/qapi/qmp-dispatch.c @@ -52,6 +52,12 @@ static QDict *qmp_dispatch_check_obj(const QObject *request, Error **errp) "QMP input member 'arguments' must be an object"); return NULL; } +} else if (!strcmp(arg_name, "control")) { +if (qobject_type(arg_obj) != QTYPE_QDICT) { +error_setg(errp, + "QMP input member 'control' must be a dict"); +return NULL; +} } else { error_setg(errp, "QMP input member '%s' is unexpected", arg_name); @@ -94,6 +100,11 @@ static QObject *do_qmp_dispatch(QmpCommandList *cmds, QObject *request, return NULL; } +if (qmp_is_oob(request) && !(cmd->options & QCO_ALLOW_OOB)) { +error_setg(errp, "The command %s does not support OOB", command); +return NULL; +} + if (!qdict_haskey(dict, "arguments")) { args = qdict_new(); } else { @@ -122,6 +133,34 @@ QObject *qmp_build_error_object(Error *err) error_get_pretty(err)); } +/* + * Detect whether a request should be run out-of-band, by quickly + * peeking at whether we have: { "control": { "run-oob": True } }. By + * default commands are run in-band. + */ +bool qmp_is_oob(const QObject *request) +{ +QDict *dict; +QBool *bool_obj; + +dict = qobject_to_qdict(request); +if (!dict) { +return false; +} + +dict = qdict_get_qdict(dict, "control"); +if (!dict) { +return false; +} + +bool_obj = qobject_to_qb
Re: [Qemu-devel] [PATCH 5/5] iotest 030: add compressed block-stream test
On 15/11/2017 10:51 PM, Max Reitz wrote: On 2017-11-14 11:16, Anton Nefedov wrote: Signed-off-by: Anton Nefedov --- tests/qemu-iotests/030 | 69 +- tests/qemu-iotests/030.out | 4 +-- 2 files changed, 70 insertions(+), 3 deletions(-) diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030 index 1883894..52cbe5d 100755 --- a/tests/qemu-iotests/030 +++ b/tests/qemu-iotests/030 @@ -21,7 +21,7 @@ import time import os import iotests -from iotests import qemu_img, qemu_io +from iotests import qemu_img, qemu_img_pipe, qemu_io backing_img = os.path.join(iotests.test_dir, 'backing.img') mid_img = os.path.join(iotests.test_dir, 'mid.img') @@ -800,5 +800,72 @@ class TestSetSpeed(iotests.QMPTestCase): self.cancel_and_wait() +class TestCompressed(iotests.QMPTestCase): +supported_fmts = ['qcow2'] +cluster_size = 64 * 1024; +image_len = 1 * 1024 * 1024; + +def setUp(self): +qemu_img('create', backing_img, str(TestCompressed.image_len)) First, this is missing the '-f raw', if you want to keep it in line with the next command. +qemu_io('-f', 'raw', '-c', 'write -P 1 0 ' + str(TestCompressed.image_len), backing_img) But why would you want this to be raw? Copied this from another test in this file :) The source image format does not really matter. Will fix though. +qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img) + +# write '4' in every 4th cluster +step=4 I'd prefer spaces around operators. (Also in _pattern() and in the call to pattern ([1,4,2] should be [1, 4, 2]).) +for i in range(TestCompressed.image_len / TestCompressed.cluster_size / step + 1): As far as I know, range() has an actual step parameter, maybe that would be simpler -- and I don't know what the +1 is supposed to mean. How about "for ofs in range(0, TestCompressed.image_len, TestCompressed.cluster_size * step)"? Would that work? It does, thank you. +qemu_io('-c', 'write -P %d %d %d' % +(step, step * i * TestCompressed.cluster_size,> + TestCompressed.cluster_size), +test_img) + +self.vm = iotests.VM().add_drive(test_img) +self.vm.launch() + +def tearDown(self): +os.remove(test_img) +os.remove(backing_img) + +def _pattern(self, x, divs): +return divs[-1] if not x%divs[-1] else self._pattern(x, divs[:-1]) I have no idea what this function does. ...OK, having looked into how it's used, I understand better. A comment would certainly be helpful, though. Maybe also call it differently. It doesn't really return a pattern. What this function does is it returns the first integer from @divs (starting from the end, which is weird) which is a divider of @x. So maybe call it _first_divider(self, x, dividers), that would help already. Yeah, I really had to comment. Exactly, it starts from the end as I meant it to accept numbers in the order they were written. So 'first_divider' wouldn't be right. Not that important though, will rename and reorder the input. + +def test_compressed(self): +self.assert_no_active_block_jobs() + +result = self.vm.qmp('block-stream', device='drive0', compress=True) +if iotests.imgfmt not in TestCompressed.supported_fmts: +self.assert_qmp( +result, 'error/desc', +'Compression is not supported for this drive drive0') +return +self.assert_qmp(result, 'return', {}) + +# write '2' in every 2nd cluster +step = 2 +for i in range(TestCompressed.image_len / TestCompressed.cluster_size / step + 1): +result = self.vm.qmp('human-monitor-command', + command_line= + 'qemu-io drive0 \"write -P %d %d %d\"' % Just " instead of \" would be enough, I think. Done. + (step, step * i * TestCompressed.cluster_size, + TestCompressed.cluster_size)) +self.assert_qmp(result, 'return', "") + +self.wait_until_completed() +self.assert_no_active_block_jobs() + +self.vm.shutdown() + +for i in range(TestCompressed.image_len / TestCompressed.cluster_size): +outp = qemu_io('-c', 'read -P %d %d %d' % + (self._pattern(i, [1,4,2]), The 4 is useless, because everything that's divisible by 4 is divisible by 2 already. Also, I don't quite see what this should achieve exactly. You first write the backing image full of 1s, OK. Then you write 4s to every fourth cluster in the top image. Then you start streaming, and then you write 2s to ever second cluster in the top image, which overwrites all of the 4s you wrote. Do you maybe not want to write 2 into ev
[Qemu-devel] [RFC v4 22/27] qmp: isolate responses into io thread
For those monitors who has enabled IO thread, we'll offload the responding procedure into IO thread. The main reason is that chardev is not thread safe, and we need to do all the read/write IOs in the same thread. For use_io_thr=true monitors, that thread is the IO thread. We do this isolation in similar pattern as what we have done to the request queue: we first create one response queue for each monitor, then instead of reply directly in main thread, we queue the responses and kick the IO thread to do the rest of the job for us. A funny thing after doing this is that, when the QMP clients send "quit" to QEMU, it's possible that we close the IOThread even earlier than replying to that "quit". So another thing we need to do before cleaning up the monitors is that we need to flush the response queue (we don't need to do that for command queue; after all we are quitting) to make sure replies for handled commands are always flushed back to clients. Signed-off-by: Peter Xu --- monitor.c | 97 ++- 1 file changed, 96 insertions(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index ccd350dca2..b18d9d696c 100644 --- a/monitor.c +++ b/monitor.c @@ -172,6 +172,8 @@ typedef struct { QemuMutex qmp_queue_lock; /* Input queue that holds all the parsed QMP requests */ GQueue *qmp_requests; +/* Output queue contains all the QMP responses in order */ +GQueue *qmp_responses; } MonitorQMP; /* @@ -219,6 +221,8 @@ struct MonitorGlobal { IOThread *mon_iothread; /* Bottom half to dispatch the requests received from IO thread */ QEMUBH *qmp_dispatcher_bh; +/* Bottom half to deliver the responses back to clients */ +QEMUBH *qmp_respond_bh; }; static struct MonitorGlobal mon_global; @@ -395,7 +399,8 @@ int monitor_fprintf(FILE *stream, const char *fmt, ...) return 0; } -static void monitor_json_emitter(Monitor *mon, const QObject *data) +static void monitor_json_emitter_raw(Monitor *mon, + QObject *data) { QString *json; @@ -409,6 +414,71 @@ static void monitor_json_emitter(Monitor *mon, const QObject *data) QDECREF(json); } +static void monitor_json_emitter(Monitor *mon, QObject *data) +{ +if (mon->use_io_thr) { +/* + * If using IO thread, we need to queue the item so that IO + * thread will do the rest for us. Take refcount so that + * caller won't free the data (which will be finally freed in + * responder thread). + */ +qobject_incref(data); +qemu_mutex_lock(&mon->qmp.qmp_queue_lock); +g_queue_push_tail(mon->qmp.qmp_responses, (void *)data); +qemu_mutex_unlock(&mon->qmp.qmp_queue_lock); +qemu_bh_schedule(mon_global.qmp_respond_bh); +} else { +/* + * If not using monitor IO thread, then we are in main thread. + * Do the emission right away. + */ +monitor_json_emitter_raw(mon, data); +} +} + +struct QMPResponse { +Monitor *mon; +QObject *data; +}; +typedef struct QMPResponse QMPResponse; + +/* + * Return one QMPResponse. The response is only valid if + * response.data is not NULL. + */ +static QMPResponse monitor_qmp_response_pop_one(void) +{ +Monitor *mon; +QObject *data = NULL; + +qemu_mutex_lock(&monitor_lock); +QTAILQ_FOREACH(mon, &mon_list, entry) { +qemu_mutex_lock(&mon->qmp.qmp_queue_lock); +data = g_queue_pop_head(mon->qmp.qmp_responses); +qemu_mutex_unlock(&mon->qmp.qmp_queue_lock); +if (data) { +break; +} +} +qemu_mutex_unlock(&monitor_lock); +return (QMPResponse) { .mon = mon, .data = data }; +} + +static void monitor_qmp_bh_responder(void *opaque) +{ +QMPResponse response; + +while (true) { +response = monitor_qmp_response_pop_one(); +if (!response.data) { +break; +} +monitor_json_emitter_raw(response.mon, response.data); +qobject_decref(response.data); +} +} + static MonitorQAPIEventConf monitor_qapi_event_conf[QAPI_EVENT__MAX] = { /* Limit guest-triggerable events to 1 per second */ [QAPI_EVENT_RTC_CHANGE]= { 1000 * SCALE_MS }, @@ -595,6 +665,7 @@ static void monitor_data_init(Monitor *mon, bool skip_flush, mon->skip_flush = skip_flush; mon->use_io_thr = use_io_thr; mon->qmp.qmp_requests = g_queue_new(); +mon->qmp.qmp_responses = g_queue_new(); } static void monitor_data_destroy(Monitor *mon) @@ -609,6 +680,7 @@ static void monitor_data_destroy(Monitor *mon) qemu_mutex_destroy(&mon->out_lock); qemu_mutex_destroy(&mon->qmp.qmp_queue_lock); g_queue_free(mon->qmp.qmp_requests); +g_queue_free(mon->qmp.qmp_responses); } char *qmp_human_monitor_command(const char *command_line, bool has_cpu_index, @@ -4301,6 +4373,11 @@ static GMainContext *monitor_io_context_get(void) return iot
Re: [Qemu-devel] [PATCH v6 3/5] fw_cfg: do DMA read operation
Hi - Original Message - > On Mon, Nov 13, 2017 at 08:29:56PM +0100, Marc-André Lureau wrote: > > Modify fw_cfg_read_blob() to use DMA if the device supports it. > > Return errors, because the operation may fail. > > > > To avoid polling with unbound amount of time, the DMA operation is > > expected to complete within 200ms, or will return ETIME error. > > > > We may want to switch all the *buf addresses to use only kmalloc'ed > > buffers (instead of using stack/image addresses with dma=false). > > > > Signed-off-by: Marc-André Lureau > > Reviewed-by: Gabriel Somlo > > --- > > drivers/firmware/qemu_fw_cfg.c | 154 > > - > > 1 file changed, 137 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/firmware/qemu_fw_cfg.c > > b/drivers/firmware/qemu_fw_cfg.c > > index 1f3e8545dab7..2ac4cd869fe6 100644 > > --- a/drivers/firmware/qemu_fw_cfg.c > > +++ b/drivers/firmware/qemu_fw_cfg.c > > @@ -33,6 +33,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > MODULE_AUTHOR("Gabriel L. Somlo "); > > MODULE_DESCRIPTION("QEMU fw_cfg sysfs support"); > > @@ -43,12 +45,26 @@ MODULE_LICENSE("GPL"); > > #define FW_CFG_ID 0x01 > > #define FW_CFG_FILE_DIR 0x19 > > > > +#define FW_CFG_VERSION_DMA 0x02 > > +#define FW_CFG_DMA_CTL_ERROR 0x01 > > +#define FW_CFG_DMA_CTL_READ0x02 > > +#define FW_CFG_DMA_CTL_SKIP0x04 > > +#define FW_CFG_DMA_CTL_SELECT 0x08 > > +#define FW_CFG_DMA_CTL_WRITE 0x10 > > +#define FW_CFG_DMA_TIMEOUT 200 /* ms */ > > + > > /* size in bytes of fw_cfg signature */ > > #define FW_CFG_SIG_SIZE 4 > > > > /* fw_cfg "file name" is up to 56 characters (including terminating nul) > > */ > > #define FW_CFG_MAX_FILE_PATH 56 > > > > +/* platform device for dma mapping */ > > +static struct device *dev; > > + > > +/* fw_cfg revision attribute, in /sys/firmware/qemu_fw_cfg top-level dir. > > */ > > +static u32 fw_cfg_rev; > > + > > /* fw_cfg file directory entry type */ > > struct fw_cfg_file { > > u32 size; > > @@ -57,6 +73,12 @@ struct fw_cfg_file { > > char name[FW_CFG_MAX_FILE_PATH]; > > }; > > > > +struct fw_cfg_dma { > > + u32 control; > > + u32 length; > > + u64 address; > > +} __packed; > > + > > /* fw_cfg device i/o register addresses */ > > static bool fw_cfg_is_mmio; > > static phys_addr_t fw_cfg_p_base; > > @@ -75,12 +97,93 @@ static inline u16 fw_cfg_sel_endianness(u16 key) > > return fw_cfg_is_mmio ? cpu_to_be16(key) : cpu_to_le16(key); > > } > > > > +static inline bool fw_cfg_dma_enabled(void) > > +{ > > + return fw_cfg_rev & FW_CFG_VERSION_DMA && fw_cfg_reg_dma; > > +} > > + > > +static bool fw_cfg_wait_for_control(struct fw_cfg_dma *d, unsigned long > > timeout) > > +{ > > + ktime_t start; > > + ktime_t stop; > > + > > + start = ktime_get(); > > + stop = ktime_add(start, ms_to_ktime(timeout)); > > + > > + do { > > + if ((be32_to_cpu(d->control) & ~FW_CFG_DMA_CTL_ERROR) == 0) > > + return true; > > + > > + usleep_range(50, 100); > > BTW it's not nice that this is uninterruptible. I think we need a > variant of usleep_range that is interruptible and call that here. > Thoughts? > This usleep_range() pattern is pretty common apparently. (and it's probably better than calling yield() in a loop, like v1-3 did) > > > + } while (ktime_before(ktime_get(), stop)); > > + > > + return false; > > +} > > + > > +static ssize_t fw_cfg_dma_transfer(void *address, u32 length, u32 control) > > +{ > > + dma_addr_t dma_addr = 0; > > + struct fw_cfg_dma *d; > > + dma_addr_t dma; > > + ssize_t ret = length; > > + enum dma_data_direction dir = > > + (control & FW_CFG_DMA_CTL_READ ? DMA_FROM_DEVICE : 0); > > + > > + if (address && length) { > > + dma_addr = dma_map_single(dev, address, length, dir); > > + if (dma_mapping_error(NULL, dma_addr)) { > > + WARN(1, "%s: failed to map address\n", __func__); > > + return -EFAULT; > > + } > > + } > > + > > + d = kmalloc(sizeof(*d), GFP_KERNEL | GFP_DMA); > > + if (!d) { > > + ret = -ENOMEM; > > + goto end; > > + } > > + > > + dma = dma_map_single(dev, d, sizeof(*d), DMA_BIDIRECTIONAL); > > + if (dma_mapping_error(NULL, dma)) { > > + WARN(1, "%s: failed to map fw_cfg_dma\n", __func__); > > + ret = -EFAULT; > > + goto end; > > + } > > + > > + *d = (struct fw_cfg_dma) { > > + .address = cpu_to_be64(dma_addr), > > + .length = cpu_to_be32(length), > > + .control = cpu_to_be32(control) > > + }; > > + > > + iowrite32be((u64)dma >> 32, fw_cfg_reg_dma); > > + iowrite32be(dma, fw_cfg_reg_dma + 4); > > + > > + if (!fw_cfg_wait_for_control(d, FW_CFG_DMA_TIMEOUT)) { > > + WARN(1, "%s: timeout", __func__); > > + ret = -ETIME; > > + } else if (be32_to_cpu(d->cont