Re: [Qemu-devel] [RFC] pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations
On 06.02.15 03:54, David Gibson wrote: > On Thu, Feb 05, 2015 at 12:55:45PM +0100, Alexander Graf wrote: >> >> >> On 05.02.15 12:30, David Gibson wrote: >>> On Thu, Feb 05, 2015 at 11:22:13AM +0100, Alexander Graf wrote: > [snip] >> [snip] >> >>> +ret1 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_LOAD); >>> +if (ret1 != 0) { >>> +fprintf(stderr, "Warning: error enabling H_LOGICAL_CI_LOAD >>> in KVM:" >>> +" %s\n", strerror(errno)); >>> +} >>> + >>> +ret2 = kvmppc_enable_hcall(kvm_state, H_LOGICAL_CI_STORE); >>> +if (ret2 != 0) { >>> +fprintf(stderr, "Warning: error enabling >>> H_LOGICAL_CI_STORE in KVM:" >>> +" %s\n", strerror(errno)); >>> + } >>> + >>> +if ((ret1 != 0) || (ret2 != 0)) { >>> +fprintf(stderr, "Warning: Couldn't enable H_LOGICAL_CI_* >>> in KVM, SLOF" >>> +" may be unable to operate devices with in-kernel >>> emulation\n"); >>> +} >> >> You'll always get these warnings if you're running on an old (meaning >> current upstream) kernel, which could be annoying. > > True. > >> Is there any way >> to tell whether you have configured any devices which need the >> in-kernel MMIO emulation and only warn if you have? > > In theory, I guess so. In practice I can't see how you'd enumerate > all devices that might require kernel intervention without something > horribly invasive. We could WARN_ONCE in QEMU if we emulate such a hypercall, but its handler is io_mem_unassigned (or we add another minimum priority huge memory region on all 64bits of address space that reports the breakage). >>> >>> Would that work for the virtio+iothread case? I had the impression >>> the kernel handled notification region was layered over the qemu >>> emulated region in that case. >> >> IIRC we don't have a way to call back into kvm saying "please write to >> this in-kernel device". But we could at least defer the warning to a >> point where we know that we actually hit it. > > Right, but I'm saying we might miss the warning in cases where we want > it, because the KVM device is shadowed by a qemu device, so qemu won't > see the IO as unassigned or unhandled. > > In particular, I think that will happen in the case of virtio-blk with > iothread, which is the simplest case in which to observe the problem. > The virtio-blk device exists in qemu and is functional, but we rely on > KVM catching the queue notification MMIO before it reaches the qemu > implementation of the rest of the device's IO space. But in that case the VM stays functional and will merely see a performance hit when using virtio in SLOF, no? I don't think that's a problem worth worrying users about. >>> >>> Alas, no. The iothread stuff *relies* on the in-kernel notification, >>> so it will not work if the IO gets punted to qemu. This is the whole >>> reason for the in-kernel hcall implementation. >> >> So at least with vhost-net the in-kernel trapping is optional. If we >> happen to get MMIO into QEMU, we'll just handle it there. >> >> Enlighten me why the iothread stuff can't handle it that way too. > > So, as I understand it, it could, but it doesn't. Working out how to > fix it properly requires better understanding of the dataplane code > than I currently possess, > > So, using virtio-blk as the example case. Normally the queue notify > mmio will get routed by the general virtio code to > virtio_blk_handle_output(). > > In the case of dataplane, that just calls > virtio_blk_data_plane_start(). So the first time we get a vq notify, > the dataplane is started. That sets up the host notifier > (VirtioBusClass::set_host_notifier -> virtio_pci_set_host_notifier -> > virtio_pci_set_host_notifier_internal -> memory_region_add_eventfd() > -> memory_region_transaction_commit() -> > address_space_update_ioeventfds - >address_space_add_del_ioeventfds -> > kvm_mem_ioeventfd_add -> kvm_set_ioeventfd_mmio -> KVM_IOEVENTFD > ioctl) > > From this point on further calls to virtio_blk_handle_output() are > IIUC a "can't happen", because vq notifies should go to the eventfd > instead, where they will kick the iothread. > > So, with SLOF, the first request is ok - it hits > virtio_blk_handle_output() which starts the iothread which goes on to > process the request. > > On the second request, however, we get back into > virtio_blk_data_plane_start() which sees the iothread is already > running and aborts. I think it is assuming that this must be the > result of a race with another vcpu starting the dataplane, and so > assumes the racing threa
Re: [Qemu-devel] [PATCH v4] sheepdog: selectable object size support
(2015/02/06 11:18), Liu Yuan wrote: On Wed, Feb 04, 2015 at 01:54:19PM +0900, Teruaki Ishizaki wrote: (2015/02/02 15:52), Liu Yuan wrote: On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: Previously, qemu block driver of sheepdog used hard-coded VDI object size. This patch enables users to handle "block_size_shift" value for calculating VDI object size. When you start qemu, you don't need to specify additional command option. But when you create the VDI which doesn't have default object size with qemu-img command, you specify block_size_shift option. If you want to create a VDI of 8MB(1 << 23) object size, you need to specify following command option. # qemu-img create -o block_size_shift=23 sheepdog:test1 100M Is it possible to make this option more user friendly? such as $ qemu-img create -o object_size=8M sheepdog:test 1G At first, I thought that the object_size was user friendly. But, Sheepdog has already the value of block_size_shift in the inode layout that means like object_size. 'object_size' doesn't always fit right in 'block_size_shift'. On the other hands, 'block_size_shift' always fit right in 'object_size'. I think that existing layout shouldn't be changed easily and it seems that it is difficult for users to specify the object_size value that fit right in 'block_size_shift'. I don't think we need to change the layout. block_size_shift is what QEMU talks to sheep, and QEMU options is what users talks to QEMU. We can convert the user friendly object size into block_size_shift internally in the driver before sending it tosheep daemon, no? For example, users specify 12MB for object size, block_size_shift doesn't fit exactly to an integer. I suppose that normally an administrator do format sheepdog cluster with specifying block_size_shift and users usually do qemu-img command without a block_size_shift option. I think that the block_size_shift option of qemu-img command is used for a experimental purpose. Thanks, Teruaki Ishizaki
Re: [Qemu-devel] [PATCH v2 0/3] bootdevcie: change the boot order validation logic
On 2015/2/3 20:34, Gonglei (Arei) wrote: > From: Gonglei > > The reset logic can be done by both machine reset and > boot handler. So we shouldn't return error when the boot > handler callback don't be set in patch 1. > > Patch 2 check boot order argument validation > before vm running. > > Patch 3 passing &error_abort instead of NULL. > > v2 -> v1: > - add patch 2 suggested by Markus. > - rework patch 3. (Markus) > - add R-by in patch 1. > Markus, any thoughts? Thanks. Regards, -Gonglei > Gonglei (3): > bootdevice: remove the check about boot_set_handler > bootdevice: check boot order argument validation before vm running > bootdevice: add check in restore_boot_order() > > bootdevice.c | 12 > vl.c | 13 +++-- > 2 files changed, 15 insertions(+), 10 deletions(-) >
Re: [Qemu-devel] Intel X86 hardware transactional memory
On 5 February 2015 at 05:09, Patrick Williams III wrote: > Consider two virtual addresses that point to the same physical address. One > thread uses the first virtual address in a transaction; another thread > writes to the second virtual address while the transaction is going on. > This should cause the transaction to fail because the same physical address > was involved. Incidentally comparison against physical addresses, not virtual addresses, is also the correct semantics for ARM load/store exclusive instructions. At the moment we implement those with a mechanism that is completely architecturally wrong but happens to work for the stereotypical uses of those insns. If we ever wanted to do it right then we'd need some way of catching "some other CPU did a write to the physaddr we have an exclusive lock for". [Somebody pointed this out to me on IRC last night.] -- PMM
Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS
Am 03.02.2015 um 16:33 schrieb Denis V. Lunev: > On 03/02/15 17:30, Peter Lieven wrote: >> Am 03.02.2015 um 14:29 schrieb Denis V. Lunev: >>> On 03/02/15 15:12, Peter Lieven wrote: we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven --- block.c | 19 --- hw/block/virtio-blk.c | 4 ++-- include/block/block.h | 3 +++ 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 8272ef9..4e58b35 100644 --- a/block.c +++ b/block.c @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors <= 0) { return 0; } -if (nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n); if (ret < 0) { error_report("error getting block status at sector %" PRId64 ": %s", @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs->bl.max_write_zeroes ? - bs->bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes, + BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0 && !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (INT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs->bl.max_discard ? bs->bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs), &multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors > INT_MAX) { +if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
Re: [Qemu-devel] [PATCH v2 1/3] bootdevice: remove the check about boot_set_handler
writes: > From: Gonglei > > The reset logic can be done by both machine reset and > boot handler. So we shouldn't return error when the boot > handler callback don't be set. > > Signed-off-by: Gonglei > Reviewed-by: Alexander Graf > --- > bootdevice.c | 10 +++--- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/bootdevice.c b/bootdevice.c > index 5914417..52d3f9e 100644 > --- a/bootdevice.c > +++ b/bootdevice.c > @@ -51,19 +51,15 @@ void qemu_boot_set(const char *boot_order, Error **errp) > { > Error *local_err = NULL; > > -if (!boot_set_handler) { > -error_setg(errp, "no function defined to set boot device list for" > - " this architecture"); > -return; > -} > - > validate_bootdevices(boot_order, &local_err); > if (local_err) { > error_propagate(errp, local_err); > return; > } > > -boot_set_handler(boot_set_opaque, boot_order, errp); > +if (boot_set_handler) { > +boot_set_handler(boot_set_opaque, boot_order, errp); > +} > } > > void validate_bootdevices(const char *devices, Error **errp) Two callers: * HMP command boot_set Before your patch: command fails when the target doesn't support changing the boot order. After your patch: command silently does nothing. I'm afraid that's a regression. Aside: looks like there's no QMP command. * restore_boot_order() No change yet, because restore_boot_order() ignores errors. But PATCH 3 will make it abort on error. I guess that's why you make the change here. To avoid the regression, you could drop PATCH 1, and change PATCH 3 to something like -qemu_boot_set(normal_boot_order, NULL); +if (boot_set_handler) { +qemu_boot_set(normal_boot_order, &error_abort); +} There are other ways, but this looks like the simplest one.
Re: [Qemu-devel] [PATCH v2 2/3] bootdevice: check boot order argument validation before vm running
writes: > From: Gonglei > > Either 'once' option or 'order' option can take effect for -boot at > the same time, that is say initial startup processing can check only > one. And pc.c's set_boot_dev() fails when its boot order argument > is invalid. This patch provide a solution fix this problem: > > 1. If "once" is given, register reset handler to restore boot order. > > 2. Pass the normal boot order to machine creation. Should fail when >the normal boot order is invalid. > > 3. If "once" is given, set it with qemu_boot_set(). Fails when the >once boot order is invalid. > > 4. Start the machine. > > 5. On reset, the reset handler calls qemu_boot_set() to restore boot >order. Should never fail. > > Suggested-by: Markus Armbruster > Signed-off-by: Gonglei > --- > vl.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/vl.c b/vl.c > index 983259b..0d90d98 100644 > --- a/vl.c > +++ b/vl.c > @@ -2734,6 +2734,7 @@ int main(int argc, char **argv, char **envp) > const char *initrd_filename; > const char *kernel_filename, *kernel_cmdline; > const char *boot_order; > +const char *once = NULL; Consider renaming once to boot_once. In its much larger scope, the boot context isn't obvious anymore, so a more telling name would be in order. > DisplayState *ds; > int cyls, heads, secs, translation; > QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL; > @@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp) > opts = qemu_opts_find(qemu_find_opts("boot-opts"), NULL); > if (opts) { > char *normal_boot_order; > -const char *order, *once; > +const char *order; > Error *local_err = NULL; > > order = qemu_opt_get(opts, "order"); > @@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp) > exit(1); > } > normal_boot_order = g_strdup(boot_order); > -boot_order = once; > qemu_register_reset(restore_boot_order, normal_boot_order); > } > normal_boot_order can be eliminated now. } qemu_register_reset(restore_boot_order, strdup(order)); } Even better, move qemu_register_reset()... > @@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp) > > net_check_clients(); > > +if (once) { > +Error *local_err = NULL; > +qemu_boot_set(once, &local_err); > +if (local_err) { > +error_report("%s", error_get_pretty(local_err)); > +exit(1); > +} ... here: qemu_register_reset(restore_boot_order, strdup(boot_order)); > +} > + > ds = init_displaystate(); > > /* init local displays */ Clearer, don't you think?
Re: [Qemu-devel] [PATCH v2 1/3] bootdevice: remove the check about boot_set_handler
On 2015/2/6 16:56, Markus Armbruster wrote: > writes: > >> From: Gonglei >> >> The reset logic can be done by both machine reset and >> boot handler. So we shouldn't return error when the boot >> handler callback don't be set. >> >> Signed-off-by: Gonglei >> Reviewed-by: Alexander Graf >> --- >> bootdevice.c | 10 +++--- >> 1 file changed, 3 insertions(+), 7 deletions(-) >> >> diff --git a/bootdevice.c b/bootdevice.c >> index 5914417..52d3f9e 100644 >> --- a/bootdevice.c >> +++ b/bootdevice.c >> @@ -51,19 +51,15 @@ void qemu_boot_set(const char *boot_order, Error **errp) >> { >> Error *local_err = NULL; >> >> -if (!boot_set_handler) { >> -error_setg(errp, "no function defined to set boot device list for" >> - " this architecture"); >> -return; >> -} >> - >> validate_bootdevices(boot_order, &local_err); >> if (local_err) { >> error_propagate(errp, local_err); >> return; >> } >> >> -boot_set_handler(boot_set_opaque, boot_order, errp); >> +if (boot_set_handler) { >> +boot_set_handler(boot_set_opaque, boot_order, errp); >> +} >> } >> >> void validate_bootdevices(const char *devices, Error **errp) > > Two callers: > > * HMP command boot_set > > Before your patch: command fails when the target doesn't support > changing the boot order. > > After your patch: command silently does nothing. I'm afraid that's a > regression. > > Aside: looks like there's no QMP command. Yes. > > * restore_boot_order() > > No change yet, because restore_boot_order() ignores errors. But PATCH > 3 will make it abort on error. I guess that's why you make the change > here. Actually, I add this patch based on the previous conversation: It duplicates the reset logic as well as information holder locality (machine struct vs parameter). http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg04172.html and http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg04292.html Regards, -Gonglei > > To avoid the regression, you could drop PATCH 1, and change PATCH 3 to > something like > > -qemu_boot_set(normal_boot_order, NULL); > +if (boot_set_handler) { > +qemu_boot_set(normal_boot_order, &error_abort); > +} > > There are other ways, but this looks like the simplest one.
Re: [Qemu-devel] [PATCH v2 2/3] bootdevice: check boot order argument validation before vm running
On 2015/2/6 17:06, Markus Armbruster wrote: > writes: > >> From: Gonglei >> >> Either 'once' option or 'order' option can take effect for -boot at >> the same time, that is say initial startup processing can check only >> one. And pc.c's set_boot_dev() fails when its boot order argument >> is invalid. This patch provide a solution fix this problem: >> >> 1. If "once" is given, register reset handler to restore boot order. >> >> 2. Pass the normal boot order to machine creation. Should fail when >>the normal boot order is invalid. >> >> 3. If "once" is given, set it with qemu_boot_set(). Fails when the >>once boot order is invalid. >> >> 4. Start the machine. >> >> 5. On reset, the reset handler calls qemu_boot_set() to restore boot >>order. Should never fail. >> >> Suggested-by: Markus Armbruster >> Signed-off-by: Gonglei >> --- >> vl.c | 13 +++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/vl.c b/vl.c >> index 983259b..0d90d98 100644 >> --- a/vl.c >> +++ b/vl.c >> @@ -2734,6 +2734,7 @@ int main(int argc, char **argv, char **envp) >> const char *initrd_filename; >> const char *kernel_filename, *kernel_cmdline; >> const char *boot_order; >> +const char *once = NULL; > > Consider renaming once to boot_once. In its much larger scope, the boot > context isn't obvious anymore, so a more telling name would be in order. > Agree. >> DisplayState *ds; >> int cyls, heads, secs, translation; >> QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL; >> @@ -4046,7 +4047,7 @@ int main(int argc, char **argv, char **envp) >> opts = qemu_opts_find(qemu_find_opts("boot-opts"), NULL); >> if (opts) { >> char *normal_boot_order; >> -const char *order, *once; >> +const char *order; >> Error *local_err = NULL; >> >> order = qemu_opt_get(opts, "order"); >> @@ -4067,7 +4068,6 @@ int main(int argc, char **argv, char **envp) >> exit(1); >> } >> normal_boot_order = g_strdup(boot_order); >> -boot_order = once; >> qemu_register_reset(restore_boot_order, normal_boot_order); >> } >> > > normal_boot_order can be eliminated now. > > } > qemu_register_reset(restore_boot_order, strdup(order)); > } > > Even better, move qemu_register_reset()... > >> @@ -4246,6 +4246,15 @@ int main(int argc, char **argv, char **envp) >> >> net_check_clients(); >> >> +if (once) { >> +Error *local_err = NULL; >> +qemu_boot_set(once, &local_err); >> +if (local_err) { >> +error_report("%s", error_get_pretty(local_err)); >> +exit(1); >> +} > > ... here: > >qemu_register_reset(restore_boot_order, strdup(boot_order)); > >> +} >> + >> ds = init_displaystate(); >> >> /* init local displays */ > > Clearer, don't you think? Yes, it's cool. :) Regards, -Gonglei
Re: [Qemu-devel] [PATCH 3/4] net: del hub port when peer is deleted
On Thu, Feb 5, 2015 at 10:25 PM, Stefan Hajnoczi wrote: On Mon, Feb 02, 2015 at 03:06:37PM +0800, Jason Wang wrote: We should del hub port when peer is deleted since it will not be reused and will only be freed during exit. Signed-off-by: Jason Wang --- net/net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/net.c b/net/net.c index 7acc162..74e651e 100644 --- a/net/net.c +++ b/net/net.c @@ -996,6 +996,8 @@ void net_host_device_remove(Monitor *mon, const QDict *qdict) error_report("invalid host network device '%s'", device); return; } + +qemu_del_net_client(nc->peer); qemu_del_net_client(nc); } If qmp_netdev_del() is used the hub port will stay alive. This is true if it has a peer. And the port will be freed during the deletion of its peer. If no peer, it will be deleted soon. This is consistent with the behaviors of other type of netdevs. Should the peer deletion happen in qemu_del_net_client(), similar to the existing NIC peer check? /* If there is a peer NIC, delete and cleanup client, but do not free. */ if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) { This way the hub port is consistently deleted when its peer is deleted. Not sure, but if management always do netdev_del after device_del, it will get an error. Thanks Stefan
Re: [Qemu-devel] [PATCH] block: introduce BDRV_REQUEST_MAX_SECTORS
On 06/02/15 11:49, Peter Lieven wrote: Am 03.02.2015 um 16:33 schrieb Denis V. Lunev: On 03/02/15 17:30, Peter Lieven wrote: Am 03.02.2015 um 14:29 schrieb Denis V. Lunev: On 03/02/15 15:12, Peter Lieven wrote: we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven --- block.c | 19 --- hw/block/virtio-blk.c | 4 ++-- include/block/block.h | 3 +++ 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/block.c b/block.c index 8272ef9..4e58b35 100644 --- a/block.c +++ b/block.c @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors <= 0) { return 0; } -if (nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n); if (ret < 0) { error_report("error getting block status at sector %" PRId64 ": %s", @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs->bl.max_write_zeroes ? - bs->bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes, + BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0 && !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (INT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs->bl.max_discard ? bs->bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs), &multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors > INT_MAX) { +if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return false; } if (sector & dev->sector_mask) { diff --git a/include/block/block.h b/include/block/block.h index 3082d2b..25a6d62 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -83,6 +83,9 @@ typedef enum { #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ +
Re: [Qemu-devel] [PATCH] block: Give always priority to unused entries in the qcow2 L2 cache
On Thu, Feb 05, 2015 at 03:17:19PM +0100, Kevin Wolf wrote: > > By never allowing the hit count to go down to zero, we make sure > > that all unused entries are chosen first before a valid one is > > discarded. > > But does this actually improve a lot? cache_hits is only 0 for the > first few accesses and it never becomes 0 again after that. The > result might be that soon all the entries have cache_hits == 1, and > we get the same problem as you're describing - only the first few > entries will be reused. I wanted to measure the actual impact of this, so I modified the current code to implement a quick and dirty LRU algorithm and made some numbers. First of all, it's true that if the cache is not big enough and there's a lot of cache misses, the entries being reused are always the first ones, whereas with LRU all of them are evicted at some point (I actually measured this and the results are very clear). However this doesn't seem to translate in actual performance improvements in my tests. I compared all three algorithms (the original one, my patched version and my LRU version) using a 20GB disk image and different L2 cache sizes. I tested using fio doing random reads for one minute. Here are the results: |---+-| | |Average IOPS | |---+--+-+| | Cache entries | Original | Patched | LRU| |---+--+-+| | 16 (8GB) | 4.0 K| 4.9 K | 5.1 K | | 24 (12GB) | 4.1 K| 7.2 K | 7.3 K | | 32 (16GB) | 4.1 K| 12.8 K | 12.7 K | | 40 (20GB) | 4.0 K| 64.0 K | 63.6 K | |---+--+-+| And these are the results with a 40GB disk image (I skipped the original code in this case since its limits are clear): |---+--| | | Average IOPS | |---+-+| | Cache entries | Patched | LRU| |---+-+| | 16 (8GB) | 3.7 K | 3.7 K | | 40 (20GB) | 5.4 K | 5.8 K | | 72 (36GB) | 20.8 K | 21.4 K | | 78 (39GB) | 42.6 K | 42.4 K | | 80 (40GB) | 64.2 K | 64.1 K | |---+-+| So it seems that under this scenario, if the cache is not big enough for the whole disk, the eviction algorithm doesn't really make a difference. This makes sense because since I'm doing random reads, the previous usage of a cache entry doesn't have an influence on the probability that it will be used again. Under a different scenario the results might be different but I don't have a use case with data to prove that. That's why I chose this simple change to the current algorithm rather than attempting a complete rewrite. Berto
Re: [Qemu-devel] [v4 02/13] migration: Add the framework of multi-thread compression
* Liang Li (liang.z...@intel.com) wrote: > Add the code to create and destroy the multiple threads those will > be used to do data compression. Left some functions empty to keep > clearness, and the code will be added later. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang Reviewed-by: Dr. David Alan Gilbert > --- > arch_init.c | 79 > ++- > include/migration/migration.h | 9 + > migration/migration.c | 37 > 3 files changed, 124 insertions(+), 1 deletion(-) > > diff --git a/arch_init.c b/arch_init.c > index 89c8fa4..1831f1a 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -332,6 +332,68 @@ static uint64_t migration_dirty_pages; > static uint32_t last_version; > static bool ram_bulk_stage; > > +struct CompressParam { > +/* To be done */ > +}; > +typedef struct CompressParam CompressParam; > + > +static CompressParam *comp_param; > +static bool quit_thread; > + > +static void *do_data_compress(void *opaque) > +{ > +while (!quit_thread) { > + > +/* To be done */ > + > +} > + > +return NULL; > +} > + > +static inline void terminate_compression_threads(void) > +{ > +quit_thread = true; > + > +/* To be done */ > +} > + > +void migrate_compress_threads_join(MigrationState *s) > +{ > +int i, thread_count; > + > +if (!migrate_use_compression()) { > +return; > +} > +terminate_compression_threads(); > +thread_count = migrate_compress_threads(); > +for (i = 0; i < thread_count; i++) { > +qemu_thread_join(s->compress_thread + i); > +} > +g_free(s->compress_thread); > +g_free(comp_param); > +s->compress_thread = NULL; > +comp_param = NULL; > +} > + > +void migrate_compress_threads_create(MigrationState *s) > +{ > +int i, thread_count; > + > +if (!migrate_use_compression()) { > +return; > +} > +quit_thread = false; > +thread_count = migrate_compress_threads(); > +s->compress_thread = g_new0(QemuThread, thread_count); > +comp_param = g_new0(CompressParam, thread_count); > +for (i = 0; i < thread_count; i++) { > +qemu_thread_create(s->compress_thread + i, "compress", > + do_data_compress, comp_param + i, > + QEMU_THREAD_JOINABLE); > +} > +} > + > /* Update the xbzrle cache to reflect a page that's been sent as all 0. > * The important thing is that a stale (not-yet-0'd) page be replaced > * by the new data. > @@ -645,6 +707,16 @@ static int ram_save_page(QEMUFile *f, RAMBlock* block, > ram_addr_t offset, > return bytes_sent; > } > > +static int ram_save_compressed_page(QEMUFile *f, RAMBlock *block, > +ram_addr_t offset, bool last_stage) > +{ > +int bytes_sent = 0; > + > +/* To be done*/ > + > +return bytes_sent; > +} > + > /* > * ram_find_and_save_block: Finds a page to send and sends it to f > * > @@ -679,7 +751,12 @@ static int ram_find_and_save_block(QEMUFile *f, bool > last_stage) > ram_bulk_stage = false; > } > } else { > -bytes_sent = ram_save_page(f, block, offset, last_stage); > +if (migrate_use_compression()) { > +bytes_sent = ram_save_compressed_page(f, block, offset, > + last_stage); > +} else { > +bytes_sent = ram_save_page(f, block, offset, last_stage); > +} > > /* if page is unmodified, continue to the next */ > if (bytes_sent > 0) { > diff --git a/include/migration/migration.h b/include/migration/migration.h > index 3cb5ba8..daf6c81 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -49,6 +49,9 @@ struct MigrationState > QemuThread thread; > QEMUBH *cleanup_bh; > QEMUFile *file; > +QemuThread *compress_thread; > +int compress_thread_count; > +int compress_level; > > int state; > MigrationParams params; > @@ -107,6 +110,8 @@ bool migration_has_finished(MigrationState *); > bool migration_has_failed(MigrationState *); > MigrationState *migrate_get_current(void); > > +void migrate_compress_threads_create(MigrationState *s); > +void migrate_compress_threads_join(MigrationState *s); > uint64_t ram_bytes_remaining(void); > uint64_t ram_bytes_transferred(void); > uint64_t ram_bytes_total(void); > @@ -156,6 +161,10 @@ int64_t migrate_xbzrle_cache_size(void); > > int64_t xbzrle_cache_resize(int64_t new_size); > > +bool migrate_use_compression(void); > +int migrate_compress_level(void); > +int migrate_compress_threads(void); > + > void ram_control_before_iterate(QEMUFile *f, uint64_t flags); > void ram_control_after_iterate(QEMUFile *f, uint64_t flags); > void ram_control_load_hook(QEMUFile *f, uint64_t flags); > diff --git a/migration/migration.c b/migration/migr
Re: [Qemu-devel] [v4 03/13] migration: Add the framework of multi-thread decompression
* Liang Li (liang.z...@intel.com) wrote: > Add the code to create and destroy the multiple threads those will be > used to do data decompression. Left some functions empty just to keep > clearness, and the code will be added later. Reviewed-by: Dr. David Alan Gilbert > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > arch_init.c | 75 > +++ > include/migration/migration.h | 4 +++ > migration/migration.c | 16 + > 3 files changed, 95 insertions(+) > > diff --git a/arch_init.c b/arch_init.c > index 1831f1a..ed34eb3 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > #ifndef _WIN32 > #include > #include > @@ -126,6 +127,7 @@ static uint64_t bitmap_sync_count; > #define RAM_SAVE_FLAG_CONTINUE 0x20 > #define RAM_SAVE_FLAG_XBZRLE 0x40 > /* 0x80 is reserved in migration.h start with 0x100 next */ > +#define RAM_SAVE_FLAG_COMPRESS_PAGE0x100 > > static struct defconfig_file { > const char *filename; > @@ -337,8 +339,16 @@ struct CompressParam { > }; > typedef struct CompressParam CompressParam; > > +struct DecompressParam { > +/* To be done */ > +}; > +typedef struct DecompressParam DecompressParam; > + > static CompressParam *comp_param; > static bool quit_thread; > +static DecompressParam *decomp_param; > +static QemuThread *decompress_threads; > +static uint8_t *compressed_data_buf; > > static void *do_data_compress(void *opaque) > { > @@ -1128,10 +1138,58 @@ void ram_handle_compressed(void *host, uint8_t ch, > uint64_t size) > } > } > > +static void *do_data_decompress(void *opaque) > +{ > +while (!quit_thread) { > +/* To be done */ > +} > + > +return NULL; > +} > + > +void migrate_decompress_threads_create(int count) > +{ > +int i; > + > +decompress_threads = g_new0(QemuThread, count); > +decomp_param = g_new0(DecompressParam, count); > +compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); > +quit_thread = false; > +for (i = 0; i < count; i++) { > +qemu_thread_create(decompress_threads + i, "decompress", > + do_data_decompress, decomp_param + i, > + QEMU_THREAD_JOINABLE); > +} > +} > + > +void migrate_decompress_threads_join(void) > +{ > +int i, thread_count; > + > +quit_thread = true; > +thread_count = migrate_decompress_threads(); > +for (i = 0; i < thread_count; i++) { > +qemu_thread_join(decompress_threads + i); > +} > +g_free(decompress_threads); > +g_free(decomp_param); > +g_free(compressed_data_buf); > +decompress_threads = NULL; > +decomp_param = NULL; > +compressed_data_buf = NULL; > +} > + > +static void decompress_data_with_multi_threads(uint8_t *compbuf, > + void *host, int len) > +{ > +/* To be done */ > +} > + > static int ram_load(QEMUFile *f, void *opaque, int version_id) > { > int flags = 0, ret = 0; > static uint64_t seq_iter; > +int len = 0; > > seq_iter++; > > @@ -1208,6 +1266,23 @@ static int ram_load(QEMUFile *f, void *opaque, int > version_id) > > qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > break; > +case RAM_SAVE_FLAG_COMPRESS_PAGE: > +host = host_from_stream_offset(f, addr, flags); > +if (!host) { > +error_report("Invalid RAM offset " RAM_ADDR_FMT, addr); > +ret = -EINVAL; > +break; > +} > + > +len = qemu_get_be32(f); > +if (len < 0 || len > compressBound(TARGET_PAGE_SIZE)) { > +error_report("Invalid compressed data length: %d", len); > +ret = -EINVAL; > +break; > +} > +qemu_get_buffer(f, compressed_data_buf, len); > +decompress_data_with_multi_threads(compressed_data_buf, host, > len); > +break; > case RAM_SAVE_FLAG_XBZRLE: > host = host_from_stream_offset(f, addr, flags); > if (!host) { > diff --git a/include/migration/migration.h b/include/migration/migration.h > index daf6c81..0c4f21c 100644 > --- a/include/migration/migration.h > +++ b/include/migration/migration.h > @@ -51,6 +51,7 @@ struct MigrationState > QEMUFile *file; > QemuThread *compress_thread; > int compress_thread_count; > +int decompress_thread_count; > int compress_level; > > int state; > @@ -112,6 +113,8 @@ MigrationState *migrate_get_current(void); > > void migrate_compress_threads_create(MigrationState *s); > void migrate_compress_threads_join(MigrationState *s); > +void migrate_decompress_threads_create(int count); > +void migrate_decompress_threads_join(void); > uint64_t ram_bytes_remaining(void); > uint64_t ram_bytes_transferred(void); > uint64_t ra
Re: [Qemu-devel] [PATCH v2 1/1] Print PID and time in stderr traces
On Tue, Jan 20, 2015 at 09:41:15AM +, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" > > When debugging migration it's useful to know the PID of > each trace message so you can figure out if it came from the source > or the destination. > > Printing the time makes it easy to do latency measurements or timings > between trace points. > > Signed-off-by: Dr. David Alan Gilbert > --- > > v2 > 0 pad usec part > > scripts/tracetool/backend/stderr.py | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) Thanks, applied to my tracing tree: https://github.com/stefanha/qemu/commits/tracing Stefan pgpT_aIUA7ZA9.pgp Description: PGP signature
Re: [Qemu-devel] [v4 04/13] qemu-file: Add compression functions to QEMUFile
* Liang Li (liang.z...@intel.com) wrote: > qemu_put_compression_data() compress the data and put it to QEMUFile. > qemu_put_qemu_file() put the data in the buffer of source QEMUFile to > destination QEMUFile. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > include/migration/qemu-file.h | 3 +++ > migration/qemu-file.c | 39 +++ > 2 files changed, 42 insertions(+) > > diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h > index d843c00..2204fb9 100644 > --- a/include/migration/qemu-file.h > +++ b/include/migration/qemu-file.h > @@ -159,6 +159,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v); > void qemu_put_be64(QEMUFile *f, uint64_t v); > int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset); > int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size); > +size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size, > + int level); > +int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src); > /* > * Note that you can only peek continuous bytes from where the current > pointer > * is; you aren't guaranteed to be able to peak to +n bytes unless you've > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index edc2830..de2da2d 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -21,6 +21,7 @@ > * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > * THE SOFTWARE. > */ > +#include > #include "qemu-common.h" > #include "qemu/iov.h" > #include "qemu/sockets.h" > @@ -529,3 +530,41 @@ uint64_t qemu_get_be64(QEMUFile *f) > v |= qemu_get_be32(f); > return v; > } > + > +/* compress size bytes of data start at p with specific compression > + * leve and store the compressed data to the buffer of f. Typo: 'leve' > + */ > + > +size_t qemu_put_compression_data(QEMUFile *f, const uint8_t *p, size_t size, > + int level) > +{ > +size_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t); > + > +if (blen < compressBound(size)) { > +return 0; > +} Imagine that there were only 3 bytes space in the buffer; so that IO_BUF_SIZE - f->buf_index = 3 then you subtract sizeof(int32_t) and get -1, but size_t is unsigned so the calculation works out as a very big number, and that comparison fails and it carries on to call compress2. > +if (compress2(f->buf + f->buf_index + sizeof(int32_t), &blen, (Bytef *)p, > + size, level) != Z_OK) { > +error_report("Compress Failed!"); > +return 0; > +} > +qemu_put_be32(f, blen); > +f->buf_index += blen; > +return blen + sizeof(int32_t); > +} > + > +/* Put the data in the buffer of f_src to the buffer of f_des, and > + * then reset the buf_index of f_src to 0. > + */ > + > +int qemu_put_qemu_file(QEMUFile *f_des, QEMUFile *f_src) > +{ > +int len = 0; > + > +if (f_src->buf_index > 0) { > +len = f_src->buf_index; > +qemu_put_buffer(f_des, f_src->buf, f_src->buf_index); > +f_src->buf_index = 0; > +} > +return len; > +} > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PULL 00/10] Migration queue
On 5 February 2015 at 16:24, Juan Quintela wrote: > Hi > > This is the migration queue. Thanks to Amit for doing almost all the work. > There were a Makefile missing dependency to make test-vmstate compile with > the json changes, > Alex agreed with the changes. > > List of things: > - vmstate checker fix (amit) > - better tracing and errors (dgilbert) > - json description for migration stream (alex) > - mc146818rtc fix for subsection (Zhang). > > Please apply > > Thanks, Juan. > > The following changes since commit cd07b19307bd185dccfd39052ac66d2730b32857: > > Merge remote-tracking branch > 'remotes/pmaydell/tags/pull-target-arm-20150205' into staging (2015-02-05 > 14:22:51 +) > > are available in the git repository at: > > git://github.com/juanquintela/qemu.git tags/migration/20150205 > > for you to fetch changes up to bb426311901776b95b021cece831b69dce4ef5ee: > > fix mc146818rtc wrong subsection name to avoid vmstate_subsection_load() > fail (2015-02-05 17:16:14 +0100) > > > migration/next for 20150205 > > Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH v2 10/19] qtest/ahci: add ahci_write_fis
On Thu, Feb 05, 2015 at 11:19:16AM -0500, John Snow wrote: > > > On 02/05/2015 08:29 AM, Stefan Hajnoczi wrote: > >On Tue, Feb 03, 2015 at 04:46:30PM -0500, John Snow wrote: > >>Similar to ahci_set_command_header, add a helper that takes an > >>in-memory representation of a command FIS and writes it to guest > >>memory, handling endianness as-needed. > >> > >>Signed-off-by: John Snow > >>--- > >> tests/ahci-test.c | 2 +- > >> tests/libqos/ahci.c | 10 ++ > >> tests/libqos/ahci.h | 1 + > >> 3 files changed, 12 insertions(+), 1 deletion(-) > >> > >>diff --git a/tests/ahci-test.c b/tests/ahci-test.c > >>index 211274e..658956d 100644 > >>--- a/tests/ahci-test.c > >>+++ b/tests/ahci-test.c > >>@@ -728,7 +728,7 @@ static void ahci_test_identify(AHCIQState *ahci) > >> g_assert_cmphex(ahci_px_rreg(ahci, i, AHCI_PX_IS), ==, 0); > >> > >> /* Commit the Command FIS to the Command Table */ > >>-memwrite(table, &fis, sizeof(fis)); > >>+ahci_write_fis(ahci, &fis, table); > >> > >> /* Commit the PRD entry to the Command Table */ > >> memwrite(table + 0x80, &prd, sizeof(prd)); > >>diff --git a/tests/libqos/ahci.c b/tests/libqos/ahci.c > >>index ec72627..7336781 100644 > >>--- a/tests/libqos/ahci.c > >>+++ b/tests/libqos/ahci.c > >>@@ -464,6 +464,16 @@ void ahci_destroy_command(AHCIQState *ahci, uint8_t > >>px, uint8_t cx) > >> ahci->port[px].prdtl[cx] = 0; > >> } > >> > >>+void ahci_write_fis(AHCIQState *ahci, RegH2DFIS *fis, uint64_t addr) > >>+{ > >>+RegH2DFIS tmp = *fis; > >>+ > >>+/* All other FIS fields are 8 bit and do not need to be flipped. */ > >>+tmp.count = cpu_to_le16(tmp.count); > >>+ > >>+memwrite(addr, &tmp, sizeof(tmp)); > >>+} > > > >This patch looks wrong because tmp.count is byteswapped now but not > >before. It actually works because the value is 0 so we never bothered > >to assign it explicitly. > > > >I do wonder about the 'aux' field in the FIS struct. It's uint32_t. > >Although the tests never access it, should that field be byteswapped? > > > >Stefan > > > > The Aux field(s) is/are used for some NCQ subcommands, and the formatting > varies per-command, so it's not (at the moment) possible to byte swap it > automatically ahead of time. > > So the answer is "sometimes, maybe, but we're not using it right now." > > Also, yes, count /was/ wrong before. It's right now :) since the IDENTIFY > test as it currently stands is very literal and script-ish, there was no > need to swap the bits there before. The DMA test requires this, though. > > If you'd like, I can add a comment or a note that the AUX fields are > currently ignored. We discussed the aux field on IRC. Since its structure depends on the command, it would be more appropraite to make it uint8_t aux[4] and perhaps introduce a union if we actually start using that field. That way it's clear that this is not a uint32_t that needs byteswapping. Stefan pgpmav7_xVfII.pgp Description: PGP signature
Re: [Qemu-devel] value of VIRTQUEUE_MAX_SIZE
On Thu, Feb 05, 2015 at 03:29:17PM +0100, Peter Lieven wrote: > Am 05.02.2015 um 15:00 schrieb Stefan Hajnoczi: > >On Fri, Jan 30, 2015 at 10:08:02PM +0100, Peter Lieven wrote: > >>Just wondering if VIRTQUEUE_MAX_SIZE in include/hw/virtio/virtio.h should > >>not be equal to IOV_MAX instead of the hardcoded 1024? > >The vring queue size is guest-visible to some extent (e.g. vring memory > >layout). Tying it to a constant that is defined by the host operating > >system could lead to problems (e.g. live migration between different > >hosts). > > > >Anyway, all of the virtio devices have a queue size that is less than or > >equal to VIRTQUEUE_MAX_SIZE (and there is an assertion to check this in > >virtio_add_queue()). > > > >Guests are supposed to honor the vring queue size, although indirect > >descriptors seem to be able to use up to VIRTQUEUE_MAX_SIZE according to > >my understanding of QEMU's virtio.c code. > > > >Why would you like to use IOV_MAX? > > The idea was that IOV_MAX is the limit in case of at least virtio-blk. The > host > will not support more than IOV_MAX iovecs passed to a block request. Is there an issue in practice? Stefan pgpbhSe25ReHm.pgp Description: PGP signature
Re: [Qemu-devel] value of VIRTQUEUE_MAX_SIZE
Am 06.02.2015 um 11:42 schrieb Stefan Hajnoczi: > On Thu, Feb 05, 2015 at 03:29:17PM +0100, Peter Lieven wrote: >> Am 05.02.2015 um 15:00 schrieb Stefan Hajnoczi: >>> On Fri, Jan 30, 2015 at 10:08:02PM +0100, Peter Lieven wrote: Just wondering if VIRTQUEUE_MAX_SIZE in include/hw/virtio/virtio.h should not be equal to IOV_MAX instead of the hardcoded 1024? >>> The vring queue size is guest-visible to some extent (e.g. vring memory >>> layout). Tying it to a constant that is defined by the host operating >>> system could lead to problems (e.g. live migration between different >>> hosts). >>> >>> Anyway, all of the virtio devices have a queue size that is less than or >>> equal to VIRTQUEUE_MAX_SIZE (and there is an assertion to check this in >>> virtio_add_queue()). >>> >>> Guests are supposed to honor the vring queue size, although indirect >>> descriptors seem to be able to use up to VIRTQUEUE_MAX_SIZE according to >>> my understanding of QEMU's virtio.c code. >>> >>> Why would you like to use IOV_MAX? >> The idea was that IOV_MAX is the limit in case of at least virtio-blk. The >> host >> will not support more than IOV_MAX iovecs passed to a block request. > Is there an issue in practice? If there is no platform where IOV_MAX is less than 1024 then not. Peter
Re: [Qemu-devel] [v4 05/13] arch_init: Alloc and free data struct for compression
* Liang Li (liang.z...@intel.com) wrote: > Define the data structure and variables used to do multiple thread > compression, and add the code to initialize and free them. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang Reviewed-by: Dr. David Alan Gilbert > --- > arch_init.c | 34 +- > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/arch_init.c b/arch_init.c > index ed34eb3..87c4947 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -335,7 +335,12 @@ static uint32_t last_version; > static bool ram_bulk_stage; > > struct CompressParam { > -/* To be done */ > +bool busy; > +QEMUFile *file; > +QemuMutex mutex; > +QemuCond cond; > +RAMBlock *block; > +ram_addr_t offset; > }; > typedef struct CompressParam CompressParam; > > @@ -345,6 +350,14 @@ struct DecompressParam { > typedef struct DecompressParam DecompressParam; > > static CompressParam *comp_param; > +/* comp_done_cond is used to wake up the migration thread when > + * one of the compression threads has finished the compression. > + * comp_done_lock is used to co-work with comp_done_cond. > + */ > +static QemuMutex *comp_done_lock; > +static QemuCond *comp_done_cond; > +/* The empty QEMUFileOps will be used by file in CompressParam */ > +static const QEMUFileOps empty_ops = { }; > static bool quit_thread; > static DecompressParam *decomp_param; > static QemuThread *decompress_threads; > @@ -379,11 +392,20 @@ void migrate_compress_threads_join(MigrationState *s) > thread_count = migrate_compress_threads(); > for (i = 0; i < thread_count; i++) { > qemu_thread_join(s->compress_thread + i); > +qemu_fclose(comp_param[i].file); > +qemu_mutex_destroy(&comp_param[i].mutex); > +qemu_cond_destroy(&comp_param[i].cond); > } > +qemu_mutex_destroy(comp_done_lock); > +qemu_cond_destroy(comp_done_cond); > g_free(s->compress_thread); > g_free(comp_param); > +g_free(comp_done_cond); > +g_free(comp_done_lock); > s->compress_thread = NULL; > comp_param = NULL; > +comp_done_cond = NULL; > +comp_done_lock = NULL; > } > > void migrate_compress_threads_create(MigrationState *s) > @@ -397,7 +419,17 @@ void migrate_compress_threads_create(MigrationState *s) > thread_count = migrate_compress_threads(); > s->compress_thread = g_new0(QemuThread, thread_count); > comp_param = g_new0(CompressParam, thread_count); > +comp_done_cond = g_new0(QemuCond, 1); > +comp_done_lock = g_new0(QemuMutex, 1); > +qemu_cond_init(comp_done_cond); > +qemu_mutex_init(comp_done_lock); > for (i = 0; i < thread_count; i++) { > +/* com_param[i].file is just used as a dummy buffer to save data, set > + * it's ops to empty. > + */ > +comp_param[i].file = qemu_fopen_ops(NULL, &empty_ops); > +qemu_mutex_init(&comp_param[i].mutex); > +qemu_cond_init(&comp_param[i].cond); > qemu_thread_create(s->compress_thread + i, "compress", > do_data_compress, comp_param + i, > QEMU_THREAD_JOINABLE); > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [v4 06/13] arch_init: Add and free data struct for decompression
* Liang Li (liang.z...@intel.com) wrote: > Define the data structure and variables used to do multiple thread > decompression, and add the code to initialize and free them. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang Reviewed-by: Dr. David Alan Gilbert > --- > arch_init.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch_init.c b/arch_init.c > index 87c4947..500f299 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -345,7 +345,12 @@ struct CompressParam { > typedef struct CompressParam CompressParam; > > struct DecompressParam { > -/* To be done */ > +bool busy; > +QemuMutex mutex; > +QemuCond cond; > +void *des; > +uint8 *compbuf; > +int len; > }; > typedef struct DecompressParam DecompressParam; > > @@ -1188,6 +1193,9 @@ void migrate_decompress_threads_create(int count) > compressed_data_buf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); > quit_thread = false; > for (i = 0; i < count; i++) { > +qemu_mutex_init(&decomp_param[i].mutex); > +qemu_cond_init(&decomp_param[i].cond); > +decomp_param[i].compbuf = g_malloc0(compressBound(TARGET_PAGE_SIZE)); > qemu_thread_create(decompress_threads + i, "decompress", > do_data_decompress, decomp_param + i, > QEMU_THREAD_JOINABLE); > @@ -1202,6 +1210,9 @@ void migrate_decompress_threads_join(void) > thread_count = migrate_decompress_threads(); > for (i = 0; i < thread_count; i++) { > qemu_thread_join(decompress_threads + i); > +qemu_mutex_destroy(&decomp_param[i].mutex); > +qemu_cond_destroy(&decomp_param[i].cond); > +g_free(decomp_param[i].compbuf); > } > g_free(decompress_threads); > g_free(decomp_param); > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PATCH V2] block: introduce BDRV_REQUEST_MAX_SECTORS
we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven --- v1->v2: use macro in bdrv_check_byte_request alse [Den] block.c | 21 + hw/block/virtio-blk.c |4 ++-- include/block/block.h |3 +++ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/block.c b/block.c index 8272ef9..49e0073 100644 --- a/block.c +++ b/block.c @@ -2647,7 +2647,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, { int64_t len; -if (size > INT_MAX) { +if (size > BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) { return -EIO; } @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors <= 0) { return 0; } -if (nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n); if (ret < 0) { error_report("error getting block status at sector %" PRId64 ": %s", @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs->bl.max_write_zeroes ? - bs->bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes, +BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0 && !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (INT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs->bl.max_discard ? bs->bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs), &multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors > INT_MAX) { +if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return false; } if (sector & dev->sector_mask) { diff --git a/include/block/block.h b/include/block/block.h index 3082d2b..25a6d62 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -83,6 +83,9 @@ typedef enum { #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) #define BDRV_SECTOR_MASK ~(BDRV_SECTOR_SIZE - 1) +#define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ +
Re: [Qemu-devel] [v4 07/13] migration: Split the function ram_save_page
* Liang Li (liang.z...@intel.com) wrote: > Split the function ram_save_page for code reuse purpose. That's better, but I still think there is an XBZRLE problem; see below. > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > arch_init.c | 102 > +--- > 1 file changed, 56 insertions(+), 46 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 500f299..eae082b 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -595,6 +595,58 @@ static void migration_bitmap_sync_range(ram_addr_t > start, ram_addr_t length) > } > } > > +static int save_zero_and_xbzrle_page(QEMUFile *f, uint8_t **current_data, > + RAMBlock *block, ram_addr_t offset, > + bool last_stage, bool *send_async) > +{ > +int bytes_sent = -1; > +int cont, ret; > +ram_addr_t current_addr; > + > +cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0; > + > +/* In doubt sent page as normal */ > +ret = ram_control_save_page(f, block->offset, > +offset, TARGET_PAGE_SIZE, &bytes_sent); > + > +XBZRLE_cache_lock(); > + > +current_addr = block->offset + offset; > +if (ret != RAM_SAVE_CONTROL_NOT_SUPP) { > +if (ret != RAM_SAVE_CONTROL_DELAYED) { > +if (bytes_sent > 0) { > +acct_info.norm_pages++; > +} else if (bytes_sent == 0) { > +acct_info.dup_pages++; > +} > +} > +} else if (is_zero_range(*current_data, TARGET_PAGE_SIZE)) { > +acct_info.dup_pages++; > +bytes_sent = save_block_hdr(f, block, offset, cont, > +RAM_SAVE_FLAG_COMPRESS); > +qemu_put_byte(f, 0); > +bytes_sent++; > +/* Must let xbzrle know, otherwise a previous (now 0'd) cached > + * page would be stale > + */ > +xbzrle_cache_zero_page(current_addr); > +} else if (!ram_bulk_stage && migrate_use_xbzrle()) { > +bytes_sent = save_xbzrle_page(f, current_data, current_addr, block, > + offset, cont, last_stage); > +if (!last_stage) { > +/* Can't send this cached data async, since the cache page > + * might get updated before it gets to the wire > + */ > +if (send_async != NULL) { > +*send_async = false; > +} > +} > +} > + > +XBZRLE_cache_unlock(); I think this is too soon; when save_xbzrle_page updates current_data to point to a page from the cache, the cache data is still in use by this point, so we must be careful that the cache couldn't get resized until after the qemu_put_buffer below. Thus this lock must be held until after that. Dave > +return bytes_sent; > +} > > /* Needs iothread lock! */ > /* Fix me: there are too many global variables used in migration process. */ > @@ -685,60 +737,20 @@ static void migration_bitmap_sync(void) > * > * Returns: Number of bytes written. > */ > -static int ram_save_page(QEMUFile *f, RAMBlock* block, ram_addr_t offset, > +static int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset, > bool last_stage) > { > int bytes_sent; > int cont; > -ram_addr_t current_addr; > MemoryRegion *mr = block->mr; > uint8_t *p; > -int ret; > bool send_async = true; > > -cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0; > - > p = memory_region_get_ram_ptr(mr) + offset; > - > -/* In doubt sent page as normal */ > -bytes_sent = -1; > -ret = ram_control_save_page(f, block->offset, > - offset, TARGET_PAGE_SIZE, &bytes_sent); > - > -XBZRLE_cache_lock(); > - > -current_addr = block->offset + offset; > -if (ret != RAM_SAVE_CONTROL_NOT_SUPP) { > -if (ret != RAM_SAVE_CONTROL_DELAYED) { > -if (bytes_sent > 0) { > -acct_info.norm_pages++; > -} else if (bytes_sent == 0) { > -acct_info.dup_pages++; > -} > -} > -} else if (is_zero_range(p, TARGET_PAGE_SIZE)) { > -acct_info.dup_pages++; > -bytes_sent = save_block_hdr(f, block, offset, cont, > -RAM_SAVE_FLAG_COMPRESS); > -qemu_put_byte(f, 0); > -bytes_sent++; > -/* Must let xbzrle know, otherwise a previous (now 0'd) cached > - * page would be stale > - */ > -xbzrle_cache_zero_page(current_addr); > -} else if (!ram_bulk_stage && migrate_use_xbzrle()) { > -bytes_sent = save_xbzrle_page(f, &p, current_addr, block, > - offset, cont, last_stage); > -if (!last_stage) { > -/* Can't send this cached data async, since the cache page > - * might get updated before it gets
Re: [Qemu-devel] [PATCH] block: Give always priority to unused entries in the qcow2 L2 cache
Am 06.02.2015 um 10:44 hat Alberto Garcia geschrieben: > On Thu, Feb 05, 2015 at 03:17:19PM +0100, Kevin Wolf wrote: > > > > By never allowing the hit count to go down to zero, we make sure > > > that all unused entries are chosen first before a valid one is > > > discarded. > > > > But does this actually improve a lot? cache_hits is only 0 for the > > first few accesses and it never becomes 0 again after that. The > > result might be that soon all the entries have cache_hits == 1, and > > we get the same problem as you're describing - only the first few > > entries will be reused. > > I wanted to measure the actual impact of this, so I modified the > current code to implement a quick and dirty LRU algorithm and made > some numbers. Cool, thanks. I think switching to an LRU algorithm is an option that might make sense for the general case, so this is something to consider. > First of all, it's true that if the cache is not big enough and > there's a lot of cache misses, the entries being reused are always the > first ones, whereas with LRU all of them are evicted at some point (I > actually measured this and the results are very clear). Good to have this confirmed. > However this doesn't seem to translate in actual performance > improvements in my tests. > > I compared all three algorithms (the original one, my patched version > and my LRU version) using a 20GB disk image and different L2 cache > sizes. I tested using fio doing random reads for one minute. Here are > the results: > > |---+-| > | |Average IOPS | > |---+--+-+| > | Cache entries | Original | Patched | LRU| > |---+--+-+| > | 16 (8GB) | 4.0 K| 4.9 K | 5.1 K | > | 24 (12GB) | 4.1 K| 7.2 K | 7.3 K | > | 32 (16GB) | 4.1 K| 12.8 K | 12.7 K | > | 40 (20GB) | 4.0 K| 64.0 K | 63.6 K | > |---+--+-+| > > And these are the results with a 40GB disk image (I skipped the > original code in this case since its limits are clear): > > |---+--| > | | Average IOPS | > |---+-+| > | Cache entries | Patched | LRU| > |---+-+| > | 16 (8GB) | 3.7 K | 3.7 K | > | 40 (20GB) | 5.4 K | 5.8 K | > | 72 (36GB) | 20.8 K | 21.4 K | > | 78 (39GB) | 42.6 K | 42.4 K | > | 80 (40GB) | 64.2 K | 64.1 K | > |---+-+| > > So it seems that under this scenario, if the cache is not big enough > for the whole disk, the eviction algorithm doesn't really make a > difference. > > This makes sense because since I'm doing random reads, the previous > usage of a cache entry doesn't have an influence on the probability > that it will be used again. Indeed, random reads across the whole disk are the worst case for caching. Your patches and LRU both "only" help in that they increase the share of cached metadata and therefore the chance of randomly picking something that is cached. It doesn't really matter which part of the metadata is cached. > Under a different scenario the results might be different but I > don't have a use case with data to prove that. That's why I chose > this simple change to the current algorithm rather than attempting a > complete rewrite. I was going to suggest a test run for a different scenario, like with many interleaved sequential reads, but thinking a bit more about how to do it so that the eviction algorithm could make a significant change, it occurred to me that it might not be all that easy (and therefore possibly not practically relevant either). So yes, I think you have convinced me that your patch is a worthwhile improvement that can be made independently from changes to the eviction strategy. Kevin
[Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev CC: Peter Lieven CC: Kevin Wolf --- block/nbd.c | 9 + 1 file changed, 9 insertions(+) diff --git a/block/nbd.c b/block/nbd.c index 04cc845..dda0b0b 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -301,6 +301,12 @@ static int nbd_co_flush(BlockDriverState *bs) return nbd_client_session_co_flush(&s->client); } +static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) +{ +bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS; +bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS; +} + static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { @@ -396,6 +402,7 @@ static BlockDriver bdrv_nbd = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -413,6 +420,7 @@ static BlockDriver bdrv_nbd_tcp = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, @@ -430,6 +438,7 @@ static BlockDriver bdrv_nbd_unix = { .bdrv_close = nbd_close, .bdrv_co_flush_to_os= nbd_co_flush, .bdrv_co_discard= nbd_co_discard, +.bdrv_refresh_limits= nbd_refresh_limits, .bdrv_getlength = nbd_getlength, .bdrv_detach_aio_context= nbd_detach_aio_context, .bdrv_attach_aio_context= nbd_attach_aio_context, -- 1.9.1
Re: [Qemu-devel] [PATCH V2] block: introduce BDRV_REQUEST_MAX_SECTORS
On 06/02/15 13:54, Peter Lieven wrote: we check and adjust request sizes at several places with sometimes inconsistent checks or default values: INT_MAX INT_MAX >> BDRV_SECTOR_BITS UINT_MAX >> BDRV_SECTOR_BITS SIZE_MAX >> BDRV_SECTOR_BITS This patches introdocues a macro for the maximal allowed sectors per request and uses it at several places. Signed-off-by: Peter Lieven --- v1->v2: use macro in bdrv_check_byte_request alse [Den] block.c | 21 + hw/block/virtio-blk.c |4 ++-- include/block/block.h |3 +++ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/block.c b/block.c index 8272ef9..49e0073 100644 --- a/block.c +++ b/block.c @@ -2647,7 +2647,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, { int64_t len; -if (size > INT_MAX) { +if (size > BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) { return -EIO; } @@ -2671,7 +2671,7 @@ static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset, static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num, int nb_sectors) { -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EIO; } @@ -2758,7 +2758,7 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf, .iov_len = nb_sectors * BDRV_SECTOR_SIZE, }; -if (nb_sectors < 0 || nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -2826,13 +2826,10 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags) } for (;;) { -nb_sectors = target_sectors - sector_num; +nb_sectors = MIN(target_sectors - sector_num, BDRV_REQUEST_MAX_SECTORS); if (nb_sectors <= 0) { return 0; } -if (nb_sectors > INT_MAX / BDRV_SECTOR_SIZE) { -nb_sectors = INT_MAX / BDRV_SECTOR_SIZE; -} ret = bdrv_get_block_status(bs, sector_num, nb_sectors, &n); if (ret < 0) { error_report("error getting block status at sector %" PRId64 ": %s", @@ -3167,7 +3164,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -3202,8 +3199,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs, struct iovec iov = {0}; int ret = 0; -int max_write_zeroes = bs->bl.max_write_zeroes ? - bs->bl.max_write_zeroes : INT_MAX; +int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes, +BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0 && !ret) { int num = nb_sectors; @@ -3458,7 +3455,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, BdrvRequestFlags flags) { -if (nb_sectors < 0 || nb_sectors > (INT_MAX >> BDRV_SECTOR_BITS)) { +if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return -EINVAL; } @@ -5120,7 +5117,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } -max_discard = bs->bl.max_discard ? bs->bl.max_discard : INT_MAX; +max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret; int num = nb_sectors; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 8c51a29..1a8a176 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -381,7 +381,7 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb) } max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk); -max_xfer_len = MIN_NON_ZERO(max_xfer_len, INT_MAX); +max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS); qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs), &multireq_compare); @@ -447,7 +447,7 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev, uint64_t nb_sectors = size >> BDRV_SECTOR_BITS; uint64_t total_sectors; -if (nb_sectors > INT_MAX) { +if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { return false; } if (sector & dev->sector_mask) { diff --git a/include/block/block.h b/include/block/block.h index 3082d2b..25a6d62 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -83,6 +83,9 @@ typedef enum { #define BDRV_SECTOR_SIZE (1ULL << BDRV_SECTOR_BITS) #define BDRV_SECTOR
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: > nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t > as the length in bytes of the data to discard due to the following > definition: > > struct nbd_request { > uint32_t magic; > uint32_t type; > uint64_t handle; > uint64_t from; > uint32_t len; <-- the length of data to be discarded, in bytes > } QEMU_PACKED; > > Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to > avoid overflow. > > NBD read/write code uses the same structure for transfers. Fix > max_transfer_length accordingly. > > Signed-off-by: Denis V. Lunev > CC: Peter Lieven > CC: Kevin Wolf Thanks, I have applied both Peter's and your patch. Can you guys please check whether the current state of my block branch is correct or whether I forgot to include or remove some patch? By the way, I don't think this NBD patch is strictly necessary as you'll have a hard time finding a platform where INT_MAX > UINT32_MAX, but I think it's good documentation at least and a safeguard if we ever decide to lift the general block layer restrictions. Kevin
Re: [Qemu-devel] [PATCH 1/5] virtio: rename VIRTIO_PCI_QUEUE_MAX to VIRTIO_QUEUE_MAX
On Fri, 6 Feb 2015 15:54:09 +0800 Jason Wang wrote: > VIRTIO_PCI_QUEUE_MAX was not specific to pci, so rename it to > VIRTIO_QUEUE_MAX. > > Cc: Amit Shah > Cc: Anthony Liguori > Cc: Michael S. Tsirkin > Cc: Alexander Graf > Cc: Richard Henderson > Cc: Cornelia Huck > Cc: Christian Borntraeger > Cc: Paolo Bonzini > Signed-off-by: Jason Wang > --- > hw/char/virtio-serial-bus.c | 2 +- > hw/s390x/s390-virtio-bus.c | 4 ++-- > hw/s390x/s390-virtio-ccw.c | 2 +- > hw/s390x/virtio-ccw.c| 10 +- > hw/scsi/virtio-scsi.c| 4 ++-- > hw/virtio/virtio-mmio.c | 4 ++-- > hw/virtio/virtio-pci.c | 10 +- > hw/virtio/virtio.c | 26 +- > include/hw/s390x/s390_flic.h | 2 +- > include/hw/virtio/virtio.h | 2 +- > 10 files changed, 33 insertions(+), 33 deletions(-) While I certainly agree with the patch on its own, I read through your complete patch series as well and I have some additional comments. Your patch series bumps up the maximum number of virtqueues for all transports, but I think we need to consider carefully what this means for each transport. For virtio-ccw, for example, we'll run into trouble when the guest uses classic queue indicators: These are limited to 64 per device (as I assumed 64 virtqueues per device initially and they fit quite nicely into a 64 bit integer). If the guest uses adapter interrupts with two-stage indicators, we should probably be fine. The guest will hopefully care about that, but we need to describe failure modes for this in the spec. For s390-virtio, I frankly have no idea what large numbers of queues will do :) Maybe we just run out of space for devices much earlier. You seem to need some changes for this to work for virtio-pci as well, don't you?
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
On 06/02/15 14:53, Kevin Wolf wrote: Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev CC: Peter Lieven CC: Kevin Wolf Thanks, I have applied both Peter's and your patch. Can you guys please check whether the current state of my block branch is correct or whether I forgot to include or remove some patch? can you give me tree URL? By the way, I don't think this NBD patch is strictly necessary as you'll have a hard time finding a platform where INT_MAX > UINT32_MAX, but I think it's good documentation at least and a safeguard if we ever decide to lift the general block layer restrictions. Kevin nope, it is absolutely mandatory stdint.h: /* Limit of `size_t' type. */ # if __WORDSIZE == 64 # define SIZE_MAX (18446744073709551615UL) # else # define SIZE_MAX (4294967295U) # endif Den
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 12:59 schrieb Denis V. Lunev: > On 06/02/15 14:53, Kevin Wolf wrote: >> Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: >>> nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t >>> as the length in bytes of the data to discard due to the following >>> definition: >>> >>> struct nbd_request { >>> uint32_t magic; >>> uint32_t type; >>> uint64_t handle; >>> uint64_t from; >>> uint32_t len; <-- the length of data to be discarded, in bytes >>> } QEMU_PACKED; >>> >>> Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to >>> avoid overflow. >>> >>> NBD read/write code uses the same structure for transfers. Fix >>> max_transfer_length accordingly. >>> >>> Signed-off-by: Denis V. Lunev >>> CC: Peter Lieven >>> CC: Kevin Wolf >> Thanks, I have applied both Peter's and your patch. Can you guys please >> check whether the current state of my block branch is correct or whether >> I forgot to include or remove some patch? > can you give me tree URL? > >> By the way, I don't think this NBD patch is strictly necessary as you'll >> have a hard time finding a platform where INT_MAX > UINT32_MAX, but I >> think it's good documentation at least and a safeguard if we ever decide >> to lift the general block layer restrictions. >> >> Kevin > nope, it is absolutely mandatory > > stdint.h: > > /* Limit of `size_t' type. */ > # if __WORDSIZE == 64 > # define SIZE_MAX (18446744073709551615UL) > # else > # define SIZE_MAX (4294967295U) > # endif > > Den Yes, but we limit to MIN(SIZE_MAX >> BDRV_SECTOR_BITS, INT_MAX >> BDRV_SECTOR_BITS) anyway. I do not know if there is a platform where INT_MAX is 2^63 - 1 and not 2^31 - 1 ? We can keep Dens patch in. Just in case. It doesn't hurt. Peter
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 12:59 hat Denis V. Lunev geschrieben: > On 06/02/15 14:53, Kevin Wolf wrote: > >Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: > >>nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t > >>as the length in bytes of the data to discard due to the following > >>definition: > >> > >>struct nbd_request { > >> uint32_t magic; > >> uint32_t type; > >> uint64_t handle; > >> uint64_t from; > >> uint32_t len; <-- the length of data to be discarded, in bytes > >>} QEMU_PACKED; > >> > >>Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to > >>avoid overflow. > >> > >>NBD read/write code uses the same structure for transfers. Fix > >>max_transfer_length accordingly. > >> > >>Signed-off-by: Denis V. Lunev > >>CC: Peter Lieven > >>CC: Kevin Wolf > >Thanks, I have applied both Peter's and your patch. Can you guys please > >check whether the current state of my block branch is correct or whether > >I forgot to include or remove some patch? > can you give me tree URL? Sure: git: git://repo.or.cz/qemu/kevin.git block Web: http://repo.or.cz/w/qemu/kevin.git/shortlog/refs/heads/block > >By the way, I don't think this NBD patch is strictly necessary as you'll > >have a hard time finding a platform where INT_MAX > UINT32_MAX, but I > >think it's good documentation at least and a safeguard if we ever decide > >to lift the general block layer restrictions. > > > >Kevin > nope, it is absolutely mandatory > > stdint.h: > > /* Limit of `size_t' type. */ > # if __WORDSIZE == 64 > # define SIZE_MAX (18446744073709551615UL) > # else > # define SIZE_MAX (4294967295U) > # endif But Peter defined it like this: #define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) And having integers with more the 32 bits is at least unusual. I don't know of any platform that has them. Anyway, as I said, your patch is good documentation, so I'm happy to apply it nevertheless. Kevin
Re: [Qemu-devel] [v4 08/13] migration: Add the core code of multi-thread compression
* Liang Li (liang.z...@intel.com) wrote: > Implement the core logic of the multiple thread compression. At this > point, multiple thread compression can't co-work with xbzrle yet. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > arch_init.c | 167 > +--- > 1 file changed, 159 insertions(+), 8 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index eae082b..b8bdb16 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -364,16 +364,31 @@ static QemuCond *comp_done_cond; > /* The empty QEMUFileOps will be used by file in CompressParam */ > static const QEMUFileOps empty_ops = { }; > static bool quit_thread; > +static int one_byte_count; Please add a comment here about what one_byte_count is; it's not obvious, but I can't think of a better name > static DecompressParam *decomp_param; > static QemuThread *decompress_threads; > static uint8_t *compressed_data_buf; > > +static int do_compress_ram_page(CompressParam *param); > + > static void *do_data_compress(void *opaque) > { > -while (!quit_thread) { > - > -/* To be done */ > +CompressParam *param = opaque; > > +while (!quit_thread) { This is something I missed on 02/ - can you rename 'quit_thread' to comp_quit_thread or something, so it's obvious which thread. > +qemu_mutex_lock(¶m->mutex); > +while (!param->busy) { > +qemu_cond_wait(¶m->cond, ¶m->mutex); > +if (quit_thread) { > +break; > +} > +} > +qemu_mutex_unlock(¶m->mutex); > +do_compress_ram_page(param); > +qemu_mutex_lock(comp_done_lock); > +param->busy = false; > +qemu_cond_signal(comp_done_cond); > +qemu_mutex_unlock(comp_done_lock); This is interestingly different from your previous version; param->mutex used to be held all of the time except during the cond_wait itself. I'm also worried about the quit_thread behaviour; is there any guarantee that 'terminate_compression_threads' is called while this code is in the param->cond cond_wait? If terminate_compression_threads was called while the thread was busy, then the cond_signal on param->cond would be too early. I'm thinking perhaps you need to check quit_thread before the cond_wait as well? (It's mostly error cases and migrate_cancel I'm worried about here). > } > > return NULL; > @@ -381,9 +396,13 @@ static void *do_data_compress(void *opaque) > > static inline void terminate_compression_threads(void) > { > -quit_thread = true; > +int idx, thread_count; > > -/* To be done */ > +thread_count = migrate_compress_threads(); > +quit_thread = true; > +for (idx = 0; idx < thread_count; idx++) { > +qemu_cond_signal(&comp_param[idx].cond); > +} > } > > void migrate_compress_threads_join(MigrationState *s) > @@ -764,12 +783,144 @@ static int ram_save_page(QEMUFile *f, RAMBlock *block, > ram_addr_t offset, > return bytes_sent; > } > > +static int do_compress_ram_page(CompressParam *param) > +{ > +int bytes_sent, cont; > +int blen; > +uint8_t *p; > +RAMBlock *block = param->block; > +ram_addr_t offset = param->offset; > + > +cont = (block == last_sent_block) ? RAM_SAVE_FLAG_CONTINUE : 0; > +p = memory_region_get_ram_ptr(block->mr) + offset; > + > +bytes_sent = save_block_hdr(param->file, block, offset, cont, > +RAM_SAVE_FLAG_COMPRESS_PAGE); > +blen = qemu_put_compression_data(param->file, p, TARGET_PAGE_SIZE, > + migrate_compress_level()); > +bytes_sent += blen; > +atomic_inc(&acct_info.norm_pages); > + > +return bytes_sent; > +} > + > +static inline void start_compression(CompressParam *param) > +{ > +qemu_mutex_lock(¶m->mutex); > +param->busy = true; > +qemu_cond_signal(¶m->cond); > +qemu_mutex_unlock(¶m->mutex); > +} > + > + > +static uint64_t bytes_transferred; > + > +static void flush_compressed_data(QEMUFile *f) > +{ > +int idx, len, thread_count; > + > +if (!migrate_use_compression()) { > +return; > +} > +thread_count = migrate_compress_threads(); > +for (idx = 0; idx < thread_count; idx++) { > +if (comp_param[idx].busy) { > +qemu_mutex_lock(comp_done_lock); > +while (comp_param[idx].busy) { > +qemu_cond_wait(comp_done_cond, comp_done_lock); > +} > +qemu_mutex_unlock(comp_done_lock); > +} > +len = qemu_put_qemu_file(f, comp_param[idx].file); > +bytes_transferred += len; > +} > +if ((one_byte_count > 0) && (bytes_transferred > one_byte_count)) { > +bytes_transferred -= one_byte_count; > +one_byte_count = 0; > +} > +} > + > +static inline void set_compress_params(CompressParam *param, RAMBlock *block, > + ram_addr_t offset) > +{
Re: [Qemu-devel] [v2][RFC][PATCH] virtio: uniform virtio device IDs
On Fri, 6 Feb 2015 13:41:26 +0800 Tiejun Chen wrote: > Actually we define these device IDs in virtio standard, so > we'd better put them into one common place to manage conveniently. > Here I also add VIRTIO_ID_RESERVE according to virtio spec. > > Signed-off-by: Tiejun Chen > --- > hw/9pfs/virtio-9p.h| 2 -- > include/hw/virtio/virtio-balloon.h | 3 --- > include/hw/virtio/virtio-blk.h | 3 --- > include/hw/virtio/virtio-rng.h | 3 --- > include/hw/virtio/virtio-scsi.h| 3 --- > include/hw/virtio/virtio-serial.h | 3 --- > include/hw/virtio/virtio.h | 16 > pc-bios/s390-ccw/virtio.h | 8 +--- > 8 files changed, 17 insertions(+), 24 deletions(-) > > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h > index f24997d..9ad6bb2 100644 > --- a/include/hw/virtio/virtio.h > +++ b/include/hw/virtio/virtio.h > @@ -23,6 +23,22 @@ > #include "hw/virtio/virtio-9p.h" > #endif > > +/* Refer to Linux's linux/virtio_ids.h */ Why not refer to the virtio spec instead? :) And maybe add in the ids that already have been reserved. > + > +enum virtio_dev_type { > +VIRTIO_ID_RESERVED = 0, /* invalid virtio device */ > +VIRTIO_ID_NET = 1, /* virtio net */ > +VIRTIO_ID_BLOCK = 2, /* virtio block */ > +VIRTIO_ID_CONSOLE = 3, /* virtio console */ > +VIRTIO_ID_RNG = 4, /* virtio rng */ > +VIRTIO_ID_BALLOON = 5, /* virtio balloon */ /* virtio balloon (legacy) */ > +VIRTIO_ID_RPMSG = 7, /* virtio remote processor messaging */ > +VIRTIO_ID_SCSI = 8, /* virtio scsi */ > +VIRTIO_ID_9P = 9, /* 9p virtio console */ > +VIRTIO_ID_RPROC_SERIAL = 11, /* virtio remoteproc serial link */ > +VIRTIO_ID_CAIF = 12, /* Virtio caif */ > +}; > + > /* from Linux's linux/virtio_config.h */ > > /* Status byte for guest to report progress, and synchronize features. */ > diff --git a/pc-bios/s390-ccw/virtio.h b/pc-bios/s390-ccw/virtio.h > index c23466b..2eabcb4 100644 > --- a/pc-bios/s390-ccw/virtio.h > +++ b/pc-bios/s390-ccw/virtio.h > @@ -11,6 +11,7 @@ > #ifndef VIRTIO_H > #define VIRTIO_H > > +#include "hw/virtio/virtio.h" This won't work, the bios can't use the common headers. > #include "s390-ccw.h" > > /* Status byte for guest to report progress, and synchronize features. */ > @@ -23,13 +24,6 @@ > /* We've given up on this device. */ > #define VIRTIO_CONFIG_S_FAILED 0x80 > > -enum virtio_dev_type { > -VIRTIO_ID_NET = 1, > -VIRTIO_ID_BLOCK = 2, > -VIRTIO_ID_CONSOLE = 3, > -VIRTIO_ID_BALLOON = 5, > -}; Even though this one is incomplete; but we don't need anything but the block id anyway. > - > struct virtio_dev_header { > enum virtio_dev_type type : 8; > u8 num_vq;
Re: [Qemu-devel] [v4 09/13] migration: Make compression co-work with xbzrle
* Liang Li (liang.z...@intel.com) wrote: > Now, multiple thread compression can co-work with xbzrle. when > xbzrle is on, multiple thread compression will only work at the > first round of RAM data sync. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > arch_init.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/arch_init.c b/arch_init.c > index b8bdb16..8ef0315 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -364,6 +364,7 @@ static QemuCond *comp_done_cond; > /* The empty QEMUFileOps will be used by file in CompressParam */ > static const QEMUFileOps empty_ops = { }; > static bool quit_thread; > +static bool compression_switch_on; Yes, OK; a bit of an odd name, but I can see what it's trying to do. Reviewed-by: Dr. David Alan Gilbert > static int one_byte_count; > static DecompressParam *decomp_param; > static QemuThread *decompress_threads; > @@ -440,6 +441,7 @@ void migrate_compress_threads_create(MigrationState *s) > return; > } > quit_thread = false; > +compression_switch_on = true; > thread_count = migrate_compress_threads(); > s->compress_thread = g_new0(QemuThread, thread_count); > comp_param = g_new0(CompressParam, thread_count); > @@ -957,9 +959,16 @@ static int ram_find_and_save_block(QEMUFile *f, bool > last_stage) > block = QTAILQ_FIRST(&ram_list.blocks); > complete_round = true; > ram_bulk_stage = false; > +if (migrate_use_xbzrle()) { > +/* If xbzrle is on, stop using the data compression at > this > + * point. In theory, xbzrle can do better than > compression. > + */ > +flush_compressed_data(f); > +compression_switch_on = false; > +} > } > } else { > -if (migrate_use_compression()) { > +if (compression_switch_on && migrate_use_compression()) { > bytes_sent = ram_save_compressed_page(f, block, offset, >last_stage); > } else { > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 12:53 schrieb Kevin Wolf: > Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: >> nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t >> as the length in bytes of the data to discard due to the following >> definition: >> >> struct nbd_request { >> uint32_t magic; >> uint32_t type; >> uint64_t handle; >> uint64_t from; >> uint32_t len; <-- the length of data to be discarded, in bytes >> } QEMU_PACKED; >> >> Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to >> avoid overflow. >> >> NBD read/write code uses the same structure for transfers. Fix >> max_transfer_length accordingly. >> >> Signed-off-by: Denis V. Lunev >> CC: Peter Lieven >> CC: Kevin Wolf > Thanks, I have applied both Peter's and your patch. Can you guys please > check whether the current state of my block branch is correct or whether > I forgot to include or remove some patch? Looks good from my point of view. Just to be sure has it to be if (size > BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) or if (size > (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)) ? If the latter is right, can you please fix that line in my patch. I am afk now. Thanks, Peter
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
On 06/02/15 15:07, Kevin Wolf wrote: Am 06.02.2015 um 12:59 hat Denis V. Lunev geschrieben: On 06/02/15 14:53, Kevin Wolf wrote: Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev CC: Peter Lieven CC: Kevin Wolf Thanks, I have applied both Peter's and your patch. Can you guys please check whether the current state of my block branch is correct or whether I forgot to include or remove some patch? can you give me tree URL? Sure: git: git://repo.or.cz/qemu/kevin.git block Web: http://repo.or.cz/w/qemu/kevin.git/shortlog/refs/heads/block By the way, I don't think this NBD patch is strictly necessary as you'll have a hard time finding a platform where INT_MAX > UINT32_MAX, but I think it's good documentation at least and a safeguard if we ever decide to lift the general block layer restrictions. Kevin nope, it is absolutely mandatory stdint.h: /* Limit of `size_t' type. */ # if __WORDSIZE == 64 # define SIZE_MAX (18446744073709551615UL) # else # define SIZE_MAX (4294967295U) # endif But Peter defined it like this: #define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) And having integers with more the 32 bits is at least unusual. I don't know of any platform that has them. Anyway, as I said, your patch is good documentation, so I'm happy to apply it nevertheless. Kevin I have misinterpreted this. Actually I think then the limit should be MAX() rather then MIN() as the stack is ready to size_t transfers. In the other case there is no need at all to use this construction. INT_MAX will be always less than SIZE_MAX. I do not know any platform where this is violated. Den
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
On 06/02/15 15:22, Peter Lieven wrote: Am 06.02.2015 um 13:17 schrieb Denis V. Lunev: On 06/02/15 15:07, Kevin Wolf wrote: Am 06.02.2015 um 12:59 hat Denis V. Lunev geschrieben: On 06/02/15 14:53, Kevin Wolf wrote: Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t as the length in bytes of the data to discard due to the following definition: struct nbd_request { uint32_t magic; uint32_t type; uint64_t handle; uint64_t from; uint32_t len; <-- the length of data to be discarded, in bytes } QEMU_PACKED; Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to avoid overflow. NBD read/write code uses the same structure for transfers. Fix max_transfer_length accordingly. Signed-off-by: Denis V. Lunev CC: Peter Lieven CC: Kevin Wolf Thanks, I have applied both Peter's and your patch. Can you guys please check whether the current state of my block branch is correct or whether I forgot to include or remove some patch? can you give me tree URL? Sure: git: git://repo.or.cz/qemu/kevin.git block Web: http://repo.or.cz/w/qemu/kevin.git/shortlog/refs/heads/block By the way, I don't think this NBD patch is strictly necessary as you'll have a hard time finding a platform where INT_MAX > UINT32_MAX, but I think it's good documentation at least and a safeguard if we ever decide to lift the general block layer restrictions. Kevin nope, it is absolutely mandatory stdint.h: /* Limit of `size_t' type. */ # if __WORDSIZE == 64 # define SIZE_MAX (18446744073709551615UL) # else # define SIZE_MAX (4294967295U) # endif But Peter defined it like this: #define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ INT_MAX >> BDRV_SECTOR_BITS) And having integers with more the 32 bits is at least unusual. I don't know of any platform that has them. Anyway, as I said, your patch is good documentation, so I'm happy to apply it nevertheless. Kevin I have misinterpreted this. Actually I think then the limit should be MAX() rather then MIN() as the stack is ready to size_t transfers. In the other case there is no need at all to use this construction. INT_MAX will be always less than SIZE_MAX. I do not know any platform where this is violated. That doesn't work. All internal routines have int (32-bit) as type for nb_sectors whereas size_t is often 64-bit. I also think that INT_MAX is always less than SIZE_MAX, but I would leave it in just to be absolutely sure. Its evaluated at compile time and will not hurt. Peter OK :) let it be. Staying on safe side is always good. Kevin, all my stuff we have agreed on is applied. Regards, Den
Re: [Qemu-devel] [v4 10/13] migration: Add the core code for decompression
* Liang Li (liang.z...@intel.com) wrote: > Implement the core logic of multiple thread decompression, > the decompression can work now. > > Signed-off-by: Liang Li > Signed-off-by: Yang Zhang > --- > arch_init.c | 49 +++-- > 1 file changed, 47 insertions(+), 2 deletions(-) > > diff --git a/arch_init.c b/arch_init.c > index 8ef0315..549fdbb 100644 > --- a/arch_init.c > +++ b/arch_init.c > @@ -814,6 +814,13 @@ static inline void start_compression(CompressParam > *param) > qemu_mutex_unlock(¶m->mutex); > } > > +static inline void start_decompression(DecompressParam *param) > +{ > +qemu_mutex_lock(¶m->mutex); > +param->busy = true; > +qemu_cond_signal(¶m->cond); > +qemu_mutex_unlock(¶m->mutex); > +} > > static uint64_t bytes_transferred; > > @@ -1347,8 +1354,27 @@ void ram_handle_compressed(void *host, uint8_t ch, > uint64_t size) > > static void *do_data_decompress(void *opaque) > { > +DecompressParam *param = opaque; > +size_t pagesize; > + > while (!quit_thread) { > -/* To be done */ > +qemu_mutex_lock(¶m->mutex); > +while (!param->busy) { > +qemu_cond_wait(¶m->cond, ¶m->mutex); > +if (quit_thread) { > +break; > +} > +pagesize = TARGET_PAGE_SIZE; > +/* uncompress() will return failed in some case, especially > + * when the page is dirted when doing the compression, it's > + * not a problem because the dirty page will be retransferred > + * and uncompress() won't break the data in other pages. > + */ > +uncompress((Bytef *)param->des, &pagesize, > + (const Bytef *)param->compbuf, param->len); > +param->busy = false; > +} > +qemu_mutex_unlock(¶m->mutex); > } Again, similar question to the compress thread; what ensures that teh cond_signal for quit_thread happens while this loop is at the cond_wait? Again mainly in error cases. Dave > > return NULL; > @@ -1379,6 +1405,9 @@ void migrate_decompress_threads_join(void) > quit_thread = true; > thread_count = migrate_decompress_threads(); > for (i = 0; i < thread_count; i++) { > +qemu_cond_signal(&decomp_param[i].cond); > +} > +for (i = 0; i < thread_count; i++) { > qemu_thread_join(decompress_threads + i); > qemu_mutex_destroy(&decomp_param[i].mutex); > qemu_cond_destroy(&decomp_param[i].cond); > @@ -1395,7 +1424,23 @@ void migrate_decompress_threads_join(void) > static void decompress_data_with_multi_threads(uint8_t *compbuf, > void *host, int len) > { > -/* To be done */ > +int idx, thread_count; > + > +thread_count = migrate_decompress_threads(); > +while (true) { > +for (idx = 0; idx < thread_count; idx++) { > +if (!decomp_param[idx].busy) { > +memcpy(decomp_param[idx].compbuf, compbuf, len); > +decomp_param[idx].des = host; > +decomp_param[idx].len = len; > +start_decompression(&decomp_param[idx]); > +break; > +} > +} > +if (idx < thread_count) { > +break; > +} > +} > } > > static int ram_load(QEMUFile *f, void *opaque, int version_id) > -- > 1.9.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 13:17 schrieb Denis V. Lunev: > On 06/02/15 15:07, Kevin Wolf wrote: >> Am 06.02.2015 um 12:59 hat Denis V. Lunev geschrieben: >>> On 06/02/15 14:53, Kevin Wolf wrote: Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: > nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t > as the length in bytes of the data to discard due to the following > definition: > > struct nbd_request { > uint32_t magic; > uint32_t type; > uint64_t handle; > uint64_t from; > uint32_t len; <-- the length of data to be discarded, in bytes > } QEMU_PACKED; > > Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to > avoid overflow. > > NBD read/write code uses the same structure for transfers. Fix > max_transfer_length accordingly. > > Signed-off-by: Denis V. Lunev > CC: Peter Lieven > CC: Kevin Wolf Thanks, I have applied both Peter's and your patch. Can you guys please check whether the current state of my block branch is correct or whether I forgot to include or remove some patch? >>> can you give me tree URL? >> Sure: >> >> git: git://repo.or.cz/qemu/kevin.git block >> Web: http://repo.or.cz/w/qemu/kevin.git/shortlog/refs/heads/block >> By the way, I don't think this NBD patch is strictly necessary as you'll have a hard time finding a platform where INT_MAX > UINT32_MAX, but I think it's good documentation at least and a safeguard if we ever decide to lift the general block layer restrictions. Kevin >>> nope, it is absolutely mandatory >>> >>> stdint.h: >>> >>> /* Limit of `size_t' type. */ >>> # if __WORDSIZE == 64 >>> # define SIZE_MAX (18446744073709551615UL) >>> # else >>> # define SIZE_MAX (4294967295U) >>> # endif >> But Peter defined it like this: >> >> #define BDRV_REQUEST_MAX_SECTORS MIN(SIZE_MAX >> BDRV_SECTOR_BITS, \ >> INT_MAX >> BDRV_SECTOR_BITS) >> >> And having integers with more the 32 bits is at least unusual. I don't >> know of any platform that has them. >> >> Anyway, as I said, your patch is good documentation, so I'm happy to >> apply it nevertheless. >> >> Kevin > I have misinterpreted this. > > Actually I think then the limit should be MAX() rather then MIN() > as the stack is ready to size_t transfers. In the other case > there is no need at all to use this construction. INT_MAX will be > always less than SIZE_MAX. I do not know any platform > where this is violated. That doesn't work. All internal routines have int (32-bit) as type for nb_sectors whereas size_t is often 64-bit. I also think that INT_MAX is always less than SIZE_MAX, but I would leave it in just to be absolutely sure. Its evaluated at compile time and will not hurt. Peter
Re: [Qemu-devel] [PATCH 1/1] nbd: fix max_discard/max_transfer_length
Am 06.02.2015 um 13:16 hat Peter Lieven geschrieben: > Am 06.02.2015 um 12:53 schrieb Kevin Wolf: > > Am 06.02.2015 um 12:24 hat Denis V. Lunev geschrieben: > >> nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t > >> as the length in bytes of the data to discard due to the following > >> definition: > >> > >> struct nbd_request { > >> uint32_t magic; > >> uint32_t type; > >> uint64_t handle; > >> uint64_t from; > >> uint32_t len; <-- the length of data to be discarded, in bytes > >> } QEMU_PACKED; > >> > >> Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to > >> avoid overflow. > >> > >> NBD read/write code uses the same structure for transfers. Fix > >> max_transfer_length accordingly. > >> > >> Signed-off-by: Denis V. Lunev > >> CC: Peter Lieven > >> CC: Kevin Wolf > > Thanks, I have applied both Peter's and your patch. Can you guys please > > check whether the current state of my block branch is correct or whether > > I forgot to include or remove some patch? > > Looks good from my point of view. > > Just to be sure has it to be > > if (size > BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS) > > or > > if (size > (BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS)) > > ? > > If the latter is right, can you please fix that line in my patch. I am afk > now. Both versions are correct. Kevin
Re: [Qemu-devel] [PATCH] blkdebug: fix "once" rule
On 2015-02-05 at 19:28, John Snow wrote: On 02/05/2015 06:43 PM, Max Reitz wrote: On 2015-02-05 at 18:15, John Snow wrote: The blkdebug scripts are currently engineered so that when a debug event occurs, a prefilter browses a master list of parsed rules for a certain event and adds them to an "active list" of rules to be used for the forthcoming action, provided the events and state numbers match. Then, once the request is received, the last active rule is used to inject an error if certain parameters match. This active list is cleared every time the prefilter injects a new rule for the first time during a debug event. The "once" rule currently causes the error injection, if it is triggered, to only clear the active list. This is insufficient for preventing future injections of the same rule. This patch /deletes/ the rule from the list that the prefilter browses, so it is gone for good. Lastly, this impacts iotests 026. Several ENOSPC tests that used "once" can be seen to have output that shows multiple failure messages. After this patch, the error messages tend to be smaller and less severe, but the injection can still be seen to be working. Patch the expected output to expect the smaller error messages. Signed-off-by: John Snow --- block/blkdebug.c | 9 + tests/qemu-iotests/026.out | 24 2 files changed, 9 insertions(+), 24 deletions(-) diff --git a/block/blkdebug.c b/block/blkdebug.c index 9ce35cd..d30c44c 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -473,14 +473,15 @@ static BlockAIOCB *inject_error(BlockDriverState *bs, struct BlkdebugAIOCB *acb; QEMUBH *bh; -if (rule->options.inject.once) { -QSIMPLEQ_INIT(&s->active_rules); -} - if (rule->options.inject.immediately) { return NULL; } +if (rule->options.inject.once) { +QSIMPLEQ_INIT(&s->active_rules); This will still delete all other currently active rules, which I don't think is what is intended. Case in point why this is not really right: $ ./qemu-img create -f qcow2 test.qcow2 64M Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off $ ./qemu-io -c 'aio_write 0 64k' "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio'},{'event':'write_aio','state':42,'once':true}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}" aio_write failed: Input/output error Failed to flush the L2 table cache: Input/output error Failed to flush the refcount block cache: Input/output error So, here we get three errors; the non-once rule stayed active. Now let's just turn the rules around: $ ./qemu-img create -f qcow2 test.qcow2 64M Formatting 'test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off $ ./qemu-io -c 'aio_write 0 64k' "json:{'driver':'qcow2','file':{'driver':'blkdebug','image':{'driver':'file','filename':'test.qcow2'},'inject-error':[{'event':'write_aio','state':42,'once':true},{'event':'write_aio'}],'set-state':[{'event':'cluster_alloc','new_state':42}]}}" aio_write failed: Input/output error Hmmm... So what happens is that in the first case, the non-"once" rule had precedence, so the list of active rules was not changed. In the second case, however, the "once" rule had precedence, therefore all active rules (including the non-"once" rule) were deleted. I think it's fine that the "once" rule does not get deleted in the first case, but whether the non-"once" rule stays active should not be influenced by the "once" rule. Note that this issue was not introduced by this patch, but I think this patch should fix it. :-) I was really trying to avoid touching how weird the blkdebug rules look and just make my own test happier, but you've forced my hand, villain! Max +remove_rule(rule); +} + OK, this gets sort of weird, because we need to think about the lifetimes of certain errors, and what we actually want to have happen. Here's what your first script does: blkdebug_debug_event -- [reftable_load] blkdebug_debug_event -- [l2_load] blkdebug_debug_event -- [cluster_alloc] Setting new_state: 42 blkdebug_debug_event -- [refblock_alloc] blkdebug_debug_event -- [refblock_load] blkdebug_debug_event -- [write_aio] Clearing active_rules ... Adding a rule for [write_aio] Adding a rule for [write_aio] blkdebug_debug_event -- [pwritev] An error is being injected for aio_writev. blkdebug_debug_event -- [pwritev_done] aio_write failed: Input/output error blkdebug_debug_event -- [flush_to_os] An error is being injected for aio_flush. An error is being injected for aio_flush. blkdebug_debug_event -- [refblock_update_part] blkdebug_debug_event -- [pwritev] An error is being injected for aio_writev. blkdebug_debug_event -- [pwritev_done] Failed to flush the L2 table cache: Input/output error Failed to flush the refcount block cach
Re: [Qemu-devel] value of VIRTQUEUE_MAX_SIZE
On 02/06/2015 03:45 AM, Peter Lieven wrote: >>> The idea was that IOV_MAX is the limit in case of at least virtio-blk. The >>> host >>> will not support more than IOV_MAX iovecs passed to a block request. >> Is there an issue in practice? > > If there is no platform where IOV_MAX is less than 1024 then not. POSIX allows for a system with IOV_MAX as small as 16. But these days, systems we care about probably have a much larger value; alas, I don't know off-hand what actual systems use to say if 1024 is a common minimum. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [RFC v2 2/8] vfio-pci: add aer capability support
On Fri, 2015-02-06 at 15:03 +0800, Chen Fan wrote: > On 02/03/2015 04:15 AM, Alex Williamson wrote: > > On Wed, 2015-01-28 at 16:37 +0800, Chen Fan wrote: > >> if we detect the aer capability in vfio device, then > >> we should initialize the vfio device aer rigister bits. > >> so guest OS can set this bits as needed. > > s/rigister/register/ > > > >> Signed-off-by: Chen Fan > >> --- > >> hw/vfio/pci.c | 72 > >> +++ > >> 1 file changed, 72 insertions(+) > >> > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index 014a92c..2072261 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -2670,6 +2670,73 @@ static int vfio_add_capabilities(VFIOPCIDevice > >> *vdev) > >> return vfio_add_std_cap(vdev, pdev->config[PCI_CAPABILITY_LIST]); > >> } > >> > >> +static void vfio_pci_aer_init(VFIOPCIDevice *vdev) > >> +{ > >> +PCIDevice *pdev = &vdev->pdev; > >> +PCIExpressDevice *exp = &pdev->exp; > >> +uint16_t offset = exp->aer_cap; > >> + > >> +if (!offset) { > >> +return; > >> +} > >> + > > All of these need to be documented with comments. > > > >> +memset(pdev->wmask + offset, 0, PCI_ERR_SIZEOF); > >> +memset(pdev->w1cmask + offset, 0, PCI_ERR_SIZEOF); > >> +memset(pdev->cmask + offset, 0xFF, PCI_ERR_SIZEOF); > >> + > >> +pci_long_test_and_set_mask(pdev->wmask + exp->exp_cap + > >> PCI_EXP_DEVCTL, > >> + PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE > >> | > >> + PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE); > >> +pci_long_test_and_set_mask(pdev->w1cmask + exp->exp_cap + > >> PCI_EXP_DEVSTA, > >> + PCI_EXP_DEVSTA_CED | PCI_EXP_DEVSTA_NFED | > >> + PCI_EXP_DEVSTA_FED | PCI_EXP_DEVSTA_URD); > >> + > >> +pci_set_long(pdev->w1cmask + offset + PCI_ERR_UNCOR_STATUS, > >> + PCI_ERR_UNC_SUPPORTED); > >> +pci_set_long(pdev->wmask + offset + PCI_ERR_UNCOR_SEVER, > >> + PCI_ERR_UNC_SUPPORTED); > >> +pci_long_test_and_set_mask(pdev->w1cmask + offset + > >> PCI_ERR_COR_STATUS, > >> + PCI_ERR_COR_STATUS); > >> +pci_set_long(pdev->wmask + offset + PCI_ERR_COR_MASK, > >> + PCI_ERR_COR_SUPPORTED); > >> + > >> +pci_set_long(pdev->wmask + offset + PCI_ERR_CAP, > >> + PCI_ERR_CAP_ECRC_GENE | PCI_ERR_CAP_ECRC_CHKE | > >> + PCI_ERR_CAP_MHRE); > >> +} > >> + > >> +static int vfio_add_ext_capabilities(VFIOPCIDevice *vdev) > >> +{ > >> +PCIDevice *pdev = &vdev->pdev; > >> +PCIExpressDevice *exp; > >> +uint32_t header; > >> +uint16_t next = PCI_CONFIG_SPACE_SIZE; > >> + > >> +if (pci_config_size(pdev) <= PCI_CONFIG_SPACE_SIZE) { > >> +return 0; > >> +} > >> + > >> +header = pci_get_long(pdev->config + next); > >> +while (header) { > >> +switch (PCI_EXT_CAP_ID(header)) { > >> +case PCI_EXT_CAP_ID_ERR: > >> + exp = &pdev->exp; > >> + exp->aer_cap = next; > > Shouldn't we call pcie_aer_init() here? > I am afraid pcie_aer_init() maybe impact the corresponding values > in pdev->config. Then we should do cleanup after calling pcie_aer_init() or modify pcie_aer_init() to do what we need. Doing raw aer manipulation outside of core support just causes maintenance issues down the road. Thanks, Alex > >> + > >> + vfio_pci_aer_init(vdev); > >> + break; > >> +}; > >> + > >> +next = PCI_EXT_CAP_NEXT(header); > >> +if (!next) { > >> +return 0; > >> +} > >> +header = pci_get_long(pdev->config + next); > > I'd like to see this look more like vfio_add_std_cap(), registering > > every capability with the QEMU PCIe-core and setting up emulation to > > allow QEMU to skip capabilities that it doesn't want to expose. > > > >> +} > >> + > >> +return 0; > >> +} > >> + > >> static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) > >> { > >> PCIDevice *pdev = &vdev->pdev; > >> @@ -3296,6 +3363,11 @@ static int vfio_initfn(PCIDevice *pdev) > >> goto out_teardown; > >> } > >> > >> +ret = vfio_add_ext_capabilities(vdev); > >> +if (ret) { > >> +goto out_teardown; > >> +} > >> + > > Why not extend vfio_add_capabilities()? It specifically calls > > vfio_add_std_cap() in order that there could be a vfio_add_ext_cap(). > it is a good idea. > > Thanks, > Chen > > > > > >> /* QEMU emulates all of MSI & MSIX */ > >> if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { > >> memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, > > > > > > . > > >
[Qemu-devel] [PULL 1/1] trace: Print PID and time in stderr traces
From: "Dr. David Alan Gilbert" When debugging migration it's useful to know the PID of each trace message so you can figure out if it came from the source or the destination. Printing the time makes it easy to do latency measurements or timings between trace points. Signed-off-by: Dr. David Alan Gilbert Message-id: 1421746875-9962-1-git-send-email-dgilb...@redhat.com Signed-off-by: Stefan Hajnoczi --- scripts/tracetool/backend/stderr.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/scripts/tracetool/backend/stderr.py b/scripts/tracetool/backend/stderr.py index 2a1e906..ca58054 100644 --- a/scripts/tracetool/backend/stderr.py +++ b/scripts/tracetool/backend/stderr.py @@ -21,6 +21,9 @@ PUBLIC = True def generate_h_begin(events): out('#include ', +'#include ', +'#include ', +'#include ', '#include "trace/control.h"', '') @@ -31,7 +34,12 @@ def generate_h(event): argnames = ", " + argnames out('if (trace_event_get_state(%(event_id)s)) {', -'fprintf(stderr, "%(name)s " %(fmt)s "\\n" %(argnames)s);', +'struct timeval _now;', +'gettimeofday(&_now, NULL);', +'fprintf(stderr, "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",', +'getpid(),', +'(size_t)_now.tv_sec, (size_t)_now.tv_usec', +'%(argnames)s);', '}', event_id="TRACE_" + event.name.upper(), name=event.name, -- 2.1.0
[Qemu-devel] [PULL 0/1] Tracing patches
The following changes since commit 16017c48547960539fcadb1f91d252124f442482: softfloat: Clarify license status (2015-01-29 16:45:45 +) are available in the git repository at: git://github.com/stefanha/qemu.git tags/tracing-pull-request for you to fetch changes up to dd9fe29c80b8a35f12d98928a97be3aded80cf69: trace: Print PID and time in stderr traces (2015-02-06 10:27:22 +) Dr. David Alan Gilbert (1): trace: Print PID and time in stderr traces scripts/tracetool/backend/stderr.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) -- 2.1.0
Re: [Qemu-devel] [PATCH 2/2] e1000: unconditionally enable bus mastering
On Wed, Jan 07, 2015 at 04:08:29PM +, Stefan Hajnoczi wrote: > On Thu, Dec 18, 2014 at 05:22:19PM +0800, Amos Kong wrote: > > After enabled network debug of e1000 in Win2012-64r2 guest, > > Bus mastering of e1000 can't be enabled by e1000 driver. It > > caused guest can't get IP address. > > > > # bcdedit /debug on > > # bcdedit /dbgsettings net hostip:192.168.122.100 port:5 > > (We can use non-existed IP here, it's just used to pass the > >setup, not really use it) > > > > If we disable debug function, e1000 driver can enable bus > > mastering bit successfully, guest network is fine. > > > > This patch changed e1000 backend to enalbe bus mastering > > unconditionally as a workaround. > > > > Signed-off-by: Amos Kong > > --- > > hw/net/e1000.c | 11 +-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/hw/net/e1000.c b/hw/net/e1000.c > > index ec9224b..82829ae 100644 > > --- a/hw/net/e1000.c > > +++ b/hw/net/e1000.c > > @@ -1544,8 +1544,15 @@ static void e1000_write_config(PCIDevice *pci_dev, > > uint32_t address, > > > > pci_default_write_config(pci_dev, address, val, len); > > > > -if (range_covers_byte(address, len, PCI_COMMAND) && > > -(pci_dev->config[PCI_COMMAND] & PCI_COMMAND_MASTER)) { > > +if (range_covers_byte(address, len, PCI_COMMAND)) { > > +/* > > + * Some guest (eg: Win2012-64r2) doesn't enable bus mastering > > + * correctly, it caused guest network down. So we unconditionally > > + * enable PCI bus mastering and BM memory region for e1000 as > > + * a workaround. > > + */ > > +pci_dev->config[PCI_COMMAND] |= PCI_COMMAND_MASTER; > > +memory_region_set_enabled(&pci_dev->bus_master_enable_region, > > true); > > qemu_flush_queued_packets(qemu_get_queue(s->nic)); > > start_xmit(s); > > } > > This is weird. > > Are you sure there's not some guest behavior missing like the NIC option > ROM leaving bus mastering enabled after the BIOS/EFI has booted, causing > Windows debug to work on physical machines? > > Before we merge a hack like this we should understand the problem 100%. Any new insights into what is going on here? Stefan pgpLTUbKHBFLH.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH] block: Give always priority to unused entries in the qcow2 L2 cache
On Fri, Feb 06, 2015 at 12:18:07PM +0100, Kevin Wolf wrote: > > I wanted to measure the actual impact of this, so I modified the > > current code to implement a quick and dirty LRU algorithm and made > > some numbers. > > Cool, thanks. I think switching to an LRU algorithm is an option > that might make sense for the general case, so this is something to > consider. I'm anyway sharing the implementation in case you want to give it a look. > I was going to suggest a test run for a different scenario, like > with many interleaved sequential reads, but thinking a bit more > about how to do it so that the eviction algorithm could make a > significant change, it occurred to me that it might not be all that > easy (and therefore possibly not practically relevant either). Yes, that was exactly my point. Thanks a lot for your comments and quick review, Berto diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index b115549..0dd0676 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -28,11 +28,11 @@ #include "trace.h" typedef struct Qcow2CachedTable { -void* table; -int64_t offset; -booldirty; -int cache_hits; -int ref; +void*table; +int64_t offset; +bool dirty; +unsigned lru_count; +int ref; } Qcow2CachedTable; struct Qcow2Cache { @@ -40,6 +40,7 @@ struct Qcow2Cache { struct Qcow2Cache* depends; int size; booldepends_on_flush; +unsignedmax_lru_count; }; Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables) @@ -228,16 +229,18 @@ int qcow2_cache_empty(BlockDriverState *bs, Qcow2Cache *c) for (i = 0; i < c->size; i++) { assert(c->entries[i].ref == 0); c->entries[i].offset = 0; -c->entries[i].cache_hits = 0; +c->entries[i].lru_count = 0; } +c->max_lru_count = 0; + return 0; } static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c) { int i; -int min_count = INT_MAX; +int min_count = UINT_MAX; int min_index = -1; @@ -246,15 +249,9 @@ static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c) continue; } -if (c->entries[i].cache_hits < min_count) { +if (c->entries[i].lru_count < min_count) { min_index = i; -min_count = c->entries[i].cache_hits; -} - -/* Give newer hits priority */ -/* TODO Check how to optimize the replacement strategy */ -if (c->entries[i].cache_hits > 1) { -c->entries[i].cache_hits /= 2; +min_count = c->entries[i].lru_count; } } @@ -310,17 +307,26 @@ static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, } } -/* Give the table some hits for the start so that it won't be replaced - * immediately. The number 32 is completely arbitrary. */ -c->entries[i].cache_hits = 32; +c->entries[i].lru_count = ++c->max_lru_count; c->entries[i].offset = offset; /* And return the right table */ found: -c->entries[i].cache_hits++; +if (c->entries[i].lru_count < c->max_lru_count) { +c->entries[i].lru_count = ++c->max_lru_count; +} c->entries[i].ref++; *table = c->entries[i].table; +/* Reset LRU counters before they overflow */ +if (c->max_lru_count == UINT_MAX) { +unsigned f; +for (f = 0; f < c->size; f++) { +c->entries[f].lru_count = 0; +} +c->max_lru_count = 0; +} + trace_qcow2_cache_get_done(qemu_coroutine_self(), c == s->l2_table_cache, i);
Re: [Qemu-devel] [PATCH] net: synchronize net_host_device_remove with host_net_remove_completion
On Mon, Jan 19, 2015 at 12:27:11PM +0100, Paolo Bonzini wrote: > On 02/01/2015 17:20, Paolo Bonzini wrote: > >> > > >> > The assert can be dropped completely since the code already has an > >> > equivalent assert: > >> > > >> > queues = qemu_find_net_clients_except(nc->name, ncs, > >> > NET_CLIENT_OPTIONS_KIND_NIC, > >> > MAX_QUEUE_NUM); > >> > assert(queues != 0); <-- fail if type == NET_CLIENT_OPTIONS_KIND_NIC > > I left it on purpose for documentation, but I'll send v2 next week that > > removes it. > > Actually it's not the same. If you have "-netdev user,id=e1000 -device > e1000,netdev=e1000" you will be able to call qemu_del_net_client on the > NIC, and it will _not_ fail if the assertion is removed. I don't follow. If you call qemu_del_net_client(e1000_nic) then qemu_find_net_clients_except(nc->name, ncs, NET_CLIENT_OPTIONS_KIND_NIC, MAX_QUEUE_NUM) returns 0. This causes the assert(queues != 0) to fail. Can you explain the scenario? Stefan pgpP4HGAqR2jE.pgp Description: PGP signature
[Qemu-devel] [PATCH] bitops.h: sextract64() return type should be int64_t, not uint64_t
The documentation for sextract64() claims that the return type is an int64_t, but the code itself disagrees. Fix the return type to conform to the documentation and to bring it into line with sextract32(), which returns int32_t. Signed-off-by: Peter Maydell --- I quickly eyeballed all the callers and I don't think any of them were relying on the unsignedness of the return type. --- include/qemu/bitops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index 181bd46..90ca8df 100644 --- a/include/qemu/bitops.h +++ b/include/qemu/bitops.h @@ -354,7 +354,7 @@ static inline int32_t sextract32(uint32_t value, int start, int length) * Returns: the sign extended value of the bit field extracted from the * input value. */ -static inline uint64_t sextract64(uint64_t value, int start, int length) +static inline int64_t sextract64(uint64_t value, int start, int length) { assert(start >= 0 && length > 0 && length <= 64 - start); /* Note that this implementation relies on right shift of signed -- 1.9.1
Re: [Qemu-devel] [PATCH] rtl8139: simplify timer logic
On Tue, Jan 20, 2015 at 03:44:59PM +0100, Paolo Bonzini wrote: > Pavel Dovgalyuk reports that TimerExpire and the timer are not restored > correctly on the receiving end of migration. > > It is not clear to me whether this is really the case, but we can take > the occasion to get rid of the complicated code that computes PCSTimeout > on the fly upon changes to IntrStatus/IntrMask. Just always keep a > timer running, it will fire every ~130 seconds at most if the interrupt > is masked with TimerInt != 0. > > This makes rtl8139_set_next_tctr_time idempotent (when the virtual clock > is stopped between two calls, as is the case during migration). > > Tested with Frediano's qtest. > > Signed-off-by: Paolo Bonzini > --- > hw/net/rtl8139.c | 77 > > 1 file changed, 27 insertions(+), 50 deletions(-) Thanks, applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan pgpR5FLpNL5WQ.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 3/4] net: del hub port when peer is deleted
On Fri, Feb 06, 2015 at 09:37:54AM +0008, Jason Wang wrote: > On Thu, Feb 5, 2015 at 10:25 PM, Stefan Hajnoczi wrote: > >On Mon, Feb 02, 2015 at 03:06:37PM +0800, Jason Wang wrote: > >> We should del hub port when peer is deleted since it will not be reused > >> and will only be freed during exit. > >> Signed-off-by: Jason Wang > >> --- > >> net/net.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> diff --git a/net/net.c b/net/net.c > >> index 7acc162..74e651e 100644 > >> --- a/net/net.c > >> +++ b/net/net.c > >> @@ -996,6 +996,8 @@ void net_host_device_remove(Monitor *mon, const > >>QDict *qdict) > >> error_report("invalid host network device '%s'", device); > >> return; > >> } > >> + > >> +qemu_del_net_client(nc->peer); > >> qemu_del_net_client(nc); > >> } > > > >If qmp_netdev_del() is used the hub port will stay alive. > > This is true if it has a peer. And the port will be freed during the > deletion of its peer. If no peer, it will be deleted soon. This is > consistent with the behaviors of other type of netdevs. > > > >Should the peer deletion happen in qemu_del_net_client(), similar to the > >existing NIC peer check? > > > > /* If there is a peer NIC, delete and cleanup client, but do not free. > >*/ > > if (nc->peer && nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) { > > > >This way the hub port is consistently deleted when its peer is deleted. > > Not sure, but if management always do netdev_del after device_del, it will > get an error. Okay, I will merge this patch as-is. Stefan pgpYgR8mfhiN4.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 1/4] monitor: print hub port name during info network
On Mon, Feb 02, 2015 at 03:06:35PM +0800, Jason Wang wrote: > Signed-off-by: Jason Wang > --- > net/hub.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) Thanks, applied to my net tree: https://github.com/stefanha/qemu/commits/net Stefan pgpS_3oP0TnC2.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v2] qcow2: Rewrite qcow2_alloc_bytes()
Am 05.02.2015 um 16:58 hat Max Reitz geschrieben: > qcow2_alloc_bytes() is a function with insufficient error handling and > an unnecessary goto. This patch rewrites it. > > Signed-off-by: Max Reitz > --- > v2: > - s/free_cluster_index/free_byte_index/ [Eric] > - added an assertion at the start of the function that > s->free_byte_offset is either 0 or points to the tail of a cluster > (but never to the start) > - use ROUND_UP() instead of start_of_cluster() + cluster_size [Eric] > - added an assertion that s->free_byte_offset is set before using it > [Eric] > --- > block/qcow2-refcount.c | 77 > +- > 1 file changed, 45 insertions(+), 32 deletions(-) > > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index 9afdb40..eede60d 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -759,46 +759,51 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, > uint64_t offset, > int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size) > { > BDRVQcowState *s = bs->opaque; > -int64_t offset, cluster_offset; > -int free_in_cluster; > +int64_t offset, new_cluster = 0, cluster_end; > +size_t free_in_cluster; > > BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES); > assert(size > 0 && size <= s->cluster_size); > -if (s->free_byte_offset == 0) { > -offset = qcow2_alloc_clusters(bs, s->cluster_size); > -if (offset < 0) { > -return offset; > +assert(!s->free_byte_offset || offset_into_cluster(s, > s->free_byte_offset)); > + > +if (s->free_byte_offset) { > +int refcount = qcow2_get_refcount(bs, > +s->free_byte_offset >> s->cluster_bits); > +if (refcount < 0) { > +return refcount; > +} > + > +if (refcount == 0x) { > +s->free_byte_offset = 0; > } > -s->free_byte_offset = offset; > } > - redo: > + > free_in_cluster = s->cluster_size - > offset_into_cluster(s, s->free_byte_offset); > -if (size <= free_in_cluster) { > -/* enough space in current cluster */ > -offset = s->free_byte_offset; > -s->free_byte_offset += size; > -free_in_cluster -= size; > -if (free_in_cluster == 0) > -s->free_byte_offset = 0; > -if (offset_into_cluster(s, offset) != 0) > -qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1, > - QCOW2_DISCARD_NEVER); > -} else { > -offset = qcow2_alloc_clusters(bs, s->cluster_size); > -if (offset < 0) { > -return offset; > + > +if (!s->free_byte_offset || free_in_cluster < size) { > +new_cluster = qcow2_alloc_clusters(bs, s->cluster_size); The code could perhaps become a bit nicer if you used alloc_clusters_noref() here... > +if (new_cluster < 0) { > +return new_cluster; > +} > + > +cluster_end = ROUND_UP(s->free_byte_offset, s->cluster_size); > +if (!s->free_byte_offset || cluster_end != new_cluster) { > +s->free_byte_offset = new_cluster; > } > -cluster_offset = start_of_cluster(s, s->free_byte_offset); > -if ((cluster_offset + s->cluster_size) == offset) { > -/* we are lucky: contiguous data */ > -offset = s->free_byte_offset; > -qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1, > - QCOW2_DISCARD_NEVER); > -s->free_byte_offset += size; > -} else { > -s->free_byte_offset = offset; > -goto redo; > +} > + > +assert(s->free_byte_offset); > +if (offset_into_cluster(s, s->free_byte_offset)) { ...because this block could become unconditional then, ... > +int ret = qcow2_update_cluster_refcount(bs, > +s->free_byte_offset >> s->cluster_bits, 1, > +QCOW2_DISCARD_NEVER); ...here you could use update_refcount() with the actual byte count (which also avoids two unnecessary shifts)... > +if (ret < 0) { > +if (new_cluster > 0) { > +qcow2_free_clusters(bs, new_cluster, s->cluster_size, > +QCOW2_DISCARD_OTHER); > +} ...and this part wouldn't be needed because update_refcount() already tries to fail atomically. > +return ret; > } > } > > @@ -807,6 +812,14 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size) > * be flushed before the caller's L2 table updates. > */ It would also simplify the two lines of this comment that aren't in the patch context any more. ;-) > qcow2_cache_set_dependency(bs, s->l2_table_cache, > s->refcount_block_cache); > + > +offset = s->free_byte_offset; > + > +s->free_byte_offset += size; > +if (!offset_into_cluster(s, s->free_byte_offset)) { > +
Re: [Qemu-devel] [PATCH] block: Give always priority to unused entries in the qcow2 L2 cache
Am 05.02.2015 um 13:55 hat Alberto Garcia geschrieben: > The current algorithm to replace entries from the L2 cache gives > priority to newer hits by dividing the hit count of all existing > entries by two everytime there is a cache miss. > > However, if there are several cache misses the hit count of the > existing entries can easily go down to 0. This will result in those > entries being replaced even when there are others that have never been > used. > > This problem is more noticeable with larger disk images and cache > sizes, since the chances of having several misses before the cache is > full are higher. > > If we make sure that the hit count can never go down to 0 again, > unused entries will always have priority. > > Signed-off-by: Alberto Garcia Thanks, applied to the block branch. Kevin
[Qemu-devel] [PULL 0/6] Net patches
The following changes since commit 16017c48547960539fcadb1f91d252124f442482: softfloat: Clarify license status (2015-01-29 16:45:45 +) are available in the git repository at: git://github.com/stefanha/qemu.git tags/net-pull-request for you to fetch changes up to 2c4681f512822b4aa35371683e164d4818f21dce: monitor: more accurate completion for host_net_remove() (2015-02-06 14:06:45 +) Jason Wang (4): monitor: print hub port name during info network net: remove the wrong comment in net_init_hubport() net: del hub port when peer is deleted monitor: more accurate completion for host_net_remove() Paolo Bonzini (1): rtl8139: simplify timer logic Stefan Hajnoczi (1): MAINTAINERS: add Jason Wang as net subsystem maintainer MAINTAINERS | 1 + hw/net/rtl8139.c | 77 monitor.c| 5 net/hub.c| 6 +++-- net/net.c| 2 ++ 5 files changed, 39 insertions(+), 52 deletions(-) -- 2.1.0
[Qemu-devel] [PULL 3/6] monitor: print hub port name during info network
From: Jason Wang Signed-off-by: Jason Wang Message-id: 1422860798-17495-1-git-send-email-jasow...@redhat.com Signed-off-by: Stefan Hajnoczi --- net/hub.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/hub.c b/net/hub.c index 7e0f2d6..ef09d5f 100644 --- a/net/hub.c +++ b/net/hub.c @@ -245,9 +245,12 @@ void net_hub_info(Monitor *mon) QLIST_FOREACH(hub, &hubs, next) { monitor_printf(mon, "hub %d\n", hub->id); QLIST_FOREACH(port, &hub->ports, next) { +monitor_printf(mon, " \\ %s", port->nc.name); if (port->nc.peer) { -monitor_printf(mon, " \\ "); +monitor_printf(mon, ": "); print_net_client(mon, port->nc.peer); +} else { +monitor_printf(mon, "\n"); } } } -- 2.1.0
[Qemu-devel] [PULL 1/6] MAINTAINERS: add Jason Wang as net subsystem maintainer
Jason Wang will be co-maintaining the QEMU net subsystem with me. He has contributed improvements and reviewed patches over the past years as part of working on virtio-net and virtualized networking. Jason has already been backing me up with patch reviews. For the time being I will continue to submit pull requests. Signed-off-by: Stefan Hajnoczi Acked-by: Michael S. Tsirkin Acked-by: Jason Wang --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index fd335a4..cd1052f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -858,6 +858,7 @@ T: git git://repo.or.cz/qemu/qmp-unstable.git queue/qmp Network device layer M: Anthony Liguori M: Stefan Hajnoczi +M: Jason Wang S: Maintained F: net/ T: git git://github.com/stefanha/qemu.git net -- 2.1.0
[Qemu-devel] [PULL 4/6] net: remove the wrong comment in net_init_hubport()
From: Jason Wang Not only nic could be the one to peer. Signed-off-by: Jason Wang Message-id: 1422860798-17495-2-git-send-email-jasow...@redhat.com Signed-off-by: Stefan Hajnoczi --- net/hub.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/hub.c b/net/hub.c index ef09d5f..2b60ab9 100644 --- a/net/hub.c +++ b/net/hub.c @@ -288,7 +288,6 @@ int net_init_hubport(const NetClientOptions *opts, const char *name, assert(opts->kind == NET_CLIENT_OPTIONS_KIND_HUBPORT); hubport = opts->hubport; -/* Treat hub port like a backend, NIC must be the one to peer */ if (peer) { return -EINVAL; } -- 2.1.0
[Qemu-devel] [PULL 2/6] rtl8139: simplify timer logic
From: Paolo Bonzini Pavel Dovgalyuk reports that TimerExpire and the timer are not restored correctly on the receiving end of migration. It is not clear to me whether this is really the case, but we can take the occasion to get rid of the complicated code that computes PCSTimeout on the fly upon changes to IntrStatus/IntrMask. Just always keep a timer running, it will fire every ~130 seconds at most if the interrupt is masked with TimerInt != 0. This makes rtl8139_set_next_tctr_time idempotent (when the virtual clock is stopped between two calls, as is the case during migration). Tested with Frediano's qtest. Signed-off-by: Paolo Bonzini Message-id: 1421765099-26190-1-git-send-email-pbonz...@redhat.com Signed-off-by: Stefan Hajnoczi --- hw/net/rtl8139.c | 77 1 file changed, 27 insertions(+), 50 deletions(-) diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 6fa9e0a..b7b87a6 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -508,7 +508,6 @@ typedef struct RTL8139State { /* PCI interrupt timer */ QEMUTimer *timer; -int64_t TimerExpire; MemoryRegion bar_io; MemoryRegion bar_mem; @@ -520,7 +519,7 @@ typedef struct RTL8139State { /* Writes tally counters to memory via DMA */ static void RTL8139TallyCounters_dma_write(RTL8139State *s, dma_addr_t tc_addr); -static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time); +static void rtl8139_set_next_tctr_time(RTL8139State *s); static void prom9346_decode_command(EEprom9346 *eeprom, uint8_t command) { @@ -1282,6 +1281,7 @@ static void rtl8139_reset(DeviceState *d) s->TCTR = 0; s->TimerInt = 0; s->TCTR_base = 0; +rtl8139_set_next_tctr_time(s); /* reset tally counters */ RTL8139TallyCounters_clear(&s->tally_counters); @@ -2652,7 +2652,6 @@ static void rtl8139_IntrMask_write(RTL8139State *s, uint32_t val) s->IntrMask = val; -rtl8139_set_next_tctr_time(s, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)); rtl8139_update_irq(s); } @@ -2687,13 +2686,7 @@ static void rtl8139_IntrStatus_write(RTL8139State *s, uint32_t val) rtl8139_update_irq(s); s->IntrStatus = newStatus; -/* - * Computing if we miss an interrupt here is not that correct but - * considered that we should have had already an interrupt - * and probably emulated is slower is better to assume this resetting was - * done before testing on previous rtl8139_update_irq lead to IRQ losing - */ -rtl8139_set_next_tctr_time(s, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)); +rtl8139_set_next_tctr_time(s); rtl8139_update_irq(s); #endif @@ -2701,8 +2694,6 @@ static void rtl8139_IntrStatus_write(RTL8139State *s, uint32_t val) static uint32_t rtl8139_IntrStatus_read(RTL8139State *s) { -rtl8139_set_next_tctr_time(s, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)); - uint32_t ret = s->IntrStatus; DPRINTF("IntrStatus read(w) val=0x%04x\n", ret); @@ -2885,43 +2876,32 @@ static void rtl8139_io_writew(void *opaque, uint8_t addr, uint32_t val) } } -static void rtl8139_set_next_tctr_time(RTL8139State *s, int64_t current_time) +static void rtl8139_set_next_tctr_time(RTL8139State *s) { -int64_t pci_time, next_time; -uint32_t low_pci; +const uint64_t ns_per_period = +muldiv64(0x1LL, get_ticks_per_sec(), PCI_FREQUENCY); DPRINTF("entered rtl8139_set_next_tctr_time\n"); -if (s->TimerExpire && current_time >= s->TimerExpire) { -s->IntrStatus |= PCSTimeout; -rtl8139_update_irq(s); -} - -/* Set QEMU timer only if needed that is - * - TimerInt <> 0 (we have a timer) - * - mask = 1 (we want an interrupt timer) - * - irq = 0 (irq is not already active) - * If any of above change we need to compute timer again - * Also we must check if timer is passed without QEMU timer +/* This function is called at least once per period, so it is a good + * place to update the timer base. + * + * After one iteration of this loop the value in the Timer register does + * not change, but the device model is counting up by 2^32 ticks (approx. + * 130 seconds). */ -s->TimerExpire = 0; -if (!s->TimerInt) { -return; -} - -pci_time = muldiv64(current_time - s->TCTR_base, PCI_FREQUENCY, -get_ticks_per_sec()); -low_pci = pci_time & 0x; -pci_time = pci_time - low_pci + s->TimerInt; -if (low_pci >= s->TimerInt) { -pci_time += 0x1LL; +while (s->TCTR_base + ns_per_period <= qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)) { +s->TCTR_base += ns_per_period; } -next_time = s->TCTR_base + muldiv64(pci_time, get_ticks_per_sec(), -PCI_FREQUENCY); -s->TimerExpire = next_time; -if ((s->IntrMask & PCSTimeout) != 0 && (s->IntrStatus & PCSTimeout) == 0) { -timer_mo
[Qemu-devel] [PULL 5/6] net: del hub port when peer is deleted
From: Jason Wang We should del hub port when peer is deleted since it will not be reused and will only be freed during exit. Signed-off-by: Jason Wang Message-id: 1422860798-17495-3-git-send-email-jasow...@redhat.com Signed-off-by: Stefan Hajnoczi --- net/net.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/net.c b/net/net.c index 7acc162..74e651e 100644 --- a/net/net.c +++ b/net/net.c @@ -996,6 +996,8 @@ void net_host_device_remove(Monitor *mon, const QDict *qdict) error_report("invalid host network device '%s'", device); return; } + +qemu_del_net_client(nc->peer); qemu_del_net_client(nc); } -- 2.1.0
Re: [Qemu-devel] [PATCH v2] qcow2: Rewrite qcow2_alloc_bytes()
On 2015-02-06 at 09:08, Kevin Wolf wrote: Am 05.02.2015 um 16:58 hat Max Reitz geschrieben: qcow2_alloc_bytes() is a function with insufficient error handling and an unnecessary goto. This patch rewrites it. Signed-off-by: Max Reitz --- v2: - s/free_cluster_index/free_byte_index/ [Eric] - added an assertion at the start of the function that s->free_byte_offset is either 0 or points to the tail of a cluster (but never to the start) - use ROUND_UP() instead of start_of_cluster() + cluster_size [Eric] - added an assertion that s->free_byte_offset is set before using it [Eric] --- block/qcow2-refcount.c | 77 +- 1 file changed, 45 insertions(+), 32 deletions(-) [snip] The patch looks correct to me. Let me know if you'd like to address the point I made above, or if I should apply it as it is. Good question. On one hand, I like your suggestion because it would indeed make the code shorter. On the other hand, I somehow feel better using functions that are prefixed qcow2_* because I feel like it might make the code harder to read if sometimes we use qcow2_alloc_clusters() and alloc_clusters_noref() at other times; and sometimes we use qcow2_update_cluster_refcount() and update_refcount() at other times. But then again, it might make sense to have this function mirror qcow2_alloc_clusters(), which does use alloc_clusters_noref() and update_refcount() (of course). So I think I'll go with your suggestion, thanks! Max
[Qemu-devel] [PULL 6/6] monitor: more accurate completion for host_net_remove()
From: Jason Wang Current completion for host_net_remove will show hub ports and clients that were not peered with hub ports. Fix this. Cc: Luiz Capitulino Signed-off-by: Jason Wang Message-id: 1422860798-17495-4-git-send-email-jasow...@redhat.com Signed-off-by: Stefan Hajnoczi --- monitor.c | 5 + 1 file changed, 5 insertions(+) diff --git a/monitor.c b/monitor.c index 7e4f605..e6dc50a 100644 --- a/monitor.c +++ b/monitor.c @@ -4597,8 +4597,13 @@ void host_net_remove_completion(ReadLineState *rs, int nb_args, const char *str) count = qemu_find_net_clients_except(NULL, ncs, NET_CLIENT_OPTIONS_KIND_NIC, 255); for (i = 0; i < count; i++) { +int id; const char *name; +if (ncs[i]->info->type == NET_CLIENT_OPTIONS_KIND_HUBPORT || +net_hub_id_for_client(ncs[i], &id)) { +continue; +} name = ncs[i]->name; if (!strncmp(str, name, len)) { readline_add_completion(rs, name); -- 2.1.0
[Qemu-devel] [PATCH 0/4] target-arm: fix various clang UB sanitizer warnings
This patchset fixes a collection of warnings emitted by the clang undefined behaviour sanitizer in the course of booting an AArch64 Linux guest to a shell prompt. These are all various kinds of bad shift (shifting into the sign bit, left shifting negative values, shifting by more than the size of the data type). There remains one set of warnings which I'm not sure how to approach. We have an enum for the valid syndrome register EC field values: enum arm_exception_class { EC_UNCATEGORIZED = 0x00, [...] EC_VECTORCATCH= 0x3a, EC_AA64_BKPT = 0x3c, }; The EC field is a 6 bit field at the top of the 32 bit syndrome register, so we define #define ARM_EL_EC_SHIFT 26 and have utility functions that assemble syndrome values like: static inline uint32_t syn_aa64_bkpt(uint32_t imm16) { return (EC_AA64_BKPT << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & 0x); } Unfortunately this is UB, because C says that enums can be signed types, and if the enum is signed then "EC_AA64_BKPT << 26" is shifting into the sign bit of a signed type. I can't see any nice way of avoiding this. Possible nasty ways: (1) Drop the enum and just use old-fashioned #define EC_AA64_BKPT 0x3cU since then we can force them to be unsigned (2) Cast the constant to unsigned at point of use (3) Keep the enum and add defines wrapping actual use #define EC_AA64_BKPT ((unsigned)EC_AA64_BKPT) I guess there's also (4) Ignore the problem and trust that the compiler developers will never decide to use this bit of undefined behaviour. My preference is (1) I suppose (we don't actually use the enum for anything except to define the values, so if it's not providing helpful definitions we should drop it). Opinions? Peter Maydell (4): target-arm: A64: Fix shifts into sign bit target-arm: A64: Fix handling of rotate in logic_imm_decode_wmask target-arm: A64: Avoid left shifting negative integers in disas_pc_rel_addr target-arm: A64: Avoid signed shifts in disas_ldst_pair() target-arm/translate-a64.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) -- 1.9.1
[Qemu-devel] [PATCH 2/4] target-arm: A64: Fix handling of rotate in logic_imm_decode_wmask
The code in logic_imm_decode_wmask attempts to rotate a mask value within the bottom 'e' bits of the value with mask = (mask >> r) | (mask << (e - r)); This has two issues: * if the element size is 64 then a rotate by zero results in a shift left by 64, which is undefined behaviour * if the element size is smaller than 64 then this will leave junk in the value at bit 'e' and above, which is not valid input to bitfield_replicate(). As it happens, the bits at bit 'e' to '2e - r' are exactly the ones which bitfield_replicate is going to copy in there, so this isn't a "wrong code generated" bug, but it's confusing and if we ever put an assert in bitfield_replicate it would fire on valid guest code. Fix the former by not doing anything if r is zero, and the latter by masking with bitmask64(e). Signed-off-by: Peter Maydell --- target-arm/translate-a64.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index d3801c5..94b3bf4 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -2820,7 +2820,10 @@ static bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn, * by r within the element (which is e bits wide)... */ mask = bitmask64(s + 1); -mask = (mask >> r) | (mask << (e - r)); +if (r) { +mask = (mask >> r) | (mask << (e - r)); +mask &= bitmask64(e); +} /* ...then replicate the element over the whole 64 bit value */ mask = bitfield_replicate(mask, e); *result = mask; -- 1.9.1
[Qemu-devel] [PATCH 4/4] target-arm: A64: Avoid signed shifts in disas_ldst_pair()
Avoid shifting potentially negative signed offset values in disas_ldst_pair() by keeping the offset in a uint64_t rather than an int64_t. Signed-off-by: Peter Maydell --- target-arm/translate-a64.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 68c5b23..9f54501 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -1917,7 +1917,7 @@ static void disas_ldst_pair(DisasContext *s, uint32_t insn) int rt = extract32(insn, 0, 5); int rn = extract32(insn, 5, 5); int rt2 = extract32(insn, 10, 5); -int64_t offset = sextract32(insn, 15, 7); +uint64_t offset = sextract64(insn, 15, 7); int index = extract32(insn, 23, 2); bool is_vector = extract32(insn, 26, 1); bool is_load = extract32(insn, 22, 1); -- 1.9.1
Re: [Qemu-devel] [PULL 0/1] Tracing patches
On 6 February 2015 at 13:45, Stefan Hajnoczi wrote: > The following changes since commit 16017c48547960539fcadb1f91d252124f442482: > > softfloat: Clarify license status (2015-01-29 16:45:45 +) > > are available in the git repository at: > > git://github.com/stefanha/qemu.git tags/tracing-pull-request > > for you to fetch changes up to dd9fe29c80b8a35f12d98928a97be3aded80cf69: > > trace: Print PID and time in stderr traces (2015-02-06 10:27:22 +) > > > > Applied, thanks. -- PMM
[Qemu-devel] [PATCH 1/4] target-arm: A64: Fix shifts into sign bit
Fix attempts to shift into the sign bit of an int, which is undefined behaviour in C and warned about by the clang sanitizer. Signed-off-by: Peter Maydell --- target-arm/translate-a64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index acf4b16..d3801c5 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -1077,7 +1077,7 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t insn) { uint64_t addr = s->pc + sextract32(insn, 0, 26) * 4 - 4; -if (insn & (1 << 31)) { +if (insn & (1U << 31)) { /* C5.6.26 BL Branch with link */ tcg_gen_movi_i64(cpu_reg(s, 30), s->pc); } @@ -1271,7 +1271,7 @@ static void gen_get_nzcv(TCGv_i64 tcg_rt) TCGv_i32 nzcv = tcg_temp_new_i32(); /* build bit 31, N */ -tcg_gen_andi_i32(nzcv, cpu_NF, (1 << 31)); +tcg_gen_andi_i32(nzcv, cpu_NF, (1U << 31)); /* build bit 30, Z */ tcg_gen_setcondi_i32(TCG_COND_EQ, tmp, cpu_ZF, 0); tcg_gen_deposit_i32(nzcv, nzcv, tmp, 30, 1); @@ -1296,7 +1296,7 @@ static void gen_set_nzcv(TCGv_i64 tcg_rt) tcg_gen_trunc_i64_i32(nzcv, tcg_rt); /* bit 31, N */ -tcg_gen_andi_i32(cpu_NF, nzcv, (1 << 31)); +tcg_gen_andi_i32(cpu_NF, nzcv, (1U << 31)); /* bit 30, Z */ tcg_gen_andi_i32(cpu_ZF, nzcv, (1 << 30)); tcg_gen_setcondi_i32(TCG_COND_EQ, cpu_ZF, cpu_ZF, 0); -- 1.9.1
[Qemu-devel] [PATCH 3/4] target-arm: A64: Avoid left shifting negative integers in disas_pc_rel_addr
Shifting a negative integer left is undefined behaviour in C. Avoid it by assembling and shifting the offset fields as unsigned values and then sign extending as the final action. Signed-off-by: Peter Maydell --- target-arm/translate-a64.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 94b3bf4..68c5b23 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -2662,11 +2662,12 @@ static void disas_pc_rel_adr(DisasContext *s, uint32_t insn) { unsigned int page, rd; uint64_t base; -int64_t offset; +uint64_t offset; page = extract32(insn, 31, 1); /* SignExtend(immhi:immlo) -> offset */ -offset = ((int64_t)sextract32(insn, 5, 19) << 2) | extract32(insn, 29, 2); +offset = sextract64(insn, 5, 19); +offset = offset << 2 | extract32(insn, 29, 2); rd = extract32(insn, 0, 5); base = s->pc - 4; -- 1.9.1
[Qemu-devel] [PATCH 2/2] openpic: convert to vmstate
Signed-off-by: Mark Cave-Ayland --- hw/intc/openpic.c | 253 + 1 file changed, 119 insertions(+), 134 deletions(-) diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c index 2a3144f..1ceac97 100644 --- a/hw/intc/openpic.c +++ b/hw/intc/openpic.c @@ -207,6 +207,7 @@ typedef enum IRQType { typedef struct IRQQueue { unsigned long *queue; +int32_t queue_size; /* Only used for VMSTATE_BITMAP */ int next; int priority; } IRQQueue; @@ -242,6 +243,15 @@ typedef struct IRQSource { #define IDR_EP 0x8000 /* external pin */ #define IDR_CI 0x4000 /* critical interrupt */ +typedef struct OpenPICTimer { +uint32_t tccr; /* Global timer current count register */ +uint32_t tbcr; /* Global timer base count register */ +} OpenPICTimer; + +typedef struct OpenPICMSI { +uint32_t msir; /* Shared Message Signaled Interrupt Register */ +} OpenPICMSI; + typedef struct IRQDest { int32_t ctpr; /* CPU current task priority */ IRQQueue raised; @@ -290,14 +300,9 @@ typedef struct OpenPICState { IRQDest dst[MAX_CPU]; uint32_t nb_cpus; /* Timer registers */ -struct { -uint32_t tccr; /* Global timer current count register */ -uint32_t tbcr; /* Global timer base count register */ -} timers[OPENPIC_MAX_TMR]; +OpenPICTimer timers[OPENPIC_MAX_TMR]; /* Shared MSI registers */ -struct { -uint32_t msir; /* Shared Message Signaled Interrupt Register */ -} msi[MAX_MSI]; +OpenPICMSI msi[MAX_MSI]; uint32_t max_irq; uint32_t irq_ipi0; uint32_t irq_tim0; @@ -1289,130 +1294,6 @@ static const MemoryRegionOps openpic_summary_ops_be = { }, }; -static void openpic_save_IRQ_queue(QEMUFile* f, IRQQueue *q) -{ -unsigned int i; - -for (i = 0; i < BITS_TO_LONGS(IRQQUEUE_SIZE_BITS); i++) { -/* Always put the lower half of a 64-bit long first, in case we - * restore on a 32-bit host. The least significant bits correspond - * to lower IRQ numbers in the bitmap. - */ -qemu_put_be32(f, (uint32_t)q->queue[i]); -#if LONG_MAX > 0x7FFF -qemu_put_be32(f, (uint32_t)(q->queue[i] >> 32)); -#endif -} - -qemu_put_sbe32s(f, &q->next); -qemu_put_sbe32s(f, &q->priority); -} - -static void openpic_save(QEMUFile* f, void *opaque) -{ -OpenPICState *opp = (OpenPICState *)opaque; -unsigned int i; - -qemu_put_be32s(f, &opp->gcr); -qemu_put_be32s(f, &opp->vir); -qemu_put_be32s(f, &opp->pir); -qemu_put_be32s(f, &opp->spve); -qemu_put_be32s(f, &opp->tfrr); - -qemu_put_be32s(f, &opp->nb_cpus); - -for (i = 0; i < opp->nb_cpus; i++) { -qemu_put_sbe32s(f, &opp->dst[i].ctpr); -openpic_save_IRQ_queue(f, &opp->dst[i].raised); -openpic_save_IRQ_queue(f, &opp->dst[i].servicing); -qemu_put_buffer(f, (uint8_t *)&opp->dst[i].outputs_active, -sizeof(opp->dst[i].outputs_active)); -} - -for (i = 0; i < OPENPIC_MAX_TMR; i++) { -qemu_put_be32s(f, &opp->timers[i].tccr); -qemu_put_be32s(f, &opp->timers[i].tbcr); -} - -for (i = 0; i < opp->max_irq; i++) { -qemu_put_be32s(f, &opp->src[i].ivpr); -qemu_put_be32s(f, &opp->src[i].idr); -qemu_put_be32s(f, &opp->src[i].destmask); -qemu_put_sbe32s(f, &opp->src[i].last_cpu); -qemu_put_sbe32s(f, &opp->src[i].pending); -} -} - -static void openpic_load_IRQ_queue(QEMUFile* f, IRQQueue *q) -{ -unsigned int i; - -for (i = 0; i < BITS_TO_LONGS(IRQQUEUE_SIZE_BITS); i++) { -unsigned long val; - -val = qemu_get_be32(f); -#if LONG_MAX > 0x7FFF -val <<= 32; -val |= qemu_get_be32(f); -#endif - -q->queue[i] = val; -} - -qemu_get_sbe32s(f, &q->next); -qemu_get_sbe32s(f, &q->priority); -} - -static int openpic_load(QEMUFile* f, void *opaque, int version_id) -{ -OpenPICState *opp = (OpenPICState *)opaque; -unsigned int i, nb_cpus; - -if (version_id != 2) { -return -EINVAL; -} - -qemu_get_be32s(f, &opp->gcr); -qemu_get_be32s(f, &opp->vir); -qemu_get_be32s(f, &opp->pir); -qemu_get_be32s(f, &opp->spve); -qemu_get_be32s(f, &opp->tfrr); - -qemu_get_be32s(f, &nb_cpus); -if (opp->nb_cpus != nb_cpus) { -return -EINVAL; -} -assert(nb_cpus > 0 && nb_cpus <= MAX_CPU); - -for (i = 0; i < opp->nb_cpus; i++) { -qemu_get_sbe32s(f, &opp->dst[i].ctpr); -openpic_load_IRQ_queue(f, &opp->dst[i].raised); -openpic_load_IRQ_queue(f, &opp->dst[i].servicing); -qemu_get_buffer(f, (uint8_t *)&opp->dst[i].outputs_active, -sizeof(opp->dst[i].outputs_active)); -} - -for (i = 0; i < OPENPIC_MAX_TMR; i++) { -qemu_get_be32s(f, &opp->timers[i].tccr); -qemu_get_be32s(f, &opp->timers[i].tbcr); -} - -for (i = 0; i < opp->max_irq; i++) { -
[Qemu-devel] [PATCH 0/2] openpic: convert to vmstate
This patchset follows on from my ppc loadvm/savevm work and converts openpic over to vmstate. With these patches applied, I can successfully a savevm/loadvm pair under -M mac99 within OpenBIOS, the same result as with the previous patchset applied. Alex: if you're happy with this, I could fixup the previous patch in my ppc loadvm/savevm patchset and append these two patches to the end of the series? Signed-off-by: Mark Cave-Ayland Mark Cave-Ayland (2): openpic: switch IRQQueue queue from inline to bitmap openpic: convert to vmstate hw/intc/openpic.c | 274 ++--- 1 file changed, 133 insertions(+), 141 deletions(-) -- 1.7.10.4
[Qemu-devel] [PATCH 1/2] openpic: switch IRQQueue queue from inline to bitmap
This is in preparation for using VMSTATE_BITMAP in a followup vmstate migration patch. Signed-off-by: Mark Cave-Ayland --- hw/intc/openpic.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c index 4194cef..2a3144f 100644 --- a/hw/intc/openpic.c +++ b/hw/intc/openpic.c @@ -200,11 +200,13 @@ typedef enum IRQType { IRQ_TYPE_FSLSPECIAL,/* FSL timer/IPI interrupt, edge, no polarity */ } IRQType; +/* Round up to the nearest 64 IRQs so that the queue length + * won't change when moving between 32 and 64 bit hosts. + */ +#define IRQQUEUE_SIZE_BITS ((OPENPIC_MAX_IRQ + 63) & ~63) + typedef struct IRQQueue { -/* Round up to the nearest 64 IRQs so that the queue length - * won't change when moving between 32 and 64 bit hosts. - */ -unsigned long queue[BITS_TO_LONGS((OPENPIC_MAX_IRQ + 63) & ~63)]; +unsigned long *queue; int next; int priority; } IRQQueue; @@ -1291,7 +1293,7 @@ static void openpic_save_IRQ_queue(QEMUFile* f, IRQQueue *q) { unsigned int i; -for (i = 0; i < ARRAY_SIZE(q->queue); i++) { +for (i = 0; i < BITS_TO_LONGS(IRQQUEUE_SIZE_BITS); i++) { /* Always put the lower half of a 64-bit long first, in case we * restore on a 32-bit host. The least significant bits correspond * to lower IRQ numbers in the bitmap. @@ -1345,7 +1347,7 @@ static void openpic_load_IRQ_queue(QEMUFile* f, IRQQueue *q) { unsigned int i; -for (i = 0; i < ARRAY_SIZE(q->queue); i++) { +for (i = 0; i < BITS_TO_LONGS(IRQQUEUE_SIZE_BITS); i++) { unsigned long val; val = qemu_get_be32(f); @@ -1444,12 +1446,14 @@ static void openpic_reset(DeviceState *d) write_IRQreg_idr(opp, i, opp->idr_reset); } /* Initialise IRQ destinations */ -for (i = 0; i < MAX_CPU; i++) { +for (i = 0; i < opp->nb_cpus; i++) { opp->dst[i].ctpr = 15; -memset(&opp->dst[i].raised, 0, sizeof(IRQQueue)); opp->dst[i].raised.next = -1; -memset(&opp->dst[i].servicing, 0, sizeof(IRQQueue)); +opp->dst[i].raised.priority = 0; +bitmap_clear(opp->dst[i].raised.queue, 0, IRQQUEUE_SIZE_BITS); opp->dst[i].servicing.next = -1; +opp->dst[i].servicing.priority = 0; +bitmap_clear(opp->dst[i].servicing.queue, 0, IRQQUEUE_SIZE_BITS); } /* Initialise timers */ for (i = 0; i < OPENPIC_MAX_TMR; i++) { @@ -1629,6 +1633,9 @@ static void openpic_realize(DeviceState *dev, Error **errp) for (j = 0; j < OPENPIC_OUTPUT_NB; j++) { sysbus_init_irq(d, &opp->dst[i].irqs[j]); } + +opp->dst[i].raised.queue = bitmap_new(IRQQUEUE_SIZE_BITS); +opp->dst[i].servicing.queue = bitmap_new(IRQQUEUE_SIZE_BITS); } register_savevm(dev, "openpic", 0, 2, -- 1.7.10.4
Re: [Qemu-devel] [PATCH 1/7] softfloat: Fix sNaN handling in FP conversion operations
On Thu, 5 Feb 2015, Peter Maydell wrote: > > Fix sNaN handling in floating-point format conversion operations, that > > are classified by the IEEE 754-2008 standard as general-computational > > operations [1]: > > > > "5.4 formatOf general-computational operations > > > > "5.4.2 Conversion operations for floating-point formats and decimal > > character sequences > > > > "Implementations shall provide the following formatOf conversion > > operations from all supported floating-point formats to all supported > > floating-point formats, as well as conversions to and from decimal > > character sequences. These operations shall not propagate non-canonical > > results. Some format conversion operations produce results in a > > different radix than the operands." > > > > according to the quietening requirement [2] set by the same standard: > > > > "7.2 Invalid operation > > > > "For operations producing results in floating-point format, the default > > result of an operation that signals the invalid operation exception > > shall be a quiet NaN that should provide some diagnostic information > > (see 6.2). > > > > "These operations are: > > a) any general-computational or signaling-computational operation > > on a signaling NaN (see 6.2), except for some conversions (see > > 5.12)" > > > > and the reference above is [3]: > > > > "5.12 Details of conversion between floating-point data and external > > character sequences" > > > > so does not apply to conversions a pair of floating-point formats. > > > > Therefore quieten any sNaN encountered in floating-point format > > conversions, in the usual manner. > > > > References: > > > > [1] "IEEE Standard for Floating-Point Arithmetic", IEEE Computer > > Society, IEEE Std 754-2008, 29 August 2008, pp. 21-22 > > > > [2] same, p. 37 > > > > [3] same, p. 30 > > > > Signed-off-by: Maciej W. Rozycki > > --- > > This is in particular how MIPS hardware operates, other processors > > supposedly do the same if they claim compliance to IEEE 754. > > ARM currently handles this issue by calling the _maybe_silence_nan() > function in all target-specific helper functions that call > float-to-float conversion functions (but that has its own > issues, see below). > > Have you looked at the other architectures that use these > functions to convert float values to see what their NaN > handling semantics are? I agree about what the spec says, > but we may well have targets which are not fully IEEE > compliant and rely on the current semantics (x86 springs to > mind). I think the IEEE 754 standard-compliant behaviour shouldn't be left to target helpers, it would bound to cause discrepancies and chaos. I wouldn't expect every target maintainer to be fluent in IEEE 754 arcana, or even have access to the standard's normative document. What I think would make sense here is instead of say `float32_to_float64' making a call to `float64_maybe_silence_nan' directly, we'd have a static inline function or a macro called say `float64_convert_silence_nan' invoked where the former is in my patch proposed. That function in turn would call the former function by default and which could be overridden by something else for individual targets that require it. And I think we shouldn't be making our decisions based on x86 despite its ubiquity -- it is, especially as the x87 implementation is concerned, the earliest attempt to implement IEEE 754 with all the oddities resulting. Or more specifically not even that but more of an FP implementation that (as of the 8087 FPU) Intel wanted to become the basis of the upcoming IEEE 754 standard, and mostly succeeded in principle, but not in some details, some of which were later corrected with the 80287 and 80387 implementations, and some of which had to stay, for compatibility reasons. For one the x87 doesn't really implement proper single or double arithmetic. While there are two precision control bits in its control register, not all arithmetic operations respect them, making their calculation unconditionally in the extended precision instead. And even these that do only respect them for the significand and not the exponent[1]: " * The precision control (PC) bits (bits 9-8) can be used to set the MCP internal operating precision of the significand at less than the default of 64 bits (extended precision). This can be useful in providing compatibility with early generation arithmetic processors of smaller precision. PC affects only the instructions ADD, SUB, DIV, MUL, and SQRT. For all other instructions, either the precision is determined by the opcode or extended precision is used. " So I think that any x87 intricacies are best handled either by hooks similar to `float64_convert_silence_nan' I propose or by individual implementation of whole operations where the actions required diverge from the standard so much as to make the use of such hooks infeasible. Besides are you
[Qemu-devel] [PATCH v3] qcow2: Rewrite qcow2_alloc_bytes()
qcow2_alloc_bytes() is a function with insufficient error handling and an unnecessary goto. This patch rewrites it. Signed-off-by: Max Reitz --- v3: - Use alloc_clusters_noref() and update_refcount() [Kevin] - Only modify s->free_byte_offset if the function is successful; this is now necessary because update_refcount() is called unconditionally and thus, if it failed and alloc_clusters_noref() had been called and had returned a non-contiguous offset, s->free_byte_offset would point to an unallocated cluster's head, which is both wrong in itself and would also violate the assertion at the beginning of the function --- block/qcow2-refcount.c | 78 +- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 9afdb40..9b80ca7 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -759,54 +759,54 @@ int qcow2_alloc_clusters_at(BlockDriverState *bs, uint64_t offset, int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size) { BDRVQcowState *s = bs->opaque; -int64_t offset, cluster_offset; -int free_in_cluster; +int64_t offset; +size_t free_in_cluster; +int ret; BLKDBG_EVENT(bs->file, BLKDBG_CLUSTER_ALLOC_BYTES); assert(size > 0 && size <= s->cluster_size); -if (s->free_byte_offset == 0) { -offset = qcow2_alloc_clusters(bs, s->cluster_size); -if (offset < 0) { -return offset; +assert(!s->free_byte_offset || offset_into_cluster(s, s->free_byte_offset)); + +offset = s->free_byte_offset; + +if (offset) { +int refcount = qcow2_get_refcount(bs, offset >> s->cluster_bits); +if (refcount < 0) { +return refcount; } -s->free_byte_offset = offset; -} - redo: -free_in_cluster = s->cluster_size - -offset_into_cluster(s, s->free_byte_offset); -if (size <= free_in_cluster) { -/* enough space in current cluster */ -offset = s->free_byte_offset; -s->free_byte_offset += size; -free_in_cluster -= size; -if (free_in_cluster == 0) -s->free_byte_offset = 0; -if (offset_into_cluster(s, offset) != 0) -qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1, - QCOW2_DISCARD_NEVER); -} else { -offset = qcow2_alloc_clusters(bs, s->cluster_size); -if (offset < 0) { -return offset; + +if (refcount == 0x) { +offset = 0; } -cluster_offset = start_of_cluster(s, s->free_byte_offset); -if ((cluster_offset + s->cluster_size) == offset) { -/* we are lucky: contiguous data */ -offset = s->free_byte_offset; -qcow2_update_cluster_refcount(bs, offset >> s->cluster_bits, 1, - QCOW2_DISCARD_NEVER); -s->free_byte_offset += size; -} else { -s->free_byte_offset = offset; -goto redo; +} + +free_in_cluster = s->cluster_size - offset_into_cluster(s, offset); +if (!offset || free_in_cluster < size) { +int64_t new_cluster = alloc_clusters_noref(bs, s->cluster_size); +if (new_cluster < 0) { +return new_cluster; +} + +if (!offset || ROUND_UP(offset, s->cluster_size) != new_cluster) { +offset = new_cluster; } } -/* The cluster refcount was incremented, either by qcow2_alloc_clusters() - * or explicitly by qcow2_update_cluster_refcount(). Refcount blocks must - * be flushed before the caller's L2 table updates. - */ +assert(offset); +ret = update_refcount(bs, offset, size, 1, QCOW2_DISCARD_NEVER); +if (ret < 0) { +return ret; +} + +/* The cluster refcount was incremented; refcount blocks must be flushed + * before the caller's L2 table updates. */ qcow2_cache_set_dependency(bs, s->l2_table_cache, s->refcount_block_cache); + +s->free_byte_offset = offset + size; +if (!offset_into_cluster(s, s->free_byte_offset)) { +s->free_byte_offset = 0; +} + return offset; } -- 2.1.0
Re: [Qemu-devel] [PATCH v11 12/13] qemu-iotests: Add tests for drive-backup sync=dirty-bitmap
On 12.01.2015 19:31, John Snow wrote: From: Fam Zheng Signed-off-by: Fam Zheng Signed-off-by: John Snow --- tests/qemu-iotests/056| 33 ++--- tests/qemu-iotests/056.out| 4 ++-- tests/qemu-iotests/iotests.py | 8 3 files changed, 40 insertions(+), 5 deletions(-) diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056 index 54e4bd0..6f8aa9a 100755 --- a/tests/qemu-iotests/056 +++ b/tests/qemu-iotests/056 @@ -23,17 +23,17 @@ import time import os import iotests -from iotests import qemu_img, qemu_io, create_image +from iotests import qemu_img, qemu_img_map_assert, qemu_io, create_image backing_img = os.path.join(iotests.test_dir, 'backing.img') test_img = os.path.join(iotests.test_dir, 'test.img') target_img = os.path.join(iotests.test_dir, 'target.img') -class TestSyncModesNoneAndTop(iotests.QMPTestCase): +class TestSyncModes(iotests.QMPTestCase): image_len = 64 * 1024 * 1024 # MB def setUp(self): -create_image(backing_img, TestSyncModesNoneAndTop.image_len) +create_image(backing_img, TestSyncModes.image_len) qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, test_img) qemu_io('-c', 'write -P0x41 0 512', test_img) qemu_io('-c', 'write -P0xd5 1M 32k', test_img) @@ -64,6 +64,33 @@ class TestSyncModesNoneAndTop(iotests.QMPTestCase): self.assertTrue(iotests.compare_images(test_img, target_img), 'target image does not match source after backup') +def test_sync_dirty_bitmap_missing(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('drive-backup', device='drive0', sync='dirty-bitmap', + format=iotests.imgfmt, target=target_img) +self.assert_qmp(result, 'error/class', 'GenericError') + +def test_sync_dirty_bitmap_not_found(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('drive-backup', device='drive0', sync='dirty-bitmap', + bitmap='unknown', + format=iotests.imgfmt, target=target_img) +self.assert_qmp(result, 'error/class', 'GenericError') + +def test_sync_dirty_bitmap(self): +self.assert_no_active_block_jobs() +result = self.vm.qmp('block-dirty-bitmap-add', node_ref='drive0', name='bitmap0') +self.assert_qmp(result, 'return', {}) +self.vm.hmp_qemu_io('drive0', 'write -P0x5a 0 512') +self.vm.hmp_qemu_io('drive0', 'write -P0x5a 48M 512') +result = self.vm.qmp('drive-backup', device='drive0', sync='dirty-bitmap', + bitmap='bitmap0', + format=iotests.imgfmt, target=target_img) +self.assert_qmp(result, 'return', {}) +self.wait_until_completed(check_offset=False) +self.assert_no_active_block_jobs() +qemu_img_map_assert(target_img, [0, 0x300]) + def test_cancel_sync_none(self): self.assert_no_active_block_jobs() diff --git a/tests/qemu-iotests/056.out b/tests/qemu-iotests/056.out index fbc63e6..914e373 100644 --- a/tests/qemu-iotests/056.out +++ b/tests/qemu-iotests/056.out @@ -1,5 +1,5 @@ -.. +. -- -Ran 2 tests +Ran 5 tests OK diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index f57f154..95bb959 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -55,6 +55,14 @@ def qemu_img_pipe(*args): '''Run qemu-img and return its output''' return subprocess.Popen(qemu_img_args + list(args), stdout=subprocess.PIPE).communicate()[0] +def qemu_img_map_assert(img, offsets): +'''Run qemu-img map on img and check the mapped ranges''' +offs = [] +for line in qemu_img_pipe('map', img).splitlines()[1:]: +offset, length, mapped, fname = line.split() +offs.append(int(offset, 16)) +assert set(offs) == set(offsets), "mapped offsets in image '%s' not equal to '%s'" % (str(offs), str(offsets)) + def qemu_io(*args): '''Run qemu-io and return the stdout data''' args = qemu_io_args + list(args) why not just check by qemu-io -c "write -P0x5a 0 512" for symmetry? -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH 1/7] softfloat: Fix sNaN handling in FP conversion operations
On 6 February 2015 at 14:37, Maciej W. Rozycki wrote: > On Thu, 5 Feb 2015, Peter Maydell wrote: >> Have you looked at the other architectures that use these >> functions to convert float values to see what their NaN >> handling semantics are? I agree about what the spec says, >> but we may well have targets which are not fully IEEE >> compliant and rely on the current semantics (x86 springs to >> mind). > > I think the IEEE 754 standard-compliant behaviour shouldn't be left to > target helpers, it would bound to cause discrepancies and chaos. I > wouldn't expect every target maintainer to be fluent in IEEE 754 arcana, > or even have access to the standard's normative document. > > What I think would make sense here is instead of say `float32_to_float64' > making a call to `float64_maybe_silence_nan' directly, we'd have a static > inline function or a macro called say `float64_convert_silence_nan' > invoked where the former is in my patch proposed. That function in turn > would call the former function by default and which could be overridden by > something else for individual targets that require it. This still doesn't work for ARM, because you can't do the silencing after conversion. You have to have a way for targets to put in target-specific behaviour directly in the 64-to-32 conversion process, before data is lost in the narrowing. > And I think we shouldn't be making our decisions based on x86 despite its > ubiquity -- it is, especially as the x87 implementation is concerned, the > earliest attempt to implement IEEE 754 with all the oddities resulting. > Or more specifically not even that but more of an FP implementation that > (as of the 8087 FPU) Intel wanted to become the basis of the upcoming IEEE > 754 standard, and mostly succeeded in principle, but not in some details, > some of which were later corrected with the 80287 and 80387 > implementations, and some of which had to stay, for compatibility reasons. My point is that you mustn't regress other targets. So either you need to retain the same behaviour for targets which don't specifically add new code/flags/whatever, or you need to go through and look up all the other architecture specifications to check that you haven't broken them (ie that any changes you make are actually improving their accuracy). This patch isn't doing the first of those, so you need to do the second... > Besides are you sure? I was asking if you'd checked all the targets whose behaviour you're changing, not making a definite "you have broken x86" statement. x86 is just the most likely because as you say it is the weirdest and least compliant. >> Also in this area, ARM has a problem where we don't give the >> correct answers for fp-to-fp conversions from a higher precision >> to a lower precision where the input is a signaling NaN whose >> non-zero mantissa bits are all at the low end such that the >> truncation of the mantissa into the lower precision format >> would drop them all. The result is the wrong value of quiet NaN >> (we return the default NaN, which has the sign bit clear, but the >> FPConvertNaN pseudocode specifies that we should effectively get >> the default NaN but with the same sign bit as the input SNaN). >> >> I think this means that: >> (1) we want to put handling of silencing the signaling NaNs >> into the NaN conversion functions themselves (since it's >> too late to do it correctly once the commonNaNtoFloat16 >> function has thrown away data) >> (2) that handling needs to be target specific, as most details >> of quiet vs signaling NaNs are >> >> I note in passing that the float*ToCommonNaN and commonNaNToFloat* >> functions are used only in the back-to-back call sequence of >>return commonNaNToFloatX(floatYToCommonNaN(...)) >> for handling the NaN inputs on FP-to-FP conversion. > > I believe my proposal addresses your concerns in a burden-free way for > target maintainers who look after processors that implement the IEEE 754 > standard as it stands. I don't, which is why I made the comment above. If you're going to fix up NaN handling in the float-to-float conversion routines we should do it in a way that lets us handle the behaviour of target CPUs we care about. thanks -- PMM
Re: [Qemu-devel] [PATCH] net: synchronize net_host_device_remove with host_net_remove_completion
On 06/02/2015 14:54, Stefan Hajnoczi wrote: > On Mon, Jan 19, 2015 at 12:27:11PM +0100, Paolo Bonzini wrote: >> On 02/01/2015 17:20, Paolo Bonzini wrote: > > The assert can be dropped completely since the code already has an > equivalent assert: > > queues = qemu_find_net_clients_except(nc->name, ncs, > NET_CLIENT_OPTIONS_KIND_NIC, > MAX_QUEUE_NUM); > assert(queues != 0); <-- fail if type == NET_CLIENT_OPTIONS_KIND_NIC >>> I left it on purpose for documentation, but I'll send v2 next week that >>> removes it. >> >> Actually it's not the same. If you have "-netdev user,id=e1000 -device >> e1000,netdev=e1000" you will be able to call qemu_del_net_client on the >> NIC, and it will _not_ fail if the assertion is removed. > > I don't follow. > > If you call qemu_del_net_client(e1000_nic) then > qemu_find_net_clients_except(nc->name, ncs, NET_CLIENT_OPTIONS_KIND_NIC, > MAX_QUEUE_NUM) returns 0. This causes the assert(queues != 0) to fail. NICs and other clients are in separate namespaces. So if you do -netdev user,id=e1000 -device e1000,netdev=e1000,id=e1000 you have two NetClients named "e1000". If you call (by mistake) qemu_del_net_client(e1000_nic), qemu_find_net_clients_except will return the SLIRP client and the assertion will not fail. So you need a separate assertion. Paolo
Re: [Qemu-devel] [RFC PATCH v3] tests: rtl8139: test timers and interrupt
This patch is marked as RFC, but I think it should be included. Stefan/Jason, can you pull it in the next net pull request? Paolo On 08/01/2015 19:38, Frediano Ziglio wrote: > Test behaviour of timers and interrupts related to timeouts. > > Signed-off-by: Frediano Ziglio > --- > tests/Makefile | 2 +- > tests/rtl8139-test.c | 181 > +++ > 2 files changed, 182 insertions(+), 1 deletion(-) > > This patch was derived from a test I did while implementing timer in > rtl8139 code. Now that there is support for integrated testing I converted > it. The test was tested on a real NIC. > > As if it's the first test I wrote I don't know if syntax and details are > fine. For instance should I remove nop test? Should I split my test? > > Changed from v2: > - style (variable declaration, Perl script not able to spot it) > > Changed from v1: > - style > > diff --git a/tests/Makefile b/tests/Makefile > index e4ddb6a..8858407 100644 > --- a/tests/Makefile > +++ b/tests/Makefile > @@ -320,7 +320,7 @@ tests/tmp105-test$(EXESUF): tests/tmp105-test.o > $(libqos-omap-obj-y) > tests/i440fx-test$(EXESUF): tests/i440fx-test.o $(libqos-pc-obj-y) > tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y) > tests/e1000-test$(EXESUF): tests/e1000-test.o > -tests/rtl8139-test$(EXESUF): tests/rtl8139-test.o > +tests/rtl8139-test$(EXESUF): tests/rtl8139-test.o $(libqos-pc-obj-y) > tests/pcnet-test$(EXESUF): tests/pcnet-test.o > tests/eepro100-test$(EXESUF): tests/eepro100-test.o > tests/vmxnet3-test$(EXESUF): tests/vmxnet3-test.o > diff --git a/tests/rtl8139-test.c b/tests/rtl8139-test.c > index f6a1be3..4e0bf02 100644 > --- a/tests/rtl8139-test.c > +++ b/tests/rtl8139-test.c > @@ -10,19 +10,200 @@ > #include > #include > #include "libqtest.h" > +#include "libqos/pci-pc.h" > #include "qemu/osdep.h" > +#include "qemu-common.h" > > /* Tests only initialization so far. TODO: Replace with functional tests */ > static void nop(void) > { > } > > +#define CLK 3300 > +#define NS_PER_SEC 10ULL > + > +static QPCIBus *pcibus; > +static QPCIDevice *dev; > +static void *dev_base; > + > +static void save_fn(QPCIDevice *dev, int devfn, void *data) > +{ > +QPCIDevice **pdev = (QPCIDevice **) data; > + > +*pdev = dev; > +} > + > +static QPCIDevice *get_device(void) > +{ > +QPCIDevice *dev; > + > +pcibus = qpci_init_pc(); > +qpci_device_foreach(pcibus, 0x10ec, 0x8139, save_fn, &dev); > +g_assert(dev != NULL); > + > +return dev; > +} > + > +#define PORT(name, len, val) \ > +static unsigned __attribute__((unused)) in_##name(void) \ > +{ \ > +unsigned res = qpci_io_read##len(dev, dev_base+(val)); \ > +g_test_message("*%s -> %x\n", #name, res); \ > +return res; \ > +} \ > +static void out_##name(unsigned v) \ > +{ \ > +g_test_message("%x -> *%s\n", v, #name); \ > +qpci_io_write##len(dev, dev_base+(val), v); \ > +} > + > +PORT(Timer, l, 0x48) > +PORT(IntrMask, w, 0x3c) > +PORT(IntrStatus, w, 0x3E) > +PORT(TimerInt, l, 0x54) > + > +#define fatal(...) do { g_test_message(__VA_ARGS__); g_assert(0); } while (0) > + > +static void test_timer(void) > +{ > +const unsigned from = 0.95 * CLK; > +const unsigned to = 1.6 * CLK; > +unsigned prev, curr, next; > +unsigned cnt, diff; > + > +out_IntrMask(0); > + > +in_IntrStatus(); > +in_Timer(); > +in_Timer(); > + > +/* Test 1. test counter continue and continue */ > +out_TimerInt(0); /* disable timer */ > +out_IntrStatus(0x4000); > +out_Timer(12345); /* reset timer to 0 */ > +curr = in_Timer(); > +if (curr > 0.1 * CLK) { > +fatal("time too big %u\n", curr); > +} > +for (cnt = 0; ; ) { > +clock_step(1 * NS_PER_SEC); > +prev = curr; > +curr = in_Timer(); > + > +/* test skip is in a specific range */ > +diff = (curr-prev) & 0xu; > +if (diff < from || diff > to) { > +fatal("Invalid diff %u (%u-%u)\n", diff, from, to); > +} > +if (curr < prev && ++cnt == 3) { > +break; > +} > +} > + > +/* Test 2. Check we didn't get an interrupt with TimerInt == 0 */ > +if (in_IntrStatus() & 0x4000) { > +fatal("got an interrupt\n"); > +} > + > +/* Test 3. Setting TimerInt to 1 and Timer to 0 get interrupt */ > +out_TimerInt(1); > +out_Timer(0); > +clock_step(40); > +if ((in_IntrStatus() & 0x4000) == 0) { > +fatal("we should have an interrupt here!\n"); > +} > + > +/* Test 3. Check acknowledge */ > +out_IntrStatus(0x4000); > +if (in_IntrStatus() & 0x4000) { > +fatal("got an interrupt\n"); > +} > + > +/* Test. Status set after Timer reset */ > +out_Timer(0); > +out_TimerInt(0); > +out_IntrStatus(0x4000); > +curr = in_Timer(); > +out_TimerInt(curr + 0.5 * CLK); > +clock_step(1 * NS_PER_SEC); > +out_Timer(0); > +
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/2] openpic: convert to vmstate
On 06.02.15 15:36, Mark Cave-Ayland wrote: > This patchset follows on from my ppc loadvm/savevm work and converts > openpic over to vmstate. > > With these patches applied, I can successfully a savevm/loadvm pair > under -M mac99 within OpenBIOS, the same result as with the previous > patchset applied. > > Alex: if you're happy with this, I could fixup the previous patch in > my ppc loadvm/savevm patchset and append these two patches to the end > of the series? Sounds perfect! Reviewed-by: Alexander Graf Alex
Re: [Qemu-devel] [PATCH v3] qcow2: Rewrite qcow2_alloc_bytes()
Am 06.02.2015 um 15:39 hat Max Reitz geschrieben: > qcow2_alloc_bytes() is a function with insufficient error handling and > an unnecessary goto. This patch rewrites it. > > Signed-off-by: Max Reitz > --- > v3: > - Use alloc_clusters_noref() and update_refcount() [Kevin] > - Only modify s->free_byte_offset if the function is successful; this is > now necessary because update_refcount() is called unconditionally and > thus, if it failed and alloc_clusters_noref() had been called and > had returned a non-contiguous offset, s->free_byte_offset would point > to an unallocated cluster's head, which is both wrong in itself and > would also violate the assertion at the beginning of the function Looks much nicer, I think. :-) Thanks, applied to the block branch. Kevin
Re: [Qemu-devel] [PATCH v3] qcow2: Rewrite qcow2_alloc_bytes()
On 02/06/2015 07:39 AM, Max Reitz wrote: > qcow2_alloc_bytes() is a function with insufficient error handling and > an unnecessary goto. This patch rewrites it. > > Signed-off-by: Max Reitz > --- > v3: > - Use alloc_clusters_noref() and update_refcount() [Kevin] Ouch. Not done quite right. Kevin, you may want to remove this from your staging area, while Max works on a v4. > + > +free_in_cluster = s->cluster_size - offset_into_cluster(s, offset); > +if (!offset || free_in_cluster < size) { Let's consider all four possibilities: Case 1: !offset we won't be modifying any existing clusters. When we are done, the new cluster needs a refcount of 1 Case 2: free_in_cluster >= size we will only be modifying an existing cluster, assume it starts with refcount of 1. When we are done, it needs a refcount of 2 Case 3: free_in_cluster < size, allocation is not contiguous we won't be modifying any existing clusters. When we are done, the new cluster needs a refcount of 1 Case 4: free_in_cluster < size, allocation is contiguous we will be placing data into both an existing cluster and a new one; assume the existing cluster starts with a refcount of 1. When we are done, the old cluster needs a refcount of 2, and the new cluster needs a refcount of 1 > +int64_t new_cluster = alloc_clusters_noref(bs, s->cluster_size); This says the new cluster has a refcount of 0. > +if (new_cluster < 0) { > +return new_cluster; > +} > + > +if (!offset || ROUND_UP(offset, s->cluster_size) != new_cluster) { > +offset = new_cluster; > } > } > > -/* The cluster refcount was incremented, either by qcow2_alloc_clusters() > - * or explicitly by qcow2_update_cluster_refcount(). Refcount blocks > must > - * be flushed before the caller's L2 table updates. > - */ > +assert(offset); > +ret = update_refcount(bs, offset, size, 1, QCOW2_DISCARD_NEVER); Case 1: This incremented the new cluster. Good Case 2: This incremented the old cluster. Good Case 3: This incremented the new cluster. Good Case 4: This incremented the old cluster. But the new cluster remains at refcount 0. BAD. v2 got it right, because it always put the NEW cluster at refcount 1 (if there was a new cluster), then incremented the old cluster when needed. > +if (ret < 0) { > +return ret; > +} > + > +/* The cluster refcount was incremented; refcount blocks must be flushed > + * before the caller's L2 table updates. */ > qcow2_cache_set_dependency(bs, s->l2_table_cache, > s->refcount_block_cache); > + > +s->free_byte_offset = offset + size; > +if (!offset_into_cluster(s, s->free_byte_offset)) { > +s->free_byte_offset = 0; > +} > + > return offset; > } > > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v3] qcow2: Rewrite qcow2_alloc_bytes()
On 02/06/2015 08:27 AM, Eric Blake wrote: >> >> -/* The cluster refcount was incremented, either by >> qcow2_alloc_clusters() >> - * or explicitly by qcow2_update_cluster_refcount(). Refcount blocks >> must >> - * be flushed before the caller's L2 table updates. >> - */ >> +assert(offset); >> +ret = update_refcount(bs, offset, size, 1, QCOW2_DISCARD_NEVER); > > Case 1: This incremented the new cluster. Good > Case 2: This incremented the old cluster. Good > Case 3: This incremented the new cluster. Good > Case 4: This incremented the old cluster. But the new cluster remains at > refcount 0. BAD. Wait. Maybe I'm confused. You are requesting an update_refcount() across size bytes, and given the offset, that means that the code will round up to cover BOTH clusters in one call. Does update_refcount() properly increment from [ 1, 0 ] to [ 2, 1 ] when given a 2-cluster size (when offset, size is rounded up to cluster boundaries)? If so, then there is no bug after all. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/2] openpic: convert to vmstate
On 06/02/15 14:51, Alexander Graf wrote: > On 06.02.15 15:36, Mark Cave-Ayland wrote: >> This patchset follows on from my ppc loadvm/savevm work and converts >> openpic over to vmstate. >> >> With these patches applied, I can successfully a savevm/loadvm pair >> under -M mac99 within OpenBIOS, the same result as with the previous >> patchset applied. >> >> Alex: if you're happy with this, I could fixup the previous patch in >> my ppc loadvm/savevm patchset and append these two patches to the end >> of the series? > > Sounds perfect! > > Reviewed-by: Alexander Graf Great! Assuming Juan and Amit have no further comments, I'll repost the augmented series in a couple of days. ATB, Mark.
Re: [Qemu-devel] [PATCH 0/3] softfloat: Remove STATUS macros
On 2 February 2015 at 21:37, Richard Henderson wrote: > On 02/02/2015 12:31 PM, Peter Maydell wrote: >> Peter Maydell (3): >> softfloat: Expand out the STATUS_PARAM macro >> softfloat: expand out STATUS_VAR >> softfloat: expand out STATUS macro >> >> fpu/softfloat-specialize.h | 124 ++-- >> fpu/softfloat.c| 1609 >> >> include/fpu/softfloat.h| 371 +- >> target-mips/msa_helper.c | 74 +- >> 4 files changed, 1188 insertions(+), 990 deletions(-) > > Reviewed-by: Richard Henderson Thanks. Unless anybody wants to object I think I'll apply these to master today (with the removal of the space you point out in patch 1), since they're kind of disruptive and I think that will overall minimise peoples' rebasing pain. -- PMM
Re: [Qemu-devel] [PATCH 1/2] target-i386: Simplify listflags() function
On 03/02/2015 19:08, Eduardo Habkost wrote: > listflags() had lots of unnecessary complexity. Instead of printing to a > buffer that will be immediately printed, simply call the printing > function directly. Also, remove the fbits and flags arguments that were > always set to the same value. Also, there's no need to list the flags in > reverse order. > > Signed-off-by: Eduardo Habkost > --- > target-i386/cpu.c | 42 ++ > 1 file changed, 14 insertions(+), 28 deletions(-) > > diff --git a/target-i386/cpu.c b/target-i386/cpu.c > index 3a9b32e..39d2fda 100644 > --- a/target-i386/cpu.c > +++ b/target-i386/cpu.c > @@ -1911,34 +1911,19 @@ static void x86_cpu_parse_featurestr(CPUState *cs, > char *features, > } > } > > -/* generate a composite string into buf of all cpuid names in featureset > - * selected by fbits. indicate truncation at bufsize in the event of > overflow. > - * if flags, suppress names undefined in featureset. > +/* Print all cpuid feature names in featureset > */ > -static void listflags(char *buf, int bufsize, uint32_t fbits, > - const char **featureset, uint32_t flags) > -{ > -const char **p = &featureset[31]; > -char *q, *b, bit; > -int nc; > - > -b = 4 <= bufsize ? buf + (bufsize -= 3) - 1 : NULL; > -*buf = '\0'; > -for (q = buf, bit = 31; fbits && bufsize; --p, fbits &= ~(1 << bit), > --bit) > -if (fbits & 1 << bit && (*p || !flags)) { > -if (*p) > -nc = snprintf(q, bufsize, "%s%s", q == buf ? "" : " ", *p); > -else > -nc = snprintf(q, bufsize, "%s[%d]", q == buf ? "" : " ", > bit); > -if (bufsize <= nc) { > -if (b) { > -memcpy(b, "...", sizeof("...")); > -} > -return; > -} > -q += nc; > -bufsize -= nc; > +static void listflags(FILE *f, fprintf_function print, const char > **featureset) > +{ > +int bit; > +bool first = true; > + > +for (bit = 0; bit < 32; bit++) { > +if (featureset[bit]) { > +print(f, "%s%s", first ? "" : " ", featureset[bit]); > +first = false; > } > +} > } > > /* generate CPU information. */ > @@ -1963,8 +1948,9 @@ void x86_cpu_list(FILE *f, fprintf_function cpu_fprintf) > for (i = 0; i < ARRAY_SIZE(feature_word_info); i++) { > FeatureWordInfo *fw = &feature_word_info[i]; > > -listflags(buf, sizeof(buf), (uint32_t)~0, fw->feat_names, 1); > -(*cpu_fprintf)(f, " %s\n", buf); > +(*cpu_fprintf)(f, " "); > +listflags(f, cpu_fprintf, fw->feat_names); > +(*cpu_fprintf)(f, "\n"); > } > } > > Reviewed-by: Paolo Bonzini
Re: [Qemu-devel] [PATCH 0/4] target-arm: fix various clang UB sanitizer warnings
On Fri, Feb 6, 2015 at 8:34 AM, Peter Maydell wrote: > This patchset fixes a collection of warnings emitted by the clang > undefined behaviour sanitizer in the course of booting an AArch64 > Linux guest to a shell prompt. These are all various kinds of bad > shift (shifting into the sign bit, left shifting negative values, > shifting by more than the size of the data type). > > There remains one set of warnings which I'm not sure how to > approach. We have an enum for the valid syndrome register EC > field values: > > enum arm_exception_class { > EC_UNCATEGORIZED = 0x00, > [...] > EC_VECTORCATCH= 0x3a, > EC_AA64_BKPT = 0x3c, > }; > > The EC field is a 6 bit field at the top of the 32 bit syndrome > register, so we define > > #define ARM_EL_EC_SHIFT 26 > > and have utility functions that assemble syndrome values like: > > static inline uint32_t syn_aa64_bkpt(uint32_t imm16) > { > return (EC_AA64_BKPT << ARM_EL_EC_SHIFT) | ARM_EL_IL | (imm16 & > 0x); > } > > Unfortunately this is UB, because C says that enums can be > signed types, and if the enum is signed then "EC_AA64_BKPT << 26" > is shifting into the sign bit of a signed type. > > I can't see any nice way of avoiding this. Possible nasty ways: > > (1) Drop the enum and just use old-fashioned > #define EC_AA64_BKPT 0x3cU > since then we can force them to be unsigned > (2) Cast the constant to unsigned at point of use > (3) Keep the enum and add defines wrapping actual use > #define EC_AA64_BKPT ((unsigned)EC_AA64_BKPT) > I guess there's also > (4) Ignore the problem and trust that the compiler developers will > never decide to use this bit of undefined behaviour. > > My preference is (1) I suppose (we don't actually use the > enum for anything except to define the values, so if it's > not providing helpful definitions we should drop it). > > Opinions? > Similar to the #define approach, but possibly with the benefits of the enum. What if you declare each enum separately as const unsigned int? > > Peter Maydell (4): > target-arm: A64: Fix shifts into sign bit > target-arm: A64: Fix handling of rotate in logic_imm_decode_wmask > target-arm: A64: Avoid left shifting negative integers in > disas_pc_rel_addr > target-arm: A64: Avoid signed shifts in disas_ldst_pair() > > target-arm/translate-a64.c | 18 +++--- > 1 file changed, 11 insertions(+), 7 deletions(-) > > -- > 1.9.1 > > >
Re: [Qemu-devel] [PATCH v2 0/1] dataplane vs. endianness
On Mon, Jan 26, 2015 at 05:26:41PM +0100, Cornelia Huck wrote: > Stefan: > > Here's v2 of my endianness patch for dataplane, with the extraneous > vdev argument dropped from get_desc(). > > I orginally planned to send my virtio-1 patchset as well, but I haven't > found the time for it; therefore, I think this should be applied > independently. > > David: I take it your r-b still holds? > > Cornelia Huck (1): > dataplane: endianness-aware accesses > > hw/block/dataplane/virtio-blk.c | 4 +- > hw/scsi/virtio-scsi-dataplane.c | 2 +- > hw/virtio/Makefile.objs | 2 +- > hw/virtio/dataplane/Makefile.objs | 2 +- > hw/virtio/dataplane/vring.c | 53 --- > include/hw/virtio/dataplane/vring-accessors.h | 75 > +++ > include/hw/virtio/dataplane/vring.h | 14 + > 7 files changed, 117 insertions(+), 35 deletions(-) > create mode 100644 include/hw/virtio/dataplane/vring-accessors.h I will merge this into the block tree for the Friday, 13th of February pull request. Reviewed-by: Stefan Hajnoczi pgp13YazoOu4N.pgp Description: PGP signature
Re: [Qemu-devel] [PULL 0/6] Net patches
On 6 February 2015 at 14:10, Stefan Hajnoczi wrote: > The following changes since commit 16017c48547960539fcadb1f91d252124f442482: > > softfloat: Clarify license status (2015-01-29 16:45:45 +) > > are available in the git repository at: > > git://github.com/stefanha/qemu.git tags/net-pull-request > > for you to fetch changes up to 2c4681f512822b4aa35371683e164d4818f21dce: > > monitor: more accurate completion for host_net_remove() (2015-02-06 > 14:06:45 +) > > > > Applied, thanks. -- PMM
Re: [Qemu-devel] [PATCH 0/4] target-arm: fix various clang UB sanitizer warnings
On 02/06/2015 07:57 AM, Greg Bellows wrote: > Similar to the #define approach, but possibly with the benefits of the enum. > What if you declare each enum separately as const unsigned int? This is not c++. That won't do what you want. r~
Re: [Qemu-devel] [PATCH v3 04/19] libqos/ahci: Add command header helpers
On Thu, Feb 05, 2015 at 12:41:15PM -0500, John Snow wrote: > Adds command header helper functions: > -ahci_command_header_set > -ahci_command_header_get, > -ahci_command_destroy, and > -ahci_cmd_pick > > These helpers help to quickly manage the command header information in > the AHCI device. > > ahci_command_header_set and get will store or retrieve an AHCI command > header, respectively. > > ahci_cmd_pick chooses the first available but least recently used > command slot to allow us to cycle through the available command slots. > > ahci_command_destroy obliterates all information contained within a > given slot's command header, and frees its associated command table, > but not its DMA buffer! > > Lastly, the command table pointer fields (dba and dbau) are merged into > a single 64bit value to make managing 64bit tests simpler. > > Signed-off-by: John Snow > --- > tests/ahci-test.c | 43 -- > tests/libqos/ahci.c | 75 > + > tests/libqos/ahci.h | 17 > 3 files changed, 110 insertions(+), 25 deletions(-) Reviewed-by: Stefan Hajnoczi pgpHR9oi6twAK.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH v3 10/19] qtest/ahci: add ahci_write_fis
On Thu, Feb 05, 2015 at 12:41:21PM -0500, John Snow wrote: > Similar to ahci_set_command_header, add a helper that takes an > in-memory representation of a command FIS and writes it to guest > memory, handling endianness as-needed. > > Signed-off-by: John Snow > --- > tests/ahci-test.c | 2 +- > tests/libqos/ahci.c | 14 ++ > tests/libqos/ahci.h | 3 ++- > 3 files changed, 17 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi pgpZ7LPNvsGVQ.pgp Description: PGP signature
Re: [Qemu-devel] Google Summer of Code 2015 - Implement Mac OS 9 support
On Thu, Feb 05, 2015 at 11:07:40AM -0500, Programmingkid wrote: > Implement support for Mac OS 9 in QEMU. > > QEMU has gone a long way in emulating a Macintosh. But we can still improve. > Adding support for Mac OS 9 would be a great imporovement. This would allow > everyone who misses their older applications to be reacquainted with them. It > would also expand QEMU's abilities. What is missing? Please explain which features need to be added so Mac OS 9 can run. Stefan pgp0y7h_5HeAa.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 0/4] target-arm: fix various clang UB sanitizer warnings
On 02/06/2015 06:34 AM, Peter Maydell wrote: > I can't see any nice way of avoiding this. Possible nasty ways: > > (1) Drop the enum and just use old-fashioned > #define EC_AA64_BKPT 0x3cU > since then we can force them to be unsigned > (2) Cast the constant to unsigned at point of use > (3) Keep the enum and add defines wrapping actual use > #define EC_AA64_BKPT ((unsigned)EC_AA64_BKPT) > I guess there's also > (4) Ignore the problem and trust that the compiler developers will > never decide to use this bit of undefined behaviour. It should be enough to simply add the unsigned suffix to the integers as they are, forcing the underlying type to be unsigned. At least that's suggested by the simple test case extern void link_error(void); enum X { bot = 1u } x = bot; int main() { if (-x < 0) link_error(); return 0; } $ clang -O z.c z.c:3:21: warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] int main() { if (-x < 0) link_error(); return 0; } ~~ ^ ~ $ gcc -O -Wall z.c $ r~
Re: [Qemu-devel] [PATCH v3] qcow2: Rewrite qcow2_alloc_bytes()
Am 06.02.2015 um 16:31 hat Eric Blake geschrieben: > On 02/06/2015 08:27 AM, Eric Blake wrote: > > >> > >> -/* The cluster refcount was incremented, either by > >> qcow2_alloc_clusters() > >> - * or explicitly by qcow2_update_cluster_refcount(). Refcount blocks > >> must > >> - * be flushed before the caller's L2 table updates. > >> - */ > >> +assert(offset); > >> +ret = update_refcount(bs, offset, size, 1, QCOW2_DISCARD_NEVER); > > > > Case 1: This incremented the new cluster. Good > > Case 2: This incremented the old cluster. Good > > Case 3: This incremented the new cluster. Good > > Case 4: This incremented the old cluster. But the new cluster remains at > > refcount 0. BAD. > > Wait. Maybe I'm confused. You are requesting an update_refcount() > across size bytes, and given the offset, that means that the code will > round up to cover BOTH clusters in one call. Does update_refcount() > properly increment from [ 1, 0 ] to [ 2, 1 ] when given a 2-cluster > size (when offset, size is rounded up to cluster boundaries)? If so, > then there is no bug after all. Yes, this is what it should be doing. Letting one update_refcount() call (more or less) atomically increment both refcounts was the whole point of my suggestion anyway because that is what enables the simplification. Kevin pgpMnBWnYBqge.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 2/5] virtio: increase VIRITO_QUEUE_MAX to 513
On Fri, Feb 06, 2015 at 03:54:10PM +0800, Jason Wang wrote: Subject has a typo s/VIRITIO/VIRTIO/ > Recent linux kernel supports up to 256 tap queues. Increase the limit > to 513 (256 * 2 + 1(ctrl vq)). For other reviewers: sizeof(VirtQueue) on x86_64 is 88 bytes. We waste memory but it's only on the order of 40 KB so I think this is okay. pgpADJAAB6OvK.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Support more virtio queues
On Fri, Feb 06, 2015 at 03:54:08PM +0800, Jason Wang wrote: > We current limit the max virtio queues to 64. This is not sufficient > to support some multiqueue deivces (e.g recent Linux support up to 256 > tap queues). So this series try to let virtio to support more queues. > > No much works needs to be done except: > > - Patch 1 renames VIRTIO_PCI_QUEUE_MAX to a more generic name: > VIRTIO_QUEUE_MAX > - Patch 2 simply increase the the limitation to 513 (256 queue pairs) > - Patch 3 fixes a bug in virtio_del_queue() > - Patch 4 tries to remove the hard coded MSI-X bar size (4096) and allow up to > 2048 MSI-X entries. > - Patch 5 add a boolean property to let the virtio-net can calculate > the MSI-X bar size based on the number of MSI-X vectors and keep > migration compability with legacy version whose bar size is 4096. > > With this patch, we can support up to 256 queues.Since x86 can only > allow about 240 interrupt vectors for MSI-X, current Linux driver can > only have about 80 queue pairs has their private MSI-X interrupt > vectors. With sharing IRQ with queue pairs (RFC posted in > https://lkml.org/lkml/2014/12/25/169), Linux driver can have up > to about 186 queue pairs has their private MSI-X interrupt vectors. Worth mentioning that the change to VIRTIO_QUEUE_MAX is safe from a live migration perspective: Old QEMUs will not accept migration from new QEMUs launched with >64 queues: Invalid number of PCI queues: 0x41 Other than that, live migration works fine. Stefan pgpm08kr2le3B.pgp Description: PGP signature
Re: [Qemu-devel] [v2][RFC][PATCH] virtio: uniform virtio device IDs
On Fri, Feb 06, 2015 at 01:41:26PM +0800, Tiejun Chen wrote: > Actually we define these device IDs in virtio standard, so > we'd better put them into one common place to manage conveniently. > Here I also add VIRTIO_ID_RESERVE according to virtio spec. > > Signed-off-by: Tiejun Chen > --- > hw/9pfs/virtio-9p.h| 2 -- > include/hw/virtio/virtio-balloon.h | 3 --- > include/hw/virtio/virtio-blk.h | 3 --- > include/hw/virtio/virtio-rng.h | 3 --- > include/hw/virtio/virtio-scsi.h| 3 --- > include/hw/virtio/virtio-serial.h | 3 --- > include/hw/virtio/virtio.h | 16 > pc-bios/s390-ccw/virtio.h | 8 +--- > 8 files changed, 17 insertions(+), 24 deletions(-) Reviewed-by: Stefan Hajnoczi pgpCiVoeUChnt.pgp Description: PGP signature
[Qemu-devel] [PULL 01/42] Restore atapi_dma flag across migration
From: "Dr. David Alan Gilbert" If a migration happens just after the guest has kicked off an ATAPI command and kicked off DMA, we lose the atapi_dma flag, and the destination tries to complete the command as PIO rather than DMA. This upsets Linux; modern libata based kernels stumble and recover OK, older kernels end up passing bad data to userspace. Signed-off-by: Dr. David Alan Gilbert Reviewed-by: John Snow Signed-off-by: Kevin Wolf --- hw/ide/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/ide/core.c b/hw/ide/core.c index d4af5e2..ac3f015 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2417,6 +2417,7 @@ static int ide_drive_pio_post_load(void *opaque, int version_id) s->end_transfer_func = transfer_end_table[s->end_transfer_fn_idx]; s->data_ptr = s->io_buffer + s->cur_io_buffer_offset; s->data_end = s->data_ptr + s->cur_io_buffer_len; +s->atapi_dma = s->feature & 1; /* as per cmd_packet */ return 0; } -- 1.8.3.1
[Qemu-devel] [PULL 00/42] Block patches
The following changes since commit cebbae86b4f7ee3d3dd9df906b97d269e70d9cc7: Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into staging (2015-02-06 14:35:52 +) are available in the git repository at: git://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to 8c44dfbc62a50a8bc4113f199b8662861f757591: qcow2: Rewrite qcow2_alloc_bytes() (2015-02-06 17:24:22 +0100) Block patches for 2.3 Alberto Garcia (1): block: Give always priority to unused entries in the qcow2 L2 cache Denis V. Lunev (7): block/raw-posix: create translate_err helper to merge errno values block/raw-posix: create do_fallocate helper block/raw-posix: refactor handle_aiocb_write_zeroes a bit block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes nbd: fix max_discard/max_transfer_length Don Slutz (1): qemu-img: Add QEMU_PKGVERSION to QEMU_IMG_VERSION Dr. David Alan Gilbert (2): Restore atapi_dma flag across migration atapi migration: Throw recoverable error to avoid recovery Fam Zheng (2): qed: Really remove unused field QEDAIOCB.finished qemu-iotests: Fix supported_oses check Francesco Romani (1): block: add event when disk usage exceeds threshold Jeff Cody (1): block: fix off-by-one error in qcow and qcow2 Max Reitz (6): iotests: Specify format for qemu-nbd iotests: Fix 083 iotests: Fix 100 for nbd iotests: Fix 104 for NBD nbd: Improve error messages qcow2: Rewrite qcow2_alloc_bytes() Peter Lieven (7): block: change default for discard and write zeroes to INT_MAX block: add accounting for merged requests hw/virtio-blk: add a constant for max number of merged requests block-backend: expose bs->bl.max_transfer_length virtio-blk: introduce multiread virtio-blk: add a knob to disable request merging block: introduce BDRV_REQUEST_MAX_SECTORS Peter Wu (12): block/dmg: properly detect the UDIF trailer block/dmg: extract mish block decoding functionality block/dmg: extract processing of resource forks block/dmg: process a buffer instead of reading ints block/dmg: validate chunk size to avoid overflow block/dmg: process XML plists block/dmg: set virtual size to a non-zero value block/dmg: fix sector data offset calculation block/dmg: use SectorNumber from BLKX header block/dmg: factor out block type check block/dmg: support bzip2 block entry types block/dmg: improve zeroes handling Stefan Hajnoczi (2): qed: check for header size overflow qemu-iotests: add 116 invalid QED input file tests block.c | 35 +-- block/Makefile.objs | 2 + block/accounting.c | 7 + block/block-backend.c| 5 + block/dmg.c | 502 ++- block/nbd-client.c | 4 +- block/nbd-client.h | 2 +- block/nbd.c | 11 +- block/qapi.c | 5 + block/qcow.c | 2 +- block/qcow2-cache.c | 4 +- block/qcow2-refcount.c | 78 +++--- block/qcow2.c| 2 +- block/qed.c | 5 + block/qed.h | 1 - block/raw-posix.c| 125 +++--- block/write-threshold.c | 125 ++ configure| 50 hmp.c| 6 +- hw/block/dataplane/virtio-blk.c | 8 +- hw/block/virtio-blk.c| 299 +++ hw/ide/atapi.c | 17 ++ hw/ide/core.c| 1 + hw/ide/internal.h| 2 + hw/ide/pci.c | 11 + include/block/accounting.h | 3 + include/block/block.h| 3 + include/block/block_int.h| 4 + include/block/nbd.h | 2 +- include/block/write-threshold.h | 64 + include/hw/virtio/virtio-blk.h | 18 +- include/sysemu/block-backend.h | 1 + nbd.c| 42 ++-- qapi/block-core.json | 60 - qemu-img.c | 2 +- qemu-nbd.c | 7 +- qmp-commands.hx | 54 - tests/Makefile | 3 + tests/qemu-iotests/067.out | 5 + tests/qemu-iotests/083 | 3 +- tests/qemu-iotests/083.out | 81 +++ tests/qemu-iotests/100 | 12 + tests/qemu-iotests/104 | 9 +- tests/qemu-iotests/116 | 96 tests/qemu-iotests
[Qemu-devel] [PULL 10/42] qemu-img: Add QEMU_PKGVERSION to QEMU_IMG_VERSION
From: Don Slutz This is the same way vl.c handles this. Signed-off-by: Don Slutz Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- qemu-img.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index 4e9a7f5..e148af8 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -35,7 +35,7 @@ #include "block/qapi.h" #include -#define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION \ +#define QEMU_IMG_VERSION "qemu-img version " QEMU_VERSION QEMU_PKGVERSION \ ", Copyright (c) 2004-2008 Fabrice Bellard\n" typedef struct img_cmd_t { -- 1.8.3.1