Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()
On 07/28/2011 12:44 AM, Blue Swirl wrote: On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino wrote: > This function should be used when the VM is not supposed to resume > execution (eg. by issuing 'cont' monitor command). > > Today, we allow the user to resume execution even when: > >o the guest shuts down and -no-shutdown is used >o there's a kvm internal error >o loading the VM state with -loadvm or "loadvm" in the monitor fails > > I think only badness can happen from the cases above. I'd suppose a system_reset should bring the system back to sanity and then clear vm_permanent_stopped (where's -ly?) except maybe for KVM internal error if that can't be recovered. Then it would not very permanent anymore, so the name would need adjusting. Currently, all kvm internal errors are recoverable by reset (and possibly by fiddling with memory/registers). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention
This might be a bit late comment... On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote: > diff --git a/hw/pci_host.c b/hw/pci_host.c > index 728e2d4..bfdc321 100644 > --- a/hw/pci_host.c > +++ b/hw/pci_host.c > @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus > *bus, uint32_t addr) > return pci_find_device(bus, bus_num, devfn); > } > > +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr, > + uint32_t limit, uint32_t val, uint32_t len) > +{ > +assert(len <= 4); > +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr)); > +} > + > +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr, > +uint32_t limit, uint32_t len) > +{ > +assert(len <= 4); > +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr)); > +} > + Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < addr. So we need explicit "if (limit < addr) return;". Here's the patch for pci branch. >From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001 Message-Id: <75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp> From: Isaku Yamahata Date: Thu, 28 Jul 2011 16:20:28 +0900 Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common This patch adds boundary check in pci_host_config_read/write_common() Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < addr. So we need explicit "if (limit <= addr) return;" Signed-off-by: Isaku Yamahata --- hw/pci_host.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/pci_host.c b/hw/pci_host.c index 2e8a29f..71fd3a1 100644 --- a/hw/pci_host.c +++ b/hw/pci_host.c @@ -51,6 +51,9 @@ void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr, uint32_t limit, uint32_t val, uint32_t len) { assert(len <= 4); +if (limit <= addr) { +return; +} pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr)); } @@ -58,6 +61,9 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, uint32_t limit, uint32_t len) { assert(len <= 4); +if (limit <= addr) { +return 0; +} return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr)); } -- 1.7.1.1 -- yamahata
Re: [Qemu-devel] [PATCH v2 3/5] balloon: Ignore negative balloon values
Amit Shah writes: > Negative balloon values don't make sense, ignore them. Actually, they aren't ignored, they're rejected.
Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks
Amit Shah writes: > Add an exit handler that will free up RAM and unregister the savevm > section after a virtio-balloon device is unplugged. > > Signed-off-by: Amit Shah > --- > hw/virtio-balloon.c |5 + > hw/virtio-pci.c | 11 ++- > hw/virtio.h |1 + > 3 files changed, 16 insertions(+), 1 deletions(-) > > diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c > index 26ee364..0ce0049 100644 > --- a/hw/virtio-balloon.c > +++ b/hw/virtio-balloon.c > @@ -297,3 +297,8 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev) > > return &s->vdev; > } > + > +void virtio_balloon_exit(VirtIODevice *vdev) > +{ > +virtio_cleanup(vdev); > +} > diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c > index ca5f125..316bf92 100644 > --- a/hw/virtio-pci.c > +++ b/hw/virtio-pci.c > @@ -795,6 +795,15 @@ static int virtio_balloon_init_pci(PCIDevice *pci_dev) > return 0; > } > > +static int virtio_balloon_exit_pci(PCIDevice *pci_dev) > +{ > +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev); > + > +virtio_pci_stop_ioeventfd(proxy); > +virtio_balloon_exit(proxy->vdev); > +return virtio_exit_pci(pci_dev); > +} > + Same code in every other virtio_*_exit_pci(). Suggests there's something wrong with the generic code. Outside the scope of this series, of course. [...]
Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model
On 07/27/2011 10:01 PM, Anthony Liguori wrote: That's milkymist, not GoldFish. Oh, Goldfish is fake. It's not real hardware. The enumerator device is not a real device. It's weird because it's imaginary and was designed specifically within QEMU. It's not a good example for discussing modelling. There are plenty of PV interfaces implemented by QEMU. Would you say the same of virtio? Paolo
Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks
Amit Shah writes: > On (Thu) 28 Jul 2011 [11:47:15], Amit Shah wrote: >> Add an exit handler that will free up RAM and unregister the savevm >> section after a virtio-balloon device is unplugged. > > This commit message should be changed; I'll do that in the pull > request I send out. You mean drop "unregister the savevm", because it has become PATCH 5/5?
[Qemu-devel] [PATCH 0/8] qcow2 cleanup for coroutine-block branch
This is a collection of patches that remove QCowAIOCB, useless with coroutines. It apply to Kevin coroutine-block branch. I leave all step I did to remove the structure, feel free to collapse some of them. I tested with iotests and got no changes (026 fails like coroutine-block branch). Frediano Ziglio (8): qcow2: removed unused fields qcow2: removed cur_nr_sectors field in QCowAIOCB qcow2: remove l2meta from QCowAIOCB qcow2: remove cluster_offset from QCowAIOCB qcow2: remove common from QCowAIOCB qcow2: reindent and use while before the big jump qcow2: removed QCowAIOCB entirely qcow2: remove memory leak block/qcow2.c | 396 - 1 files changed, 167 insertions(+), 229 deletions(-)
[Qemu-devel] [PATCH 1/8] qcow2: removed unused fields
Signed-off-by: Frediano Ziglio --- block/qcow2.c | 10 +++--- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index edc068e..5c454bb 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -381,11 +381,8 @@ typedef struct QCowAIOCB { uint64_t bytes_done; uint64_t cluster_offset; uint8_t *cluster_data; -bool is_write; QEMUIOVector hd_qiov; -QEMUBH *bh; QCowL2Meta l2meta; -QLIST_ENTRY(QCowAIOCB) next_depend; } QCowAIOCB; /* @@ -517,13 +514,12 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, BlockDriverCompletionFunc *cb, - void *opaque, int is_write, QCowAIOCB *acb) + void *opaque, QCowAIOCB *acb) { memset(acb, 0, sizeof(*acb)); acb->common.bs = bs; acb->sector_num = sector_num; acb->qiov = qiov; -acb->is_write = is_write; qemu_iovec_init(&acb->hd_qiov, qiov->niov); @@ -543,7 +539,7 @@ static int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num, QCowAIOCB acb; int ret; -qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, 0, &acb); +qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, &acb); qemu_co_mutex_lock(&s->lock); do { @@ -658,7 +654,7 @@ static int qcow2_co_writev(BlockDriverState *bs, QCowAIOCB acb; int ret; -qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, 1, &acb); +qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, &acb); s->cluster_cache_offset = -1; /* disable compressed cache */ qemu_co_mutex_lock(&s->lock); -- 1.7.1
[Qemu-devel] [PATCH 2/8] qcow2: removed cur_nr_sectors field in QCowAIOCB
Signed-off-by: Frediano Ziglio --- block/qcow2.c | 98 + 1 files changed, 43 insertions(+), 55 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 5c454bb..0cf4465 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -377,7 +377,6 @@ typedef struct QCowAIOCB { int64_t sector_num; QEMUIOVector *qiov; int remaining_sectors; -int cur_nr_sectors;/* number of sectors in current iteration */ uint64_t bytes_done; uint64_t cluster_offset; uint8_t *cluster_data; @@ -395,42 +394,22 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) BDRVQcowState *s = bs->opaque; int index_in_cluster, n1; int ret; - -/* post process the read buffer */ -if (!acb->cluster_offset) { -/* nothing to do */ -} else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) { -/* nothing to do */ -} else { -if (s->crypt_method) { -qcow2_encrypt_sectors(s, acb->sector_num, acb->cluster_data, -acb->cluster_data, acb->cur_nr_sectors, 0, &s->aes_decrypt_key); -qemu_iovec_reset(&acb->hd_qiov); -qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, -acb->cur_nr_sectors * 512); -qemu_iovec_from_buffer(&acb->hd_qiov, acb->cluster_data, -512 * acb->cur_nr_sectors); -} -} - -acb->remaining_sectors -= acb->cur_nr_sectors; -acb->sector_num += acb->cur_nr_sectors; -acb->bytes_done += acb->cur_nr_sectors * 512; +int cur_nr_sectors; /* number of sectors in current iteration */ if (acb->remaining_sectors == 0) { /* request completed */ return 0; } -/* prepare next AIO request */ -acb->cur_nr_sectors = acb->remaining_sectors; +/* prepare next request */ +cur_nr_sectors = acb->remaining_sectors; if (s->crypt_method) { -acb->cur_nr_sectors = MIN(acb->cur_nr_sectors, +cur_nr_sectors = MIN(cur_nr_sectors, QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); } ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9, -&acb->cur_nr_sectors, &acb->cluster_offset); +&cur_nr_sectors, &acb->cluster_offset); if (ret < 0) { return ret; } @@ -439,14 +418,14 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) qemu_iovec_reset(&acb->hd_qiov); qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, -acb->cur_nr_sectors * 512); +cur_nr_sectors * 512); if (!acb->cluster_offset) { if (bs->backing_hd) { /* read from the base image */ n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov, -acb->sector_num, acb->cur_nr_sectors); +acb->sector_num, cur_nr_sectors); if (n1 > 0) { BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO); qemu_co_mutex_unlock(&s->lock); @@ -457,11 +436,9 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) return ret; } } -return 1; } else { /* Note: in this case, no need to wait */ -qemu_iovec_memset(&acb->hd_qiov, 0, 512 * acb->cur_nr_sectors); -return 1; +qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors); } } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) { /* add AIO support for compressed blocks ? */ @@ -472,9 +449,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) qemu_iovec_from_buffer(&acb->hd_qiov, s->cluster_cache + index_in_cluster * 512, -512 * acb->cur_nr_sectors); - -return 1; +512 * cur_nr_sectors); } else { if ((acb->cluster_offset & 511) != 0) { return -EIO; @@ -490,24 +465,37 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); } -assert(acb->cur_nr_sectors <= +assert(cur_nr_sectors <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); qemu_iovec_reset(&acb->hd_qiov); qemu_iovec_add(&acb->hd_qiov, acb->cluster_data, -512 * acb->cur_nr_sectors); +512 * cur_nr_sectors); } BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); qemu_co_mutex_unlock(&s->lock); ret = bdrv_co_readv(bs->file, (acb->cluster_offset >> 9) + index_in_cluster, -acb->cur_nr_sectors, &acb->hd_qiov); +cur_nr_sectors, &acb->hd_qiov); qemu_co_mutex_lock(&s->lock); if (ret < 0) { return ret; } +if (s->crypt_method) { +qcow2_encrypt_sectors(s, acb->sector_num, acb->cluster_data, +acb->cluster_data, cur_nr_sectors, 0, &s->aes_decrypt_key);
[Qemu-devel] [PATCH 3/8] qcow2: remove l2meta from QCowAIOCB
Signed-off-by: Frediano Ziglio --- block/qcow2.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 0cf4465..adf31ce 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -381,7 +381,6 @@ typedef struct QCowAIOCB { uint64_t cluster_offset; uint8_t *cluster_data; QEMUIOVector hd_qiov; -QCowL2Meta l2meta; } QCowAIOCB; /* @@ -514,8 +513,6 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num, acb->bytes_done = 0; acb->remaining_sectors = nb_sectors; acb->cluster_offset = 0; -acb->l2meta.nb_clusters = 0; -qemu_co_queue_init(&acb->l2meta.dependent_requests); return acb; } @@ -566,6 +563,10 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) int n_end; int ret; int cur_nr_sectors; /* number of sectors in current iteration */ +QCowL2Meta l2meta; + +l2meta.nb_clusters = 0; +qemu_co_queue_init(&l2meta.dependent_requests); if (acb->remaining_sectors == 0) { /* request completed */ @@ -579,12 +580,12 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors; ret = qcow2_alloc_cluster_offset(bs, acb->sector_num << 9, -index_in_cluster, n_end, &cur_nr_sectors, &acb->l2meta); +index_in_cluster, n_end, &cur_nr_sectors, &l2meta); if (ret < 0) { return ret; } -acb->cluster_offset = acb->l2meta.cluster_offset; +acb->cluster_offset = l2meta.cluster_offset; assert((acb->cluster_offset & 511) == 0); qemu_iovec_reset(&acb->hd_qiov); @@ -618,9 +619,9 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) return ret; } -ret = qcow2_alloc_cluster_link_l2(bs, &acb->l2meta); +ret = qcow2_alloc_cluster_link_l2(bs, &l2meta); -run_dependent_requests(s, &acb->l2meta); +run_dependent_requests(s, &l2meta); if (ret < 0) { return ret; -- 1.7.1
[Qemu-devel] [PATCH 8/8] qcow2: remove memory leak
Signed-off-by: Frediano Ziglio --- block/qcow2.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 6073568..aed8da7 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -492,6 +492,7 @@ fail: qemu_co_mutex_unlock(&s->lock); qemu_iovec_destroy(&hd_qiov); +qemu_free(cluster_data); return ret; } @@ -604,6 +605,7 @@ fail: qemu_co_mutex_unlock(&s->lock); qemu_iovec_destroy(&hd_qiov); +qemu_free(cluster_data); return ret; } -- 1.7.1
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Am 27.07.2011 21:57, schrieb Christoph Hellwig: > On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote: >> Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give >> impressive performance but currently there is no way to specify all that >> flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with >> writethrough. > > Indeed. This has come up a few times, and actually is a mostly trivial > task. Maybe we should give up waiting for -blockdev and separate cache > mode settings and allow a nocache-writethrough or similar mode now? It's > going to be around 10 lines of code + documentation. I understand that there may be reasons for using O_DIRECT | O_DSYNC, but what is the explanation for O_DSYNC improving performance? Christoph, on another note: Can we rely on Linux AIO never returning short writes except on EOF? Currently we return -EINVAL in this case, so I hope it's true or we wouldn't return the correct error code. The reason why I'm asking is because I want to allow reads across EOF for growable images and pad with zeros (the synchronous code does this today in order to allow bdrv_pread/pwrite to work, and when we start using coroutines in the block layer, these cases will hit the AIO paths). Kevin
Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug
Amit Shah writes: > Migrating after unplugging a virtio-balloon device resulted in an error > message on the destination: > > Unknown savevm section or instance ':00:04.0/virtio-balloon' 0 > load of migration failed > > Fix this by unregistering the section on device unplug. > > Signed-off-by: Amit Shah > --- > hw/virtio-balloon.c |4 > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c > index 0ce0049..072a88a 100644 > --- a/hw/virtio-balloon.c > +++ b/hw/virtio-balloon.c > @@ -45,6 +45,7 @@ typedef struct VirtIOBalloon > size_t stats_vq_offset; > MonitorCompletion *stats_callback; > void *stats_opaque_callback_data; > +DeviceState *qdev; > } VirtIOBalloon; All the other virtio device structs already have such a pointer back to the proxy. Suggests that it should live in VirtIODevice, and be set up in generic code. Again, outside the scope of this series. I hate the virtio pointer thicket. [...]
Re: [Qemu-devel] [PATCH v2 0/5] balloon: fix memleaks, invalid arguments, unplug
Amit Shah writes: > Hello, > > This series is on top of the other balloon series for which I sent a > pull request on Tuesday. > > This series fixes memleak on exit, unregisters the savevm section on > unplug, disallows negative values as ballooning targets and doesn't > allow multiple balloon device registrations. > > v2 contains small tweaks suggested: > - Error is shown by balloon.c instead of virtio-balloon.c in patch 1 >(mst) > - Filter out negative input in do_balloon() and add qerror message >(Markus) > - Separate out savevm section unregistering from memleak fix Reviewed-by: Markus Armbruster
[Qemu-devel] [PATCH 7/8] qcow2: removed QCowAIOCB entirely
Signed-off-by: Frediano Ziglio --- block/qcow2.c | 207 ++--- 1 files changed, 80 insertions(+), 127 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index dfb969e..6073568 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -372,83 +372,77 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov, return n1; } -typedef struct QCowAIOCB { -BlockDriverState *bs; -int64_t sector_num; -QEMUIOVector *qiov; -int remaining_sectors; -uint64_t bytes_done; -uint8_t *cluster_data; -QEMUIOVector hd_qiov; -} QCowAIOCB; - -/* - * Returns 0 when the request is completed successfully, 1 when there is still - * a part left to do and -errno in error cases. - */ -static int qcow2_aio_read_cb(QCowAIOCB *acb) +static int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num, + int remaining_sectors, QEMUIOVector *qiov) { -BlockDriverState *bs = acb->bs; BDRVQcowState *s = bs->opaque; int index_in_cluster, n1; int ret; int cur_nr_sectors; /* number of sectors in current iteration */ uint64_t cluster_offset = 0; +uint64_t bytes_done = 0; +QEMUIOVector hd_qiov; +uint8_t *cluster_data = NULL; -while (acb->remaining_sectors != 0) { +qemu_iovec_init(&hd_qiov, qiov->niov); + +qemu_co_mutex_lock(&s->lock); + +while (remaining_sectors != 0) { /* prepare next request */ -cur_nr_sectors = acb->remaining_sectors; +cur_nr_sectors = remaining_sectors; if (s->crypt_method) { cur_nr_sectors = MIN(cur_nr_sectors, QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); } -ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9, +ret = qcow2_get_cluster_offset(bs, sector_num << 9, &cur_nr_sectors, &cluster_offset); if (ret < 0) { -return ret; +goto fail; } -index_in_cluster = acb->sector_num & (s->cluster_sectors - 1); +index_in_cluster = sector_num & (s->cluster_sectors - 1); -qemu_iovec_reset(&acb->hd_qiov); -qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, +qemu_iovec_reset(&hd_qiov); +qemu_iovec_copy(&hd_qiov, qiov, bytes_done, cur_nr_sectors * 512); if (!cluster_offset) { if (bs->backing_hd) { /* read from the base image */ -n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov, -acb->sector_num, cur_nr_sectors); +n1 = qcow2_backing_read1(bs->backing_hd, &hd_qiov, +sector_num, cur_nr_sectors); if (n1 > 0) { BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO); qemu_co_mutex_unlock(&s->lock); -ret = bdrv_co_readv(bs->backing_hd, acb->sector_num, -n1, &acb->hd_qiov); +ret = bdrv_co_readv(bs->backing_hd, sector_num, +n1, &hd_qiov); qemu_co_mutex_lock(&s->lock); if (ret < 0) { -return ret; +goto fail; } } } else { /* Note: in this case, no need to wait */ -qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors); +qemu_iovec_memset(&hd_qiov, 0, 512 * cur_nr_sectors); } } else if (cluster_offset & QCOW_OFLAG_COMPRESSED) { /* add AIO support for compressed blocks ? */ ret = qcow2_decompress_cluster(bs, cluster_offset); if (ret < 0) { -return ret; +goto fail; } -qemu_iovec_from_buffer(&acb->hd_qiov, +qemu_iovec_from_buffer(&hd_qiov, s->cluster_cache + index_in_cluster * 512, 512 * cur_nr_sectors); } else { if ((cluster_offset & 511) != 0) { -return -EIO; +ret = -EIO; +goto fail; } if (s->crypt_method) { @@ -456,15 +450,15 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) * For encrypted images, read everything into a temporary * contiguous buffer on which the AES functions can work. */ -if (!acb->cluster_data) { -acb->cluster_data = +if (!cluster_data) { +cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); } assert(cur_nr_sectors <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); -qemu_iovec_reset(&acb->hd_qiov); -qemu_iovec_add(&acb->hd_qiov, acb->cluster_data
[Qemu-devel] [PATCH 4/8] qcow2: remove cluster_offset from QCowAIOCB
Signed-off-by: Frediano Ziglio --- block/qcow2.c | 22 +++--- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index adf31ce..446946e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -378,7 +378,6 @@ typedef struct QCowAIOCB { QEMUIOVector *qiov; int remaining_sectors; uint64_t bytes_done; -uint64_t cluster_offset; uint8_t *cluster_data; QEMUIOVector hd_qiov; } QCowAIOCB; @@ -394,6 +393,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) int index_in_cluster, n1; int ret; int cur_nr_sectors; /* number of sectors in current iteration */ +uint64_t cluster_offset = 0; if (acb->remaining_sectors == 0) { /* request completed */ @@ -408,7 +408,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) } ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9, -&cur_nr_sectors, &acb->cluster_offset); +&cur_nr_sectors, &cluster_offset); if (ret < 0) { return ret; } @@ -419,7 +419,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, cur_nr_sectors * 512); -if (!acb->cluster_offset) { +if (!cluster_offset) { if (bs->backing_hd) { /* read from the base image */ @@ -439,9 +439,9 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) /* Note: in this case, no need to wait */ qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors); } -} else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) { +} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) { /* add AIO support for compressed blocks ? */ -ret = qcow2_decompress_cluster(bs, acb->cluster_offset); +ret = qcow2_decompress_cluster(bs, cluster_offset); if (ret < 0) { return ret; } @@ -450,7 +450,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) s->cluster_cache + index_in_cluster * 512, 512 * cur_nr_sectors); } else { -if ((acb->cluster_offset & 511) != 0) { +if ((cluster_offset & 511) != 0) { return -EIO; } @@ -474,7 +474,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); qemu_co_mutex_unlock(&s->lock); ret = bdrv_co_readv(bs->file, -(acb->cluster_offset >> 9) + index_in_cluster, +(cluster_offset >> 9) + index_in_cluster, cur_nr_sectors, &acb->hd_qiov); qemu_co_mutex_lock(&s->lock); if (ret < 0) { @@ -512,7 +512,6 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num, acb->bytes_done = 0; acb->remaining_sectors = nb_sectors; -acb->cluster_offset = 0; return acb; } @@ -564,6 +563,7 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) int ret; int cur_nr_sectors; /* number of sectors in current iteration */ QCowL2Meta l2meta; +uint64_t cluster_offset; l2meta.nb_clusters = 0; qemu_co_queue_init(&l2meta.dependent_requests); @@ -585,8 +585,8 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) return ret; } -acb->cluster_offset = l2meta.cluster_offset; -assert((acb->cluster_offset & 511) == 0); +cluster_offset = l2meta.cluster_offset; +assert((cluster_offset & 511) == 0); qemu_iovec_reset(&acb->hd_qiov); qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, @@ -612,7 +612,7 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb) BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO); qemu_co_mutex_unlock(&s->lock); ret = bdrv_co_writev(bs->file, - (acb->cluster_offset >> 9) + index_in_cluster, + (cluster_offset >> 9) + index_in_cluster, cur_nr_sectors, &acb->hd_qiov); qemu_co_mutex_lock(&s->lock); if (ret < 0) { -- 1.7.1
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote: > On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote: > > making > > sure no lib is calling any I/O function to be able to defreeze the > > filesystems later, making sure the oom killer or a wrong kill -9 > > $RANDOM isn't killing the agent by mistake while the I/O is blocked > > and the copy is going. > > Yes with the current API if the agent is killed while the filesystems > are frozen we are screwed. > > I have just submitted patches that implement a new API that should make > the virtualization use case more reliable. Basically, I am adding a new > ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns > a file descriptor; as long as that file descriptor is held open, the > filesystem remains open. If the freeze file descriptor is closed (be it > through a explicit call to close(2) or as part of process exit > housekeeping) the associated filesystem is automatically thawed. > > - fsfreeze: add ioctl to create a fd for freeze control > http://marc.info/?l=linux-fsdevel&m=131175212512290&w=2 > - fsfreeze: add freeze fd ioctls > http://marc.info/?l=linux-fsdevel&m=131175220612341&w=2 This is probably how the API should have been implemented originally instead of FIFREEZE/FITHAW. It looks a bit overkill though, I would think it'd be enough to have the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by closing the file without requiring any of the FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases for those if you implemented it, maybe to check if root is stepping on its own toes by checking if the fs is already freezed before freezing it and returning failure if it is, running ioctl instead of opening closing the file isn't necessarily better. At the very least the get_user(should_freeze, argp) doesn't seem so necessary, it just complicates the ioctl API a bit without much gain, I think it'd be cleaner if the FS_FREEZE_FD was the only way to freeze then. It's certainly a nice reliability improvement and safer API. Now if you add a file descriptor to epoll/poll that userland can open and talk to, to know when a fsfreeze is asked on a certain fs, a fsfreeze userland agent (not virt related too) could open it and start the scripts if that filesystem is being fsfreezed before calling freeze_super(). Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze without any dependency on a virt specific guest agent. Maybe Christoph's right there are filesystems in userland (not sure how the storage is related, it's all about filesystems and apps as far I can see, and it's all blkdev agnostic) that may make things more complicated, but those usually have a kernel backend too (like fuse). I may not see the full picture of the filesystem in userland or how the storage agent in guest userland relates to this. If you believe having libvirt talking QMP/QAPI over a virtio-serial vmchannel with some virt specific guest userland agent bypassing qemu entirely is better, that's ok with me, but there should be a strong reason for it because the paravirt_fsfreeze.ko approach with a small qemu backend and a qemu monitor command that starts paravirt-fsfreeze in guest before going ahead blocking all I/O (to provide backwards compatibility and reliable snapshots to guest OS that won't have the paravirt fsfreeze too) looks more reliable, more compact and simpler to use to me. I'll be surely ok either ways though. Thanks, Andrea
[Qemu-devel] Volume key in qcow3?
Hi, I noted that AES encryption using qcow2 just use the password given as as key (and also truncating it to 16 bytes == 128 bits). This is prone to brute force attacks and is not also easy to change password (you have to decrypt and encrypt again the entire image). LUKS and EncFS use another way. They generate a random key (the "volume key") then use the password you give to encrypt N times (where N is decided by security level or automatically based on time to decrypt the volume key. To change the password just give the old one, get the volume key and encrypt again using the new one. LUKS support also multiple "slots" to allow multiple password and even using an external key file. Obviously this require an additional extension to qcow2 so I think it require a new qcow3 format. Frediano
Re: [Qemu-devel] [PATCH 00/55] Block layer cleanup & fixes
Am 20.07.2011 18:23, schrieb Markus Armbruster: > This patch series looks bigger than it is. All the patches are small > and hopefully easy to review. > > Objectives: > > * Push BlockDriverState members locked, tray_open, media_changed into > device models, where they belong. > > * BlockDriverState member removable is a confusing mess, replace it. > > * Improve eject -f. > > Also clean up minor messes as they get in the way. > > Part I: Preliminaries > PATCH 01-09: Assorted block layer fixes and cleanup Thanks, applied patches 1-9 to the block branch. > Part II: Move tray state to device models > PATCH 10-18: IDE tray open/closed > PATCH 19-22: SCSI tray open/closed > PATCH 23-25: block layer kill tray_open > PATCH 26-27: IDE tray lock > PATCH 28-29: SCSI tray lock > PATCH 30-32: block layer kill locked > PATCH 33-35: IDE & SCSI synchronize physical tray on init > PATCH 36-38: IDE & SCSI migrate tray state > PATCH 39-41: block layer & fdc media_changed > > Part III: Replace removable > PATCH 42-44 clean up inappopriate uses of removable > PATCH 45-46 replace remaining users > > Part IV: Reduce unclean use of block_int.h > PATCH 47-53 > > PART V: Improve eject -f > PATCH 54-55 I'll wait for a v2 on top of the block branch for the rest of the series. Kevin
[Qemu-devel] [PATCH 5/8] qcow2: remove common from QCowAIOCB
Signed-off-by: Frediano Ziglio --- block/qcow2.c |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 446946e..c7445cc 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -373,7 +373,7 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov, } typedef struct QCowAIOCB { -BlockDriverAIOCB common; +BlockDriverState *bs; int64_t sector_num; QEMUIOVector *qiov; int remaining_sectors; @@ -388,7 +388,7 @@ typedef struct QCowAIOCB { */ static int qcow2_aio_read_cb(QCowAIOCB *acb) { -BlockDriverState *bs = acb->common.bs; +BlockDriverState *bs = acb->bs; BDRVQcowState *s = bs->opaque; int index_in_cluster, n1; int ret; @@ -504,7 +504,7 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num, void *opaque, QCowAIOCB *acb) { memset(acb, 0, sizeof(*acb)); -acb->common.bs = bs; +acb->bs = bs; acb->sector_num = sector_num; acb->qiov = qiov; @@ -556,7 +556,7 @@ static void run_dependent_requests(BDRVQcowState *s, QCowL2Meta *m) */ static int qcow2_aio_write_cb(QCowAIOCB *acb) { -BlockDriverState *bs = acb->common.bs; +BlockDriverState *bs = acb->bs; BDRVQcowState *s = bs->opaque; int index_in_cluster; int n_end; -- 1.7.1
Re: [Qemu-devel] [PATCH v5] Add support for Zipit Z2 machine
On Monday 11 July 2011 18:26:37 Vasily Khoruzhick wrote: > On Wednesday 06 July 2011 16:52:49 Vasily Khoruzhick wrote: > > Zipit Z2 is small PXA270 based handheld. > > Ping? One more ping.
Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.
On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu wrote: > On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi wrote: >> On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu wrote: >>> On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti >>> wrote: On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote: > Welcome to give me your comments, thanks. > > Signed-off-by: Zhi Yong Wu > --- > Makefile.objs | 2 +- > block.c | 288 > +++-- > block.h | 1 - > block/blk-queue.c | 116 + > block/blk-queue.h | 70 + > block_int.h | 28 + > blockdev.c | 21 > qemu-config.c | 24 + > qemu-option.c | 17 +++ > qemu-option.h | 1 + > qemu-options.hx | 1 + > 11 files changed, 559 insertions(+), 10 deletions(-) > create mode 100644 block/blk-queue.c > create mode 100644 block/blk-queue.h > > diff --git a/Makefile.objs b/Makefile.objs > index 9f99ed4..06f2033 100644 > --- a/Makefile.objs > +++ b/Makefile.objs > @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o > cloop.o dmg.o bochs.o vpc.o vv > block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o > qcow2-snapshot.o qcow2-cache.o > block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o > qed-cluster.o > block-nested-y += qed-check.o > -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o > +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o > blk-queue.o > block-nested-$(CONFIG_WIN32) += raw-win32.o > block-nested-$(CONFIG_POSIX) += raw-posix.o > block-nested-$(CONFIG_CURL) += curl.o > diff --git a/block.c b/block.c > index 24a25d5..e54e59c 100644 > --- a/block.c > +++ b/block.c > @@ -29,6 +29,9 @@ > #include "module.h" > #include "qemu-objects.h" > > +#include "qemu-timer.h" > +#include "block/blk-queue.h" > + > #ifdef CONFIG_BSD > #include > #include > @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t > sector_num, > static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, > const uint8_t *buf, int nb_sectors); > > +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, > + bool is_write, double elapsed_time, uint64_t *wait); > +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, > + double elapsed_time, uint64_t *wait); > +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, > + bool is_write, uint64_t *wait); > + > static QTAILQ_HEAD(, BlockDriverState) bdrv_states = > QTAILQ_HEAD_INITIALIZER(bdrv_states); > > @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename) > } > #endif > > +static int bdrv_io_limits_enable(BlockIOLimit *io_limits) > +{ > + if ((io_limits->bps[0] == 0) > + && (io_limits->bps[1] == 0) > + && (io_limits->bps[2] == 0) > + && (io_limits->iops[0] == 0) > + && (io_limits->iops[1] == 0) > + && (io_limits->iops[2] == 0)) { > + return 0; > + } > + > + return 1; > +} > + > /* check if the path starts with ":" */ > static int path_has_protocol(const char *path) > { > @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size, > } > } > > +static void bdrv_block_timer(void *opaque) > +{ > + BlockDriverState *bs = opaque; > + BlockQueue *queue = bs->block_queue; > + > + while (!QTAILQ_EMPTY(&queue->requests)) { > + BlockIORequest *request; > + int ret; > + > + request = QTAILQ_FIRST(&queue->requests); > + QTAILQ_REMOVE(&queue->requests, request, entry); > + > + ret = qemu_block_queue_handler(request); > + if (ret == 0) { > + QTAILQ_INSERT_HEAD(&queue->requests, request, entry); > + break; > + } > + > + qemu_free(request); > + } > +} > + > void bdrv_register(BlockDriver *bdrv) > { > if (!bdrv->bdrv_aio_readv) { > @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char > *filename, int flags, > bs->change_cb(bs->change_opaque, CHANGE_MEDIA); > } > > + /* throttling disk I/O limits */ > + if (bdrv_io_limits_enable(&bs->io_limits)) { > + bs->block_queue = qemu_new_block_queue(); > + bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, > bs); > + > + bs->slice_start[0] = qemu_get_clock_ns(vm_clock); > + bs->slice_star
[Qemu-devel] [PATCH 6/8] qcow2: reindent and use while before the big jump
prepare to remove read/write callbacks Signed-off-by: Frediano Ziglio --- block/qcow2.c | 272 - 1 files changed, 135 insertions(+), 137 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index c7445cc..dfb969e 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -395,107 +395,105 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb) int cur_nr_sectors; /* number of sectors in current iteration */ uint64_t cluster_offset = 0; -if (acb->remaining_sectors == 0) { -/* request completed */ -return 0; -} - -/* prepare next request */ -cur_nr_sectors = acb->remaining_sectors; -if (s->crypt_method) { -cur_nr_sectors = MIN(cur_nr_sectors, -QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); -} +while (acb->remaining_sectors != 0) { -ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9, -&cur_nr_sectors, &cluster_offset); -if (ret < 0) { -return ret; -} - -index_in_cluster = acb->sector_num & (s->cluster_sectors - 1); - -qemu_iovec_reset(&acb->hd_qiov); -qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, -cur_nr_sectors * 512); - -if (!cluster_offset) { - -if (bs->backing_hd) { -/* read from the base image */ -n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov, -acb->sector_num, cur_nr_sectors); -if (n1 > 0) { -BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO); -qemu_co_mutex_unlock(&s->lock); -ret = bdrv_co_readv(bs->backing_hd, acb->sector_num, -n1, &acb->hd_qiov); -qemu_co_mutex_lock(&s->lock); -if (ret < 0) { -return ret; -} -} -} else { -/* Note: in this case, no need to wait */ -qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors); +/* prepare next request */ +cur_nr_sectors = acb->remaining_sectors; +if (s->crypt_method) { +cur_nr_sectors = MIN(cur_nr_sectors, +QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); } -} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) { -/* add AIO support for compressed blocks ? */ -ret = qcow2_decompress_cluster(bs, cluster_offset); + +ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9, +&cur_nr_sectors, &cluster_offset); if (ret < 0) { return ret; } -qemu_iovec_from_buffer(&acb->hd_qiov, -s->cluster_cache + index_in_cluster * 512, -512 * cur_nr_sectors); -} else { -if ((cluster_offset & 511) != 0) { -return -EIO; -} +index_in_cluster = acb->sector_num & (s->cluster_sectors - 1); -if (s->crypt_method) { -/* - * For encrypted images, read everything into a temporary - * contiguous buffer on which the AES functions can work. - */ -if (!acb->cluster_data) { -acb->cluster_data = -qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); +qemu_iovec_reset(&acb->hd_qiov); +qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done, +cur_nr_sectors * 512); + +if (!cluster_offset) { + +if (bs->backing_hd) { +/* read from the base image */ +n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov, +acb->sector_num, cur_nr_sectors); +if (n1 > 0) { +BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO); +qemu_co_mutex_unlock(&s->lock); +ret = bdrv_co_readv(bs->backing_hd, acb->sector_num, +n1, &acb->hd_qiov); +qemu_co_mutex_lock(&s->lock); +if (ret < 0) { +return ret; +} +} +} else { +/* Note: in this case, no need to wait */ +qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors); +} +} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) { +/* add AIO support for compressed blocks ? */ +ret = qcow2_decompress_cluster(bs, cluster_offset); +if (ret < 0) { +return ret; } -assert(cur_nr_sectors <= -QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors); -qemu_iovec_reset(&acb->hd_qiov); -qemu_iovec_add(&acb->hd_qiov, acb->cluster_data, +qemu_iovec_from_buffer(&acb->hd_qiov, +s->cluster_cache + index_in_cluster * 512, 512 * cur_nr_sectors); -} +} else { +if ((cluster_offset & 511) != 0) { +
Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.
On Thu, Jul 28, 2011 at 9:20 AM, Stefan Hajnoczi wrote: > On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu wrote: >> On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi wrote: >>> On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu wrote: On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti wrote: > On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote: >> Welcome to give me your comments, thanks. >> >> Signed-off-by: Zhi Yong Wu >> --- >> Makefile.objs | 2 +- >> block.c | 288 >> +++-- >> block.h | 1 - >> block/blk-queue.c | 116 + >> block/blk-queue.h | 70 + >> block_int.h | 28 + >> blockdev.c | 21 >> qemu-config.c | 24 + >> qemu-option.c | 17 +++ >> qemu-option.h | 1 + >> qemu-options.hx | 1 + >> 11 files changed, 559 insertions(+), 10 deletions(-) >> create mode 100644 block/blk-queue.c >> create mode 100644 block/blk-queue.h >> >> diff --git a/Makefile.objs b/Makefile.objs >> index 9f99ed4..06f2033 100644 >> --- a/Makefile.objs >> +++ b/Makefile.objs >> @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o >> cloop.o dmg.o bochs.o vpc.o vv >> block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o >> qcow2-snapshot.o qcow2-cache.o >> block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o >> qed-cluster.o >> block-nested-y += qed-check.o >> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o >> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o >> blk-queue.o >> block-nested-$(CONFIG_WIN32) += raw-win32.o >> block-nested-$(CONFIG_POSIX) += raw-posix.o >> block-nested-$(CONFIG_CURL) += curl.o >> diff --git a/block.c b/block.c >> index 24a25d5..e54e59c 100644 >> --- a/block.c >> +++ b/block.c >> @@ -29,6 +29,9 @@ >> #include "module.h" >> #include "qemu-objects.h" >> >> +#include "qemu-timer.h" >> +#include "block/blk-queue.h" >> + >> #ifdef CONFIG_BSD >> #include >> #include >> @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t >> sector_num, >> static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, >> const uint8_t *buf, int nb_sectors); >> >> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, >> + bool is_write, double elapsed_time, uint64_t *wait); >> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, >> + double elapsed_time, uint64_t *wait); >> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, >> + bool is_write, uint64_t *wait); >> + >> static QTAILQ_HEAD(, BlockDriverState) bdrv_states = >> QTAILQ_HEAD_INITIALIZER(bdrv_states); >> >> @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename) >> } >> #endif >> >> +static int bdrv_io_limits_enable(BlockIOLimit *io_limits) >> +{ >> + if ((io_limits->bps[0] == 0) >> + && (io_limits->bps[1] == 0) >> + && (io_limits->bps[2] == 0) >> + && (io_limits->iops[0] == 0) >> + && (io_limits->iops[1] == 0) >> + && (io_limits->iops[2] == 0)) { >> + return 0; >> + } >> + >> + return 1; >> +} >> + >> /* check if the path starts with ":" */ >> static int path_has_protocol(const char *path) >> { >> @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size, >> } >> } >> >> +static void bdrv_block_timer(void *opaque) >> +{ >> + BlockDriverState *bs = opaque; >> + BlockQueue *queue = bs->block_queue; >> + >> + while (!QTAILQ_EMPTY(&queue->requests)) { >> + BlockIORequest *request; >> + int ret; >> + >> + request = QTAILQ_FIRST(&queue->requests); >> + QTAILQ_REMOVE(&queue->requests, request, entry); >> + >> + ret = qemu_block_queue_handler(request); >> + if (ret == 0) { >> + QTAILQ_INSERT_HEAD(&queue->requests, request, entry); >> + break; >> + } >> + >> + qemu_free(request); >> + } >> +} >> + >> void bdrv_register(BlockDriver *bdrv) >> { >> if (!bdrv->bdrv_aio_readv) { >> @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char >> *filename, int flags, >> bs->change_cb(bs->change_opaque, CHANGE_MEDIA); >> } >> >> + /* throttling disk I/O limits */ >> + if (bdrv_io_limits_enable(&bs->io_limits)) { >> + bs->block_queue = qemu_new_block_queue(); >
Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention
On Thu, Jul 28, 2011 at 04:23:24PM +0900, Isaku Yamahata wrote: > This might be a bit late comment... > > On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote: > > diff --git a/hw/pci_host.c b/hw/pci_host.c > > index 728e2d4..bfdc321 100644 > > --- a/hw/pci_host.c > > +++ b/hw/pci_host.c > > @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus > > *bus, uint32_t addr) > > return pci_find_device(bus, bus_num, devfn); > > } > > > > +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr, > > + uint32_t limit, uint32_t val, uint32_t len) > > +{ > > +assert(len <= 4); > > +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr)); > > +} > > + > > +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr, > > +uint32_t limit, uint32_t len) > > +{ > > +assert(len <= 4); > > +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr)); > > +} > > + > > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < > addr. > So we need explicit "if (limit < addr) return;". > Here's the patch for pci branch. > > >From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001 > Message-Id: > <75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp> > From: Isaku Yamahata > Date: Thu, 28 Jul 2011 16:20:28 +0900 > Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common > > This patch adds boundary check in pci_host_config_read/write_common() > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < > addr. > So we need explicit "if (limit <= addr) return;" > > Signed-off-by: Isaku Yamahata I don't see a problem with this, but could you please clarify when does this happen? I think this is only possible for a pci device behind an express root. If so, this belongs in pcie_host.c I'd also like this info to be recorded in the commit log. > --- > hw/pci_host.c |6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diff --git a/hw/pci_host.c b/hw/pci_host.c > index 2e8a29f..71fd3a1 100644 > --- a/hw/pci_host.c > +++ b/hw/pci_host.c > @@ -51,6 +51,9 @@ void pci_host_config_write_common(PCIDevice *pci_dev, > uint32_t addr, >uint32_t limit, uint32_t val, uint32_t len) > { > assert(len <= 4); > +if (limit <= addr) { > +return; > +} > pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr)); > } > > @@ -58,6 +61,9 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, > uint32_t addr, > uint32_t limit, uint32_t len) > { > assert(len <= 4); > +if (limit <= addr) { > +return 0; > +} > return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr)); > } > > -- > 1.7.1.1 > > -- > yamahata
Re: [Qemu-devel] [PATCH] add QEMU_LD_PREFIX environment variable
On Sat, Jul 23, 2011 at 07:47:49AM +0200, josch wrote: > This could be avoided by setting the proposed environment variable > QEMU_LD_PREFIX to the just > created debian rootfs. As mentioned earlier, the usage of the -L option > is not possible in this scenario because qemu-user is only implicitly > called by the binfmt mechanism. What worries me here is that we are beginning to add a enviroment variable for each and every command line option of qemu linux-user. I think it would be better to have a wrapper binary to be registered as the binfmt runner. Alternatively we should have a generic setup for mapping enviroment variables to command line options. Now we get special per-option code every time someone needs to setup a command line option from binfmt. Riku
Re: [Qemu-devel] [PATCH v2 0/5] balloon: fix memleaks, invalid arguments, unplug
On Thu, Jul 28, 2011 at 11:47:11AM +0530, Amit Shah wrote: > Hello, > > This series is on top of the other balloon series for which I sent a > pull request on Tuesday. > > This series fixes memleak on exit, unregisters the savevm section on > unplug, disallows negative values as ballooning targets and doesn't > allow multiple balloon device registrations. > > v2 contains small tweaks suggested: > - Error is shown by balloon.c instead of virtio-balloon.c in patch 1 >(mst) > - Filter out negative input in do_balloon() and add qerror message >(Markus) > - Separate out savevm section unregistering from memleak fix > > > Amit Shah (5): > balloon: Don't allow multiple balloon handler registrations > virtio-balloon: Check if balloon registration failed > balloon: Ignore negative balloon values > virtio-balloon: Add exit handler, fix memleaks > virtio-balloon: Unregister savevm section on device unplug Acked-by: Michael S. Tsirkin > balloon.c | 20 +--- > balloon.h |4 ++-- > hw/virtio-balloon.c | 18 +- > hw/virtio-pci.c | 14 +- > hw/virtio.h |1 + > 5 files changed, 50 insertions(+), 7 deletions(-) > > -- > 1.7.6
Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.
On Thu, Jul 28, 2011 at 4:25 PM, Stefan Hajnoczi wrote: > On Thu, Jul 28, 2011 at 9:20 AM, Stefan Hajnoczi wrote: >> On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu wrote: >>> On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi wrote: On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu wrote: > On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti > wrote: >> On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote: >>> Welcome to give me your comments, thanks. >>> >>> Signed-off-by: Zhi Yong Wu >>> --- >>> Makefile.objs | 2 +- >>> block.c | 288 >>> +++-- >>> block.h | 1 - >>> block/blk-queue.c | 116 + >>> block/blk-queue.h | 70 + >>> block_int.h | 28 + >>> blockdev.c | 21 >>> qemu-config.c | 24 + >>> qemu-option.c | 17 +++ >>> qemu-option.h | 1 + >>> qemu-options.hx | 1 + >>> 11 files changed, 559 insertions(+), 10 deletions(-) >>> create mode 100644 block/blk-queue.c >>> create mode 100644 block/blk-queue.h >>> >>> diff --git a/Makefile.objs b/Makefile.objs >>> index 9f99ed4..06f2033 100644 >>> --- a/Makefile.objs >>> +++ b/Makefile.objs >>> @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o >>> cloop.o dmg.o bochs.o vpc.o vv >>> block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o >>> qcow2-snapshot.o qcow2-cache.o >>> block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o >>> qed-cluster.o >>> block-nested-y += qed-check.o >>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o >>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o >>> blk-queue.o >>> block-nested-$(CONFIG_WIN32) += raw-win32.o >>> block-nested-$(CONFIG_POSIX) += raw-posix.o >>> block-nested-$(CONFIG_CURL) += curl.o >>> diff --git a/block.c b/block.c >>> index 24a25d5..e54e59c 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -29,6 +29,9 @@ >>> #include "module.h" >>> #include "qemu-objects.h" >>> >>> +#include "qemu-timer.h" >>> +#include "block/blk-queue.h" >>> + >>> #ifdef CONFIG_BSD >>> #include >>> #include >>> @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, >>> int64_t sector_num, >>> static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, >>> const uint8_t *buf, int nb_sectors); >>> >>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int >>> nb_sectors, >>> + bool is_write, double elapsed_time, uint64_t *wait); >>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool >>> is_write, >>> + double elapsed_time, uint64_t *wait); >>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, >>> + bool is_write, uint64_t *wait); >>> + >>> static QTAILQ_HEAD(, BlockDriverState) bdrv_states = >>> QTAILQ_HEAD_INITIALIZER(bdrv_states); >>> >>> @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename) >>> } >>> #endif >>> >>> +static int bdrv_io_limits_enable(BlockIOLimit *io_limits) >>> +{ >>> + if ((io_limits->bps[0] == 0) >>> + && (io_limits->bps[1] == 0) >>> + && (io_limits->bps[2] == 0) >>> + && (io_limits->iops[0] == 0) >>> + && (io_limits->iops[1] == 0) >>> + && (io_limits->iops[2] == 0)) { >>> + return 0; >>> + } >>> + >>> + return 1; >>> +} >>> + >>> /* check if the path starts with ":" */ >>> static int path_has_protocol(const char *path) >>> { >>> @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size, >>> } >>> } >>> >>> +static void bdrv_block_timer(void *opaque) >>> +{ >>> + BlockDriverState *bs = opaque; >>> + BlockQueue *queue = bs->block_queue; >>> + >>> + while (!QTAILQ_EMPTY(&queue->requests)) { >>> + BlockIORequest *request; >>> + int ret; >>> + >>> + request = QTAILQ_FIRST(&queue->requests); >>> + QTAILQ_REMOVE(&queue->requests, request, entry); >>> + >>> + ret = qemu_block_queue_handler(request); >>> + if (ret == 0) { >>> + QTAILQ_INSERT_HEAD(&queue->requests, request, entry); >>> + break; >>> + } >>> + >>> + qemu_free(request); >>> + } >>> +} >>> + >>> void bdrv_register(BlockDriver *bdrv) >>> { >>> if (!bdrv->bdrv_aio_readv) { >>> @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char >>> *filename, int flags, >>> bs->change_cb(bs->change_opaque,
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 18:40, Andrea Arcangeli wrote: >> Another thing to note is that snapshotting is not necessarily something >> > that should be completely transparent to the guest. One of the planned >> > future features for the guest agent (mentioned in the snapshot wiki, and >> > a common use case that I've seen come up elsewhere as well in the >> > context of database applications), is a way for userspace applications >> > to register callbacks to be made in the event of a freeze (dumping >> > application-managed caches to disk and things along that line). The > Not sure if the scripts are really needed or if they would just open a > brand new fsfreeze specific unix domain socket (created by the > database) to tell the database to freeze. > > If the latter is the case, then it'd be better rather than changing > the database to open unix domain socket so the script can connect to > it when invoked (or maybe to just add some new function to the > protocol of an existing open unix domain socket), to instead change > the database to open a /dev/virtio-fsfreeze device, created by the > virtio-fsfreeze.ko virtio driver through udev. The database would poll > it, and it could read the request to freeze, and write into it that it > finished freezing when done. Then when all openers of the device > freezed, the virtio-fsfreeze.ko would go ahead freezing all the > filesystems, and then tell qemu when it's finished freezing. Then qemu > can finally block all the I/O and tell libvirt to go ahead with the > snapshot. I think it could also be a combined operation, ie. having the freeze happen in the kernel, but doing the callouts using a userspace daemon. I like the userspace daemon for the callouts because it allows providing a more sophisticated API than if we provide just a socket like interface. In addition the callout is less critical wrt crashes than the fsfreeze operations. Cheers, Jes
Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel
On 07/27/11 20:36, Christoph Hellwig wrote: > Initiating the freeze from kernelspace doesn't make much sense. With > virtio we could add in-band freeze request to the protocol, and although > that would be a major change in that way virtio-blk works right now it's > at least doable. But all other "real" storage targets only communicate > with their initators over out of band procotols that are entirely handled > in userspace, and given their high-level nature better are - that is if > we know them at all given how vendors like to keep this secrete IP > closed and just offer userspace management tools in binary form. > > building new infrastructure in the kernel just for virtio, while needing > to duplicate the same thing in userspace for all real storage seems like > a really bad idea. That is in addition to the userspace freeze notifier > similar to what e.g. Windows has - if the freeze process is driven from > userspace it's much easier to handle those properly compared to requiring > kernel upcalls. > The freeze operation would really just be a case of walking the list of mounted file systems and calling the FIFREEZE ioctl operation on them. I wouldn't anticipate doing anything else in a virtio-fsfreeze.ko module. Cheers, Jes
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
On Wed, Jul 27, 2011 at 5:02 PM, Anthony Liguori wrote: > On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote: >> >> On Wed, Jul 27, 2011 at 1:58 PM, Anthony Liguori >> wrote: Index: qemu/hmp-commands.hx === --- qemu.orig/hmp-commands.hx +++ qemu/hmp-commands.hx @@ -70,6 +70,20 @@ but should be used with extreme caution. resizes image files, it can not resize block devices like LVM volumes. ETEXI + { + .name = "block_set", + .args_type = "device:B,device:O", + .params = "device [prop=value][,...]", + .help = "Change block device parameters [hostcache=on/off]", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_block_set, + }, +STEXI +@item block_set @var{config} +@findex block_set +Change block device parameters (eg: hostcache=on/off) while guest is running. +ETEXI + >>> >>> block_set_hostcache() please. >>> >>> Multiplexing commands is generally a bad idea. It weakens typing. In >>> the >>> absence of a generic way to set block device properties, implementing >>> properties as generic in the QMP layer seems like a bad idea to me. >> >> The idea behind block_set was to have a unified interface for changing >> block device parameters at runtime. This prevents us from reinventing >> new commands from scratch. For example, block I/O throttling is >> already queued up to add run-time parameters. >> >> Without a unified command we have a bulkier QMP/HMP interface, >> duplicated code, and possibly inconsistencies in syntax between the >> commands. Isn't the best way to avoid these problems a unified >> interface? >> >> I understand the lack of type safety concern but in this case we >> already have to manually pull parsed arguments (i.e. cast to specific >> types and deal with invalid input). To me this is a reason *for* >> using a unified interface like block_set. > > Think about it from a client perspective. How do I determine which > properties are supported by this version of QEMU? I have no way to identify > programmatically what arguments are valid for block_set. > > OTOH, if you have strong types like block_set_hostcache, query-commands > tells me exactly what's supported. Use query-block and see if 'hostcache' is there. If yes, then the hostcache parameter is available. If we allow BlockDrivers to have their own runtime parameters then query-commands does not tell you anything because the specific BlockDriver may or may not support that runtime parameter - you need to use query-block. Stefan
Re: [Qemu-devel] [PATCH 00/55] Block layer cleanup & fixes
On (Wed) 20 Jul 2011 [18:23:34], Markus Armbruster wrote: > This patch series looks bigger than it is. All the patches are small > and hopefully easy to review. Nice series! ACK the entire series (barring the non-qdev users connecting to bdrv and scsi patches, which I didn't quite follow). Amit
Re: [Qemu-devel] [PATCH v2 3/5] balloon: Ignore negative balloon values
On (Thu) 28 Jul 2011 [09:31:53], Markus Armbruster wrote: > Amit Shah writes: > > > Negative balloon values don't make sense, ignore them. > > Actually, they aren't ignored, they're rejected. Will update commit log. Amit
Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks
On (Thu) 28 Jul 2011 [09:39:40], Markus Armbruster wrote: > Amit Shah writes: > > > On (Thu) 28 Jul 2011 [11:47:15], Amit Shah wrote: > >> Add an exit handler that will free up RAM and unregister the savevm > >> section after a virtio-balloon device is unplugged. > > > > This commit message should be changed; I'll do that in the pull > > request I send out. > > You mean drop "unregister the savevm", because it has become PATCH 5/5? Right. Amit
Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug
On (Thu) 28 Jul 2011 [09:45:44], Markus Armbruster wrote: > Amit Shah writes: > > > Migrating after unplugging a virtio-balloon device resulted in an error > > message on the destination: > > > > Unknown savevm section or instance ':00:04.0/virtio-balloon' 0 > > load of migration failed > > > > Fix this by unregistering the section on device unplug. > > > > Signed-off-by: Amit Shah > > --- > > hw/virtio-balloon.c |4 > > 1 files changed, 4 insertions(+), 0 deletions(-) > > > > diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c > > index 0ce0049..072a88a 100644 > > --- a/hw/virtio-balloon.c > > +++ b/hw/virtio-balloon.c > > @@ -45,6 +45,7 @@ typedef struct VirtIOBalloon > > size_t stats_vq_offset; > > MonitorCompletion *stats_callback; > > void *stats_opaque_callback_data; > > +DeviceState *qdev; > > } VirtIOBalloon; > > All the other virtio device structs already have such a pointer back to > the proxy. Suggests that it should live in VirtIODevice, and be set up > in generic code. Again, outside the scope of this series. > > I hate the virtio pointer thicket. Yep :-( Amit
[Qemu-devel] [PULL v2][ALSO STABLE] balloon: multiple fixes and cleanups
Hello, This new pull request carries the first 7 patches from the earlier pull request that cleaned up balloon code and fixed a use-after-free segfault on issuing 'balloon 0'. This series adds 5 more patches that fix: - a memleak on unplug, - migration after unplug, - negative balloon values by rejecting them It also rejects creating multiple balloon devices. Please pull. Makes sense to pull these in for stable as well. The following changes since commit c886edfb851c0c590d4e77f058f2ec8ed95ad1b5: Let users select their pythons (2011-07-25 16:50:12 +) are available in the git repository at: git://git.kernel.org/pub/scm/virt/qemu/amit/misc.git for-anthony Amit Shah (12): balloon: Make functions, local vars static balloon: Add braces around if statements balloon: Simplify code flow virtio-balloon: Separate status handling into separate function balloon: Separate out stat and balloon handling balloon: Fix header comment; add Copyright virtio-balloon: Fix header comment; add Copyright balloon: Don't allow multiple balloon handler registrations virtio-balloon: Check if balloon registration failed balloon: Reject negative balloon values virtio-balloon: Add exit handler, fix memleaks virtio-balloon: Unregister savevm section on device unplug balloon.c | 61 ++--- balloon.h | 12 +++- hw/virtio-balloon.c | 76 ++- hw/virtio-pci.c | 14 +- hw/virtio.h |1 + 5 files changed, 109 insertions(+), 55 deletions(-) Amit
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
On 07/27/2011 09:32 PM, Anthony Liguori wrote: On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote: On Wed, Jul 27, 2011 at 1:58 PM, Anthony Liguori wrote: Index: qemu/hmp-commands.hx === --- qemu.orig/hmp-commands.hx +++ qemu/hmp-commands.hx @@ -70,6 +70,20 @@ but should be used with extreme caution. resizes image files, it can not resize block devices like LVM volumes. ETEXI + { + .name = "block_set", + .args_type = "device:B,device:O", + .params = "device [prop=value][,...]", + .help = "Change block device parameters [hostcache=on/off]", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_block_set, + }, +STEXI +@item block_set @var{config} +@findex block_set +Change block device parameters (eg: hostcache=on/off) while guest is running. +ETEXI + block_set_hostcache() please. Multiplexing commands is generally a bad idea. It weakens typing. In the absence of a generic way to set block device properties, implementing properties as generic in the QMP layer seems like a bad idea to me. The idea behind block_set was to have a unified interface for changing block device parameters at runtime. This prevents us from reinventing new commands from scratch. For example, block I/O throttling is already queued up to add run-time parameters. Without a unified command we have a bulkier QMP/HMP interface, duplicated code, and possibly inconsistencies in syntax between the commands. Isn't the best way to avoid these problems a unified interface? I understand the lack of type safety concern but in this case we already have to manually pull parsed arguments (i.e. cast to specific types and deal with invalid input). To me this is a reason *for* using a unified interface like block_set. Think about it from a client perspective. How do I determine which properties are supported by this version of QEMU? I have no way to identify programmatically what arguments are valid for block_set. User can programmatically find out valid parameters for block_set. Internally, validation of prop=value is done against -drive parameter list and then, only the valid/implemented commands (for now, hostcache) are accepted from that list. Please find execution output attached: (qemu) info block ide0-hd0: removable=0 hostcache=1 file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw encrypted=0 ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted] sd0: removable=1 locked=0 hostcache=1 [not inserted] (qemu) block block_resize block_set block_passwd (qemu) block_set block_set: block device name expected (qemu) block block_resize block_set block_passwd (qemu) help block_set block_set device [prop=value][,...] -- Change block device parameters [hostcache=on/off] (qemu) block_set ide Device 'ide' not found (qemu) block_set ide0-hd0 Usage: block_set device [prop=value][,...] (qemu) block_set ide0-hd0 cache Invalid parameter 'cache' (qemu) block_set ide0-hd0 cache=on Parameter 'hostcache' expects on/off (qemu) block_set ide0-hd0 hostcache=on (qemu) block_set ide0-hd0 hostcache=off (qemu) info block ide0-hd0: removable=0 hostcache=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw encrypted=0 ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted] sd0: removable=1 locked=0 hostcache=1 [not inserted] When we add further parameters, "usage" string can be enhanced to include those parameters for informing user. - Thanks, Supriya OTOH, if you have strong types like block_set_hostcache, query-commands tells me exactly what's supported. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] Block layer roadmap
Am 27.07.2011 14:37, schrieb Stefan Hajnoczi: > Hi, > Here is a list of block layer and storage changes that have been discussed. > It > is useful to have a roadmap of changes in order to avoid duplication, allow > more developers to contribute, and to communicate the direction of storage in > QEMU. > > I suggest we first do a braindump of all changes that have been discussed. > Later we can discuss specific changes and if/when they fit into the roadmap - > please don't jump into discussion about specific changes yet. > > Kevin: I hope this a useful starting point. Here are all the items > that I am aware of: > > =Material for next QEMU release= > > Coroutines in the block layer [Kevin] > * Programming model to simplify block drivers without blocking QEMU threads > > Generic copy-on-read [Stefan] > * Populate image file to avoid fetching same block from backing file > again later > > Generic image streaming [Stefan] > * Make block_stream commands available for all image formats that > support backing files > > Live block copy [Marcelo/Kevin/Stefan?] > * Copy the contents of an image file while a guest is using it > > In-place qcow2 <-> qed conversion [Devin, GSoC 2011]: > * Fast conversion between qcow2 and qed image formats without copy all data > > VMDK enhancements [Fam, GSoC 2011] > * Implement latest VMDK specs to support modern image files > > Block I/O limits [Zhi Yong] > * Resource control for guest I/O bandwidth/iops consumption Another item that just came up on IRC again: Allow guests to toggle WCE, so that we finally can get rid of the cache=writethrough default. We should definitely get this done before 1.0. Kevin
[Qemu-devel] [PATCH 0.16 v3 0/3] make endian-independent unaligned memory access functions available to libhw
Functions like ldl_be_p and ldl_le_p are currently used only as building blocks for {ld,st}XX_p. As such, they are in cpu-all.h even though they have absolutely no dependency on the target. In order to make them globally available, this series moves them to bswap.h instead. An interesting part of this is that there are functions also for floating point values. Leaving them in cpu-all.h would be possible but untidy. In fact handling these is easy, but it requires to make softfloat.h target-dependent as well. This is what patch 2 does. v2->v3: rebase v1->v2: rebase, use softfloat-specialize.h instead of introducing softfloat-target.h Paolo Bonzini (3): move WORDS_ALIGNED to qemu-common.h softfloat: change default nan definitions to variables move unaligned memory access functions to bswap.h Makefile.hw|2 +- bswap.h| 474 configure |3 +- cpu-all.h | 446 +- cpu-common.h |4 - fpu/softfloat-specialize.h | 72 +++ fpu/softfloat.h| 60 +- qemu-common.h |4 + 8 files changed, 562 insertions(+), 503 deletions(-) -- 1.7.6
[Qemu-devel] [PATCH 0.16 v3 1/3] move WORDS_ALIGNED to qemu-common.h
This is not a CPU interface, and a configure test would not be too precise. So just add it to qemu-common.h. Signed-off-by: Paolo Bonzini --- cpu-common.h |4 qemu-common.h |4 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index 44b04b3..16c9f4f 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -3,10 +3,6 @@ /* CPU interfaces that are target indpendent. */ -#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || defined(__hppa__) || defined(__ia64__) -#define WORDS_ALIGNED -#endif - #ifdef TARGET_PHYS_ADDR_BITS #include "targphys.h" #endif diff --git a/qemu-common.h b/qemu-common.h index ba55719..43ebaca 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -5,6 +5,10 @@ #include "compiler.h" #include "config-host.h" +#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || defined(__hppa__) || defined(__ia64__) +#define WORDS_ALIGNED +#endif + #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR) typedef struct QEMUTimer QEMUTimer; -- 1.7.6
[Qemu-devel] [PATCH 0.16 v3 2/3] softfloat: change default nan definitions to variables
Most definitions in softfloat.h are really target-independent, but the file is not because it includes definitions of the default NaN values. Change those to variables to allow including softfloat.h from files that are not compiled per-target. By making them const, the compiler is allowed to optimize them into softfloat functions that use them. Signed-off-by: Paolo Bonzini --- fpu/softfloat-specialize.h | 72 fpu/softfloat.h| 60 +--- 2 files changed, 81 insertions(+), 51 deletions(-) diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h index c7d35a1..c165205 100644 --- a/fpu/softfloat-specialize.h +++ b/fpu/softfloat-specialize.h @@ -35,6 +35,78 @@ these four paragraphs for those parts of this code that are retained. =*/ +#if defined(TARGET_MIPS) || defined(TARGET_SH4) || defined(TARGET_UNICORE32) +#define SNAN_BIT_IS_ONE1 +#else +#define SNAN_BIT_IS_ONE0 +#endif + +/* +| The pattern for a default generated half-precision NaN. +**/ +#if defined(TARGET_ARM) +const float16 float16_default_nan = const_float16(0x7E00); +#elif SNAN_BIT_IS_ONE +const float16 float16_default_nan = const_float16(0x7DFF); +#else +const float16 float16_default_nan = const_float16(0xFE00); +#endif + +/* +| The pattern for a default generated single-precision NaN. +**/ +#if defined(TARGET_SPARC) +const float32 float32_default_nan = const_float32(0x7FFF); +#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) +const float32 float32_default_nan = const_float32(0x7FC0); +#elif SNAN_BIT_IS_ONE +const float32 float32_default_nan = const_float32(0x7FBF); +#else +const float32 float32_default_nan = const_float32(0xFFC0); +#endif + +/* +| The pattern for a default generated double-precision NaN. +**/ +#if defined(TARGET_SPARC) +const float64 float64_default_nan = const_float64(LIT64( 0x7FFF )); +#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) +const float64 float64_default_nan = const_float64(LIT64( 0x7FF8 )); +#elif SNAN_BIT_IS_ONE +const float64 float64_default_nan = const_float64(LIT64( 0x7FF7 )); +#else +const float64 float64_default_nan = const_float64(LIT64( 0xFFF8 )); +#endif + +/* +| The pattern for a default generated extended double-precision NaN. +**/ +#if SNAN_BIT_IS_ONE +#define floatx80_default_nan_high 0x7FFF +#define floatx80_default_nan_low LIT64( 0xBFFF ) +#else +#define floatx80_default_nan_high 0x +#define floatx80_default_nan_low LIT64( 0xC000 ) +#endif + +const floatx80 floatx80_default_nan = make_floatx80(floatx80_default_nan_high, +floatx80_default_nan_low); + +/* +| The pattern for a default generated quadruple-precision NaN. The `high' and +| `low' values hold the most- and least-significant bits, respectively. +**/ +#if SNAN_BIT_IS_ONE +#define float128_default_nan_high LIT64( 0x7FFF7FFF ) +#define float128_default_nan_low LIT64( 0x ) +#else +#define float128_default_nan_high LIT64( 0x8000 ) +#define float128_default_nan_low LIT64( 0x ) +#endif + +const float128 float128_default_nan = make_float128(float128_default_nan_high, +float128_default_nan_low); + /* | Raises the exceptions specified by `flags'. Floating-point traps can be | defined here if desired. It is currently not possible for such a trap diff --git a/fpu/softfloat.h b/fpu/softfloat.h index bde2500..3bb7d8f 100644 --- a/fpu/softfloat.h +++ b/fpu/softfloat.h @@ -43,7 +43,7 @@ these four paragraphs for those parts of this code that are retained. #endif #include -#include "config.h" +#include "config-host.h" /* | Each of the following `typedef's defines the most convenient type that holds @@ -68,12 +68,6 @@ typedef int64_t int64;
[Qemu-devel] [PATCH 0.16 v3 3/3] move unaligned memory access functions to bswap.h
This is just code movement, and moving the fpu/ include path from target-dependent to target-independent Make variables. Signed-off-by: Paolo Bonzini --- Makefile.hw |2 +- bswap.h | 474 +++ configure |3 +- cpu-all.h | 446 +--- 4 files changed, 477 insertions(+), 448 deletions(-) diff --git a/Makefile.hw b/Makefile.hw index b9181ab..659e441 100644 --- a/Makefile.hw +++ b/Makefile.hw @@ -9,7 +9,7 @@ include $(SRC_PATH)/rules.mak $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw) -QEMU_CFLAGS+=-I.. -I$(SRC_PATH)/fpu +QEMU_CFLAGS+=-I.. include $(SRC_PATH)/Makefile.objs diff --git a/bswap.h b/bswap.h index 82a7951..f41bebe 100644 --- a/bswap.h +++ b/bswap.h @@ -11,6 +11,8 @@ #include #else +#include "softfloat.h" + #ifdef CONFIG_BYTESWAP_H #include #else @@ -237,4 +239,476 @@ static inline uint32_t qemu_bswap_len(uint32_t value, int len) return bswap32(value) >> (32 - 8 * len); } +typedef union { +float32 f; +uint32_t l; +} CPU_FloatU; + +typedef union { +float64 d; +#if defined(HOST_WORDS_BIGENDIAN) +struct { +uint32_t upper; +uint32_t lower; +} l; +#else +struct { +uint32_t lower; +uint32_t upper; +} l; +#endif +uint64_t ll; +} CPU_DoubleU; + +typedef union { + floatx80 d; + struct { + uint64_t lower; + uint16_t upper; + } l; +} CPU_LDoubleU; + +typedef union { +float128 q; +#if defined(HOST_WORDS_BIGENDIAN) +struct { +uint32_t upmost; +uint32_t upper; +uint32_t lower; +uint32_t lowest; +} l; +struct { +uint64_t upper; +uint64_t lower; +} ll; +#else +struct { +uint32_t lowest; +uint32_t lower; +uint32_t upper; +uint32_t upmost; +} l; +struct { +uint64_t lower; +uint64_t upper; +} ll; +#endif +} CPU_QuadU; + +/* unaligned/endian-independent pointer access */ + +/* + * the generic syntax is: + * + * load: ld{type}{sign}{size}{endian}_p(ptr) + * + * store: st{type}{size}{endian}_p(ptr, val) + * + * Note there are small differences with the softmmu access API! + * + * type is: + * (empty): integer access + * f: float access + * + * sign is: + * (empty): for floats or 32 bit size + * u: unsigned + * s: signed + * + * size is: + * b: 8 bits + * w: 16 bits + * l: 32 bits + * q: 64 bits + * + * endian is: + * (empty): 8 bit access + * be : big endian + * le : little endian + */ +static inline int ldub_p(const void *ptr) +{ +return *(uint8_t *)ptr; +} + +static inline int ldsb_p(const void *ptr) +{ +return *(int8_t *)ptr; +} + +static inline void stb_p(void *ptr, int v) +{ +*(uint8_t *)ptr = v; +} + +/* NOTE: on arm, putting 2 in /proc/sys/debug/alignment so that the + kernel handles unaligned load/stores may give better results, but + it is a system wide setting : bad */ +#if defined(HOST_WORDS_BIGENDIAN) || defined(WORDS_ALIGNED) + +/* conservative code for little endian unaligned accesses */ +static inline int lduw_le_p(const void *ptr) +{ +#ifdef _ARCH_PPC +int val; +__asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); +return val; +#else +const uint8_t *p = ptr; +return p[0] | (p[1] << 8); +#endif +} + +static inline int ldsw_le_p(const void *ptr) +{ +#ifdef _ARCH_PPC +int val; +__asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr)); +return (int16_t)val; +#else +const uint8_t *p = ptr; +return (int16_t)(p[0] | (p[1] << 8)); +#endif +} + +static inline int ldl_le_p(const void *ptr) +{ +#ifdef _ARCH_PPC +int val; +__asm__ __volatile__ ("lwbrx %0,0,%1" : "=r" (val) : "r" (ptr)); +return val; +#else +const uint8_t *p = ptr; +return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24); +#endif +} + +static inline uint64_t ldq_le_p(const void *ptr) +{ +const uint8_t *p = ptr; +uint32_t v1, v2; +v1 = ldl_le_p(p); +v2 = ldl_le_p(p + 4); +return v1 | ((uint64_t)v2 << 32); +} + +static inline void stw_le_p(void *ptr, int v) +{ +#ifdef _ARCH_PPC +__asm__ __volatile__ ("sthbrx %1,0,%2" : "=m" (*(uint16_t *)ptr) : "r" (v), "r" (ptr)); +#else +uint8_t *p = ptr; +p[0] = v; +p[1] = v >> 8; +#endif +} + +static inline void stl_le_p(void *ptr, int v) +{ +#ifdef _ARCH_PPC +__asm__ __volatile__ ("stwbrx %1,0,%2" : "=m" (*(uint32_t *)ptr) : "r" (v), "r" (ptr)); +#else +uint8_t *p = ptr; +p[0] = v; +p[1] = v >> 8; +p[2] = v >> 16; +p[3] = v >> 24; +#endif +} + +static inline void stq_le_p(void *ptr, uint64_t v) +{ +uint8_t *p = ptr; +stl_le_p(p, (uint32_t)v); +stl_le_p(p + 4, v >> 32); +} + +/* float access */ + +static inline float32 ldfl_le_p(const void *ptr) +{ +union { +float32 f; +uint32_t i; +} u; +u.i = ldl_le_p(ptr); +retur
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
Am 27.07.2011 16:51, schrieb Stefan Hajnoczi: > 2011/7/27 Michael Tokarev : >> 27.07.2011 15:30, Supriya Kannery wrote: >>> New command "block_set" added for dynamically changing any of the block >>> device parameters. For now, dynamic setting of hostcache params using this >>> command is implemented. Other block device parameter changes, can be >>> integrated in similar lines. >>> >>> Signed-off-by: Supriya Kannery >>> >>> --- >>> block.c | 54 + >>> block.h |2 + >>> blockdev.c | 61 >>> >>> blockdev.h |1 >>> hmp-commands.hx | 14 >>> qemu-config.c | 13 +++ >>> qemu-option.c | 25 ++ >>> qemu-option.h |2 + >>> qmp-commands.hx | 28 + >>> 9 files changed, 200 insertions(+) >>> >>> Index: qemu/block.c >>> === >>> --- qemu.orig/block.c >>> +++ qemu/block.c >>> @@ -651,6 +651,34 @@ unlink_and_fail: >>> return ret; >>> } >>> >>> +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags) >>> +{ >>> +BlockDriver *drv = bs->drv; >>> +int ret = 0, open_flags; >>> + >>> +/* Quiesce IO for the given block device */ >>> +qemu_aio_flush(); >>> +if (bdrv_flush(bs)) { >>> +qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name); >>> +return ret; >>> +} >>> +open_flags = bs->open_flags; >>> +bdrv_close(bs); >>> + >>> +ret = bdrv_open(bs, bs->filename, bdrv_flags, drv); >>> +if (ret < 0) { >>> +/* Reopen failed. Try to open with original flags */ >>> +qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename); >>> +ret = bdrv_open(bs, bs->filename, open_flags, drv); >>> +if (ret < 0) { >>> +/* Reopen failed with orig and modified flags */ >>> +abort(); >>> +} >> >> Can we please avoid this stuff completely? Just keep the >> old device open still, until you're sure new one is ok. >> >> Or else it will be quite dangerous command in many cases. >> For example, after -runas/-chroot, or additional selinux >> settings or whatnot. And in this case, instead of merely >> returning an error, we'll see abort(). Boom. > > Slight complication for image formats that use a dirty bit here. QED > has a dirty bit. VMDK also specifies one but we don't implement it > today. > > If the image file is dirty then all its metadata will be scanned for > consistency when it is opened. This increases the bdrv_open() time > and hence the downtime of the VM. So it is not ideal to open the > image file twice, even though there is no consistency problem. In general I really like understatements, but opening an image file twice isn't only "not ideal", but should be considered verboten. We're still doing it during migration and it means that in upstream qemu any non-raw images will be corrupted. > I'll think about this some more, there are a couple of solutions like > keeping only the file descriptor around, introducing a flush command > that makes sure the file is in a clean state, or changing QED to not > do this. Changing the format drivers doesn't really look like the right solution. Keeping the fd around looks okay, we can probably achieve this by introducing a bdrv_reopen function. It means that we may need to reopen the format layer, but it can't really fail. Kevin
Re: [Qemu-devel] [V5 Patch 1/4]Qemu: Enhance "info block" to display host cache setting
On 07/27/2011 07:49 PM, Stefan Hajnoczi wrote: On Wed, Jul 27, 2011 at 12:30 PM, Supriya Kannery wrote: Enhance "info block" to display hostcache setting for each block device. Example: (qemu) info block ide0-hd0: type=hd removable=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 Enhanced to display "hostcache" setting: (qemu) info block ide0-hd0: type=hd removable=0 hostcache=true file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 This description is outdated, should be hostcache=1 instead of hostcache=true. @@ -1749,13 +1757,18 @@ void bdrv_info(Monitor *mon, QObject **r QObject *bs_obj; bs_obj = qobject_from_jsonf("{ 'device': %s, 'type': 'unknown', " -"'removable': %i, 'locked': %i }", -bs->device_name, bs->removable, -bs->locked); + "'removable': %i, 'locked': %i, " + "'hostcache': %s }", + bs->device_name, bs->removable, + bs->locked, + (bs->open_flags& BDRV_O_NOCACHE) ? + "false" : "true"); Please use the same bool-from-int format specifier ('%i') as the other fields for consistency. The value can be !(bs->open_flags& BDRV_O_NOCACHE). ok + +QDict *bs_dict = qobject_to_qdict(bs_obj); +qdict_put(bs_dict, "open_flags", qint_from_int(bs->open_flags)); if (bs->drv) { QObject *obj; -QDict *bs_dict = qobject_to_qdict(bs_obj); obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, " "'encrypted': %i }", My copy of the code has the following a few lines further down: qdict_put_obj(bs_dict, "inserted", obj); Removing bs_dict would break the build. Am I missing something? qdict_put_obj() for "inserted" is still there further down in the code. Here, QDict *bs_dict = qobject_to_qdict(bs_obj); is moved up to enable placing of "open_flags" into the dict. We need "open_flags" in dict to calculate hostcache value for printing in monitor. Code for placing "inserted" is not touched. Index: qemu/qmp-commands.hx === --- qemu.orig/qmp-commands.hx +++ qemu/qmp-commands.hx @@ -1131,6 +1131,7 @@ Each json-object contain the following: - Possible values: "unknown" - "removable": true if the device is removable, false otherwise (json-bool) - "locked": true if the device is locked, false otherwise (json-bool) +- "hostcache": true if hostcache enabled, false otherwise (json-bool) Let's explain what "hostcache" means: - "hostcache": true if host page cache is enabled, false otherwise (json-bool) ok -Thanks, Supriya Stefan
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
Hi, At Mon, 25 Jul 2011 19:22:38 +0200, Artyom Tarasenko wrote: > clear interrupt request if the interrupt priority < CPU pil > clear hardware interrupt request if interrupts are disabled Not directly related to the fix, but I'd like to note a problem of hw/sun4u.c interrupt code: The interrupt code probably mixes hardware interrupts and software interrupts. %pil is for software interrupts (interrupt_level_n traps). %pil can not mask hardware interrupts (interrupt_vector traps); the CPU raises interrupt_vector traps even on %pil=15. But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts seem to be masked by %pil. > Signed-off-by: Artyom Tarasenko > --- > hw/sun4u.c |6 -- > 1 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/hw/sun4u.c b/hw/sun4u.c > index d7dcaf0..7f95aeb 100644 > --- a/hw/sun4u.c > +++ b/hw/sun4u.c > @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) > pil |= 1 << 14; > } > > -if (!pil) { > +if (pil < (2 << env->psrpil)){ > if (env->interrupt_request & CPU_INTERRUPT_HARD) { > CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", > env->interrupt_index); > @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) > break; > } > } > -} else { > +} else if (env->interrupt_request & CPU_INTERRUPT_HARD) { > CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x > softint=%08x " > "current interrupt %x\n", > pil, env->pil_in, env->softint, env->interrupt_index); > +env->interrupt_index = 0; > +cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); > } > } > > -- > 1.7.3.4 Tsuneo Saito
Re: [Qemu-devel] Block layer roadmap
On Wed, Jul 27, 2011 at 1:37 PM, Stefan Hajnoczi wrote: > =Changes where I am not aware of firm plans= qcow2 online resize * Handle snapshots * Support shrinking qed online resize * Support shrinking
[Qemu-devel] [PATCH v3 1/2] The command line support for QEMU disk I/O limits
Signed-off-by: Zhi Yong Wu --- Makefile.objs |2 +- blockdev.c | 22 ++ qemu-config.c | 24 qemu-option.c | 17 + qemu-option.h |1 + qemu-options.hx |1 + 6 files changed, 66 insertions(+), 1 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 9f99ed4..06f2033 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/blockdev.c b/blockdev.c index c263663..aff6bb2 100644 --- a/blockdev.c +++ b/blockdev.c @@ -238,6 +238,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) int on_read_error, on_write_error; const char *devaddr; DriveInfo *dinfo; +BlockIOLimit io_limits; +bool iol_flag = false; +const char *iol_opts[7] = {"bps", "bps_rd", "bps_wr", +"iops", "iops_rd", "iops_wr"}; int is_extboot = 0; int snapshot = 0; int ret; @@ -372,6 +376,19 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) return NULL; } +/* disk io limits */ +iol_flag = qemu_opt_io_limits_enable_flag(opts, iol_opts); +if (iol_flag) { +memset(&io_limits, 0, sizeof(BlockIOLimit)); + +io_limits.bps[2] = qemu_opt_get_number(opts, "bps", 0); +io_limits.bps[0] = qemu_opt_get_number(opts, "bps_rd", 0); +io_limits.bps[1] = qemu_opt_get_number(opts, "bps_wr", 0); +io_limits.iops[2] = qemu_opt_get_number(opts, "iops", 0); +io_limits.iops[0] = qemu_opt_get_number(opts, "iops_rd", 0); +io_limits.iops[1] = qemu_opt_get_number(opts, "iops_wr", 0); +} + on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, "werror")) != NULL) { if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) { @@ -483,6 +500,11 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error); +/* throttling disk io limits */ +if (iol_flag) { +bdrv_set_io_limits(dinfo->bdrv, &io_limits); +} + switch(type) { case IF_IDE: case IF_SCSI: diff --git a/qemu-config.c b/qemu-config.c index efa892c..9232bbb 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -82,6 +82,30 @@ static QemuOptsList qemu_drive_opts = { .name = "boot", .type = QEMU_OPT_BOOL, .help = "make this a boot drive", +},{ +.name = "iops", +.type = QEMU_OPT_NUMBER, +.help = "limit total I/O operations per second", +},{ +.name = "iops_rd", +.type = QEMU_OPT_NUMBER, +.help = "limit read operations per second", +},{ +.name = "iops_wr", +.type = QEMU_OPT_NUMBER, +.help = "limit write operations per second", +},{ +.name = "bps", +.type = QEMU_OPT_NUMBER, +.help = "limit total bytes per second", +},{ +.name = "bps_rd", +.type = QEMU_OPT_NUMBER, +.help = "limit read bytes per second", +},{ +.name = "bps_wr", +.type = QEMU_OPT_NUMBER, +.help = "limit write bytes per second", }, { /* end of list */ } }, diff --git a/qemu-option.c b/qemu-option.c index 65db542..c5aa96a 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -562,6 +562,23 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint64_t defval) return opt->value.uint; } +bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts) +{ +int i; +uint64_t opt_val = 0; +bool iol_flag= false; + +for (i = 0; iol_opts[i]; i++) { +opt_val = qemu_opt_get_number(opts, iol_opts[i], 0); +if (opt_val != 0) { +iol_flag = true; +break; +} +} + +return iol_flag; +} + uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval) { QemuOpt *opt = qemu_opt_find(opts, name); diff --git a/qemu-option.h b/qemu-option.h index b515813..fc909f9 100644 --- a/qemu-option.h +++ b/qemu-option.h @@ -107,6 +107,7 @@ struct QemuOptsList { const char *qemu_opt_get(QemuOpts *opts, const char *name); int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval); uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint
[Qemu-devel] [PATCH v3 2/2] The implement of block queue, block timer, and I/O throttling algorithm
Note: 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. 2.) When "dd" command is issued in guest, if its option bs is set to a large value such as "bs=1024K", the result speed will slightly bigger than the limits. For these problems, if you have nice thought, pls let us know.:) Signed-off-by: Zhi Yong Wu --- block.c | 302 +-- block.h |1 - block_int.h | 29 ++ 3 files changed, 323 insertions(+), 9 deletions(-) diff --git a/block.c b/block.c index 24a25d5..c4c631b 100644 --- a/block.c +++ b/block.c @@ -29,6 +29,9 @@ #include "module.h" #include "qemu-objects.h" +#include "qemu-timer.h" +#include "block/blk-queue.h" + #ifdef CONFIG_BSD #include #include @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors); +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, +bool is_write, double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, +double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, +bool is_write, uint64_t *wait); + static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename) } #endif +static int bdrv_io_limits_enable(BlockIOLimit *io_limits) +{ +if ((io_limits->bps[0] == 0) + && (io_limits->bps[1] == 0) + && (io_limits->bps[2] == 0) + && (io_limits->iops[0] == 0) + && (io_limits->iops[1] == 0) + && (io_limits->iops[2] == 0)) { +return 0; +} + +return 1; +} + /* check if the path starts with ":" */ static int path_has_protocol(const char *path) { @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size, } } +static void bdrv_block_timer(void *opaque) +{ +BlockDriverState *bs = opaque; +BlockQueue *queue = bs->block_queue; + +while (!QTAILQ_EMPTY(&queue->requests)) { +BlockIORequest *request; +int ret; + +request = QTAILQ_FIRST(&queue->requests); +QTAILQ_REMOVE(&queue->requests, request, entry); + +ret = qemu_block_queue_handler(request); +if (ret == 0) { +QTAILQ_INSERT_HEAD(&queue->requests, request, entry); +break; +} + +qemu_free(request); +} +} + void bdrv_register(BlockDriver *bdrv) { if (!bdrv->bdrv_aio_readv) { @@ -642,6 +688,19 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, bs->change_cb(bs->change_opaque, CHANGE_MEDIA); } +/* throttling disk I/O limits */ +if (bdrv_io_limits_enable(&bs->io_limits)) { +bs->req_from_queue = false; +bs->block_queue= qemu_new_block_queue(); +bs->block_timer= qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs); + +bs->slice_start[0] = qemu_get_clock_ns(vm_clock); +bs->slice_start[1] = qemu_get_clock_ns(vm_clock); + +bs->slice_end[0] = qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME; +bs->slice_end[1] = qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME; +} + return 0; unlink_and_fail: @@ -680,6 +739,16 @@ void bdrv_close(BlockDriverState *bs) if (bs->change_cb) bs->change_cb(bs->change_opaque, CHANGE_MEDIA); } + +/* throttling disk I/O limits */ +if (bs->block_queue) { +qemu_del_block_queue(bs->block_queue); +} + +if (bs->block_timer) { +qemu_del_timer(bs->block_timer); +qemu_free_timer(bs->block_timer); +} } void bdrv_close_all(void) @@ -1312,6 +1381,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs, *psecs = bs->secs; } +/* throttling disk io limits */ +void bdrv_set_io_limits(BlockDriverState *bs, +BlockIOLimit *io_limits) +{ +memset(&bs->io_limits, 0, sizeof(BlockIOLimit)); +bs->io_limits = *io_limits; +} + /* Recognize floppy formats */ typedef struct FDFormat { FDriveType drive; @@ -2111,6 +2188,165 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn) return buf; } +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, + bool is_write, double elapsed_time, uint64_t *wait) { +uint64_t bps_limit = 0; +double bytes_limit, bytes_disp, bytes_res; +double slice_time, wait_time; + +if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) { +bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]; +} else if (bs->io_limits.bps[is_write]) { +bps_limit = bs->io_limits.bps[is_write]; +} else { +if (wait) { +
[Qemu-devel] [PATCH v3 0/2] The intro for QEMU disk I/O limits
The main goal of the patch is to effectively cap the disk I/O speed or counts of one single VM.It is only one draft, so it unavoidably has some drawbacks, if you catch them, please let me know. The patch will mainly introduce one block I/O throttling algorithm, one timer and one block queue for each I/O limits enabled drive. When a block request is coming in, the throttling algorithm will check if its I/O rate or counts exceed the limits; if yes, then it will enqueue to the block queue; The timer will periodically handle the I/O requests in it. Some available features follow as below: (1) global bps limit. -drive bps=xxxin bytes/s (2) only read bps limit -drive bps_rd=xxx in bytes/s (3) only write bps limit -drive bps_wr=xxx in bytes/s (4) global iops limit -drive iops=xxx in ios/s (5) only read iops limit -drive iops_rd=xxxin ios/s (6) only write iops limit -drive iops_wr=xxxin ios/s (7) the combination of some limits. -drive bps=xxx,iops=xxx Known Limitations: (1) #1 can not coexist with #2, #3 (2) #4 can not coexist with #5, #6 (3) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. Zhi Yong Wu (2): v3: Added the code for extending slice time, and modified the method to compute wait time for the timer. The command line support for QEMU disk I/O limits The implement of block queue, block timer, and I/O throttling algorithm v2: The codes V2 for QEMU disk I/O limits. Modified the codes mainly based on stefan's comments. v1: Submit the codes for QEMU disk I/O limits. Only a code draft. Makefile.objs |2 +- block.c | 302 +-- block.h |1 - block_int.h | 29 ++ blockdev.c | 22 qemu-config.c | 24 + qemu-option.c | 17 +++ qemu-option.h |1 + qemu-options.hx |1 + 9 files changed, 389 insertions(+), 10 deletions(-) -- 1.7.2.3
Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug
On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote: > I hate the virtio pointer thicket. The problem is that each device is both a virtio pci device and a virtio net device. -- MST
Re: [Qemu-devel] [PATCH] add QEMU_LD_PREFIX environment variable
Hi, On Thu, Jul 28, 2011 at 11:41:09AM +0300, Riku Voipio wrote: > On Sat, Jul 23, 2011 at 07:47:49AM +0200, josch wrote: > > This could be avoided by setting the proposed environment variable > > QEMU_LD_PREFIX to the just > > created debian rootfs. As mentioned earlier, the usage of the -L option > > is not possible in this scenario because qemu-user is only implicitly > > called by the binfmt mechanism. > > What worries me here is that we are beginning to add a enviroment > variable for each and every command line option of qemu linux-user. Are there other environment variables? I didnt see any? (well besides QEMU_STRACE that is) > I think it would be better to have a wrapper binary to be registered > as the binfmt runner. If you need help with writing something - dont hesitate to ask for help. I'm very interested in having this functionality working because of the reasons I gave in my initial mail. [1] > Alternatively we should have a generic setup for mapping enviroment > variables to command line options. Now we get special per-option code > every time someone needs to setup a command line option from binfmt. What other options are there that would be interesting for binfmt? I was also sending this patch to qemu-devel list a month ago but got no reply. [2] As I said - if you would like a wrapper or a generic setup for mapping env variables to commandline parameters and dont have time to do it yourself, dont hesitate to delegate some work to me :) @Geert Stappers: you are patching bsd-user/main.c and darwin-user/main.c as well. I take it that you did test your changes on those platforms? does it work there as well? I have no clue of darwin but is it really useful there? cheers, josch [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632192#5 [2] http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg00459.html
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
Hi, On Thu, Jul 28, 2011 at 12:31 PM, wrote: > Hi, > > At Mon, 25 Jul 2011 19:22:38 +0200, > Artyom Tarasenko wrote: > >> clear interrupt request if the interrupt priority < CPU pil >> clear hardware interrupt request if interrupts are disabled > > Not directly related to the fix, but I'd like to note a problem > of hw/sun4u.c interrupt code: > > The interrupt code probably mixes hardware interrupts and > software interrupts. > %pil is for software interrupts (interrupt_level_n traps). > %pil can not mask hardware interrupts (interrupt_vector traps); > the CPU raises interrupt_vector traps even on %pil=15. > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts > seem to be masked by %pil. The interrupt_vector traps are currently not implemented, are they? So it's hard to tell whether they are masked. >> Signed-off-by: Artyom Tarasenko >> --- >> hw/sun4u.c | 6 -- >> 1 files changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c >> index d7dcaf0..7f95aeb 100644 >> --- a/hw/sun4u.c >> +++ b/hw/sun4u.c >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) >> pil |= 1 << 14; >> } >> >> - if (!pil) { >> + if (pil < (2 << env->psrpil)){ >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", >> env->interrupt_index); >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) >> break; >> } >> } >> - } else { >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x >> softint=%08x " >> "current interrupt %x\n", >> pil, env->pil_in, env->softint, >> env->interrupt_index); >> + env->interrupt_index = 0; >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); >> } >> } >> >> -- >> 1.7.3.4 > > > > Tsuneo Saito > -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/
[Qemu-devel] [PATCH] libcacard: use INSTALL_DATA for data
Signed-off-by: Alon Levy --- libcacard/Makefile |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/libcacard/Makefile b/libcacard/Makefile index 15205b5..33c1302 100644 --- a/libcacard/Makefile +++ b/libcacard/Makefile @@ -56,9 +56,8 @@ install-libcacard: libcacard.pc libcacard.la vscclient $(INSTALL_DIR) "$(DESTDIR)$(bindir)" libtool --mode=install $(INSTALL_PROG) vscclient "$(DESTDIR)$(bindir)" libtool --mode=install $(INSTALL_PROG) libcacard.la "$(DESTDIR)$(libdir)" - libtool --mode=install $(INSTALL_PROG) libcacard.pc "$(DESTDIR)$(libdir)/pkgconfig" + libtool --mode=install $(INSTALL_DATA) libcacard.pc "$(DESTDIR)$(libdir)/pkgconfig" for inc in *.h; do \ - libtool --mode=install $(INSTALL_PROG) $(libcacard_srcpath)/$$inc "$(DESTDIR)$(libcacard_includedir)"; \ + libtool --mode=install $(INSTALL_DATA) $(libcacard_srcpath)/$$inc "$(DESTDIR)$(libcacard_includedir)"; \ done - endif -- 1.7.6
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
At Thu, 28 Jul 2011 13:51:08 +0200, Artyom Tarasenko wrote: > On Thu, Jul 28, 2011 at 12:31 PM, wrote: > > Hi, > > > > At Mon, 25 Jul 2011 19:22:38 +0200, > > Artyom Tarasenko wrote: > > > >> clear interrupt request if the interrupt priority < CPU pil > >> clear hardware interrupt request if interrupts are disabled > > > > Not directly related to the fix, but I'd like to note a problem > > of hw/sun4u.c interrupt code: > > > > The interrupt code probably mixes hardware interrupts and > > software interrupts. > > %pil is for software interrupts (interrupt_level_n traps). > > %pil can not mask hardware interrupts (interrupt_vector traps); > > the CPU raises interrupt_vector traps even on %pil=15. > > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts > > seem to be masked by %pil. > > The interrupt_vector traps are currently not implemented, are they? > So it's hard to tell whether they are masked. Yes, interrupt_vector is not implemented yet. I failed to explain the problem. The problem is that cpu_set_irqs() should raise interrupt_vector traps but it raises interrupt_level_n traps actually. sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as the 1st argument. The allocated irqs (the irq variable) are passed to pci_apb_init(). APB should generate interrupt_vector traps (hardware interrupts), not the interrupt_vector_n traps. The interrupts from APB would be reported by cpu_set_irq(), but cpu_set_irq() seems to generate interrupt_vector_n traps. > >> Signed-off-by: Artyom Tarasenko > >> --- > >> hw/sun4u.c | 6 -- > >> 1 files changed, 4 insertions(+), 2 deletions(-) > >> > >> diff --git a/hw/sun4u.c b/hw/sun4u.c > >> index d7dcaf0..7f95aeb 100644 > >> --- a/hw/sun4u.c > >> +++ b/hw/sun4u.c > >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) > >> pil |= 1 << 14; > >> } > >> > >> - if (!pil) { > >> + if (pil < (2 << env->psrpil)){ > >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { > >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", > >> env->interrupt_index); > >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) > >> break; > >> } > >> } > >> - } else { > >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { > >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x > >> softint=%08x " > >> "current interrupt %x\n", > >> pil, env->pil_in, env->softint, > >> env->interrupt_index); > >> + env->interrupt_index = 0; > >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); > >> } > >> } > >> > >> -- > >> 1.7.3.4 > > > > > > > > Tsuneo Saito > > > > > > -- > Regards, > Artyom Tarasenko > > solaris/sparc under qemu blog: http://tyom.blogspot.com/ Tsuneo Saito
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote: > Another item that just came up on IRC again: Allow guests to toggle WCE, > so that we finally can get rid of the cache=writethrough default. We > should definitely get this done before 1.0. Guest toggling is nice and cool, but the real feature is to allow it on the command line. As mentioned about 10 times before guests generally don't fiddle with it without the administrator triggering it manually. So specifying it in the qemu config is the much better interface for 95% of the use cases.
Re: [Qemu-devel] Block layer roadmap
On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: > Coroutines in the block layer [Kevin] > * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading?
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
2011/7/28 : > > At Thu, 28 Jul 2011 13:51:08 +0200, > Artyom Tarasenko wrote: > > On Thu, Jul 28, 2011 at 12:31 PM, wrote: > > > Hi, > > > > > > At Mon, 25 Jul 2011 19:22:38 +0200, > > > Artyom Tarasenko wrote: > > > > > >> clear interrupt request if the interrupt priority < CPU pil > > >> clear hardware interrupt request if interrupts are disabled > > > > > > Not directly related to the fix, but I'd like to note a problem > > > of hw/sun4u.c interrupt code: > > > > > > The interrupt code probably mixes hardware interrupts and > > > software interrupts. > > > %pil is for software interrupts (interrupt_level_n traps). > > > %pil can not mask hardware interrupts (interrupt_vector traps); > > > the CPU raises interrupt_vector traps even on %pil=15. > > > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts > > > seem to be masked by %pil. > > > > The interrupt_vector traps are currently not implemented, are they? > > So it's hard to tell whether they are masked. > > Yes, interrupt_vector is not implemented yet. Ah, I intended to write: No, interrupt_vector is not implemented yet. I'm very sorry for my engrish :-( > I failed to explain the problem. > The problem is that cpu_set_irqs() should raise interrupt_vector > traps but it raises interrupt_level_n traps actually. > > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as > the 1st argument. The allocated irqs (the irq variable) are > passed to pci_apb_init(). APB should generate interrupt_vector > traps (hardware interrupts), not the interrupt_vector_n traps. > The interrupts from APB would be reported by cpu_set_irq(), > but cpu_set_irq() seems to generate interrupt_vector_n traps. > > > >> Signed-off-by: Artyom Tarasenko > > >> --- > > >> hw/sun4u.c | 6 -- > > >> 1 files changed, 4 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/hw/sun4u.c b/hw/sun4u.c > > >> index d7dcaf0..7f95aeb 100644 > > >> --- a/hw/sun4u.c > > >> +++ b/hw/sun4u.c > > >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) > > >> pil |= 1 << 14; > > >> } > > >> > > >> - if (!pil) { > > >> + if (pil < (2 << env->psrpil)){ > > >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { > > >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", > > >> env->interrupt_index); > > >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) > > >> break; > > >> } > > >> } > > >> - } else { > > >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { > > >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x > > >> softint=%08x " > > >> "current interrupt %x\n", > > >> pil, env->pil_in, env->softint, > > >> env->interrupt_index); > > >> + env->interrupt_index = 0; > > >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); > > >> } > > >> } > > >> > > >> -- > > >> 1.7.3.4 > > > > > > > > > > > > Tsuneo Saito > > > > > > > > > > > -- > > Regards, > > Artyom Tarasenko > > > > solaris/sparc under qemu blog: http://tyom.blogspot.com/ > > > Tsuneo Saito
Re: [Qemu-devel] Block layer roadmap
2011/7/28 Christoph Hellwig : > On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: >> Coroutines in the block layer [Kevin] >> * Programming model to simplify block drivers without blocking QEMU threads > > Can anyone explain what the whole point of this is? It really just is > a bit of syntactic sugar for the current async state machines. What does > it buy us over going for real threading? > This has nothing (or few) to do with threads. Instead of splitting functions in callbacks at every synchronous function it allow to write more readable code. For instance after changing qcow code from current to coroutine you remove about 400 lines (about 30%). This will help maintaining code and develop more complicated optimizations. About threading you can do threading using AIO and using coroutines. Frediano
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote: > > Indeed. This has come up a few times, and actually is a mostly trivial > > task. Maybe we should give up waiting for -blockdev and separate cache > > mode settings and allow a nocache-writethrough or similar mode now? It's > > going to be around 10 lines of code + documentation. > > I understand that there may be reasons for using O_DIRECT | O_DSYNC, but > what is the explanation for O_DSYNC improving performance? There isn't any, at least for modern Linux. O_DSYNC at this point is equivalent to a range fdatasync for each write call, and given that we're doing O_DIRECT the ranges flush doesn't matter. If you do have a modern host and an old guest it might end up beeing faster because the barrier implementation in Linux used to suck so badly, but that's not inhrent to the I/O model. If you guest however doesn't support cache flushes at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems and block devices. > Christoph, on another note: Can we rely on Linux AIO never returning > short writes except on EOF? Currently we return -EINVAL in this case, so > I hope it's true or we wouldn't return the correct error code. More or less. There's one corner case for all Linux I/O, and that is only writes up to INT_MAX are supported, and larger writes (and reads) get truncated to it. It's pretty nasty, but Linux has been vocally opposed to fixing this issue.
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 02:15:38PM +0200, Frediano Ziglio wrote: > > This has nothing (or few) to do with threads. Instead of splitting > functions in callbacks at every synchronous function it allow to write > more readable code. Thanks for repeating my statement that it doesn't fix the real thing but just adds syntactic sugar.
Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug
"Michael S. Tsirkin" writes: > On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote: >> I hate the virtio pointer thicket. > > The problem is that each device is both a virtio pci > device and a virtio net device. In my possibly naive opinion, virtio-FOO-pci is a PCI device, and has a common virtio part.
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:08, schrieb Christoph Hellwig: > On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote: >> Another item that just came up on IRC again: Allow guests to toggle WCE, >> so that we finally can get rid of the cache=writethrough default. We >> should definitely get this done before 1.0. > > Guest toggling is nice and cool, but the real feature is to allow it on > the command line. As mentioned about 10 times before guests generally don't > fiddle with it without the administrator triggering it manually. So > specifying it in the qemu config is the much better interface for 95% of > the use cases. I don't really care about guest toggling. I care about the default cache mode, and Anthony said that he doesn't agree with changing it unless guests can toggle it. Kevin
Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug
On Thu, Jul 28, 2011 at 02:21:32PM +0200, Markus Armbruster wrote: > "Michael S. Tsirkin" writes: > > > On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote: > >> I hate the virtio pointer thicket. > > > > The problem is that each device is both a virtio pci > > device and a virtio net device. > > In my possibly naive opinion, virtio-FOO-pci is a PCI device, and has a > common virtio part. Yes, but it also has a comon virtio-pci part with other virtio-FOO-pci devices, and a common virtio-net part with virtio-net-. -- MST
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:09, schrieb Christoph Hellwig: > On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: >> Coroutines in the block layer [Kevin] >> * Programming model to simplify block drivers without blocking QEMU threads > > Can anyone explain what the whole point of this is? It really just is > a bit of syntactic sugar for the current async state machines. What does > it buy us over going for real threading? The only current block driver that really does everything in an async state machine is qed. It's definitely not nice code, and having to convert all of the other block drivers to this would be a lot of work. So if it only means that we're making things async that would block the VCPU until now, isn't that a great improvement already? The advantage compared to threading is that it allows an easy and incremental transition. Kevin
Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation
Am 28.07.2011 14:15, schrieb Christoph Hellwig: >> Christoph, on another note: Can we rely on Linux AIO never returning >> short writes except on EOF? Currently we return -EINVAL in this case, so "short reads" I meant, of course. >> I hope it's true or we wouldn't return the correct error code. > > More or less. There's one corner case for all Linux I/O, and that is > only writes up to INT_MAX are supported, and larger writes (and reads) > get truncated to it. It's pretty nasty, but Linux has been vocally > opposed to fixing this issue. I think we can safely ignore this. So just replacing the current ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be okay, right? (Of course using the qiov versions, but you get the idea) Kevin
Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model
On 07/28/2011 02:36 AM, Paolo Bonzini wrote: On 07/27/2011 10:01 PM, Anthony Liguori wrote: That's milkymist, not GoldFish. Oh, Goldfish is fake. It's not real hardware. The enumerator device is not a real device. It's weird because it's imaginary and was designed specifically within QEMU. It's not a good example for discussing modelling. There are plenty of PV interfaces implemented by QEMU. Would you say the same of virtio? Virtio was designed to look like real hardware. I would say that trying to fit XenStore into QOM would not be a good exercise. Regards, Anthony Liguori Paolo
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
On 07/28/2011 05:13 AM, Supriya Kannery wrote: On 07/27/2011 09:32 PM, Anthony Liguori wrote: On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote: On Wed, Jul 27, 2011 at 1:58 PM, Anthony Liguori wrote: Index: qemu/hmp-commands.hx === --- qemu.orig/hmp-commands.hx +++ qemu/hmp-commands.hx @@ -70,6 +70,20 @@ but should be used with extreme caution. resizes image files, it can not resize block devices like LVM volumes. ETEXI + { + .name = "block_set", + .args_type = "device:B,device:O", + .params = "device [prop=value][,...]", + .help = "Change block device parameters [hostcache=on/off]", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_block_set, + }, +STEXI +@item block_set @var{config} +@findex block_set +Change block device parameters (eg: hostcache=on/off) while guest is running. +ETEXI + block_set_hostcache() please. Multiplexing commands is generally a bad idea. It weakens typing. In the absence of a generic way to set block device properties, implementing properties as generic in the QMP layer seems like a bad idea to me. The idea behind block_set was to have a unified interface for changing block device parameters at runtime. This prevents us from reinventing new commands from scratch. For example, block I/O throttling is already queued up to add run-time parameters. Without a unified command we have a bulkier QMP/HMP interface, duplicated code, and possibly inconsistencies in syntax between the commands. Isn't the best way to avoid these problems a unified interface? I understand the lack of type safety concern but in this case we already have to manually pull parsed arguments (i.e. cast to specific types and deal with invalid input). To me this is a reason *for* using a unified interface like block_set. Think about it from a client perspective. How do I determine which properties are supported by this version of QEMU? I have no way to identify programmatically what arguments are valid for block_set. User can programmatically find out valid parameters for block_set. Internally, validation of prop=value is done against -drive parameter list and then, only the valid/implemented commands (for now, hostcache) are accepted from that list. Please find execution output attached: (qemu) info block ide0-hd0: removable=0 hostcache=1 file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw encrypted=0 ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted] sd0: removable=1 locked=0 hostcache=1 [not inserted] (qemu) block block_resize block_set block_passwd (qemu) block_set That's HMP btw, not QMP. block_set: block device name expected (qemu) block block_resize block_set block_passwd (qemu) help block_set block_set device [prop=value][,...] -- Change block device parameters [hostcache=on/off] Parsing help text is not introspection! Regards, Anthony Liguori (qemu) block_set ide Device 'ide' not found (qemu) block_set ide0-hd0 Usage: block_set device [prop=value][,...] (qemu) block_set ide0-hd0 cache Invalid parameter 'cache' (qemu) block_set ide0-hd0 cache=on Parameter 'hostcache' expects on/off (qemu) block_set ide0-hd0 hostcache=on (qemu) block_set ide0-hd0 hostcache=off (qemu) info block ide0-hd0: removable=0 hostcache=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2 encrypted=0 floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw encrypted=0 ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted] sd0: removable=1 locked=0 hostcache=1 [not inserted] When we add further parameters, "usage" string can be enhanced to include those parameters for informing user. - Thanks, Supriya OTOH, if you have strong types like block_set_hostcache, query-commands tells me exactly what's supported. Regards, Anthony Liguori Stefan
Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention
On Thu, Jul 28, 2011 at 11:40:21AM +0300, Michael S. Tsirkin wrote: > On Thu, Jul 28, 2011 at 04:23:24PM +0900, Isaku Yamahata wrote: > > This might be a bit late comment... > > > > On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote: > > > diff --git a/hw/pci_host.c b/hw/pci_host.c > > > index 728e2d4..bfdc321 100644 > > > --- a/hw/pci_host.c > > > +++ b/hw/pci_host.c > > > @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus > > > *bus, uint32_t addr) > > > return pci_find_device(bus, bus_num, devfn); > > > } > > > > > > +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr, > > > + uint32_t limit, uint32_t val, uint32_t len) > > > +{ > > > +assert(len <= 4); > > > +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr)); > > > +} > > > + > > > +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr, > > > +uint32_t limit, uint32_t len) > > > +{ > > > +assert(len <= 4); > > > +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr)); > > > +} > > > + > > > > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < > > addr. > > So we need explicit "if (limit < addr) return;". > > Here's the patch for pci branch. > > > > >From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001 > > Message-Id: > > <75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp> > > From: Isaku Yamahata > > Date: Thu, 28 Jul 2011 16:20:28 +0900 > > Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common > > > > This patch adds boundary check in pci_host_config_read/write_common() > > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < > > addr. > > So we need explicit "if (limit <= addr) return;" > > > > Signed-off-by: Isaku Yamahata > > I don't see a problem with this, but could you please clarify when does > this happen? I think this is only possible for a pci device > behind an express root. If so, this belongs in pcie_host.c Right. I'll move the check into pcie_host.c -- yamahata
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
On Thu, Jul 28, 2011 at 2:03 PM, wrote: > At Thu, 28 Jul 2011 13:51:08 +0200, > Artyom Tarasenko wrote: >> On Thu, Jul 28, 2011 at 12:31 PM, wrote: >> > Hi, >> > >> > At Mon, 25 Jul 2011 19:22:38 +0200, >> > Artyom Tarasenko wrote: >> > >> >> clear interrupt request if the interrupt priority < CPU pil >> >> clear hardware interrupt request if interrupts are disabled >> > >> > Not directly related to the fix, but I'd like to note a problem >> > of hw/sun4u.c interrupt code: >> > >> > The interrupt code probably mixes hardware interrupts and >> > software interrupts. >> > %pil is for software interrupts (interrupt_level_n traps). >> > %pil can not mask hardware interrupts (interrupt_vector traps); >> > the CPU raises interrupt_vector traps even on %pil=15. >> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts >> > seem to be masked by %pil. >> >> The interrupt_vector traps are currently not implemented, are they? >> So it's hard to tell whether they are masked. > > Yes, interrupt_vector is not implemented yet. > I failed to explain the problem. > The problem is that cpu_set_irqs() should raise interrupt_vector > traps but it raises interrupt_level_n traps actually. > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as > the 1st argument. The allocated irqs (the irq variable) are > passed to pci_apb_init(). APB should generate interrupt_vector > traps (hardware interrupts), not the interrupt_vector_n traps. Yes, this is true. But it's more complicated than this: cpu_check_irqs also checks tick/stick/hstick interrupts. They should produce the interrupt_level_n traps as they currently do. The patch merely fixes the problem of hanging on a interrupt_vector_n trap if the trap handler uses pil for interrupt masking. The problem exists independently from interrupt_vector trap generation (as you pointed out). Do you have objections to this patch in its current form? > The interrupts from APB would be reported by cpu_set_irq(), > but cpu_set_irq() seems to generate interrupt_vector_n traps. For me it's not obvious. The interrupt vector not just one line, but the vector, which is written in the corresponding CPU register (also missing in the current qemu implementation). On the real hardware the vector is created by the IOMMU (PBM/APB/...). If qemu intends to support multiple chipsets, we should keep it the way it's done on the real hardware (for instance the interrupt vectors for on-board devices on Ultra-1 and E6500 are not the same). I'd suggest APB shall use some other interface for communicating interrupts to the CPU. Something like cpu_receive_ivec(interrupt_vector). >> >> Signed-off-by: Artyom Tarasenko >> >> --- >> >> hw/sun4u.c | 6 -- >> >> 1 files changed, 4 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c >> >> index d7dcaf0..7f95aeb 100644 >> >> --- a/hw/sun4u.c >> >> +++ b/hw/sun4u.c >> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) >> >> pil |= 1 << 14; >> >> } >> >> >> >> - if (!pil) { >> >> + if (pil < (2 << env->psrpil)){ >> >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", >> >> env->interrupt_index); >> >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) >> >> break; >> >> } >> >> } >> >> - } else { >> >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x >> >> softint=%08x " >> >> "current interrupt %x\n", >> >> pil, env->pil_in, env->softint, >> >> env->interrupt_index); >> >> + env->interrupt_index = 0; >> >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); >> >> } >> >> } >> >> >> >> -- >> >> 1.7.3.4 >> > >> > >> > >> > Tsuneo Saito >> > >> >> >> >> -- >> Regards, >> Artyom Tarasenko >> >> solaris/sparc under qemu blog: http://tyom.blogspot.com/ > > > Tsuneo Saito > -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/
Re: [Qemu-devel] Block layer roadmap
On 07/28/2011 07:09 AM, Christoph Hellwig wrote: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? It is threading--just with a common locking model where a single big lock is held to make up for the fact that most of QEMU isn't reentrant. By restructuring the code to be threaded, we can incrementally remove the big lock if we audit for use of non-reentrant functions and introduce more granular locking. The whole ucontext/setjmp thing is just an optimization. I would hope it entirely disappears long term as we promote coroutines to full threads. Regards, Anthony Liguori
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf wrote: > Am 28.07.2011 14:09, schrieb Christoph Hellwig: >> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: >>> Coroutines in the block layer [Kevin] >>> * Programming model to simplify block drivers without blocking QEMU threads >> >> Can anyone explain what the whole point of this is? It really just is >> a bit of syntactic sugar for the current async state machines. What does >> it buy us over going for real threading? > > The only current block driver that really does everything in an async > state machine is qed. It's definitely not nice code, and having to > convert all of the other block drivers to this would be a lot of work. Thanks Kevin :). I do agree with the clumsiness of async callback programming - a lot of code is spent bundling and unbundling variables to pass between functions, not to mention that the control flow is much harder to follow. > So if it only means that we're making things async that would block the > VCPU until now, isn't that a great improvement already? > > The advantage compared to threading is that it allows an easy and > incremental transition. Also remember that we already have a threads-based implementation of coroutines. Later on we can do fine-grained locking and switch to threads directly instead of coroutines, if need be. Stefan
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
On Thu, Jul 28, 2011 at 2:10 PM, Tsuneo Saito wrote: > 2011/7/28 : >> >> At Thu, 28 Jul 2011 13:51:08 +0200, >> Artyom Tarasenko wrote: >> > On Thu, Jul 28, 2011 at 12:31 PM, wrote: >> > > Hi, >> > > >> > > At Mon, 25 Jul 2011 19:22:38 +0200, >> > > Artyom Tarasenko wrote: >> > > >> > >> clear interrupt request if the interrupt priority < CPU pil >> > >> clear hardware interrupt request if interrupts are disabled >> > > >> > > Not directly related to the fix, but I'd like to note a problem >> > > of hw/sun4u.c interrupt code: >> > > >> > > The interrupt code probably mixes hardware interrupts and >> > > software interrupts. >> > > %pil is for software interrupts (interrupt_level_n traps). >> > > %pil can not mask hardware interrupts (interrupt_vector traps); >> > > the CPU raises interrupt_vector traps even on %pil=15. >> > > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts >> > > seem to be masked by %pil. >> > >> > The interrupt_vector traps are currently not implemented, are they? >> > So it's hard to tell whether they are masked. >> >> Yes, interrupt_vector is not implemented yet. > > Ah, I intended to write: > No, interrupt_vector is not implemented yet. > I'm very sorry for my engrish :-( Not having English as my native language, I even haven't noticed something wrong here. Despite what they say the International language is not English, but broken English. :) >> I failed to explain the problem. >> The problem is that cpu_set_irqs() should raise interrupt_vector >> traps but it raises interrupt_level_n traps actually. >> >> sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as >> the 1st argument. The allocated irqs (the irq variable) are >> passed to pci_apb_init(). APB should generate interrupt_vector >> traps (hardware interrupts), not the interrupt_vector_n traps. >> The interrupts from APB would be reported by cpu_set_irq(), >> but cpu_set_irq() seems to generate interrupt_vector_n traps. >> >> > >> Signed-off-by: Artyom Tarasenko >> > >> --- >> > >> hw/sun4u.c | 6 -- >> > >> 1 files changed, 4 insertions(+), 2 deletions(-) >> > >> >> > >> diff --git a/hw/sun4u.c b/hw/sun4u.c >> > >> index d7dcaf0..7f95aeb 100644 >> > >> --- a/hw/sun4u.c >> > >> +++ b/hw/sun4u.c >> > >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) >> > >> pil |= 1 << 14; >> > >> } >> > >> >> > >> - if (!pil) { >> > >> + if (pil < (2 << env->psrpil)){ >> > >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> > >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", >> > >> env->interrupt_index); >> > >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) >> > >> break; >> > >> } >> > >> } >> > >> - } else { >> > >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> > >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x >> > >> softint=%08x " >> > >> "current interrupt %x\n", >> > >> pil, env->pil_in, env->softint, >> > >> env->interrupt_index); >> > >> + env->interrupt_index = 0; >> > >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); >> > >> } >> > >> } >> > >> >> > >> -- >> > >> 1.7.3.4 >> > > >> > > >> > > >> > > Tsuneo Saito >> > > >> > >> > >> > >> > -- >> > Regards, >> > Artyom Tarasenko >> > >> > solaris/sparc under qemu blog: http://tyom.blogspot.com/ >> >> >> Tsuneo Saito > -- Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/
Re: [Qemu-devel] Block layer roadmap
Am 28.07.2011 14:54, schrieb Stefan Hajnoczi: > On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf wrote: >> Am 28.07.2011 14:09, schrieb Christoph Hellwig: >>> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: Coroutines in the block layer [Kevin] * Programming model to simplify block drivers without blocking QEMU threads >>> >>> Can anyone explain what the whole point of this is? It really just is >>> a bit of syntactic sugar for the current async state machines. What does >>> it buy us over going for real threading? >> >> The only current block driver that really does everything in an async >> state machine is qed. It's definitely not nice code, and having to >> convert all of the other block drivers to this would be a lot of work. > > Thanks Kevin :). I certainly didn't mean to attack your code or even yourself. It's not that qed is done particularly bad or anything. That the code isn't really nice is just the natural result of the callback-based programming model. >> So if it only means that we're making things async that would block the >> VCPU until now, isn't that a great improvement already? >> >> The advantage compared to threading is that it allows an easy and >> incremental transition. > > Also remember that we already have a threads-based implementation of > coroutines. Later on we can do fine-grained locking and switch to > threads directly instead of coroutines, if need be. Might be an option for the future, but for now there's enough left to do to take real advantage of coroutines. Kevin
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
On Thu, Jul 28, 2011 at 11:23 AM, Kevin Wolf wrote: > Am 27.07.2011 16:51, schrieb Stefan Hajnoczi: >> 2011/7/27 Michael Tokarev : >>> 27.07.2011 15:30, Supriya Kannery wrote: New command "block_set" added for dynamically changing any of the block device parameters. For now, dynamic setting of hostcache params using this command is implemented. Other block device parameter changes, can be integrated in similar lines. Signed-off-by: Supriya Kannery --- block.c | 54 + block.h | 2 + blockdev.c | 61 blockdev.h | 1 hmp-commands.hx | 14 qemu-config.c | 13 +++ qemu-option.c | 25 ++ qemu-option.h | 2 + qmp-commands.hx | 28 + 9 files changed, 200 insertions(+) Index: qemu/block.c === --- qemu.orig/block.c +++ qemu/block.c @@ -651,6 +651,34 @@ unlink_and_fail: return ret; } +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags) +{ + BlockDriver *drv = bs->drv; + int ret = 0, open_flags; + + /* Quiesce IO for the given block device */ + qemu_aio_flush(); + if (bdrv_flush(bs)) { + qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name); + return ret; + } + open_flags = bs->open_flags; + bdrv_close(bs); + + ret = bdrv_open(bs, bs->filename, bdrv_flags, drv); + if (ret < 0) { + /* Reopen failed. Try to open with original flags */ + qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename); + ret = bdrv_open(bs, bs->filename, open_flags, drv); + if (ret < 0) { + /* Reopen failed with orig and modified flags */ + abort(); + } >>> >>> Can we please avoid this stuff completely? Just keep the >>> old device open still, until you're sure new one is ok. >>> >>> Or else it will be quite dangerous command in many cases. >>> For example, after -runas/-chroot, or additional selinux >>> settings or whatnot. And in this case, instead of merely >>> returning an error, we'll see abort(). Boom. >> >> Slight complication for image formats that use a dirty bit here. QED >> has a dirty bit. VMDK also specifies one but we don't implement it >> today. >> >> If the image file is dirty then all its metadata will be scanned for >> consistency when it is opened. This increases the bdrv_open() time >> and hence the downtime of the VM. So it is not ideal to open the >> image file twice, even though there is no consistency problem. > > In general I really like understatements, but opening an image file > twice isn't only "not ideal", but should be considered verboten. Point taken. >> I'll think about this some more, there are a couple of solutions like >> keeping only the file descriptor around, introducing a flush command >> that makes sure the file is in a clean state, or changing QED to not >> do this. > > Changing the format drivers doesn't really look like the right solution. > > Keeping the fd around looks okay, we can probably achieve this by > introducing a bdrv_reopen function. It means that we may need to reopen > the format layer, but it can't really fail. My concern is that this assumes a single file descriptor. For vmdk there may be multiple split files. I'm almost starting to think bdrv_reopen() should be recursive down the BlockDriverState tree. Stefan
Re: [Qemu-devel] Block layer roadmap
On Thu, Jul 28, 2011 at 2:10 PM, Kevin Wolf wrote: > Am 28.07.2011 14:54, schrieb Stefan Hajnoczi: >> On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf wrote: >>> Am 28.07.2011 14:09, schrieb Christoph Hellwig: On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote: > Coroutines in the block layer [Kevin] > * Programming model to simplify block drivers without blocking QEMU > threads Can anyone explain what the whole point of this is? It really just is a bit of syntactic sugar for the current async state machines. What does it buy us over going for real threading? >>> >>> The only current block driver that really does everything in an async >>> state machine is qed. It's definitely not nice code, and having to >>> convert all of the other block drivers to this would be a lot of work. >> >> Thanks Kevin :). > > I certainly didn't mean to attack your code or even yourself. It's not > that qed is done particularly bad or anything. That the code isn't > really nice is just the natural result of the callback-based programming > model. No worries, no offence taken :) Stefan
Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change
Am 28.07.2011 15:10, schrieb Stefan Hajnoczi: > On Thu, Jul 28, 2011 at 11:23 AM, Kevin Wolf wrote: >> Am 27.07.2011 16:51, schrieb Stefan Hajnoczi: >>> I'll think about this some more, there are a couple of solutions like >>> keeping only the file descriptor around, introducing a flush command >>> that makes sure the file is in a clean state, or changing QED to not >>> do this. >> >> Changing the format drivers doesn't really look like the right solution. >> >> Keeping the fd around looks okay, we can probably achieve this by >> introducing a bdrv_reopen function. It means that we may need to reopen >> the format layer, but it can't really fail. > > My concern is that this assumes a single file descriptor. For vmdk > there may be multiple split files. > > I'm almost starting to think bdrv_reopen() should be recursive down > the BlockDriverState tree. Yes, VMDK would have to call bdrv_reopen() for each file that it uses. Kevin
[Qemu-devel] [PATCH RESEND 2/3] Move the xenfb pointer handler to the connected method
From: John Haxby Ensure that we read "request-abs-pointer" after the frontend has written it. This means that we will correctly set up an ansolute or relative pointer handler correctly. Signed-off-by: John Haxby Acked-by: Stefano Stabellini --- hw/xenfb.c | 19 ++- 1 files changed, 14 insertions(+), 5 deletions(-) diff --git a/hw/xenfb.c b/hw/xenfb.c index c001ab9..d964318 100644 --- a/hw/xenfb.c +++ b/hw/xenfb.c @@ -356,10 +356,6 @@ static int input_initialise(struct XenDevice *xendev) struct XenInput *in = container_of(xendev, struct XenInput, c.xendev); int rc; -if (xenstore_read_fe_int(xendev, "request-abs-pointer", - &in->abs_pointer_wanted) == -1) - in->abs_pointer_wanted = 0; - if (!in->c.ds) { char *vfb = xenstore_read_str(NULL, "device/vfb"); if (vfb == NULL) { @@ -377,10 +373,22 @@ static int input_initialise(struct XenDevice *xendev) return rc; qemu_add_kbd_event_handler(xenfb_key_event, in); +return 0; +} + +static void input_connected(struct XenDevice *xendev) +{ +struct XenInput *in = container_of(xendev, struct XenInput, c.xendev); + +if (xenstore_read_fe_int(xendev, "request-abs-pointer", + &in->abs_pointer_wanted) == -1) + in->abs_pointer_wanted = 0; + +if (in->qmouse) + qemu_remove_mouse_event_handler(in->qmouse); in->qmouse = qemu_add_mouse_event_handler(xenfb_mouse_event, in, in->abs_pointer_wanted, "Xen PVFB Mouse"); -return 0; } static void input_disconnect(struct XenDevice *xendev) @@ -960,6 +968,7 @@ struct XenDevOps xen_kbdmouse_ops = { .size = sizeof(struct XenInput), .init = input_init, .initialise = input_initialise, +.connected = input_connected, .disconnect = input_disconnect, .event = input_event, }; -- 1.7.2.3
[Qemu-devel] [PATCH RESEND 1/3] Introduce a new 'connected' xendev op called when Connected.
From: John Haxby Rename the existing xendev 'connect' op to 'initialised' and introduce a new 'connected' op. This new op, if defined, is called when the backend is connected. Note that since there is no state transition this may be called more than once. Signed-off-by: John Haxby Acked-by: Stefano Stabellini --- hw/xen_backend.c | 39 +-- hw/xen_backend.h |3 ++- hw/xen_console.c |4 ++-- hw/xen_disk.c|2 +- hw/xen_nic.c |2 +- hw/xenfb.c |8 6 files changed, 43 insertions(+), 15 deletions(-) diff --git a/hw/xen_backend.c b/hw/xen_backend.c index d881fa2..c506dfe 100644 --- a/hw/xen_backend.c +++ b/hw/xen_backend.c @@ -421,13 +421,13 @@ static int xen_be_try_init(struct XenDevice *xendev) } /* - * Try to connect xendev. Depends on the frontend being ready + * Try to initialise xendev. Depends on the frontend being ready * for it (shared ring and evtchn info in xenstore, state being * Initialised or Connected). * * Goes to Connected on success. */ -static int xen_be_try_connect(struct XenDevice *xendev) +static int xen_be_try_initialise(struct XenDevice *xendev) { int rc = 0; @@ -441,11 +441,11 @@ static int xen_be_try_connect(struct XenDevice *xendev) } } -if (xendev->ops->connect) { -rc = xendev->ops->connect(xendev); +if (xendev->ops->initialise) { +rc = xendev->ops->initialise(xendev); } if (rc != 0) { -xen_be_printf(xendev, 0, "connect() failed\n"); +xen_be_printf(xendev, 0, "initialise() failed\n"); return rc; } @@ -454,6 +454,28 @@ static int xen_be_try_connect(struct XenDevice *xendev) } /* + * Try to let xendev know that it is connected. Depends on the + * frontend being Connected. Note that this may be called more + * than once since the backend state is not modified. + */ +static void xen_be_try_connected(struct XenDevice *xendev) +{ +if (!xendev->ops->connected) + return; + +if (xendev->fe_state != XenbusStateConnected) { + if (xendev->ops->flags & DEVOPS_FLAG_IGNORE_STATE) { + xen_be_printf(xendev, 2, "frontend not ready, ignoring\n"); + } else { + xen_be_printf(xendev, 2, "frontend not ready (yet)\n"); + return; + } +} + +xendev->ops->connected(xendev); +} + +/* * Teardown connection. * * Goes to Closed when done. @@ -508,7 +530,12 @@ void xen_be_check_state(struct XenDevice *xendev) rc = xen_be_try_init(xendev); break; case XenbusStateInitWait: -rc = xen_be_try_connect(xendev); +rc = xen_be_try_initialise(xendev); +break; +case XenbusStateConnected: +/* xendev->be_state doesn't change */ +xen_be_try_connected(xendev); +rc = -1; break; case XenbusStateClosed: rc = xen_be_try_reset(xendev); diff --git a/hw/xen_backend.h b/hw/xen_backend.h index 6401c85..3305630 100644 --- a/hw/xen_backend.h +++ b/hw/xen_backend.h @@ -21,7 +21,8 @@ struct XenDevOps { uint32_t flags; void (*alloc)(struct XenDevice *xendev); int (*init)(struct XenDevice *xendev); -int (*connect)(struct XenDevice *xendev); +int (*initialise)(struct XenDevice *xendev); +void (*connected)(struct XenDevice *xendev); void (*event)(struct XenDevice *xendev); void (*disconnect)(struct XenDevice *xendev); int (*free)(struct XenDevice *xendev); diff --git a/hw/xen_console.c b/hw/xen_console.c index 8ef104c..c6bea81 100644 --- a/hw/xen_console.c +++ b/hw/xen_console.c @@ -212,7 +212,7 @@ out: return ret; } -static int con_connect(struct XenDevice *xendev) +static int con_initialise(struct XenDevice *xendev) { struct XenConsole *con = container_of(xendev, struct XenConsole, xendev); int limit; @@ -273,7 +273,7 @@ struct XenDevOps xen_console_ops = { .size = sizeof(struct XenConsole), .flags = DEVOPS_FLAG_IGNORE_STATE, .init = con_init, -.connect= con_connect, +.initialise = con_initialise, .event = con_event, .disconnect = con_disconnect, }; diff --git a/hw/xen_disk.c b/hw/xen_disk.c index add815f..4abdaf9 100644 --- a/hw/xen_disk.c +++ b/hw/xen_disk.c @@ -846,7 +846,7 @@ struct XenDevOps xen_blkdev_ops = { .flags = DEVOPS_FLAG_NEED_GNTDEV, .alloc = blk_alloc, .init = blk_init, -.connect= blk_connect, +.initialise= blk_connect, .disconnect = blk_disconnect, .event = blk_event, .free = blk_free, diff --git a/hw/xen_nic.c b/hw/xen_nic.c index ff86491..9a97d3e 100644 --- a/hw/xen_nic.c +++ b/hw/xen_nic.c @@ -433,7 +433,7 @@ struct XenDevOps xen_netdev_ops = { .size = sizeof(struct XenNetDev), .flags = DEVOPS_FLAG_NEED_GNTDEV, .init = net_init, -.connect
[Qemu-devel] [PATCH RESEND 3/3] xen: use uint64_t instead of target_ulong in cpu_ioreq_move
From: Stefano Stabellini cpu_ioreq_move might move 8 bytes at a time so we must make sure that the temporary variable can hold 8 bytes. Signed-off-by: Stefano Stabellini --- xen-all.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xen-all.c b/xen-all.c index 9eaeac1..3611e19 100644 --- a/xen-all.c +++ b/xen-all.c @@ -620,7 +620,7 @@ static void cpu_ioreq_move(ioreq_t *req) } } } else { -target_ulong tmp; +uint64_t tmp; if (req->dir == IOREQ_READ) { for (i = 0; i < req->count; i++) { -- 1.7.2.3
Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()
On Thu, 28 Jul 2011 10:23:22 +0300 Avi Kivity wrote: > On 07/28/2011 12:44 AM, Blue Swirl wrote: > > On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino > > wrote: > > > This function should be used when the VM is not supposed to resume > > > execution (eg. by issuing 'cont' monitor command). > > > > > > Today, we allow the user to resume execution even when: > > > > > >o the guest shuts down and -no-shutdown is used > > >o there's a kvm internal error > > >o loading the VM state with -loadvm or "loadvm" in the monitor fails > > > > > > I think only badness can happen from the cases above. > > > > I'd suppose a system_reset should bring the system back to sanity and > > then clear vm_permanent_stopped (where's -ly?) What's -ly? > > except maybe for KVM > > internal error if that can't be recovered. Then it would not very > > permanent anymore, so the name would need adjusting. > > Currently, all kvm internal errors are recoverable by reset (and > possibly by fiddling with memory/registers). Ok, but a poweroff in the guest isn't recoverable with system_reset right? Or does it depend on the guest? I get funny results if qemu is started with -no-shutdown and I run cont after a poweroff in a F15 guest. Sometimes qemu will exit after a few seconds, sometimes 'info status' will say 'running'.
Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()
On 07/28/2011 04:31 PM, Luiz Capitulino wrote: On Thu, 28 Jul 2011 10:23:22 +0300 Avi Kivity wrote: > On 07/28/2011 12:44 AM, Blue Swirl wrote: > > On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino wrote: > > > This function should be used when the VM is not supposed to resume > > > execution (eg. by issuing 'cont' monitor command). > > > > > > Today, we allow the user to resume execution even when: > > > > > > o the guest shuts down and -no-shutdown is used > > > o there's a kvm internal error > > > o loading the VM state with -loadvm or "loadvm" in the monitor fails > > > > > > I think only badness can happen from the cases above. > > > > I'd suppose a system_reset should bring the system back to sanity and > > then clear vm_permanent_stopped (where's -ly?) What's -ly? permanent-ly. > > except maybe for KVM > > internal error if that can't be recovered. Then it would not very > > permanent anymore, so the name would need adjusting. > > Currently, all kvm internal errors are recoverable by reset (and > possibly by fiddling with memory/registers). Ok, but a poweroff in the guest isn't recoverable with system_reset right? Or does it depend on the guest? Right, it's not recoverable if you shut the power down where the tractor beam is coupled to the main reactor. I get funny results if qemu is started with -no-shutdown and I run cont after a poweroff in a F15 guest. Sometimes qemu will exit after a few seconds, sometimes 'info status' will say 'running'. Yes, best fixed. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 0/4] Set of patches for chrooted environment
This set of patches helps to use qemu-linux-user in a chrooted environment. It mostly allows to define the default cpu model as we can't use '-cpu' argument. The last one defines enviromnent variables to be able to use log file and gdb server ('-d' and '-g' arguments). NOTE: I saw some comments in the mailing list about environment variables, if patch #4 dislikes, I've also a patch providing a "qemu-wrapper" with the same behavior. [PATCH 1/4] linux-user: define default cpu model in configure instead of linux-user/main.c [PATCH 2/4] linux-user: specify the cpu model during configure [PATCH 3/4] linux-user,m68k: display default cpu [PATCH 4/4] linux-user: define new environment variables
[Qemu-devel] [PATCH 2/4] linux-user: specify the cpu model during configure
This patch allows to set the default cpu model for a given architecture, for instance: configure --target-list=m68k-linux-user --m68k-default-cpu=m68040 Signed-off-by: Laurent Vivier --- configure |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/configure b/configure index c74a5f9..d54f0ed 100755 --- a/configure +++ b/configure @@ -527,6 +527,10 @@ for opt do ;; --target-list=*) target_list="$optarg" ;; + --*-default-cpu=*) +tmp=`expr "x$opt" : 'x--\(.*\)-default-cpu=.*'` +eval ${tmp}_default_cpu="$optarg" + ;; --enable-trace-backend=*) trace_backend="$optarg" ;; --with-trace-file=*) trace_file="$optarg" @@ -916,6 +920,7 @@ echo " use %M for cpu name [$interp_prefix]" echo " --target-list=LIST set target list (default: build everything)" echo "Available targets: $default_target_list" | \ fold -s -w 53 | sed -e 's/^/ /' +echo " --ARCH-default-cpu=CPU set the default cpu for a given architecture" echo "" echo "Advanced options (experts only):" echo " --source-path=PATH path of source code [$source_path]" @@ -3291,6 +3296,10 @@ case "$target_arch2" in exit 1 ;; esac +tmp_target_default_cpu=`eval echo \\$${target_arch2}_default_cpu` +if [ "x$tmp_target_default_cpu" != "x" ] ; then + target_default_cpu="$tmp_target_default_cpu" +fi echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak -- 1.7.4.1
[Qemu-devel] [PATCH 3/4] linux-user,m68k: display default cpu
Signed-off-by: Laurent Vivier --- target-m68k/helper.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/target-m68k/helper.c b/target-m68k/helper.c index a936fe7..f5d33cd 100644 --- a/target-m68k/helper.c +++ b/target-m68k/helper.c @@ -57,6 +57,11 @@ void m68k_cpu_list(FILE *f, fprintf_function cpu_fprintf) unsigned int i; for (i = 0; m68k_cpu_defs[i].name; i++) { +if (strcmp(m68k_cpu_defs[i].name, TARGET_DEFAULT_CPU) == 0) { +(*cpu_fprintf)(f, " >"); +} else { +(*cpu_fprintf)(f, " "); +} (*cpu_fprintf)(f, "%s\n", m68k_cpu_defs[i].name); } } -- 1.7.4.1
[Qemu-devel] [PATCH 4/4] linux-user: define new environment variables
QEMU_GDB=port allows to define gdb server port to wait on. QEMU_DEBUG=options allows to activate log file (like -d options) Signed-off-by: Laurent Vivier --- linux-user/main.c | 11 --- qemu-doc.texi |4 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 7180cee..ff1720b 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2815,8 +2815,10 @@ static void usage(void) "-strace log system calls\n" "\n" "Environment variables:\n" - "QEMU_STRACE Print system calls and arguments similar to the\n" - " 'strace' program. Enable by setting to any value.\n" + "QEMU_STRACEPrint system calls and arguments similar to the\n" + " 'strace' program. Enable by setting to any value.\n" + "QEMU_DEBUG=options Activate log. Use same options as '-d' options\n" + "QEMU_GDB=port Wait gdb connection to port\n" "You can use -E and -U options to set/unset environment variables\n" "for target process. It is possible to provide several variables\n" "by repeating the option. For example:\n" @@ -2872,7 +2874,7 @@ int main(int argc, char **argv, char **envp) const char *filename; const char *cpu_model; const char *log_file = DEBUG_LOGFILE; -const char *log_mask = NULL; +const char *log_mask = getenv("QEMU_DEBUG"); struct target_pt_regs regs1, *regs = ®s1; struct image_info info1, *info = &info1; struct linux_binprm bprm; @@ -2919,6 +2921,9 @@ int main(int argc, char **argv, char **envp) #if defined(cpudef_setup) cpudef_setup(); /* parse cpu definitions in target config file (TBD) */ #endif +if (getenv("QEMU_GDB")) { + gdbstub_port = atoi(getenv("QEMU_GDB")); +} optind = 1; for(;;) { diff --git a/qemu-doc.texi b/qemu-doc.texi index 47e1991..330f9d0 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -2285,6 +2285,10 @@ space emulator hasn't implemented ptrace). At the moment this is incomplete. All system calls that don't have a specific argument format are printed with information for six arguments. Many flag-style arguments don't have decoders and will show up as numbers. +@item QEMU_DEBUG=options +Activate log. Use same options as '-d' options. +@item QEMU_GDB=port +Wait gdb connection to port. @end table @node Other binaries -- 1.7.4.1
[Qemu-devel] [PATCH 1/4] linux-user: define default cpu model in configure instead of linux-user/main.c
Signed-off-by: Laurent Vivier --- configure | 15 +++ linux-user/main.c | 34 +- 2 files changed, 16 insertions(+), 33 deletions(-) diff --git a/configure b/configure index fb8819b..c74a5f9 100755 --- a/configure +++ b/configure @@ -3075,6 +3075,7 @@ target_dir="$target" config_target_mak=$target_dir/config-target.mak target_arch2=`echo $target | cut -d '-' -f 1` target_bigendian="no" +target_default_cpu="any" case "$target_arch2" in armeb|lm32|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|s390x|sh4eb|sparc|sparc64|sparc32plus) @@ -3151,11 +3152,13 @@ TARGET_ABI_DIR="" case "$target_arch2" in i386) target_phys_bits=64 +target_default_cpu="qemu32" ;; x86_64) TARGET_BASE_ARCH=i386 target_phys_bits=64 target_long_alignment=8 +target_default_cpu="qemu64" ;; alpha) target_phys_bits=64 @@ -3173,6 +3176,7 @@ case "$target_arch2" in cris) target_nptl="yes" target_phys_bits=32 +target_default_cpu="" ;; lm32) target_phys_bits=32 @@ -3198,12 +3202,14 @@ case "$target_arch2" in echo "TARGET_ABI_MIPSO32=y" >> $config_target_mak target_nptl="yes" target_phys_bits=64 +target_default_cpu="24Kf" ;; mipsn32|mipsn32el) TARGET_ARCH=mipsn32 TARGET_BASE_ARCH=mips echo "TARGET_ABI_MIPSN32=y" >> $config_target_mak target_phys_bits=64 +target_default_cpu="20Kc" ;; mips64|mips64el) TARGET_ARCH=mips64 @@ -3211,12 +3217,14 @@ case "$target_arch2" in echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak target_phys_bits=64 target_long_alignment=8 +target_default_cpu="20Kc" ;; ppc) gdb_xml_files="power-core.xml power-fpu.xml power-altivec.xml power-spe.xml" target_phys_bits=32 target_nptl="yes" target_libs_softmmu="$fdt_libs" +target_default_cpu="750" ;; ppcemb) TARGET_BASE_ARCH=ppc @@ -3225,6 +3233,7 @@ case "$target_arch2" in target_phys_bits=64 target_nptl="yes" target_libs_softmmu="$fdt_libs" +target_default_cpu="750" ;; ppc64) TARGET_BASE_ARCH=ppc @@ -3233,6 +3242,7 @@ case "$target_arch2" in target_phys_bits=64 target_long_alignment=8 target_libs_softmmu="$fdt_libs" +target_default_cpu="970fx" ;; ppc64abi32) TARGET_ARCH=ppc64 @@ -3242,6 +3252,7 @@ case "$target_arch2" in gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml power-spe.xml" target_phys_bits=64 target_libs_softmmu="$fdt_libs" +target_default_cpu="750" ;; sh4|sh4eb) TARGET_ARCH=sh4 @@ -3251,11 +3262,13 @@ case "$target_arch2" in ;; sparc) target_phys_bits=64 +target_default_cpu="Fujitsu MB86904" ;; sparc64) TARGET_BASE_ARCH=sparc target_phys_bits=64 target_long_alignment=8 +target_default_cpu="TI UltraSparc II" ;; sparc32plus) TARGET_ARCH=sparc64 @@ -3263,6 +3276,7 @@ case "$target_arch2" in TARGET_ABI_DIR=sparc echo "TARGET_ABI32=y" >> $config_target_mak target_phys_bits=64 +target_default_cpu="Fujitsu MB86904" ;; s390x) target_nptl="yes" @@ -3281,6 +3295,7 @@ echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak echo "TARGET_LLONG_ALIGNMENT=$target_llong_alignment" >> $config_target_mak +echo "TARGET_DEFAULT_CPU=\"$target_default_cpu\"" >> $config_target_mak echo "TARGET_ARCH=$TARGET_ARCH" >> $config_target_mak target_arch_name="`echo $TARGET_ARCH | tr '[:lower:]' '[:upper:]'`" echo "TARGET_$target_arch_name=y" >> $config_target_mak diff --git a/linux-user/main.c b/linux-user/main.c index 2135b9c..7180cee 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -3082,39 +3082,7 @@ int main(int argc, char **argv, char **envp) init_paths(interp_prefix); if (cpu_model == NULL) { -#if defined(TARGET_I386) -#ifdef TARGET_X86_64 -cpu_model = "qemu64"; -#else -cpu_model = "qemu32"; -#endif -#elif defined(TARGET_ARM) -cpu_model = "any"; -#elif defined(TARGET_UNICORE32) -cpu_model = "any"; -#elif defined(TARGET_M68K) -cpu_model = "any"; -#elif defined(TARGET_SPARC) -#ifdef TARGET_SPARC64 -cpu_model = "TI UltraSparc II"; -#else -cpu_model = "Fujitsu MB86904"; -#endif -#elif defined(TARGET_MIPS) -#if defined(TARGET_ABI_MIPSN32) || defined(TARGET_ABI_MIPSN64) -cpu_model = "20Kc"; -#else -cpu_model = "24Kf"; -#endif -#elif defined(TARGET_PPC) -#ifdef TARGET_PPC64 -cpu_model = "970fx"; -#else -cpu_model = "750"; -#endif -#else -cpu_model = "any"; -#endif +cpu_model = TARGET_DEFAULT_CPU; } cpu_exec_init_all(0); /* NOTE: we need to init the CPU at this stage to get -- 1.7.4.1
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
At Thu, 28 Jul 2011 14:50:57 +0200, Artyom Tarasenko wrote: > On Thu, Jul 28, 2011 at 2:03 PM, wrote: > > At Thu, 28 Jul 2011 13:51:08 +0200, > > Artyom Tarasenko wrote: > >> On Thu, Jul 28, 2011 at 12:31 PM, wrote: > >> > Hi, > >> > > >> > At Mon, 25 Jul 2011 19:22:38 +0200, > >> > Artyom Tarasenko wrote: > >> > > >> >> clear interrupt request if the interrupt priority < CPU pil > >> >> clear hardware interrupt request if interrupts are disabled > >> > > >> > Not directly related to the fix, but I'd like to note a problem > >> > of hw/sun4u.c interrupt code: > >> > > >> > The interrupt code probably mixes hardware interrupts and > >> > software interrupts. > >> > %pil is for software interrupts (interrupt_level_n traps). > >> > %pil can not mask hardware interrupts (interrupt_vector traps); > >> > the CPU raises interrupt_vector traps even on %pil=15. > >> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts > >> > seem to be masked by %pil. > >> > >> The interrupt_vector traps are currently not implemented, are they? > >> So it's hard to tell whether they are masked. > > > > Yes, interrupt_vector is not implemented yet. > > I failed to explain the problem. > > The problem is that cpu_set_irqs() should raise interrupt_vector > > traps but it raises interrupt_level_n traps actually. > > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as > > the 1st argument. The allocated irqs (the irq variable) are > > passed to pci_apb_init(). APB should generate interrupt_vector > > traps (hardware interrupts), not the interrupt_vector_n traps. > > Yes, this is true. But it's more complicated than this: cpu_check_irqs > also checks tick/stick/hstick interrupts. They should produce the > interrupt_level_n traps as they currently do. That's right. tick/stick/hstick must raise interrupt_level_n traps. > The patch merely fixes the problem of hanging on a interrupt_vector_n > trap if the trap handler uses pil for interrupt masking. The problem > exists independently from interrupt_vector trap generation (as you > pointed out). I understand what is the problem that your patch is going to fix. Thanks for the explanation. > Do you have objections to this patch in its current form? No, I don't have any objections. > > The interrupts from APB would be reported by cpu_set_irq(), > > but cpu_set_irq() seems to generate interrupt_vector_n traps. > > For me it's not obvious. The interrupt vector not just one line, but > the vector, which is written in the corresponding CPU register (also > missing in the current qemu implementation). On the real hardware the > vector is created by the IOMMU (PBM/APB/...). If qemu intends to > support multiple chipsets, we should keep it the way it's done on the > real hardware (for instance the interrupt vectors for on-board devices > on Ultra-1 and E6500 are not the same). Sorry, I can't keep up with this vector thing... Does the CPU receive hardware interrupts as interrupt_vector traps (trap type=0x60) regardless of the kind of the interrupt controller, doesn't it? > I'd suggest APB shall use some other interface for communicating > interrupts to the CPU. Something like > cpu_receive_ivec(interrupt_vector). > > >> >> Signed-off-by: Artyom Tarasenko > >> >> --- > >> >> hw/sun4u.c | 6 -- > >> >> 1 files changed, 4 insertions(+), 2 deletions(-) > >> >> > >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c > >> >> index d7dcaf0..7f95aeb 100644 > >> >> --- a/hw/sun4u.c > >> >> +++ b/hw/sun4u.c > >> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) > >> >> pil |= 1 << 14; > >> >> } > >> >> > >> >> - if (!pil) { > >> >> + if (pil < (2 << env->psrpil)){ > >> >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { > >> >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", > >> >> env->interrupt_index); > >> >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env) > >> >> break; > >> >> } > >> >> } > >> >> - } else { > >> >> + } else if (env->interrupt_request & CPU_INTERRUPT_HARD) { > >> >> CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x > >> >> softint=%08x " > >> >> "current interrupt %x\n", > >> >> pil, env->pil_in, env->softint, > >> >> env->interrupt_index); > >> >> + env->interrupt_index = 0; > >> >> + cpu_reset_interrupt(env, CPU_INTERRUPT_HARD); > >> >> } > >> >> } > >> >> > >> >> -- > >> >> 1.7.3.4 > >> > > >> > > >> > > >> > Tsuneo Saito > >> > > >> > >> > >> > >> -- > >> Regards, > >> Artyom Tarasenko > >> > >> solaris/sparc under qemu blog: http://tyom.blogspot.com/ > > > > > > Tsuneo Saito > > > > -- > Regards, > Artyom Tarasenko > > solaris/sparc under qemu blog: http://tyom.blogspot.com/ Tsuneo Saito
[Qemu-devel] [PATCH] [RFC] qcow2: group refcount updates during cow
Well, I think this is the first real improve patch. Is more a RFC than a patch. Yes, some lines are terrible! It collapses refcount decrement during cow. >From a first check time executing 015 test passed from about 600 seconds to 70. This at least prove that refcount updates counts! Some doubt: 1- place the code in qcow2-refcount.c as it update only refcount and not cluster? 2- allow some sort of "begin transaction" / "commit" / "rollback" like databases instead? 3- allow changing tables from different coroutines? 1) If you have a sequence like (1, 2, 4) probably these clusters are all in the same l2 table but with this code you get two write instead of one. I'm thinking about a function in qcow2-refcount.c that accept an array of cluster instead of a start + len. Signed-off-by: Frediano Ziglio --- block/qcow2-cluster.c | 36 ++-- 1 files changed, 34 insertions(+), 2 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 81cf77d..da17365 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -675,10 +675,42 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) * Also flush bs->file to get the right order for L2 and refcount update. */ if (j != 0) { +int64_t old_start = 0, old_end = -2; +int count = 0; for (i = 0; i < j; i++) { -qcow2_free_any_clusters(bs, -be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1); +old_cluster[i] = be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED; } +// XXX sort old_cluster +for (i = 0; i < j; i++) { +int64_t cluster = old_cluster[i]; + +/* group if contiguos */ +if (old_end + 1 == (cluster >> s->cluster_bits)) { +++old_end; +continue; +} + +/* handle */ +if (old_end > 0) { +qcow2_free_any_clusters(bs, old_start << s->cluster_bits, old_end - old_start + 1); +count += old_end - old_start + 1; +} +old_end = -2; + +/* handle compressed separately */ +if ((cluster & QCOW_OFLAG_COMPRESSED)) { +qcow2_free_any_clusters(bs, cluster, 1); +continue; +} + +/* start a new group */ +old_start = old_end = cluster >> s->cluster_bits; +} +if (old_end > 0) { +qcow2_free_any_clusters(bs, old_start << s->cluster_bits, old_end - old_start + 1); +count += old_end - old_start + 1; +} +assert(count == j); } ret = 0; -- 1.7.1
Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model
On 07/28/2011 02:46 PM, Anthony Liguori wrote: There are plenty of PV interfaces implemented by QEMU. Would you say the same of virtio? Virtio was designed to look like real hardware. I would say that trying to fit XenStore into QOM would not be a good exercise. No doubt about that. :) I'd put a lot more hope into Goldfish though. Paolo
Re: [Qemu-devel] [PATCH 00/12] bugfix and qdevify NAND and ONENAND devices
Ping on this one too, since I'm nearly ready with the next chunk of patches (18 patches fixing bugs and adding NAND support to omap_gpmc) and they obviously depend on this set. -- PMM On 15 July 2011 15:58, Peter Maydell wrote: > This patchseries is more goodies from the Meego tree. Specifically, various > bug fixes to hw/nand and hw/onenand, plus conversion of both to sysbus. > > [An advance note on the next thing due to come out of the pipe: qdeving > nand and onenand allows us to move to qdevification of omap_gpmc, which > takes both of those (and other things) as 'downstream' devices. That > change will make it accept arbitrary SysBus devices as its 'downstream' > devices rather than the current setup of taking an opaque pointer and a > pair of resize/unmap function pointers. (There are also patches which add > NAND support so you can tell omap_gpmc "this is a NAND device" and it then > uses the nand_setpins/nand_getpins/nand_setio interface.) The rationale for > using SysBus there rather than defining some new kind of bus is that you > want to be able to plug any random thing into it; for instance the Overo > board has a lan9118 ethernet controller hanging off the GPMC. ] > > A note on dependencies: > * This patch depends on the omap gpio patchset (although more textually > than seriously semantically) > * the six nand patches and the six onenand patches are broadly independent > of each other > > Juha Riihimäki (9): > hw/nand: Support large NAND devices > hw/nand: Support devices wider than 8 bits > hw/nand: Support multiple reads following READ STATUS > hw/nand: qdevify > onenand: Handle various ID fields separately > onenand: Ignore zero writes to boot command space > hw/onenand: program actions can only clear bits > hw/sysbus: Add sysbus_mmio_unmap() for unmapping a region > hw/onenand: qdevify > > Peter Maydell (3): > hw/nand: Pass block device state to init function > hw/nand: Writing to NAND can only clear bits > onenand: Pass BlockDriverState to init function > > hw/axis_dev88.c | 8 +- > hw/flash.h | 19 ++-- > hw/nand.c | 345 --- > hw/nseries.c | 13 ++- > hw/onenand.c | 373 > +++ > hw/spitz.c | 6 +- > hw/sysbus.c | 17 +++ > hw/sysbus.h | 1 + > hw/tc6393xb.c | 7 +- > 9 files changed, 561 insertions(+), 228 deletions(-)
Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model
On 07/28/2011 08:50 AM, Paolo Bonzini wrote: On 07/28/2011 02:46 PM, Anthony Liguori wrote: There are plenty of PV interfaces implemented by QEMU. Would you say the same of virtio? Virtio was designed to look like real hardware. I would say that trying to fit XenStore into QOM would not be a good exercise. No doubt about that. :) I'd put a lot more hope into Goldfish though. What's unclear to me about the Goldfish enumerator is whether it should be filled out through interaction with hardware devices or via some other mechanism. In many ways, it's similar to ACPI and a Device Tree. In both of those cases, firmware actually is responsible for constructing those tables. Regards, Anthony Liguori Paolo
[Qemu-devel] [Bug 524447] Re: virsh save is very slow
Michael: Yes, that is the correct patch. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/524447 Title: virsh save is very slow Status in libvirt virtualization API: Unknown Status in QEMU: Fix Released Status in “libvirt” package in Ubuntu: Invalid Status in “qemu-kvm” package in Ubuntu: Fix Released Status in “libvirt” source package in Lucid: Won't Fix Status in “qemu-kvm” source package in Lucid: Won't Fix Status in “libvirt” source package in Maverick: Won't Fix Status in “qemu-kvm” source package in Maverick: Won't Fix Bug description: == SRU Justification: 1. impact: 'qemu save' is slow 2. how addressed: a patch upstream fixes the case when a file does not announce when it is ready. 3. patch: see the patch in linked bzr trees 4. TEST CASE: see comment #4 for a specific recipe 5. regression potential: this patch only touches the vm save path. == As reported here: http://www.redhat.com/archives/libvir- list/2009-December/msg00203.html "virsh save" is very slow - it writes the image at around 1MB/sec on my test system. (I think I saw a bug report for this issue on Fedora's bugzilla, but I can't find it now...) Confirmed under Karmic. To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/524447/+subscriptions
Re: [Qemu-devel] [PATCH] [RFC] qcow2: group refcount updates during cow
Am 28.07.2011 15:50, schrieb Frediano Ziglio: > Well, I think this is the first real improve patch. > Is more a RFC than a patch. Yes, some lines are terrible! > It collapses refcount decrement during cow. > From a first check time executing 015 test passed from about 600 seconds > to 70. > This at least prove that refcount updates counts! > Some doubt: > 1- place the code in qcow2-refcount.c as it update only refcount and not > cluster? > 2- allow some sort of "begin transaction" / "commit" / "rollback" like > databases instead? > 3- allow changing tables from different coroutines? > > 1) If you have a sequence like (1, 2, 4) probably these clusters are all in > the same l2 table but with this code you get two write instead of one. > I'm thinking about a function in qcow2-refcount.c that accept an array of > cluster > instead of a start + len. > > Signed-off-by: Frediano Ziglio I think what you're seeing is actually just one special case of a more general problem. The problem is that we're interpreting writethrough stricter than required. The semantics that we really need is that on completion of a request, all of its data and metadata must be flushed to disk. There is no requirement that we flush all intermediate states. My recent update to qcow2_update_snapshot_refcount() is just another case of the same problem. I think the solution should be similar to what I did there, i.e. switch the cache to writeback mode while we're operating on it and switch back when we're done. We should probably have functions that make both of this a one-liner (I think here we have some similarity to your begin/commit idea). With the right functions, this could become as easy as this (might need better function names, but you get the idea): diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 882f50a..45b67b1 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -612,6 +612,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) if (m->nb_clusters == 0) return 0; +qcow2_cache_disable_writethrough(bs); + old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t)); /* copy content of unmodified sectors */ @@ -683,6 +685,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) ret = 0; err: +qcow2_cache_restore_writethrough(bs); qemu_free(old_cluster); return ret; } Kevin
Re: [Qemu-devel] Volume key in qcow3?
Am 28.07.2011 10:05, schrieb Frediano Ziglio: > Hi, > I noted that AES encryption using qcow2 just use the password given > as as key (and also truncating it to 16 bytes == 128 bits). > This is prone to brute force attacks and is not also easy to change > password (you have to decrypt and encrypt again the entire image). > LUKS and EncFS use another way. They generate a random key (the > "volume key") then use the password you give to encrypt N times (where > N is decided by security level or automatically based on time to > decrypt the volume key. To change the password just give the old one, > get the volume key and encrypt again using the new one. LUKS support > also multiple "slots" to allow multiple password and even using an > external key file. > Obviously this require an additional extension to qcow2 so I think it > require a new qcow3 format. Yes, once we have qcow3, adding things like this should be easy enough. I think the idea makes sense. Another thing to consider with encryption is that we don't encrypt metadata currently. I'm not entirely sure if this is a good or a bad thing. Metadata is relatively predictable and I think that might hurt the encryption? Though I'm really not an expert in this area. Kevin
Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()
On 2011-07-28 15:37, Avi Kivity wrote: > On 07/28/2011 04:31 PM, Luiz Capitulino wrote: >> On Thu, 28 Jul 2011 10:23:22 +0300 >> Avi Kivity wrote: >> >> > On 07/28/2011 12:44 AM, Blue Swirl wrote: >> > > On Wed, Jul 27, 2011 at 9:42 PM, Luiz >> Capitulino wrote: >> > > > This function should be used when the VM is not supposed to >> resume >> > > > execution (eg. by issuing 'cont' monitor command). >> > > > >> > > > Today, we allow the user to resume execution even when: >> > > > >> > > > o the guest shuts down and -no-shutdown is used >> > > > o there's a kvm internal error >> > > > o loading the VM state with -loadvm or "loadvm" in the >> monitor fails >> > > > >> > > > I think only badness can happen from the cases above. >> > > >> > > I'd suppose a system_reset should bring the system back to >> sanity and >> > > then clear vm_permanent_stopped (where's -ly?) >> >> What's -ly? >> > > permanent-ly. > >> > > except maybe for KVM >> > > internal error if that can't be recovered. Then it would not very >> > > permanent anymore, so the name would need adjusting. >> > >> > Currently, all kvm internal errors are recoverable by reset (and >> > possibly by fiddling with memory/registers). >> >> Ok, but a poweroff in the guest isn't recoverable with system_reset >> right? Or does it depend on the guest? > > Right, it's not recoverable if you shut the power down where the tractor > beam is coupled to the main reactor. system_reset will bring all emulated devices back into their power-on state - unless we have remaining bugs to fix. Actually, one may consider issuing this reset automatically on vm_start after "permant" vm_stop. Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [Bug 524447] Re: virsh save is very slow
I just wanted to point out that we've this patch in Debian since ages, and it's been included in upstream for a long time too. Added a debbug reference for this as well. ** Bug watch added: Debian Bug tracker #597517 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597517 ** Also affects: qemu-kvm (Debian) via http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597517 Importance: Unknown Status: Unknown -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/524447 Title: virsh save is very slow Status in libvirt virtualization API: Unknown Status in QEMU: Fix Released Status in “libvirt” package in Ubuntu: Invalid Status in “qemu-kvm” package in Ubuntu: Fix Released Status in “libvirt” source package in Lucid: Won't Fix Status in “qemu-kvm” source package in Lucid: Won't Fix Status in “libvirt” source package in Maverick: Won't Fix Status in “qemu-kvm” source package in Maverick: Won't Fix Status in “qemu-kvm” package in Debian: Unknown Bug description: == SRU Justification: 1. impact: 'qemu save' is slow 2. how addressed: a patch upstream fixes the case when a file does not announce when it is ready. 3. patch: see the patch in linked bzr trees 4. TEST CASE: see comment #4 for a specific recipe 5. regression potential: this patch only touches the vm save path. == As reported here: http://www.redhat.com/archives/libvir- list/2009-December/msg00203.html "virsh save" is very slow - it writes the image at around 1MB/sec on my test system. (I think I saw a bug report for this issue on Fedora's bugzilla, but I can't find it now...) Confirmed under Karmic. To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/524447/+subscriptions
Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model
On 07/28/2011 04:03 PM, Anthony Liguori wrote: No doubt about that. :) I'd put a lot more hope into Goldfish though. What's unclear to me about the Goldfish enumerator is whether it should be filled out through interaction with hardware devices or via some other mechanism. In many ways, it's similar to ACPI and a Device Tree. In both of those cases, firmware actually is responsible for constructing those tables. Yes, it is a flat device tree. Since it supports hotplug (at least in theory, the Android emulator predates qdev so it doesn't support it), I would say it is more similar to PCI configuration space. The difference is that IRQ numbers and MMIO base addresses are handed out by hardware (by a piece of the SoC) rather than by the firmware. So yes, the hardware would have some kind of bus to talk to the devices and arbitrate hotplug/hotunplug. The only peculiarity being that the bus enumerator hardcodes itself in the list it exposes, in addition to the devices on the bus. But that still means that the devices have two views: 1) the enumerator's view is either "this is my name, my MMIO base, my IRQ base" or "I need 4k of MMIO and 1 IRQ line, please tell me where/which are those", depending on the device; 2) the PIC's view is "please bring this IRQ line up/down" (the device says which line, since the enumerator can assign those dynamically). The PIC's view is more complicated than a Pin, and more similar to ISA. Paolo
Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u
On Thu, Jul 28, 2011 at 3:40 PM, wrote: > At Thu, 28 Jul 2011 14:50:57 +0200, > Artyom Tarasenko wrote: >> On Thu, Jul 28, 2011 at 2:03 PM, wrote: >> > At Thu, 28 Jul 2011 13:51:08 +0200, >> > Artyom Tarasenko wrote: >> >> On Thu, Jul 28, 2011 at 12:31 PM, wrote: >> >> > Hi, >> >> > >> >> > At Mon, 25 Jul 2011 19:22:38 +0200, >> >> > Artyom Tarasenko wrote: >> >> > >> >> >> clear interrupt request if the interrupt priority < CPU pil >> >> >> clear hardware interrupt request if interrupts are disabled >> >> > >> >> > Not directly related to the fix, but I'd like to note a problem >> >> > of hw/sun4u.c interrupt code: >> >> > >> >> > The interrupt code probably mixes hardware interrupts and >> >> > software interrupts. >> >> > %pil is for software interrupts (interrupt_level_n traps). >> >> > %pil can not mask hardware interrupts (interrupt_vector traps); >> >> > the CPU raises interrupt_vector traps even on %pil=15. >> >> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts >> >> > seem to be masked by %pil. >> >> >> >> The interrupt_vector traps are currently not implemented, are they? >> >> So it's hard to tell whether they are masked. >> > >> > Yes, interrupt_vector is not implemented yet. >> > I failed to explain the problem. >> > The problem is that cpu_set_irqs() should raise interrupt_vector >> > traps but it raises interrupt_level_n traps actually. >> > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as >> > the 1st argument. The allocated irqs (the irq variable) are >> > passed to pci_apb_init(). APB should generate interrupt_vector >> > traps (hardware interrupts), not the interrupt_vector_n traps. >> >> Yes, this is true. But it's more complicated than this: cpu_check_irqs >> also checks tick/stick/hstick interrupts. They should produce the >> interrupt_level_n traps as they currently do. > > That's right. > tick/stick/hstick must raise interrupt_level_n traps. > >> The patch merely fixes the problem of hanging on a interrupt_vector_n >> trap if the trap handler uses pil for interrupt masking. The problem >> exists independently from interrupt_vector trap generation (as you >> pointed out). > > I understand what is the problem that your patch is going to fix. > Thanks for the explanation. > Sorry, I haven't implied that you don't. Meant to say "As you pointed out, the problem exists independently...". >> Do you have objections to this patch in its current form? > > No, I don't have any objections. > >> > The interrupts from APB would be reported by cpu_set_irq(), >> > but cpu_set_irq() seems to generate interrupt_vector_n traps. >> >> For me it's not obvious. The interrupt vector not just one line, but >> the vector, which is written in the corresponding CPU register (also >> missing in the current qemu implementation). On the real hardware the >> vector is created by the IOMMU (PBM/APB/...). If qemu intends to >> support multiple chipsets, we should keep it the way it's done on the >> real hardware (for instance the interrupt vectors for on-board devices >> on Ultra-1 and E6500 are not the same). > > Sorry, I can't keep up with this vector thing... > Does the CPU receive hardware interrupts as interrupt_vector traps > (trap type=0x60) regardless of the kind of the interrupt controller, > doesn't it? It does indeed, but it also stores the interrupt vector identifying the initiator device, in a CPU register readable with asi 0x7f . What would APB pass to the cpu_set_irq? I see the three following variants: a) it passes the PCI interrupt id, which is translated to the interrupt vector in cpu_set_irq() b) it passes the vector. This implies that 2048 (0-0x7ff) CPU interrupts have to be allocated. c) hack combining a+b: allocate only the interrupts known to be used and translate an internal interrupt id to a vector. The variant "a" is bad because it doesn't allow support for different chipsets. The variant "b" is bad because qemu has to allocate way too many interrupts. Only few of them will be used actually. The variant "c" is bad, well, because it's a hack. That's why I suggest using another interface between APB and CPU. Have I overlooked something? >> I'd suggest APB shall use some other interface for communicating >> interrupts to the CPU. Something like >> cpu_receive_ivec(interrupt_vector). >> >> >> >> Signed-off-by: Artyom Tarasenko >> >> >> --- >> >> >> hw/sun4u.c | 6 -- >> >> >> 1 files changed, 4 insertions(+), 2 deletions(-) >> >> >> >> >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c >> >> >> index d7dcaf0..7f95aeb 100644 >> >> >> --- a/hw/sun4u.c >> >> >> +++ b/hw/sun4u.c >> >> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env) >> >> >> pil |= 1 << 14; >> >> >> } >> >> >> >> >> >> - if (!pil) { >> >> >> + if (pil < (2 << env->psrpil)){ >> >> >> if (env->interrupt_request & CPU_INTERRUPT_HARD) { >> >> >> CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n", >> >> >> en
[Qemu-devel] [PATCH] lm832x: Take DeviceState pointer in lm832x_key_event()
Since lm832x has been qdev'ified, its users will generally have a DeviceState pointer rather than an i2c_slave pointer, so adjust lm832x_key_event's prototype to suit. This allows the n810 (its only user) to actually pass a correct pointer to it rather than NULL. The effect is that we no longer segfault when a key is pressed. Signed-off-by: Peter Maydell --- NB: this patch depends on the OMAP GPIO v2 patchset. hw/i2c.h |2 +- hw/lm832x.c |4 ++-- hw/nseries.c |7 +++ 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/hw/i2c.h b/hw/i2c.h index 5514402..9381d01 100644 --- a/hw/i2c.h +++ b/hw/i2c.h @@ -72,6 +72,6 @@ void wm8750_set_bclk_in(void *opaque, int new_hz); void tmp105_set(i2c_slave *i2c, int temp); /* lm832x.c */ -void lm832x_key_event(i2c_slave *i2c, int key, int state); +void lm832x_key_event(DeviceState *dev, int key, int state); #endif diff --git a/hw/lm832x.c b/hw/lm832x.c index 590a4cc..992ce49 100644 --- a/hw/lm832x.c +++ b/hw/lm832x.c @@ -474,9 +474,9 @@ static int lm8323_init(i2c_slave *i2c) return 0; } -void lm832x_key_event(struct i2c_slave *i2c, int key, int state) +void lm832x_key_event(DeviceState *dev, int key, int state) { -LM823KbdState *s = (LM823KbdState *) i2c; +LM823KbdState *s = FROM_I2C_SLAVE(LM823KbdState, I2C_SLAVE_FROM_QDEV(dev)); if ((s->status & INT_ERROR) && (s->error & ERR_FIFOOVR)) return; diff --git a/hw/nseries.c b/hw/nseries.c index 32f2f53..45b52bb 100644 --- a/hw/nseries.c +++ b/hw/nseries.c @@ -45,7 +45,7 @@ struct n800_s { i2c_bus *i2c; int keymap[0x80]; -i2c_slave *kbd; +DeviceState *kbd; TUSBState *usb; void *retu; @@ -362,7 +362,6 @@ static int n810_keys[0x80] = { static void n810_kbd_setup(struct n800_s *s) { qemu_irq kbd_irq = qdev_get_gpio_in(s->cpu->gpio, N810_KEYBOARD_GPIO); -DeviceState *dev; int i; for (i = 0; i < 0x80; i ++) @@ -375,8 +374,8 @@ static void n810_kbd_setup(struct n800_s *s) /* Attach the LM8322 keyboard to the I2C bus, * should happen in n8x0_i2c_setup and s->kbd be initialised here. */ -dev = i2c_create_slave(s->i2c, "lm8323", N810_LM8323_ADDR); -qdev_connect_gpio_out(dev, 0, kbd_irq); +s->kbd = i2c_create_slave(s->i2c, "lm8323", N810_LM8323_ADDR); +qdev_connect_gpio_out(s->kbd, 0, kbd_irq); } /* LCD MIPI DBI-C controller (URAL) */ -- 1.7.1
Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.
On Thu, Jul 28, 2011 at 12:24:48PM +0800, Zhi Yong Wu wrote: > On Wed, Jul 27, 2011 at 11:49 PM, Marcelo Tosatti wrote: > > On Wed, Jul 27, 2011 at 06:17:15PM +0800, Zhi Yong Wu wrote: > >> >> + wait_time = 1; > >> >> + } > >> >> + > >> >> + wait_time = wait_time + (slice_time - elapsed_time); > >> >> + if (wait) { > >> >> + *wait = wait_time * BLOCK_IO_SLICE_TIME * 10 + 1; > >> >> + } > >> > > >> > The guest can keep submitting requests where "wait_time = 1" above, > >> > and the timer will be rearmed continuously in the future. > > > > This is wrong. > > > >> Can't you > >> > simply arm the timer to the next slice start? _Some_ data must be > >> > transfered by then, anyway (and nothing can be transfered earlier than > >> > that). > > > > This is valid. > > > >> Sorry, i have got what you mean. Can you elaborate in more detail? > > > > Sorry, the bug i mentioned about timer being rearmed does not exist. > > > > But arming the timer for the last request as its done is > > confusing/unnecessary. > > > > The timer processes queued requests, but the timer is armed accordingly > > to the last queued request in the slice. For example, if a request is > > submitted 1ms before the slice ends, the timer is armed 100ms + > > (slice_time - elapsed_time) in the future. > > If the timer is simply amred to the next slice start, this timer will > be a periodic timer, either the I/O rate can not be throttled under > the limits, or the enqueued request can be delayed to handled, this > will lower I/O rate seriously than the limits. Yes, periodic but disarmed when there is no queueing. I don't understand your point about low I/O rate. > Maybe the slice time should be variable with current I/O rate. What do > you think of it? Not sure if its necessary. The slice should be small to avoid excessive work on timer context, but the advantage of increasing the slice is very small if any. BTW, 10ms seems a better default than 100ms.