Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()

2011-07-28 Thread Avi Kivity

On 07/28/2011 12:44 AM, Blue Swirl wrote:

On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino  wrote:
>  This function should be used when the VM is not supposed to resume
>  execution (eg. by issuing 'cont' monitor command).
>
>  Today, we allow the user to resume execution even when:
>
>o the guest shuts down and -no-shutdown is used
>o there's a kvm internal error
>o loading the VM state with -loadvm or "loadvm" in the monitor fails
>
>  I think only badness can happen from the cases above.

I'd suppose a system_reset should bring the system back to sanity and
then clear vm_permanent_stopped (where's -ly?) except maybe for KVM
internal error if that can't be recovered. Then it would not very
permanent anymore, so the name would need adjusting.


Currently, all kvm internal errors are recoverable by reset (and 
possibly by fiddling with memory/registers).


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention

2011-07-28 Thread Isaku Yamahata
This might be a bit late comment...

On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote:
> diff --git a/hw/pci_host.c b/hw/pci_host.c
> index 728e2d4..bfdc321 100644
> --- a/hw/pci_host.c
> +++ b/hw/pci_host.c
> @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus 
> *bus, uint32_t addr)
>  return pci_find_device(bus, bus_num, devfn);
>  }
>  
> +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr,
> + uint32_t limit, uint32_t val, uint32_t len)
> +{
> +assert(len <= 4);
> +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr));
> +}
> +
> +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr,
> +uint32_t limit, uint32_t len)
> +{
> +assert(len <= 4);
> +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr));
> +}
> +

Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < addr.
So we need explicit "if (limit < addr) return;".
Here's the patch for pci branch.

>From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001
Message-Id: 
<75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp>
From: Isaku Yamahata 
Date: Thu, 28 Jul 2011 16:20:28 +0900
Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common

This patch adds boundary check in pci_host_config_read/write_common()
Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < addr.
So we need explicit "if (limit <= addr) return;"

Signed-off-by: Isaku Yamahata 
---
 hw/pci_host.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/pci_host.c b/hw/pci_host.c
index 2e8a29f..71fd3a1 100644
--- a/hw/pci_host.c
+++ b/hw/pci_host.c
@@ -51,6 +51,9 @@ void pci_host_config_write_common(PCIDevice *pci_dev, 
uint32_t addr,
   uint32_t limit, uint32_t val, uint32_t len)
 {
 assert(len <= 4);
+if (limit <= addr) {
+return;
+}
 pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr));
 }
 
@@ -58,6 +61,9 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
uint32_t addr,
  uint32_t limit, uint32_t len)
 {
 assert(len <= 4);
+if (limit <= addr) {
+return 0;
+}
 return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr));
 }
 
-- 
1.7.1.1

-- 
yamahata



Re: [Qemu-devel] [PATCH v2 3/5] balloon: Ignore negative balloon values

2011-07-28 Thread Markus Armbruster
Amit Shah  writes:

> Negative balloon values don't make sense, ignore them.

Actually, they aren't ignored, they're rejected.



Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks

2011-07-28 Thread Markus Armbruster
Amit Shah  writes:

> Add an exit handler that will free up RAM and unregister the savevm
> section after a virtio-balloon device is unplugged.
>
> Signed-off-by: Amit Shah 
> ---
>  hw/virtio-balloon.c |5 +
>  hw/virtio-pci.c |   11 ++-
>  hw/virtio.h |1 +
>  3 files changed, 16 insertions(+), 1 deletions(-)
>
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index 26ee364..0ce0049 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -297,3 +297,8 @@ VirtIODevice *virtio_balloon_init(DeviceState *dev)
>  
>  return &s->vdev;
>  }
> +
> +void virtio_balloon_exit(VirtIODevice *vdev)
> +{
> +virtio_cleanup(vdev);
> +}
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index ca5f125..316bf92 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -795,6 +795,15 @@ static int virtio_balloon_init_pci(PCIDevice *pci_dev)
>  return 0;
>  }
>  
> +static int virtio_balloon_exit_pci(PCIDevice *pci_dev)
> +{
> +VirtIOPCIProxy *proxy = DO_UPCAST(VirtIOPCIProxy, pci_dev, pci_dev);
> +
> +virtio_pci_stop_ioeventfd(proxy);
> +virtio_balloon_exit(proxy->vdev);
> +return virtio_exit_pci(pci_dev);
> +}
> +

Same code in every other virtio_*_exit_pci().  Suggests there's
something wrong with the generic code.  Outside the scope of this
series, of course.

[...]



Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model

2011-07-28 Thread Paolo Bonzini

On 07/27/2011 10:01 PM, Anthony Liguori wrote:




That's milkymist, not GoldFish.


Oh, Goldfish is fake. It's not real hardware.

The enumerator device is not a real device.  It's weird because it's
imaginary and was designed specifically within QEMU.

It's not a good example for discussing modelling.


There are plenty of PV interfaces implemented by QEMU.  Would you say 
the same of virtio?


Paolo



Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks

2011-07-28 Thread Markus Armbruster
Amit Shah  writes:

> On (Thu) 28 Jul 2011 [11:47:15], Amit Shah wrote:
>> Add an exit handler that will free up RAM and unregister the savevm
>> section after a virtio-balloon device is unplugged.
>
> This commit message should be changed; I'll do that in the pull
> request I send out.

You mean drop "unregister the savevm", because it has become PATCH 5/5?



[Qemu-devel] [PATCH 0/8] qcow2 cleanup for coroutine-block branch

2011-07-28 Thread Frediano Ziglio
This is a collection of patches that remove QCowAIOCB, useless with
coroutines.
It apply to Kevin coroutine-block branch.
I leave all step I did to remove the structure, feel free to collapse
some of them.
I tested with iotests and got no changes (026 fails like coroutine-block
branch).

Frediano Ziglio (8):
  qcow2: removed unused fields
  qcow2: removed cur_nr_sectors field in QCowAIOCB
  qcow2: remove l2meta from QCowAIOCB
  qcow2: remove cluster_offset from QCowAIOCB
  qcow2: remove common from QCowAIOCB
  qcow2: reindent and use while before the big jump
  qcow2: removed QCowAIOCB entirely
  qcow2: remove memory leak

 block/qcow2.c |  396 -
 1 files changed, 167 insertions(+), 229 deletions(-)




[Qemu-devel] [PATCH 1/8] qcow2: removed unused fields

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |   10 +++---
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index edc068e..5c454bb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -381,11 +381,8 @@ typedef struct QCowAIOCB {
 uint64_t bytes_done;
 uint64_t cluster_offset;
 uint8_t *cluster_data;
-bool is_write;
 QEMUIOVector hd_qiov;
-QEMUBH *bh;
 QCowL2Meta l2meta;
-QLIST_ENTRY(QCowAIOCB) next_depend;
 } QCowAIOCB;
 
 /*
@@ -517,13 +514,12 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num,
   QEMUIOVector *qiov, int nb_sectors,
   BlockDriverCompletionFunc *cb,
-  void *opaque, int is_write, QCowAIOCB *acb)
+  void *opaque, QCowAIOCB *acb)
 {
 memset(acb, 0, sizeof(*acb));
 acb->common.bs = bs;
 acb->sector_num = sector_num;
 acb->qiov = qiov;
-acb->is_write = is_write;
 
 qemu_iovec_init(&acb->hd_qiov, qiov->niov);
 
@@ -543,7 +539,7 @@ static int qcow2_co_readv(BlockDriverState *bs, int64_t 
sector_num,
 QCowAIOCB acb;
 int ret;
 
-qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, 0, &acb);
+qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, &acb);
 
 qemu_co_mutex_lock(&s->lock);
 do {
@@ -658,7 +654,7 @@ static int qcow2_co_writev(BlockDriverState *bs,
 QCowAIOCB acb;
 int ret;
 
-qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, 1, &acb);
+qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, NULL, NULL, &acb);
 s->cluster_cache_offset = -1; /* disable compressed cache */
 
 qemu_co_mutex_lock(&s->lock);
-- 
1.7.1




[Qemu-devel] [PATCH 2/8] qcow2: removed cur_nr_sectors field in QCowAIOCB

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |   98 +
 1 files changed, 43 insertions(+), 55 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5c454bb..0cf4465 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -377,7 +377,6 @@ typedef struct QCowAIOCB {
 int64_t sector_num;
 QEMUIOVector *qiov;
 int remaining_sectors;
-int cur_nr_sectors;/* number of sectors in current iteration */
 uint64_t bytes_done;
 uint64_t cluster_offset;
 uint8_t *cluster_data;
@@ -395,42 +394,22 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster, n1;
 int ret;
-
-/* post process the read buffer */
-if (!acb->cluster_offset) {
-/* nothing to do */
-} else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
-/* nothing to do */
-} else {
-if (s->crypt_method) {
-qcow2_encrypt_sectors(s, acb->sector_num,  acb->cluster_data,
-acb->cluster_data, acb->cur_nr_sectors, 0, 
&s->aes_decrypt_key);
-qemu_iovec_reset(&acb->hd_qiov);
-qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
-acb->cur_nr_sectors * 512);
-qemu_iovec_from_buffer(&acb->hd_qiov, acb->cluster_data,
-512 * acb->cur_nr_sectors);
-}
-}
-
-acb->remaining_sectors -= acb->cur_nr_sectors;
-acb->sector_num += acb->cur_nr_sectors;
-acb->bytes_done += acb->cur_nr_sectors * 512;
+int cur_nr_sectors; /* number of sectors in current iteration */
 
 if (acb->remaining_sectors == 0) {
 /* request completed */
 return 0;
 }
 
-/* prepare next AIO request */
-acb->cur_nr_sectors = acb->remaining_sectors;
+/* prepare next request */
+cur_nr_sectors = acb->remaining_sectors;
 if (s->crypt_method) {
-acb->cur_nr_sectors = MIN(acb->cur_nr_sectors,
+cur_nr_sectors = MIN(cur_nr_sectors,
 QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
 }
 
 ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
-&acb->cur_nr_sectors, &acb->cluster_offset);
+&cur_nr_sectors, &acb->cluster_offset);
 if (ret < 0) {
 return ret;
 }
@@ -439,14 +418,14 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 
 qemu_iovec_reset(&acb->hd_qiov);
 qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
-acb->cur_nr_sectors * 512);
+cur_nr_sectors * 512);
 
 if (!acb->cluster_offset) {
 
 if (bs->backing_hd) {
 /* read from the base image */
 n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov,
-acb->sector_num, acb->cur_nr_sectors);
+acb->sector_num, cur_nr_sectors);
 if (n1 > 0) {
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
 qemu_co_mutex_unlock(&s->lock);
@@ -457,11 +436,9 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 return ret;
 }
 }
-return 1;
 } else {
 /* Note: in this case, no need to wait */
-qemu_iovec_memset(&acb->hd_qiov, 0, 512 * acb->cur_nr_sectors);
-return 1;
+qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors);
 }
 } else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
 /* add AIO support for compressed blocks ? */
@@ -472,9 +449,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 
 qemu_iovec_from_buffer(&acb->hd_qiov,
 s->cluster_cache + index_in_cluster * 512,
-512 * acb->cur_nr_sectors);
-
-return 1;
+512 * cur_nr_sectors);
 } else {
 if ((acb->cluster_offset & 511) != 0) {
 return -EIO;
@@ -490,24 +465,37 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
 }
 
-assert(acb->cur_nr_sectors <=
+assert(cur_nr_sectors <=
 QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
 qemu_iovec_reset(&acb->hd_qiov);
 qemu_iovec_add(&acb->hd_qiov, acb->cluster_data,
-512 * acb->cur_nr_sectors);
+512 * cur_nr_sectors);
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 qemu_co_mutex_unlock(&s->lock);
 ret = bdrv_co_readv(bs->file,
 (acb->cluster_offset >> 9) + index_in_cluster,
-acb->cur_nr_sectors, &acb->hd_qiov);
+cur_nr_sectors, &acb->hd_qiov);
 qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
 return ret;
 }
+if (s->crypt_method) {
+qcow2_encrypt_sectors(s, acb->sector_num,  acb->cluster_data,
+acb->cluster_data, cur_nr_sectors, 0, &s->aes_decrypt_key);

[Qemu-devel] [PATCH 3/8] qcow2: remove l2meta from QCowAIOCB

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0cf4465..adf31ce 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -381,7 +381,6 @@ typedef struct QCowAIOCB {
 uint64_t cluster_offset;
 uint8_t *cluster_data;
 QEMUIOVector hd_qiov;
-QCowL2Meta l2meta;
 } QCowAIOCB;
 
 /*
@@ -514,8 +513,6 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
 acb->bytes_done = 0;
 acb->remaining_sectors = nb_sectors;
 acb->cluster_offset = 0;
-acb->l2meta.nb_clusters = 0;
-qemu_co_queue_init(&acb->l2meta.dependent_requests);
 return acb;
 }
 
@@ -566,6 +563,10 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 int n_end;
 int ret;
 int cur_nr_sectors; /* number of sectors in current iteration */
+QCowL2Meta l2meta;
+
+l2meta.nb_clusters = 0;
+qemu_co_queue_init(&l2meta.dependent_requests);
 
 if (acb->remaining_sectors == 0) {
 /* request completed */
@@ -579,12 +580,12 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 n_end = QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors;
 
 ret = qcow2_alloc_cluster_offset(bs, acb->sector_num << 9,
-index_in_cluster, n_end, &cur_nr_sectors, &acb->l2meta);
+index_in_cluster, n_end, &cur_nr_sectors, &l2meta);
 if (ret < 0) {
 return ret;
 }
 
-acb->cluster_offset = acb->l2meta.cluster_offset;
+acb->cluster_offset = l2meta.cluster_offset;
 assert((acb->cluster_offset & 511) == 0);
 
 qemu_iovec_reset(&acb->hd_qiov);
@@ -618,9 +619,9 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 return ret;
 }
 
-ret = qcow2_alloc_cluster_link_l2(bs, &acb->l2meta);
+ret = qcow2_alloc_cluster_link_l2(bs, &l2meta);
 
-run_dependent_requests(s, &acb->l2meta);
+run_dependent_requests(s, &l2meta);
 
 if (ret < 0) {
 return ret;
-- 
1.7.1




[Qemu-devel] [PATCH 8/8] qcow2: remove memory leak

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 6073568..aed8da7 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -492,6 +492,7 @@ fail:
 qemu_co_mutex_unlock(&s->lock);
 
 qemu_iovec_destroy(&hd_qiov);
+qemu_free(cluster_data);
 
 return ret;
 }
@@ -604,6 +605,7 @@ fail:
 qemu_co_mutex_unlock(&s->lock);
 
 qemu_iovec_destroy(&hd_qiov);
+qemu_free(cluster_data);
 
 return ret;
 }
-- 
1.7.1




Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Kevin Wolf
Am 27.07.2011 21:57, schrieb Christoph Hellwig:
> On Wed, Jul 27, 2011 at 09:52:51PM +0200, Frediano Ziglio wrote:
>> Also I notice that combining XFS, Linux AIO, O_DIRECT and O_DSYNC give 
>> impressive performance but currently there is no way to specify all that 
>> flags together cause nocache enable O_DIRECT while O_DSYNC is enabled with 
>> writethrough.
> 
> Indeed.  This has come up a few times, and actually is a mostly trivial
> task.  Maybe we should give up waiting for -blockdev and separate cache
> mode settings and allow a nocache-writethrough or similar mode now?  It's
> going to be around 10 lines of code + documentation.

I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
what is the explanation for O_DSYNC improving performance?

Christoph, on another note: Can we rely on Linux AIO never returning
short writes except on EOF? Currently we return -EINVAL in this case, so
I hope it's true or we wouldn't return the correct error code.

The reason why I'm asking is because I want to allow reads across EOF
for growable images and pad with zeros (the synchronous code does this
today in order to allow bdrv_pread/pwrite to work, and when we start
using coroutines in the block layer, these cases will hit the AIO paths).

Kevin



Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug

2011-07-28 Thread Markus Armbruster
Amit Shah  writes:

> Migrating after unplugging a virtio-balloon device resulted in an error
> message on the destination:
>
> Unknown savevm section or instance ':00:04.0/virtio-balloon' 0
> load of migration failed
>
> Fix this by unregistering the section on device unplug.
>
> Signed-off-by: Amit Shah 
> ---
>  hw/virtio-balloon.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> index 0ce0049..072a88a 100644
> --- a/hw/virtio-balloon.c
> +++ b/hw/virtio-balloon.c
> @@ -45,6 +45,7 @@ typedef struct VirtIOBalloon
>  size_t stats_vq_offset;
>  MonitorCompletion *stats_callback;
>  void *stats_opaque_callback_data;
> +DeviceState *qdev;
>  } VirtIOBalloon;

All the other virtio device structs already have such a pointer back to
the proxy.  Suggests that it should live in VirtIODevice, and be set up
in generic code.  Again, outside the scope of this series.

I hate the virtio pointer thicket.

[...]



Re: [Qemu-devel] [PATCH v2 0/5] balloon: fix memleaks, invalid arguments, unplug

2011-07-28 Thread Markus Armbruster
Amit Shah  writes:

> Hello,
>
> This series is on top of the other balloon series for which I sent a
> pull request on Tuesday.
>
> This series fixes memleak on exit, unregisters the savevm section on
> unplug, disallows negative values as ballooning targets and doesn't
> allow multiple balloon device registrations.
>
> v2 contains small tweaks suggested:
>  - Error is shown by balloon.c instead of virtio-balloon.c in patch 1
>(mst)
>  - Filter out negative input in do_balloon() and add qerror message
>(Markus)
>  - Separate out savevm section unregistering from memleak fix

Reviewed-by: Markus Armbruster 



[Qemu-devel] [PATCH 7/8] qcow2: removed QCowAIOCB entirely

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |  207 ++---
 1 files changed, 80 insertions(+), 127 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dfb969e..6073568 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -372,83 +372,77 @@ int qcow2_backing_read1(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return n1;
 }
 
-typedef struct QCowAIOCB {
-BlockDriverState *bs;
-int64_t sector_num;
-QEMUIOVector *qiov;
-int remaining_sectors;
-uint64_t bytes_done;
-uint8_t *cluster_data;
-QEMUIOVector hd_qiov;
-} QCowAIOCB;
-
-/*
- * Returns 0 when the request is completed successfully, 1 when there is still
- * a part left to do and -errno in error cases.
- */
-static int qcow2_aio_read_cb(QCowAIOCB *acb)
+static int qcow2_co_readv(BlockDriverState *bs, int64_t sector_num,
+  int remaining_sectors, QEMUIOVector *qiov)
 {
-BlockDriverState *bs = acb->bs;
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster, n1;
 int ret;
 int cur_nr_sectors; /* number of sectors in current iteration */
 uint64_t cluster_offset = 0;
+uint64_t bytes_done = 0;
+QEMUIOVector hd_qiov;
+uint8_t *cluster_data = NULL;
 
-while (acb->remaining_sectors != 0) {
+qemu_iovec_init(&hd_qiov, qiov->niov);
+
+qemu_co_mutex_lock(&s->lock);
+
+while (remaining_sectors != 0) {
 
 /* prepare next request */
-cur_nr_sectors = acb->remaining_sectors;
+cur_nr_sectors = remaining_sectors;
 if (s->crypt_method) {
 cur_nr_sectors = MIN(cur_nr_sectors,
 QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
 }
 
-ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
+ret = qcow2_get_cluster_offset(bs, sector_num << 9,
 &cur_nr_sectors, &cluster_offset);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 
-index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
+index_in_cluster = sector_num & (s->cluster_sectors - 1);
 
-qemu_iovec_reset(&acb->hd_qiov);
-qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
+qemu_iovec_reset(&hd_qiov);
+qemu_iovec_copy(&hd_qiov, qiov, bytes_done,
 cur_nr_sectors * 512);
 
 if (!cluster_offset) {
 
 if (bs->backing_hd) {
 /* read from the base image */
-n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov,
-acb->sector_num, cur_nr_sectors);
+n1 = qcow2_backing_read1(bs->backing_hd, &hd_qiov,
+sector_num, cur_nr_sectors);
 if (n1 > 0) {
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
 qemu_co_mutex_unlock(&s->lock);
-ret = bdrv_co_readv(bs->backing_hd, acb->sector_num,
-n1, &acb->hd_qiov);
+ret = bdrv_co_readv(bs->backing_hd, sector_num,
+n1, &hd_qiov);
 qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 }
 } else {
 /* Note: in this case, no need to wait */
-qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors);
+qemu_iovec_memset(&hd_qiov, 0, 512 * cur_nr_sectors);
 }
 } else if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
 /* add AIO support for compressed blocks ? */
 ret = qcow2_decompress_cluster(bs, cluster_offset);
 if (ret < 0) {
-return ret;
+goto fail;
 }
 
-qemu_iovec_from_buffer(&acb->hd_qiov,
+qemu_iovec_from_buffer(&hd_qiov,
 s->cluster_cache + index_in_cluster * 512,
 512 * cur_nr_sectors);
 } else {
 if ((cluster_offset & 511) != 0) {
-return -EIO;
+ret = -EIO;
+goto fail;
 }
 
 if (s->crypt_method) {
@@ -456,15 +450,15 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
  * For encrypted images, read everything into a temporary
  * contiguous buffer on which the AES functions can work.
  */
-if (!acb->cluster_data) {
-acb->cluster_data =
+if (!cluster_data) {
+cluster_data =
 qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * 
s->cluster_size);
 }
 
 assert(cur_nr_sectors <=
 QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
-qemu_iovec_reset(&acb->hd_qiov);
-qemu_iovec_add(&acb->hd_qiov, acb->cluster_data

[Qemu-devel] [PATCH 4/8] qcow2: remove cluster_offset from QCowAIOCB

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |   22 +++---
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index adf31ce..446946e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -378,7 +378,6 @@ typedef struct QCowAIOCB {
 QEMUIOVector *qiov;
 int remaining_sectors;
 uint64_t bytes_done;
-uint64_t cluster_offset;
 uint8_t *cluster_data;
 QEMUIOVector hd_qiov;
 } QCowAIOCB;
@@ -394,6 +393,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 int index_in_cluster, n1;
 int ret;
 int cur_nr_sectors; /* number of sectors in current iteration */
+uint64_t cluster_offset = 0;
 
 if (acb->remaining_sectors == 0) {
 /* request completed */
@@ -408,7 +408,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 }
 
 ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
-&cur_nr_sectors, &acb->cluster_offset);
+&cur_nr_sectors, &cluster_offset);
 if (ret < 0) {
 return ret;
 }
@@ -419,7 +419,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
 cur_nr_sectors * 512);
 
-if (!acb->cluster_offset) {
+if (!cluster_offset) {
 
 if (bs->backing_hd) {
 /* read from the base image */
@@ -439,9 +439,9 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 /* Note: in this case, no need to wait */
 qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors);
 }
-} else if (acb->cluster_offset & QCOW_OFLAG_COMPRESSED) {
+} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
 /* add AIO support for compressed blocks ? */
-ret = qcow2_decompress_cluster(bs, acb->cluster_offset);
+ret = qcow2_decompress_cluster(bs, cluster_offset);
 if (ret < 0) {
 return ret;
 }
@@ -450,7 +450,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 s->cluster_cache + index_in_cluster * 512,
 512 * cur_nr_sectors);
 } else {
-if ((acb->cluster_offset & 511) != 0) {
+if ((cluster_offset & 511) != 0) {
 return -EIO;
 }
 
@@ -474,7 +474,7 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
 qemu_co_mutex_unlock(&s->lock);
 ret = bdrv_co_readv(bs->file,
-(acb->cluster_offset >> 9) + index_in_cluster,
+(cluster_offset >> 9) + index_in_cluster,
 cur_nr_sectors, &acb->hd_qiov);
 qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
@@ -512,7 +512,6 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
 
 acb->bytes_done = 0;
 acb->remaining_sectors = nb_sectors;
-acb->cluster_offset = 0;
 return acb;
 }
 
@@ -564,6 +563,7 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 int ret;
 int cur_nr_sectors; /* number of sectors in current iteration */
 QCowL2Meta l2meta;
+uint64_t cluster_offset;
 
 l2meta.nb_clusters = 0;
 qemu_co_queue_init(&l2meta.dependent_requests);
@@ -585,8 +585,8 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 return ret;
 }
 
-acb->cluster_offset = l2meta.cluster_offset;
-assert((acb->cluster_offset & 511) == 0);
+cluster_offset = l2meta.cluster_offset;
+assert((cluster_offset & 511) == 0);
 
 qemu_iovec_reset(&acb->hd_qiov);
 qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
@@ -612,7 +612,7 @@ static int qcow2_aio_write_cb(QCowAIOCB *acb)
 BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
 qemu_co_mutex_unlock(&s->lock);
 ret = bdrv_co_writev(bs->file,
- (acb->cluster_offset >> 9) + index_in_cluster,
+ (cluster_offset >> 9) + index_in_cluster,
  cur_nr_sectors, &acb->hd_qiov);
 qemu_co_mutex_lock(&s->lock);
 if (ret < 0) {
-- 
1.7.1




Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Andrea Arcangeli
On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote:
> On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:
> > making
> > sure no lib is calling any I/O function to be able to defreeze the
> > filesystems later, making sure the oom killer or a wrong kill -9
> > $RANDOM isn't killing the agent by mistake while the I/O is blocked
> > and the copy is going.
> 
> Yes with the current API if the agent is killed while the filesystems
> are frozen we are screwed.
> 
> I have just submitted patches that implement a new API that should make
> the virtualization use case more reliable. Basically, I am adding a new
> ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns
> a file descriptor; as long as that file descriptor is held open, the
> filesystem remains open. If the freeze file descriptor is closed (be it
> through a explicit call to close(2) or as part of process exit
> housekeeping) the associated filesystem is automatically thawed.
> 
> - fsfreeze: add ioctl to create a fd for freeze control
>   http://marc.info/?l=linux-fsdevel&m=131175212512290&w=2
> - fsfreeze: add freeze fd ioctls
>   http://marc.info/?l=linux-fsdevel&m=131175220612341&w=2

This is probably how the API should have been implemented originally
instead of FIFREEZE/FITHAW.

It looks a bit overkill though, I would think it'd be enough to have
the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by
closing the file without requiring any of the
FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases
for those if you implemented it, maybe to check if root is stepping on
its own toes by checking if the fs is already freezed before freezing
it and returning failure if it is, running ioctl instead of opening
closing the file isn't necessarily better. At the very least the
get_user(should_freeze, argp) doesn't seem so necessary, it just
complicates the ioctl API a bit without much gain, I think it'd be
cleaner if the FS_FREEZE_FD was the only way to freeze then.

It's certainly a nice reliability improvement and safer API.

Now if you add a file descriptor to epoll/poll that userland can open
and talk to, to know when a fsfreeze is asked on a certain fs, a
fsfreeze userland agent (not virt related too) could open it and start
the scripts if that filesystem is being fsfreezed before calling
freeze_super().

Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze
without any dependency on a virt specific guest agent.

Maybe Christoph's right there are filesystems in userland (not sure
how the storage is related, it's all about filesystems and apps as far
I can see, and it's all blkdev agnostic) that may make things more
complicated, but those usually have a kernel backend too (like
fuse). I may not see the full picture of the filesystem in userland or
how the storage agent in guest userland relates to this.

If you believe having libvirt talking QMP/QAPI over a virtio-serial
vmchannel with some virt specific guest userland agent bypassing qemu
entirely is better, that's ok with me, but there should be a strong
reason for it because the paravirt_fsfreeze.ko approach with a small
qemu backend and a qemu monitor command that starts paravirt-fsfreeze
in guest before going ahead blocking all I/O (to provide backwards
compatibility and reliable snapshots to guest OS that won't have the
paravirt fsfreeze too) looks more reliable, more compact and simpler
to use to me. I'll be surely ok either ways though.

Thanks,
Andrea



[Qemu-devel] Volume key in qcow3?

2011-07-28 Thread Frediano Ziglio
Hi,
  I noted that AES encryption using qcow2 just use the password given
as as key (and also truncating it to 16 bytes == 128 bits).
This is prone to brute force attacks and is not also easy to change
password (you have to decrypt and encrypt again the entire image).
LUKS and EncFS use another way. They generate a random key (the
"volume key") then use the password you give to encrypt N times (where
N is decided by security level or automatically based on time to
decrypt the volume key. To change the password just give the old one,
get the volume key and encrypt again using the new one. LUKS support
also multiple "slots" to allow multiple password and even using an
external key file.
Obviously this require an additional extension to qcow2 so I think it
require a new qcow3 format.

Frediano



Re: [Qemu-devel] [PATCH 00/55] Block layer cleanup & fixes

2011-07-28 Thread Kevin Wolf
Am 20.07.2011 18:23, schrieb Markus Armbruster:
> This patch series looks bigger than it is.  All the patches are small
> and hopefully easy to review.
> 
> Objectives:
> 
> * Push BlockDriverState members locked, tray_open, media_changed into
>   device models, where they belong.
> 
> * BlockDriverState member removable is a confusing mess, replace it.
> 
> * Improve eject -f.
> 
> Also clean up minor messes as they get in the way.
> 
> Part I: Preliminaries
> PATCH 01-09: Assorted block layer fixes and cleanup

Thanks, applied patches 1-9 to the block branch.

> Part II: Move tray state to device models
> PATCH 10-18: IDE tray open/closed
> PATCH 19-22: SCSI tray open/closed
> PATCH 23-25: block layer kill tray_open
> PATCH 26-27: IDE tray lock
> PATCH 28-29: SCSI tray lock
> PATCH 30-32: block layer kill locked
> PATCH 33-35: IDE & SCSI synchronize physical tray on init
> PATCH 36-38: IDE & SCSI migrate tray state
> PATCH 39-41: block layer & fdc media_changed
> 
> Part III: Replace removable
> PATCH 42-44 clean up inappopriate uses of removable
> PATCH 45-46 replace remaining users
> 
> Part IV: Reduce unclean use of block_int.h
> PATCH 47-53
> 
> PART V: Improve eject -f
> PATCH 54-55

I'll wait for a v2 on top of the block branch for the rest of the series.

Kevin



[Qemu-devel] [PATCH 5/8] qcow2: remove common from QCowAIOCB

2011-07-28 Thread Frediano Ziglio

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 446946e..c7445cc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -373,7 +373,7 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector 
*qiov,
 }
 
 typedef struct QCowAIOCB {
-BlockDriverAIOCB common;
+BlockDriverState *bs;
 int64_t sector_num;
 QEMUIOVector *qiov;
 int remaining_sectors;
@@ -388,7 +388,7 @@ typedef struct QCowAIOCB {
  */
 static int qcow2_aio_read_cb(QCowAIOCB *acb)
 {
-BlockDriverState *bs = acb->common.bs;
+BlockDriverState *bs = acb->bs;
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster, n1;
 int ret;
@@ -504,7 +504,7 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
   void *opaque, QCowAIOCB *acb)
 {
 memset(acb, 0, sizeof(*acb));
-acb->common.bs = bs;
+acb->bs = bs;
 acb->sector_num = sector_num;
 acb->qiov = qiov;
 
@@ -556,7 +556,7 @@ static void run_dependent_requests(BDRVQcowState *s, 
QCowL2Meta *m)
  */
 static int qcow2_aio_write_cb(QCowAIOCB *acb)
 {
-BlockDriverState *bs = acb->common.bs;
+BlockDriverState *bs = acb->bs;
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster;
 int n_end;
-- 
1.7.1




Re: [Qemu-devel] [PATCH v5] Add support for Zipit Z2 machine

2011-07-28 Thread Vasily Khoruzhick
On Monday 11 July 2011 18:26:37 Vasily Khoruzhick wrote:
> On Wednesday 06 July 2011 16:52:49 Vasily Khoruzhick wrote:
> > Zipit Z2 is small PXA270 based handheld.
> 
> Ping?

One more ping.



Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu  wrote:
> On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi  wrote:
>> On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu  wrote:
>>> On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti  
>>> wrote:
 On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote:
> Welcome to give me your comments, thanks.
>
> Signed-off-by: Zhi Yong Wu 
> ---
>  Makefile.objs     |    2 +-
>  block.c           |  288 
> +++--
>  block.h           |    1 -
>  block/blk-queue.c |  116 +
>  block/blk-queue.h |   70 +
>  block_int.h       |   28 +
>  blockdev.c        |   21 
>  qemu-config.c     |   24 +
>  qemu-option.c     |   17 +++
>  qemu-option.h     |    1 +
>  qemu-options.hx   |    1 +
>  11 files changed, 559 insertions(+), 10 deletions(-)
>  create mode 100644 block/blk-queue.c
>  create mode 100644 block/blk-queue.h
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 9f99ed4..06f2033 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o 
> cloop.o dmg.o bochs.o vpc.o vv
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
> qcow2-snapshot.o qcow2-cache.o
>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o 
> qed-cluster.o
>  block-nested-y += qed-check.o
> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
> blk-queue.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>  block-nested-$(CONFIG_CURL) += curl.o
> diff --git a/block.c b/block.c
> index 24a25d5..e54e59c 100644
> --- a/block.c
> +++ b/block.c
> @@ -29,6 +29,9 @@
>  #include "module.h"
>  #include "qemu-objects.h"
>
> +#include "qemu-timer.h"
> +#include "block/blk-queue.h"
> +
>  #ifdef CONFIG_BSD
>  #include 
>  #include 
> @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
> sector_num,
>  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
>                           const uint8_t *buf, int nb_sectors);
>
> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
> +        double elapsed_time, uint64_t *wait);
> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
> +        bool is_write, uint64_t *wait);
> +
>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>
> @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename)
>  }
>  #endif
>
> +static int bdrv_io_limits_enable(BlockIOLimit *io_limits)
> +{
> +    if ((io_limits->bps[0] == 0)
> +         && (io_limits->bps[1] == 0)
> +         && (io_limits->bps[2] == 0)
> +         && (io_limits->iops[0] == 0)
> +         && (io_limits->iops[1] == 0)
> +         && (io_limits->iops[2] == 0)) {
> +        return 0;
> +    }
> +
> +    return 1;
> +}
> +
>  /* check if the path starts with ":" */
>  static int path_has_protocol(const char *path)
>  {
> @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size,
>      }
>  }
>
> +static void bdrv_block_timer(void *opaque)
> +{
> +    BlockDriverState *bs = opaque;
> +    BlockQueue *queue = bs->block_queue;
> +
> +    while (!QTAILQ_EMPTY(&queue->requests)) {
> +        BlockIORequest *request;
> +        int ret;
> +
> +        request = QTAILQ_FIRST(&queue->requests);
> +        QTAILQ_REMOVE(&queue->requests, request, entry);
> +
> +        ret = qemu_block_queue_handler(request);
> +        if (ret == 0) {
> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
> +            break;
> +        }
> +
> +        qemu_free(request);
> +    }
> +}
> +
>  void bdrv_register(BlockDriver *bdrv)
>  {
>      if (!bdrv->bdrv_aio_readv) {
> @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char 
> *filename, int flags,
>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>      }
>
> +    /* throttling disk I/O limits */
> +    if (bdrv_io_limits_enable(&bs->io_limits)) {
> +        bs->block_queue = qemu_new_block_queue();
> +        bs->block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, 
> bs);
> +
> +        bs->slice_start[0] = qemu_get_clock_ns(vm_clock);
> +        bs->slice_star

[Qemu-devel] [PATCH 6/8] qcow2: reindent and use while before the big jump

2011-07-28 Thread Frediano Ziglio
prepare to remove read/write callbacks

Signed-off-by: Frediano Ziglio 
---
 block/qcow2.c |  272 -
 1 files changed, 135 insertions(+), 137 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c7445cc..dfb969e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -395,107 +395,105 @@ static int qcow2_aio_read_cb(QCowAIOCB *acb)
 int cur_nr_sectors; /* number of sectors in current iteration */
 uint64_t cluster_offset = 0;
 
-if (acb->remaining_sectors == 0) {
-/* request completed */
-return 0;
-}
-
-/* prepare next request */
-cur_nr_sectors = acb->remaining_sectors;
-if (s->crypt_method) {
-cur_nr_sectors = MIN(cur_nr_sectors,
-QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
-}
+while (acb->remaining_sectors != 0) {
 
-ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
-&cur_nr_sectors, &cluster_offset);
-if (ret < 0) {
-return ret;
-}
-
-index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
-
-qemu_iovec_reset(&acb->hd_qiov);
-qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
-cur_nr_sectors * 512);
-
-if (!cluster_offset) {
-
-if (bs->backing_hd) {
-/* read from the base image */
-n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov,
-acb->sector_num, cur_nr_sectors);
-if (n1 > 0) {
-BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
-qemu_co_mutex_unlock(&s->lock);
-ret = bdrv_co_readv(bs->backing_hd, acb->sector_num,
-n1, &acb->hd_qiov);
-qemu_co_mutex_lock(&s->lock);
-if (ret < 0) {
-return ret;
-}
-}
-} else {
-/* Note: in this case, no need to wait */
-qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors);
+/* prepare next request */
+cur_nr_sectors = acb->remaining_sectors;
+if (s->crypt_method) {
+cur_nr_sectors = MIN(cur_nr_sectors,
+QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
 }
-} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
-/* add AIO support for compressed blocks ? */
-ret = qcow2_decompress_cluster(bs, cluster_offset);
+
+ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
+&cur_nr_sectors, &cluster_offset);
 if (ret < 0) {
 return ret;
 }
 
-qemu_iovec_from_buffer(&acb->hd_qiov,
-s->cluster_cache + index_in_cluster * 512,
-512 * cur_nr_sectors);
-} else {
-if ((cluster_offset & 511) != 0) {
-return -EIO;
-}
+index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
 
-if (s->crypt_method) {
-/*
- * For encrypted images, read everything into a temporary
- * contiguous buffer on which the AES functions can work.
- */
-if (!acb->cluster_data) {
-acb->cluster_data =
-qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+qemu_iovec_reset(&acb->hd_qiov);
+qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
+cur_nr_sectors * 512);
+
+if (!cluster_offset) {
+
+if (bs->backing_hd) {
+/* read from the base image */
+n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov,
+acb->sector_num, cur_nr_sectors);
+if (n1 > 0) {
+BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING_AIO);
+qemu_co_mutex_unlock(&s->lock);
+ret = bdrv_co_readv(bs->backing_hd, acb->sector_num,
+n1, &acb->hd_qiov);
+qemu_co_mutex_lock(&s->lock);
+if (ret < 0) {
+return ret;
+}
+}
+} else {
+/* Note: in this case, no need to wait */
+qemu_iovec_memset(&acb->hd_qiov, 0, 512 * cur_nr_sectors);
+}
+} else if (cluster_offset & QCOW_OFLAG_COMPRESSED) {
+/* add AIO support for compressed blocks ? */
+ret = qcow2_decompress_cluster(bs, cluster_offset);
+if (ret < 0) {
+return ret;
 }
 
-assert(cur_nr_sectors <=
-QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
-qemu_iovec_reset(&acb->hd_qiov);
-qemu_iovec_add(&acb->hd_qiov, acb->cluster_data,
+qemu_iovec_from_buffer(&acb->hd_qiov,
+s->cluster_cache + index_in_cluster * 512,
 512 * cur_nr_sectors);
-}
+} else {
+if ((cluster_offset & 511) != 0) {
+ 

Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 9:20 AM, Stefan Hajnoczi  wrote:
> On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu  wrote:
>> On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi  wrote:
>>> On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu  wrote:
 On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti  
 wrote:
> On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote:
>> Welcome to give me your comments, thanks.
>>
>> Signed-off-by: Zhi Yong Wu 
>> ---
>>  Makefile.objs     |    2 +-
>>  block.c           |  288 
>> +++--
>>  block.h           |    1 -
>>  block/blk-queue.c |  116 +
>>  block/blk-queue.h |   70 +
>>  block_int.h       |   28 +
>>  blockdev.c        |   21 
>>  qemu-config.c     |   24 +
>>  qemu-option.c     |   17 +++
>>  qemu-option.h     |    1 +
>>  qemu-options.hx   |    1 +
>>  11 files changed, 559 insertions(+), 10 deletions(-)
>>  create mode 100644 block/blk-queue.c
>>  create mode 100644 block/blk-queue.h
>>
>> diff --git a/Makefile.objs b/Makefile.objs
>> index 9f99ed4..06f2033 100644
>> --- a/Makefile.objs
>> +++ b/Makefile.objs
>> @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o 
>> cloop.o dmg.o bochs.o vpc.o vv
>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
>> qcow2-snapshot.o qcow2-cache.o
>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o 
>> qed-cluster.o
>>  block-nested-y += qed-check.o
>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
>> blk-queue.o
>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>  block-nested-$(CONFIG_CURL) += curl.o
>> diff --git a/block.c b/block.c
>> index 24a25d5..e54e59c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -29,6 +29,9 @@
>>  #include "module.h"
>>  #include "qemu-objects.h"
>>
>> +#include "qemu-timer.h"
>> +#include "block/blk-queue.h"
>> +
>>  #ifdef CONFIG_BSD
>>  #include 
>>  #include 
>> @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
>> sector_num,
>>  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
>>                           const uint8_t *buf, int nb_sectors);
>>
>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
>> +        double elapsed_time, uint64_t *wait);
>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>> +        bool is_write, uint64_t *wait);
>> +
>>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>>
>> @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename)
>>  }
>>  #endif
>>
>> +static int bdrv_io_limits_enable(BlockIOLimit *io_limits)
>> +{
>> +    if ((io_limits->bps[0] == 0)
>> +         && (io_limits->bps[1] == 0)
>> +         && (io_limits->bps[2] == 0)
>> +         && (io_limits->iops[0] == 0)
>> +         && (io_limits->iops[1] == 0)
>> +         && (io_limits->iops[2] == 0)) {
>> +        return 0;
>> +    }
>> +
>> +    return 1;
>> +}
>> +
>>  /* check if the path starts with ":" */
>>  static int path_has_protocol(const char *path)
>>  {
>> @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size,
>>      }
>>  }
>>
>> +static void bdrv_block_timer(void *opaque)
>> +{
>> +    BlockDriverState *bs = opaque;
>> +    BlockQueue *queue = bs->block_queue;
>> +
>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>> +        BlockIORequest *request;
>> +        int ret;
>> +
>> +        request = QTAILQ_FIRST(&queue->requests);
>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>> +
>> +        ret = qemu_block_queue_handler(request);
>> +        if (ret == 0) {
>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>> +            break;
>> +        }
>> +
>> +        qemu_free(request);
>> +    }
>> +}
>> +
>>  void bdrv_register(BlockDriver *bdrv)
>>  {
>>      if (!bdrv->bdrv_aio_readv) {
>> @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char 
>> *filename, int flags,
>>              bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
>>      }
>>
>> +    /* throttling disk I/O limits */
>> +    if (bdrv_io_limits_enable(&bs->io_limits)) {
>> +        bs->block_queue = qemu_new_block_queue();
>

Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention

2011-07-28 Thread Michael S. Tsirkin
On Thu, Jul 28, 2011 at 04:23:24PM +0900, Isaku Yamahata wrote:
> This might be a bit late comment...
> 
> On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote:
> > diff --git a/hw/pci_host.c b/hw/pci_host.c
> > index 728e2d4..bfdc321 100644
> > --- a/hw/pci_host.c
> > +++ b/hw/pci_host.c
> > @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus 
> > *bus, uint32_t addr)
> >  return pci_find_device(bus, bus_num, devfn);
> >  }
> >  
> > +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr,
> > + uint32_t limit, uint32_t val, uint32_t len)
> > +{
> > +assert(len <= 4);
> > +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr));
> > +}
> > +
> > +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr,
> > +uint32_t limit, uint32_t len)
> > +{
> > +assert(len <= 4);
> > +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr));
> > +}
> > +
> 
> Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < 
> addr.
> So we need explicit "if (limit < addr) return;".
> Here's the patch for pci branch.
> 
> >From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001
> Message-Id: 
> <75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp>
> From: Isaku Yamahata 
> Date: Thu, 28 Jul 2011 16:20:28 +0900
> Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common
> 
> This patch adds boundary check in pci_host_config_read/write_common()
> Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < 
> addr.
> So we need explicit "if (limit <= addr) return;"
> 
> Signed-off-by: Isaku Yamahata 

I don't see a problem with this, but could you please clarify when does
this happen? I think this is only possible for a pci device
behind an express root. If so, this belongs in pcie_host.c

I'd also like this info to be recorded in the commit log.

> ---
>  hw/pci_host.c |6 ++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/pci_host.c b/hw/pci_host.c
> index 2e8a29f..71fd3a1 100644
> --- a/hw/pci_host.c
> +++ b/hw/pci_host.c
> @@ -51,6 +51,9 @@ void pci_host_config_write_common(PCIDevice *pci_dev, 
> uint32_t addr,
>uint32_t limit, uint32_t val, uint32_t len)
>  {
>  assert(len <= 4);
> +if (limit <= addr) {
> +return;
> +}
>  pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr));
>  }
>  
> @@ -58,6 +61,9 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, 
> uint32_t addr,
>   uint32_t limit, uint32_t len)
>  {
>  assert(len <= 4);
> +if (limit <= addr) {
> +return 0;
> +}
>  return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr));
>  }
>  
> -- 
> 1.7.1.1
> 
> -- 
> yamahata



Re: [Qemu-devel] [PATCH] add QEMU_LD_PREFIX environment variable

2011-07-28 Thread Riku Voipio
On Sat, Jul 23, 2011 at 07:47:49AM +0200, josch wrote:
> This could be avoided by setting the proposed environment variable 
> QEMU_LD_PREFIX to the just
> created debian rootfs. As mentioned earlier, the usage of the -L option
> is not possible in this scenario because qemu-user is only implicitly
> called by the binfmt mechanism.

What worries me here is that we are beginning to add a enviroment variable
for each and every command line option of qemu linux-user. I think it would
be better to have a wrapper binary to be registered as the binfmt runner.
Alternatively we should have a generic setup for mapping enviroment variables
to command line options. Now we get special per-option code every time someone 
needs to setup a command line option from binfmt.

Riku



Re: [Qemu-devel] [PATCH v2 0/5] balloon: fix memleaks, invalid arguments, unplug

2011-07-28 Thread Michael S. Tsirkin
On Thu, Jul 28, 2011 at 11:47:11AM +0530, Amit Shah wrote:
> Hello,
> 
> This series is on top of the other balloon series for which I sent a
> pull request on Tuesday.
> 
> This series fixes memleak on exit, unregisters the savevm section on
> unplug, disallows negative values as ballooning targets and doesn't
> allow multiple balloon device registrations.
> 
> v2 contains small tweaks suggested:
>  - Error is shown by balloon.c instead of virtio-balloon.c in patch 1
>(mst)
>  - Filter out negative input in do_balloon() and add qerror message
>(Markus)
>  - Separate out savevm section unregistering from memleak fix
> 
> 
> Amit Shah (5):
>   balloon: Don't allow multiple balloon handler registrations
>   virtio-balloon: Check if balloon registration failed
>   balloon: Ignore negative balloon values
>   virtio-balloon: Add exit handler, fix memleaks
>   virtio-balloon: Unregister savevm section on device unplug


Acked-by: Michael S. Tsirkin 

>  balloon.c   |   20 +---
>  balloon.h   |4 ++--
>  hw/virtio-balloon.c |   18 +-
>  hw/virtio-pci.c |   14 +-
>  hw/virtio.h |1 +
>  5 files changed, 50 insertions(+), 7 deletions(-)
> 
> -- 
> 1.7.6



Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.

2011-07-28 Thread Zhi Yong Wu
On Thu, Jul 28, 2011 at 4:25 PM, Stefan Hajnoczi  wrote:
> On Thu, Jul 28, 2011 at 9:20 AM, Stefan Hajnoczi  wrote:
>> On Thu, Jul 28, 2011 at 6:43 AM, Zhi Yong Wu  wrote:
>>> On Wed, Jul 27, 2011 at 8:58 PM, Stefan Hajnoczi  wrote:
 On Wed, Jul 27, 2011 at 11:17 AM, Zhi Yong Wu  wrote:
> On Wed, Jul 27, 2011 at 3:26 AM, Marcelo Tosatti  
> wrote:
>> On Tue, Jul 26, 2011 at 04:59:06PM +0800, Zhi Yong Wu wrote:
>>> Welcome to give me your comments, thanks.
>>>
>>> Signed-off-by: Zhi Yong Wu 
>>> ---
>>>  Makefile.objs     |    2 +-
>>>  block.c           |  288 
>>> +++--
>>>  block.h           |    1 -
>>>  block/blk-queue.c |  116 +
>>>  block/blk-queue.h |   70 +
>>>  block_int.h       |   28 +
>>>  blockdev.c        |   21 
>>>  qemu-config.c     |   24 +
>>>  qemu-option.c     |   17 +++
>>>  qemu-option.h     |    1 +
>>>  qemu-options.hx   |    1 +
>>>  11 files changed, 559 insertions(+), 10 deletions(-)
>>>  create mode 100644 block/blk-queue.c
>>>  create mode 100644 block/blk-queue.h
>>>
>>> diff --git a/Makefile.objs b/Makefile.objs
>>> index 9f99ed4..06f2033 100644
>>> --- a/Makefile.objs
>>> +++ b/Makefile.objs
>>> @@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o 
>>> cloop.o dmg.o bochs.o vpc.o vv
>>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
>>> qcow2-snapshot.o qcow2-cache.o
>>>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o 
>>> qed-cluster.o
>>>  block-nested-y += qed-check.o
>>> -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>> +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
>>> blk-queue.o
>>>  block-nested-$(CONFIG_WIN32) += raw-win32.o
>>>  block-nested-$(CONFIG_POSIX) += raw-posix.o
>>>  block-nested-$(CONFIG_CURL) += curl.o
>>> diff --git a/block.c b/block.c
>>> index 24a25d5..e54e59c 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -29,6 +29,9 @@
>>>  #include "module.h"
>>>  #include "qemu-objects.h"
>>>
>>> +#include "qemu-timer.h"
>>> +#include "block/blk-queue.h"
>>> +
>>>  #ifdef CONFIG_BSD
>>>  #include 
>>>  #include 
>>> @@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, 
>>> int64_t sector_num,
>>>  static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
>>>                           const uint8_t *buf, int nb_sectors);
>>>
>>> +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int 
>>> nb_sectors,
>>> +        bool is_write, double elapsed_time, uint64_t *wait);
>>> +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool 
>>> is_write,
>>> +        double elapsed_time, uint64_t *wait);
>>> +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
>>> +        bool is_write, uint64_t *wait);
>>> +
>>>  static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
>>>      QTAILQ_HEAD_INITIALIZER(bdrv_states);
>>>
>>> @@ -90,6 +100,20 @@ int is_windows_drive(const char *filename)
>>>  }
>>>  #endif
>>>
>>> +static int bdrv_io_limits_enable(BlockIOLimit *io_limits)
>>> +{
>>> +    if ((io_limits->bps[0] == 0)
>>> +         && (io_limits->bps[1] == 0)
>>> +         && (io_limits->bps[2] == 0)
>>> +         && (io_limits->iops[0] == 0)
>>> +         && (io_limits->iops[1] == 0)
>>> +         && (io_limits->iops[2] == 0)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    return 1;
>>> +}
>>> +
>>>  /* check if the path starts with ":" */
>>>  static int path_has_protocol(const char *path)
>>>  {
>>> @@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size,
>>>      }
>>>  }
>>>
>>> +static void bdrv_block_timer(void *opaque)
>>> +{
>>> +    BlockDriverState *bs = opaque;
>>> +    BlockQueue *queue = bs->block_queue;
>>> +
>>> +    while (!QTAILQ_EMPTY(&queue->requests)) {
>>> +        BlockIORequest *request;
>>> +        int ret;
>>> +
>>> +        request = QTAILQ_FIRST(&queue->requests);
>>> +        QTAILQ_REMOVE(&queue->requests, request, entry);
>>> +
>>> +        ret = qemu_block_queue_handler(request);
>>> +        if (ret == 0) {
>>> +            QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
>>> +            break;
>>> +        }
>>> +
>>> +        qemu_free(request);
>>> +    }
>>> +}
>>> +
>>>  void bdrv_register(BlockDriver *bdrv)
>>>  {
>>>      if (!bdrv->bdrv_aio_readv) {
>>> @@ -642,6 +688,15 @@ int bdrv_open(BlockDriverState *bs, const char 
>>> *filename, int flags,
>>>              bs->change_cb(bs->change_opaque,

Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Jes Sorensen
On 07/27/11 18:40, Andrea Arcangeli wrote:
>> Another thing to note is that snapshotting is not necessarily something 
>> > that should be completely transparent to the guest. One of the planned 
>> > future features for the guest agent (mentioned in the snapshot wiki, and 
>> > a common use case that I've seen come up elsewhere as well in the 
>> > context of database applications), is a way for userspace applications 
>> > to register callbacks to be made in the event of a freeze (dumping 
>> > application-managed caches to disk and things along that line). The 
> Not sure if the scripts are really needed or if they would just open a
> brand new fsfreeze specific unix domain socket (created by the
> database) to tell the database to freeze.
> 
> If the latter is the case, then it'd be better rather than changing
> the database to open unix domain socket so the script can connect to
> it when invoked (or maybe to just add some new function to the
> protocol of an existing open unix domain socket), to instead change
> the database to open a /dev/virtio-fsfreeze device, created by the
> virtio-fsfreeze.ko virtio driver through udev. The database would poll
> it, and it could read the request to freeze, and write into it that it
> finished freezing when done. Then when all openers of the device
> freezed, the virtio-fsfreeze.ko would go ahead freezing all the
> filesystems, and then tell qemu when it's finished freezing. Then qemu
> can finally block all the I/O and tell libvirt to go ahead with the
> snapshot.

I think it could also be a combined operation, ie. having the freeze
happen in the kernel, but doing the callouts using a userspace daemon. I
like the userspace daemon for the callouts because it allows providing a
more sophisticated API than if we provide just a socket like interface.
In addition the callout is less critical wrt crashes than the fsfreeze
operations.

Cheers,
Jes



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Jes Sorensen
On 07/27/11 20:36, Christoph Hellwig wrote:
> Initiating the freeze from kernelspace doesn't make much sense.  With
> virtio we could add in-band freeze request to the protocol, and although
> that would be a major change in that way virtio-blk works right now it's
> at least doable.  But all other "real" storage targets only communicate
> with their initators over out of band procotols that are entirely handled
> in userspace, and given their high-level nature better are - that is if
> we know them at all given how vendors like to keep this secrete IP
> closed and just offer userspace management tools in binary form.
> 
> building new infrastructure in the kernel just for virtio, while needing
> to duplicate the same thing in userspace for all real storage seems like
> a really bad idea.  That is in addition to the userspace freeze notifier
> similar to what e.g. Windows has - if the freeze process is driven from
> userspace it's much easier to handle those properly compared to requiring
> kernel upcalls.
> 

The freeze operation would really just be a case of walking the list of
mounted file systems and calling the FIFREEZE ioctl operation on them. I
wouldn't anticipate doing anything else in a virtio-fsfreeze.ko module.

Cheers,
Jes




Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Stefan Hajnoczi
On Wed, Jul 27, 2011 at 5:02 PM, Anthony Liguori  wrote:
> On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote:
>>
>> On Wed, Jul 27, 2011 at 1:58 PM, Anthony Liguori
>>  wrote:

 Index: qemu/hmp-commands.hx
 ===
 --- qemu.orig/hmp-commands.hx
 +++ qemu/hmp-commands.hx
 @@ -70,6 +70,20 @@ but should be used with extreme caution.
  resizes image files, it can not resize block devices like LVM volumes.
  ETEXI

 +    {
 +        .name       = "block_set",
 +        .args_type  = "device:B,device:O",
 +        .params     = "device [prop=value][,...]",
 +        .help       = "Change block device parameters
 [hostcache=on/off]",
 +        .user_print = monitor_user_noop,
 +        .mhandler.cmd_new = do_block_set,
 +    },
 +STEXI
 +@item block_set @var{config}
 +@findex block_set
 +Change block device parameters (eg: hostcache=on/off) while guest is
 running.
 +ETEXI
 +
>>>
>>> block_set_hostcache() please.
>>>
>>> Multiplexing commands is generally a bad idea.  It weakens typing.  In
>>> the
>>> absence of a generic way to set block device properties, implementing
>>> properties as generic in the QMP layer seems like a bad idea to me.
>>
>> The idea behind block_set was to have a unified interface for changing
>> block device parameters at runtime.  This prevents us from reinventing
>> new commands from scratch.  For example, block I/O throttling is
>> already queued up to add run-time parameters.
>>
>> Without a unified command we have a bulkier QMP/HMP interface,
>> duplicated code, and possibly inconsistencies in syntax between the
>> commands.  Isn't the best way to avoid these problems a unified
>> interface?
>>
>> I understand the lack of type safety concern but in this case we
>> already have to manually pull parsed arguments (i.e. cast to specific
>> types and deal with invalid input).  To me this is a reason *for*
>> using a unified interface like block_set.
>
> Think about it from a client perspective.  How do I determine which
> properties are supported by this version of QEMU?  I have no way to identify
> programmatically what arguments are valid for block_set.
>
> OTOH, if you have strong types like block_set_hostcache, query-commands
> tells me exactly what's supported.

Use query-block and see if 'hostcache' is there.  If yes, then the
hostcache parameter is available.  If we allow BlockDrivers to have
their own runtime parameters then query-commands does not tell you
anything because the specific BlockDriver may or may not support that
runtime parameter - you need to use query-block.

Stefan



Re: [Qemu-devel] [PATCH 00/55] Block layer cleanup & fixes

2011-07-28 Thread Amit Shah
On (Wed) 20 Jul 2011 [18:23:34], Markus Armbruster wrote:
> This patch series looks bigger than it is.  All the patches are small
> and hopefully easy to review.

Nice series!

ACK the entire series (barring the non-qdev users connecting to bdrv
and scsi patches, which I didn't quite follow).

Amit



Re: [Qemu-devel] [PATCH v2 3/5] balloon: Ignore negative balloon values

2011-07-28 Thread Amit Shah
On (Thu) 28 Jul 2011 [09:31:53], Markus Armbruster wrote:
> Amit Shah  writes:
> 
> > Negative balloon values don't make sense, ignore them.
> 
> Actually, they aren't ignored, they're rejected.

Will update commit log.

Amit



Re: [Qemu-devel] [PATCH v2 4/5] virtio-balloon: Add exit handler, fix memleaks

2011-07-28 Thread Amit Shah
On (Thu) 28 Jul 2011 [09:39:40], Markus Armbruster wrote:
> Amit Shah  writes:
> 
> > On (Thu) 28 Jul 2011 [11:47:15], Amit Shah wrote:
> >> Add an exit handler that will free up RAM and unregister the savevm
> >> section after a virtio-balloon device is unplugged.
> >
> > This commit message should be changed; I'll do that in the pull
> > request I send out.
> 
> You mean drop "unregister the savevm", because it has become PATCH 5/5?

Right.

Amit



Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug

2011-07-28 Thread Amit Shah
On (Thu) 28 Jul 2011 [09:45:44], Markus Armbruster wrote:
> Amit Shah  writes:
> 
> > Migrating after unplugging a virtio-balloon device resulted in an error
> > message on the destination:
> >
> > Unknown savevm section or instance ':00:04.0/virtio-balloon' 0
> > load of migration failed
> >
> > Fix this by unregistering the section on device unplug.
> >
> > Signed-off-by: Amit Shah 
> > ---
> >  hw/virtio-balloon.c |4 
> >  1 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
> > index 0ce0049..072a88a 100644
> > --- a/hw/virtio-balloon.c
> > +++ b/hw/virtio-balloon.c
> > @@ -45,6 +45,7 @@ typedef struct VirtIOBalloon
> >  size_t stats_vq_offset;
> >  MonitorCompletion *stats_callback;
> >  void *stats_opaque_callback_data;
> > +DeviceState *qdev;
> >  } VirtIOBalloon;
> 
> All the other virtio device structs already have such a pointer back to
> the proxy.  Suggests that it should live in VirtIODevice, and be set up
> in generic code.  Again, outside the scope of this series.
> 
> I hate the virtio pointer thicket.

Yep :-(

Amit



[Qemu-devel] [PULL v2][ALSO STABLE] balloon: multiple fixes and cleanups

2011-07-28 Thread Amit Shah
Hello,

This new pull request carries the first 7 patches from the earlier
pull request that cleaned up balloon code and fixed a use-after-free
segfault on issuing 'balloon 0'.

This series adds 5 more patches that fix:
 - a memleak on unplug, 
 - migration after unplug,
 - negative balloon values by rejecting them

It also rejects creating multiple balloon devices.

Please pull.

Makes sense to pull these in for stable as well.

The following changes since commit c886edfb851c0c590d4e77f058f2ec8ed95ad1b5:

  Let users select their pythons (2011-07-25 16:50:12 +)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/qemu/amit/misc.git for-anthony

Amit Shah (12):
  balloon: Make functions, local vars static
  balloon: Add braces around if statements
  balloon: Simplify code flow
  virtio-balloon: Separate status handling into separate function
  balloon: Separate out stat and balloon handling
  balloon: Fix header comment; add Copyright
  virtio-balloon: Fix header comment; add Copyright
  balloon: Don't allow multiple balloon handler registrations
  virtio-balloon: Check if balloon registration failed
  balloon: Reject negative balloon values
  virtio-balloon: Add exit handler, fix memleaks
  virtio-balloon: Unregister savevm section on device unplug

 balloon.c   |   61 ++---
 balloon.h   |   12 +++-
 hw/virtio-balloon.c |   76 ++-
 hw/virtio-pci.c |   14 +-
 hw/virtio.h |1 +
 5 files changed, 109 insertions(+), 55 deletions(-)


Amit



Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Supriya Kannery

On 07/27/2011 09:32 PM, Anthony Liguori wrote:

On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote:

On Wed, Jul 27, 2011 at 1:58 PM, Anthony
Liguori wrote:

Index: qemu/hmp-commands.hx
===
--- qemu.orig/hmp-commands.hx
+++ qemu/hmp-commands.hx
@@ -70,6 +70,20 @@ but should be used with extreme caution.
resizes image files, it can not resize block devices like LVM volumes.
ETEXI

+ {
+ .name = "block_set",
+ .args_type = "device:B,device:O",
+ .params = "device [prop=value][,...]",
+ .help = "Change block device parameters
[hostcache=on/off]",
+ .user_print = monitor_user_noop,
+ .mhandler.cmd_new = do_block_set,
+ },
+STEXI
+@item block_set @var{config}
+@findex block_set
+Change block device parameters (eg: hostcache=on/off) while guest is
running.
+ETEXI
+


block_set_hostcache() please.

Multiplexing commands is generally a bad idea. It weakens typing. In the
absence of a generic way to set block device properties, implementing
properties as generic in the QMP layer seems like a bad idea to me.


The idea behind block_set was to have a unified interface for changing
block device parameters at runtime. This prevents us from reinventing
new commands from scratch. For example, block I/O throttling is
already queued up to add run-time parameters.

Without a unified command we have a bulkier QMP/HMP interface,
duplicated code, and possibly inconsistencies in syntax between the
commands. Isn't the best way to avoid these problems a unified
interface?

I understand the lack of type safety concern but in this case we
already have to manually pull parsed arguments (i.e. cast to specific
types and deal with invalid input). To me this is a reason *for*
using a unified interface like block_set.


Think about it from a client perspective. How do I determine which
properties are supported by this version of QEMU? I have no way to
identify programmatically what arguments are valid for block_set.



   User can programmatically find out valid parameters for
block_set. Internally, validation of prop=value is done against -drive
parameter list and then, only the valid/implemented commands (for now, 
hostcache) are accepted from that list. Please find execution output 
attached:


(qemu) info block
ide0-hd0: removable=0 hostcache=1 file=../rhel6-32.qcow2 ro=0 drv=qcow2 
encrypted=0
floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw 
encrypted=0

ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted]
sd0: removable=1 locked=0 hostcache=1 [not inserted]
(qemu) block
block_resize  block_set block_passwd
(qemu) block_set
block_set: block device name expected
(qemu) block
block_resize  block_set block_passwd
(qemu) help block_set
block_set device [prop=value][,...] -- Change block device parameters 
[hostcache=on/off]

(qemu) block_set ide
Device 'ide' not found
(qemu) block_set ide0-hd0
Usage: block_set device [prop=value][,...]
(qemu) block_set ide0-hd0 cache
Invalid parameter 'cache'
(qemu) block_set ide0-hd0 cache=on
Parameter 'hostcache' expects on/off
(qemu) block_set ide0-hd0 hostcache=on
(qemu) block_set ide0-hd0 hostcache=off
(qemu) info block
ide0-hd0: removable=0 hostcache=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2 
encrypted=0
floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw 
encrypted=0

ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted]
sd0: removable=1 locked=0 hostcache=1 [not inserted]


 When we add further parameters, "usage" string can be enhanced to
include those parameters for informing user.

- Thanks, Supriya


OTOH, if you have strong types like block_set_hostcache, query-commands
tells me exactly what's supported.

Regards,

Anthony Liguori



Stefan









Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 27.07.2011 14:37, schrieb Stefan Hajnoczi:
> Hi,
> Here is a list of block layer and storage changes that have been discussed.  
> It
> is useful to have a roadmap of changes in order to avoid duplication, allow
> more developers to contribute, and to communicate the direction of storage in
> QEMU.
> 
> I suggest we first do a braindump of all changes that have been discussed.
> Later we can discuss specific changes and if/when they fit into the roadmap -
> please don't jump into discussion about specific changes yet.
> 
> Kevin: I hope this a useful starting point.  Here are all the items
> that I am aware of:
> 
> =Material for next QEMU release=
> 
> Coroutines in the block layer [Kevin]
>  * Programming model to simplify block drivers without blocking QEMU threads
> 
> Generic copy-on-read [Stefan]
>  * Populate image file to avoid fetching same block from backing file
> again later
> 
> Generic image streaming [Stefan]
>  * Make block_stream commands available for all image formats that
> support backing files
> 
> Live block copy [Marcelo/Kevin/Stefan?]
>  * Copy the contents of an image file while a guest is using it
> 
> In-place qcow2 <-> qed conversion [Devin, GSoC 2011]:
>  * Fast conversion between qcow2 and qed image formats without copy all data
> 
> VMDK enhancements [Fam, GSoC 2011]
>  * Implement latest VMDK specs to support modern image files
> 
> Block I/O limits [Zhi Yong]
>  * Resource control for guest I/O bandwidth/iops consumption

Another item that just came up on IRC again: Allow guests to toggle WCE,
so that we finally can get rid of the cache=writethrough default. We
should definitely get this done before 1.0.

Kevin



[Qemu-devel] [PATCH 0.16 v3 0/3] make endian-independent unaligned memory access functions available to libhw

2011-07-28 Thread Paolo Bonzini
Functions like ldl_be_p and ldl_le_p are currently used only as building
blocks for {ld,st}XX_p.  As such, they are in cpu-all.h even though they
have absolutely no dependency on the target.

In order to make them globally available, this series moves them to
bswap.h instead.

An interesting part of this is that there are functions also for floating
point values.  Leaving them in cpu-all.h would be possible but untidy.
In fact handling these is easy, but it requires to make softfloat.h
target-dependent as well.  This is what patch 2 does.

v2->v3:
rebase

v1->v2:
rebase, use softfloat-specialize.h instead of introducing
softfloat-target.h

Paolo Bonzini (3):
  move WORDS_ALIGNED to qemu-common.h
  softfloat: change default nan definitions to variables
  move unaligned memory access functions to bswap.h

 Makefile.hw|2 +-
 bswap.h|  474 
 configure  |3 +-
 cpu-all.h  |  446 +-
 cpu-common.h   |4 -
 fpu/softfloat-specialize.h |   72 +++
 fpu/softfloat.h|   60 +-
 qemu-common.h  |4 +
 8 files changed, 562 insertions(+), 503 deletions(-)

-- 
1.7.6




[Qemu-devel] [PATCH 0.16 v3 1/3] move WORDS_ALIGNED to qemu-common.h

2011-07-28 Thread Paolo Bonzini
This is not a CPU interface, and a configure test would not be too
precise.  So just add it to qemu-common.h.

Signed-off-by: Paolo Bonzini 
---
 cpu-common.h  |4 
 qemu-common.h |4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 44b04b3..16c9f4f 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -3,10 +3,6 @@
 
 /* CPU interfaces that are target indpendent.  */
 
-#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || 
defined(__hppa__) || defined(__ia64__)
-#define WORDS_ALIGNED
-#endif
-
 #ifdef TARGET_PHYS_ADDR_BITS
 #include "targphys.h"
 #endif
diff --git a/qemu-common.h b/qemu-common.h
index ba55719..43ebaca 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -5,6 +5,10 @@
 #include "compiler.h"
 #include "config-host.h"
 
+#if defined(__arm__) || defined(__sparc__) || defined(__mips__) || 
defined(__hppa__) || defined(__ia64__)
+#define WORDS_ALIGNED
+#endif
+
 #define TFR(expr) do { if ((expr) != -1) break; } while (errno == EINTR)
 
 typedef struct QEMUTimer QEMUTimer;
-- 
1.7.6





[Qemu-devel] [PATCH 0.16 v3 2/3] softfloat: change default nan definitions to variables

2011-07-28 Thread Paolo Bonzini
Most definitions in softfloat.h are really target-independent, but the
file is not because it includes definitions of the default NaN values.
Change those to variables to allow including softfloat.h from files that
are not compiled per-target.  By making them const, the compiler is
allowed to optimize them into softfloat functions that use them.

Signed-off-by: Paolo Bonzini 
---
 fpu/softfloat-specialize.h |   72 
 fpu/softfloat.h|   60 +---
 2 files changed, 81 insertions(+), 51 deletions(-)

diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h
index c7d35a1..c165205 100644
--- a/fpu/softfloat-specialize.h
+++ b/fpu/softfloat-specialize.h
@@ -35,6 +35,78 @@ these four paragraphs for those parts of this code that are 
retained.
 
 =*/
 
+#if defined(TARGET_MIPS) || defined(TARGET_SH4) || defined(TARGET_UNICORE32)
+#define SNAN_BIT_IS_ONE1
+#else
+#define SNAN_BIT_IS_ONE0
+#endif
+
+/*
+| The pattern for a default generated half-precision NaN.
+**/
+#if defined(TARGET_ARM)
+const float16 float16_default_nan = const_float16(0x7E00);
+#elif SNAN_BIT_IS_ONE
+const float16 float16_default_nan = const_float16(0x7DFF);
+#else
+const float16 float16_default_nan = const_float16(0xFE00);
+#endif
+
+/*
+| The pattern for a default generated single-precision NaN.
+**/
+#if defined(TARGET_SPARC)
+const float32 float32_default_nan = const_float32(0x7FFF);
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+const float32 float32_default_nan = const_float32(0x7FC0);
+#elif SNAN_BIT_IS_ONE
+const float32 float32_default_nan = const_float32(0x7FBF);
+#else
+const float32 float32_default_nan = const_float32(0xFFC0);
+#endif
+
+/*
+| The pattern for a default generated double-precision NaN.
+**/
+#if defined(TARGET_SPARC)
+const float64 float64_default_nan = const_float64(LIT64( 0x7FFF ));
+#elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA)
+const float64 float64_default_nan = const_float64(LIT64( 0x7FF8 ));
+#elif SNAN_BIT_IS_ONE
+const float64 float64_default_nan = const_float64(LIT64( 0x7FF7 ));
+#else
+const float64 float64_default_nan = const_float64(LIT64( 0xFFF8 ));
+#endif
+
+/*
+| The pattern for a default generated extended double-precision NaN.
+**/
+#if SNAN_BIT_IS_ONE
+#define floatx80_default_nan_high 0x7FFF
+#define floatx80_default_nan_low  LIT64( 0xBFFF )
+#else
+#define floatx80_default_nan_high 0x
+#define floatx80_default_nan_low  LIT64( 0xC000 )
+#endif
+
+const floatx80 floatx80_default_nan = make_floatx80(floatx80_default_nan_high,
+floatx80_default_nan_low);
+
+/*
+| The pattern for a default generated quadruple-precision NaN.  The `high' and
+| `low' values hold the most- and least-significant bits, respectively.
+**/
+#if SNAN_BIT_IS_ONE
+#define float128_default_nan_high LIT64( 0x7FFF7FFF )
+#define float128_default_nan_low  LIT64( 0x )
+#else
+#define float128_default_nan_high LIT64( 0x8000 )
+#define float128_default_nan_low  LIT64( 0x )
+#endif
+
+const float128 float128_default_nan = make_float128(float128_default_nan_high,
+float128_default_nan_low);
+
 /*
 | Raises the exceptions specified by `flags'.  Floating-point traps can be
 | defined here if desired.  It is currently not possible for such a trap
diff --git a/fpu/softfloat.h b/fpu/softfloat.h
index bde2500..3bb7d8f 100644
--- a/fpu/softfloat.h
+++ b/fpu/softfloat.h
@@ -43,7 +43,7 @@ these four paragraphs for those parts of this code that are 
retained.
 #endif
 
 #include 
-#include "config.h"
+#include "config-host.h"
 
 /*
 | Each of the following `typedef's defines the most convenient type that holds
@@ -68,12 +68,6 @@ typedef int64_t int64;

[Qemu-devel] [PATCH 0.16 v3 3/3] move unaligned memory access functions to bswap.h

2011-07-28 Thread Paolo Bonzini
This is just code movement, and moving the fpu/ include path from
target-dependent to target-independent Make variables.

Signed-off-by: Paolo Bonzini 
---
 Makefile.hw |2 +-
 bswap.h |  474 +++
 configure   |3 +-
 cpu-all.h   |  446 +---
 4 files changed, 477 insertions(+), 448 deletions(-)

diff --git a/Makefile.hw b/Makefile.hw
index b9181ab..659e441 100644
--- a/Makefile.hw
+++ b/Makefile.hw
@@ -9,7 +9,7 @@ include $(SRC_PATH)/rules.mak
 
 $(call set-vpath, $(SRC_PATH):$(SRC_PATH)/hw)
 
-QEMU_CFLAGS+=-I.. -I$(SRC_PATH)/fpu
+QEMU_CFLAGS+=-I..
 
 include $(SRC_PATH)/Makefile.objs
 
diff --git a/bswap.h b/bswap.h
index 82a7951..f41bebe 100644
--- a/bswap.h
+++ b/bswap.h
@@ -11,6 +11,8 @@
 #include 
 #else
 
+#include "softfloat.h"
+
 #ifdef CONFIG_BYTESWAP_H
 #include 
 #else
@@ -237,4 +239,476 @@ static inline uint32_t qemu_bswap_len(uint32_t value, int 
len)
 return bswap32(value) >> (32 - 8 * len);
 }
 
+typedef union {
+float32 f;
+uint32_t l;
+} CPU_FloatU;
+
+typedef union {
+float64 d;
+#if defined(HOST_WORDS_BIGENDIAN)
+struct {
+uint32_t upper;
+uint32_t lower;
+} l;
+#else
+struct {
+uint32_t lower;
+uint32_t upper;
+} l;
+#endif
+uint64_t ll;
+} CPU_DoubleU;
+
+typedef union {
+ floatx80 d;
+ struct {
+ uint64_t lower;
+ uint16_t upper;
+ } l;
+} CPU_LDoubleU;
+
+typedef union {
+float128 q;
+#if defined(HOST_WORDS_BIGENDIAN)
+struct {
+uint32_t upmost;
+uint32_t upper;
+uint32_t lower;
+uint32_t lowest;
+} l;
+struct {
+uint64_t upper;
+uint64_t lower;
+} ll;
+#else
+struct {
+uint32_t lowest;
+uint32_t lower;
+uint32_t upper;
+uint32_t upmost;
+} l;
+struct {
+uint64_t lower;
+uint64_t upper;
+} ll;
+#endif
+} CPU_QuadU;
+
+/* unaligned/endian-independent pointer access */
+
+/*
+ * the generic syntax is:
+ *
+ * load: ld{type}{sign}{size}{endian}_p(ptr)
+ *
+ * store: st{type}{size}{endian}_p(ptr, val)
+ *
+ * Note there are small differences with the softmmu access API!
+ *
+ * type is:
+ * (empty): integer access
+ *   f: float access
+ *
+ * sign is:
+ * (empty): for floats or 32 bit size
+ *   u: unsigned
+ *   s: signed
+ *
+ * size is:
+ *   b: 8 bits
+ *   w: 16 bits
+ *   l: 32 bits
+ *   q: 64 bits
+ *
+ * endian is:
+ * (empty): 8 bit access
+ *   be   : big endian
+ *   le   : little endian
+ */
+static inline int ldub_p(const void *ptr)
+{
+return *(uint8_t *)ptr;
+}
+
+static inline int ldsb_p(const void *ptr)
+{
+return *(int8_t *)ptr;
+}
+
+static inline void stb_p(void *ptr, int v)
+{
+*(uint8_t *)ptr = v;
+}
+
+/* NOTE: on arm, putting 2 in /proc/sys/debug/alignment so that the
+   kernel handles unaligned load/stores may give better results, but
+   it is a system wide setting : bad */
+#if defined(HOST_WORDS_BIGENDIAN) || defined(WORDS_ALIGNED)
+
+/* conservative code for little endian unaligned accesses */
+static inline int lduw_le_p(const void *ptr)
+{
+#ifdef _ARCH_PPC
+int val;
+__asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
+return val;
+#else
+const uint8_t *p = ptr;
+return p[0] | (p[1] << 8);
+#endif
+}
+
+static inline int ldsw_le_p(const void *ptr)
+{
+#ifdef _ARCH_PPC
+int val;
+__asm__ __volatile__ ("lhbrx %0,0,%1" : "=r" (val) : "r" (ptr));
+return (int16_t)val;
+#else
+const uint8_t *p = ptr;
+return (int16_t)(p[0] | (p[1] << 8));
+#endif
+}
+
+static inline int ldl_le_p(const void *ptr)
+{
+#ifdef _ARCH_PPC
+int val;
+__asm__ __volatile__ ("lwbrx %0,0,%1" : "=r" (val) : "r" (ptr));
+return val;
+#else
+const uint8_t *p = ptr;
+return p[0] | (p[1] << 8) | (p[2] << 16) | (p[3] << 24);
+#endif
+}
+
+static inline uint64_t ldq_le_p(const void *ptr)
+{
+const uint8_t *p = ptr;
+uint32_t v1, v2;
+v1 = ldl_le_p(p);
+v2 = ldl_le_p(p + 4);
+return v1 | ((uint64_t)v2 << 32);
+}
+
+static inline void stw_le_p(void *ptr, int v)
+{
+#ifdef _ARCH_PPC
+__asm__ __volatile__ ("sthbrx %1,0,%2" : "=m" (*(uint16_t *)ptr) : "r" 
(v), "r" (ptr));
+#else
+uint8_t *p = ptr;
+p[0] = v;
+p[1] = v >> 8;
+#endif
+}
+
+static inline void stl_le_p(void *ptr, int v)
+{
+#ifdef _ARCH_PPC
+__asm__ __volatile__ ("stwbrx %1,0,%2" : "=m" (*(uint32_t *)ptr) : "r" 
(v), "r" (ptr));
+#else
+uint8_t *p = ptr;
+p[0] = v;
+p[1] = v >> 8;
+p[2] = v >> 16;
+p[3] = v >> 24;
+#endif
+}
+
+static inline void stq_le_p(void *ptr, uint64_t v)
+{
+uint8_t *p = ptr;
+stl_le_p(p, (uint32_t)v);
+stl_le_p(p + 4, v >> 32);
+}
+
+/* float access */
+
+static inline float32 ldfl_le_p(const void *ptr)
+{
+union {
+float32 f;
+uint32_t i;
+} u;
+u.i = ldl_le_p(ptr);
+retur

Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Kevin Wolf
Am 27.07.2011 16:51, schrieb Stefan Hajnoczi:
> 2011/7/27 Michael Tokarev :
>> 27.07.2011 15:30, Supriya Kannery wrote:
>>> New command "block_set" added for dynamically changing any of the block
>>> device parameters. For now, dynamic setting of hostcache params using this
>>> command is implemented. Other block device parameter changes, can be
>>> integrated in similar lines.
>>>
>>> Signed-off-by: Supriya Kannery 
>>>
>>> ---
>>>  block.c |   54 +
>>>  block.h |2 +
>>>  blockdev.c  |   61 
>>> 
>>>  blockdev.h  |1
>>>  hmp-commands.hx |   14 
>>>  qemu-config.c   |   13 +++
>>>  qemu-option.c   |   25 ++
>>>  qemu-option.h   |2 +
>>>  qmp-commands.hx |   28 +
>>>  9 files changed, 200 insertions(+)
>>>
>>> Index: qemu/block.c
>>> ===
>>> --- qemu.orig/block.c
>>> +++ qemu/block.c
>>> @@ -651,6 +651,34 @@ unlink_and_fail:
>>>  return ret;
>>>  }
>>>
>>> +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
>>> +{
>>> +BlockDriver *drv = bs->drv;
>>> +int ret = 0, open_flags;
>>> +
>>> +/* Quiesce IO for the given block device */
>>> +qemu_aio_flush();
>>> +if (bdrv_flush(bs)) {
>>> +qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name);
>>> +return ret;
>>> +}
>>> +open_flags = bs->open_flags;
>>> +bdrv_close(bs);
>>> +
>>> +ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
>>> +if (ret < 0) {
>>> +/* Reopen failed. Try to open with original flags */
>>> +qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
>>> +ret = bdrv_open(bs, bs->filename, open_flags, drv);
>>> +if (ret < 0) {
>>> +/* Reopen failed with orig and modified flags */
>>> +abort();
>>> +}
>>
>> Can we please avoid this stuff completely?  Just keep the
>> old device open still, until you're sure new one is ok.
>>
>> Or else it will be quite dangerous command in many cases.
>> For example, after -runas/-chroot, or additional selinux
>> settings or whatnot.  And in this case, instead of merely
>> returning an error, we'll see abort().  Boom.
> 
> Slight complication for image formats that use a dirty bit here.  QED
> has a dirty bit.  VMDK also specifies one but we don't implement it
> today.
> 
> If the image file is dirty then all its metadata will be scanned for
> consistency when it is opened.  This increases the bdrv_open() time
> and hence the downtime of the VM.  So it is not ideal to open the
> image file twice, even though there is no consistency problem.

In general I really like understatements, but opening an image file
twice isn't only "not ideal", but should be considered verboten.

We're still doing it during migration and it means that in upstream qemu
any non-raw images will be corrupted.

> I'll think about this some more, there are a couple of solutions like
> keeping only the file descriptor around, introducing a flush command
> that makes sure the file is in a clean state, or changing QED to not
> do this.

Changing the format drivers doesn't really look like the right solution.

Keeping the fd around looks okay, we can probably achieve this by
introducing a bdrv_reopen function. It means that we may need to reopen
the format layer, but it can't really fail.

Kevin



Re: [Qemu-devel] [V5 Patch 1/4]Qemu: Enhance "info block" to display host cache setting

2011-07-28 Thread Supriya Kannery

On 07/27/2011 07:49 PM, Stefan Hajnoczi wrote:

On Wed, Jul 27, 2011 at 12:30 PM, Supriya Kannery
  wrote:

Enhance "info block" to display hostcache setting for each
block device.

Example:
(qemu) info block
ide0-hd0: type=hd removable=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2
encrypted=0

Enhanced to display "hostcache" setting:
(qemu) info block
ide0-hd0: type=hd removable=0 hostcache=true file=../rhel6-32.qcow2
ro=0 drv=qcow2 encrypted=0


This description is outdated, should be hostcache=1 instead of hostcache=true.


@@ -1749,13 +1757,18 @@ void bdrv_info(Monitor *mon, QObject **r
 QObject *bs_obj;

 bs_obj = qobject_from_jsonf("{ 'device': %s, 'type': 'unknown', "
-"'removable': %i, 'locked': %i }",
-bs->device_name, bs->removable,
-bs->locked);
+ "'removable': %i, 'locked': %i, "
+ "'hostcache': %s }",
+ bs->device_name, bs->removable,
+ bs->locked,
+ (bs->open_flags&  BDRV_O_NOCACHE) ?
+ "false" : "true");


Please use the same bool-from-int format specifier ('%i') as the other
fields for consistency.  The value can be !(bs->open_flags&
BDRV_O_NOCACHE).



ok


+
+QDict *bs_dict = qobject_to_qdict(bs_obj);
+qdict_put(bs_dict, "open_flags", qint_from_int(bs->open_flags));

 if (bs->drv) {
 QObject *obj;
-QDict *bs_dict = qobject_to_qdict(bs_obj);

 obj = qobject_from_jsonf("{ 'file': %s, 'ro': %i, 'drv': %s, "
  "'encrypted': %i }",


My copy of the code has the following a few lines further down:
qdict_put_obj(bs_dict, "inserted", obj);

Removing bs_dict would break the build.  Am I missing something?



qdict_put_obj() for "inserted" is still there further down in the code.
Here,
  QDict *bs_dict = qobject_to_qdict(bs_obj);
is moved up to enable placing of "open_flags" into the dict. We need 
"open_flags" in dict to calculate hostcache value for printing in

monitor. Code for placing "inserted" is not touched.


Index: qemu/qmp-commands.hx
===
--- qemu.orig/qmp-commands.hx
+++ qemu/qmp-commands.hx
@@ -1131,6 +1131,7 @@ Each json-object contain the following:
  - Possible values: "unknown"
  - "removable": true if the device is removable, false otherwise (json-bool)
  - "locked": true if the device is locked, false otherwise (json-bool)
+- "hostcache": true if hostcache enabled, false otherwise (json-bool)


Let's explain what "hostcache" means:

- "hostcache": true if host page cache is enabled, false otherwise (json-bool)



ok

-Thanks, Supriya


Stefan





Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread tsnsaito
Hi,

At Mon, 25 Jul 2011 19:22:38 +0200,
Artyom Tarasenko wrote:

> clear interrupt request if the interrupt priority < CPU pil
> clear hardware interrupt request if interrupts are disabled

Not directly related to the fix, but I'd like to note a problem
of hw/sun4u.c interrupt code:

The interrupt code probably mixes hardware interrupts and
software interrupts.
%pil is for software interrupts (interrupt_level_n traps).
%pil can not mask hardware interrupts (interrupt_vector traps);
the CPU raises interrupt_vector traps even on %pil=15.
But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
seem to be masked by %pil.

> Signed-off-by: Artyom Tarasenko 
> ---
>  hw/sun4u.c |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/sun4u.c b/hw/sun4u.c
> index d7dcaf0..7f95aeb 100644
> --- a/hw/sun4u.c
> +++ b/hw/sun4u.c
> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
>  pil |= 1 << 14;
>  }
>  
> -if (!pil) {
> +if (pil < (2 << env->psrpil)){
>  if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>  CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
> env->interrupt_index);
> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
>  break;
>  }
>  }
> -} else {
> +} else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>  CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
> softint=%08x "
> "current interrupt %x\n",
> pil, env->pil_in, env->softint, env->interrupt_index);
> +env->interrupt_index = 0;
> +cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
>  }
>  }
>  
> -- 
> 1.7.3.4



Tsuneo Saito 



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Wed, Jul 27, 2011 at 1:37 PM, Stefan Hajnoczi  wrote:
> =Changes where I am not aware of firm plans=

qcow2 online resize
 * Handle snapshots
 * Support shrinking

qed online resize
 * Support shrinking



[Qemu-devel] [PATCH v3 1/2] The command line support for QEMU disk I/O limits

2011-07-28 Thread Zhi Yong Wu
Signed-off-by: Zhi Yong Wu 
---
 Makefile.objs   |2 +-
 blockdev.c  |   22 ++
 qemu-config.c   |   24 
 qemu-option.c   |   17 +
 qemu-option.h   |1 +
 qemu-options.hx |1 +
 6 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 9f99ed4..06f2033 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,7 +23,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
blk-queue.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/blockdev.c b/blockdev.c
index c263663..aff6bb2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -238,6 +238,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 int on_read_error, on_write_error;
 const char *devaddr;
 DriveInfo *dinfo;
+BlockIOLimit io_limits;
+bool iol_flag = false;
+const char *iol_opts[7] = {"bps", "bps_rd", "bps_wr",
+"iops", "iops_rd", "iops_wr"};
 int is_extboot = 0;
 int snapshot = 0;
 int ret;
@@ -372,6 +376,19 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 return NULL;
 }
 
+/* disk io limits */
+iol_flag = qemu_opt_io_limits_enable_flag(opts, iol_opts);
+if (iol_flag) {
+memset(&io_limits, 0, sizeof(BlockIOLimit));
+
+io_limits.bps[2]  = qemu_opt_get_number(opts, "bps", 0);
+io_limits.bps[0]  = qemu_opt_get_number(opts, "bps_rd", 0);
+io_limits.bps[1]  = qemu_opt_get_number(opts, "bps_wr", 0);
+io_limits.iops[2] = qemu_opt_get_number(opts, "iops", 0);
+io_limits.iops[0] = qemu_opt_get_number(opts, "iops_rd", 0);
+io_limits.iops[1] = qemu_opt_get_number(opts, "iops_wr", 0);
+}
+
 on_write_error = BLOCK_ERR_STOP_ENOSPC;
 if ((buf = qemu_opt_get(opts, "werror")) != NULL) {
 if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != 
IF_NONE) {
@@ -483,6 +500,11 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
 bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error);
 
+/* throttling disk io limits */
+if (iol_flag) {
+bdrv_set_io_limits(dinfo->bdrv, &io_limits);
+}
+
 switch(type) {
 case IF_IDE:
 case IF_SCSI:
diff --git a/qemu-config.c b/qemu-config.c
index efa892c..9232bbb 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -82,6 +82,30 @@ static QemuOptsList qemu_drive_opts = {
 .name = "boot",
 .type = QEMU_OPT_BOOL,
 .help = "make this a boot drive",
+},{
+.name = "iops",
+.type = QEMU_OPT_NUMBER,
+.help = "limit total I/O operations per second",
+},{
+.name = "iops_rd",
+.type = QEMU_OPT_NUMBER,
+.help = "limit read operations per second",
+},{
+.name = "iops_wr",
+.type = QEMU_OPT_NUMBER,
+.help = "limit write operations per second",
+},{
+.name = "bps",
+.type = QEMU_OPT_NUMBER,
+.help = "limit total bytes per second",
+},{
+.name = "bps_rd",
+.type = QEMU_OPT_NUMBER,
+.help = "limit read bytes per second",
+},{
+.name = "bps_wr",
+.type = QEMU_OPT_NUMBER,
+.help = "limit write bytes per second",
 },
 { /* end of list */ }
 },
diff --git a/qemu-option.c b/qemu-option.c
index 65db542..c5aa96a 100644
--- a/qemu-option.c
+++ b/qemu-option.c
@@ -562,6 +562,23 @@ uint64_t qemu_opt_get_number(QemuOpts *opts, const char 
*name, uint64_t defval)
 return opt->value.uint;
 }
 
+bool qemu_opt_io_limits_enable_flag(QemuOpts *opts, const char **iol_opts)
+{
+int i;
+uint64_t opt_val = 0;
+bool iol_flag= false;
+
+for (i = 0; iol_opts[i]; i++) {
+opt_val = qemu_opt_get_number(opts, iol_opts[i], 0);
+if (opt_val != 0) {
+iol_flag = true;
+break;
+}
+}
+
+return iol_flag;
+}
+
 uint64_t qemu_opt_get_size(QemuOpts *opts, const char *name, uint64_t defval)
 {
 QemuOpt *opt = qemu_opt_find(opts, name);
diff --git a/qemu-option.h b/qemu-option.h
index b515813..fc909f9 100644
--- a/qemu-option.h
+++ b/qemu-option.h
@@ -107,6 +107,7 @@ struct QemuOptsList {
 const char *qemu_opt_get(QemuOpts *opts, const char *name);
 int qemu_opt_get_bool(QemuOpts *opts, const char *name, int defval);
 uint64_t qemu_opt_get_number(QemuOpts *opts, const char *name, uint

[Qemu-devel] [PATCH v3 2/2] The implement of block queue, block timer, and I/O throttling algorithm

2011-07-28 Thread Zhi Yong Wu
Note: 1.) When bps/iops limits are specified to a small value such as 511 
bytes/s, this VM will hang up. We are considering how to handle this senario.
  2.) When "dd" command is issued in guest, if its option bs is set to a 
large value such as "bs=1024K", the result speed will slightly bigger than the 
limits.

For these problems, if you have nice thought, pls let us know.:)

Signed-off-by: Zhi Yong Wu 
---
 block.c |  302 +--
 block.h |1 -
 block_int.h |   29 ++
 3 files changed, 323 insertions(+), 9 deletions(-)

diff --git a/block.c b/block.c
index 24a25d5..c4c631b 100644
--- a/block.c
+++ b/block.c
@@ -29,6 +29,9 @@
 #include "module.h"
 #include "qemu-objects.h"
 
+#include "qemu-timer.h"
+#include "block/blk-queue.h"
+
 #ifdef CONFIG_BSD
 #include 
 #include 
@@ -58,6 +61,13 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+bool is_write, double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write,
+double elapsed_time, uint64_t *wait);
+static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors,
+bool is_write, uint64_t *wait);
+
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
@@ -90,6 +100,20 @@ int is_windows_drive(const char *filename)
 }
 #endif
 
+static int bdrv_io_limits_enable(BlockIOLimit *io_limits)
+{
+if ((io_limits->bps[0] == 0)
+ && (io_limits->bps[1] == 0)
+ && (io_limits->bps[2] == 0)
+ && (io_limits->iops[0] == 0)
+ && (io_limits->iops[1] == 0)
+ && (io_limits->iops[2] == 0)) {
+return 0;
+}
+
+return 1;
+}
+
 /* check if the path starts with ":" */
 static int path_has_protocol(const char *path)
 {
@@ -167,6 +191,28 @@ void path_combine(char *dest, int dest_size,
 }
 }
 
+static void bdrv_block_timer(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BlockQueue *queue = bs->block_queue;
+
+while (!QTAILQ_EMPTY(&queue->requests)) {
+BlockIORequest *request;
+int ret;
+
+request = QTAILQ_FIRST(&queue->requests);
+QTAILQ_REMOVE(&queue->requests, request, entry);
+
+ret = qemu_block_queue_handler(request);
+if (ret == 0) {
+QTAILQ_INSERT_HEAD(&queue->requests, request, entry);
+break;
+}
+
+qemu_free(request);
+}
+}
+
 void bdrv_register(BlockDriver *bdrv)
 {
 if (!bdrv->bdrv_aio_readv) {
@@ -642,6 +688,19 @@ int bdrv_open(BlockDriverState *bs, const char *filename, 
int flags,
 bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
 }
 
+/* throttling disk I/O limits */
+if (bdrv_io_limits_enable(&bs->io_limits)) {
+bs->req_from_queue = false;
+bs->block_queue= qemu_new_block_queue();
+bs->block_timer= qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs);
+
+bs->slice_start[0] = qemu_get_clock_ns(vm_clock);
+bs->slice_start[1] = qemu_get_clock_ns(vm_clock);
+
+bs->slice_end[0]   = qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
+bs->slice_end[1]   = qemu_get_clock_ns(vm_clock) + BLOCK_IO_SLICE_TIME;
+}
+
 return 0;
 
 unlink_and_fail:
@@ -680,6 +739,16 @@ void bdrv_close(BlockDriverState *bs)
 if (bs->change_cb)
 bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
 }
+
+/* throttling disk I/O limits */
+if (bs->block_queue) {
+qemu_del_block_queue(bs->block_queue);
+}
+
+if (bs->block_timer) {
+qemu_del_timer(bs->block_timer);
+qemu_free_timer(bs->block_timer);
+}
 }
 
 void bdrv_close_all(void)
@@ -1312,6 +1381,14 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
 *psecs = bs->secs;
 }
 
+/* throttling disk io limits */
+void bdrv_set_io_limits(BlockDriverState *bs,
+BlockIOLimit *io_limits)
+{
+memset(&bs->io_limits, 0, sizeof(BlockIOLimit));
+bs->io_limits = *io_limits;
+}
+
 /* Recognize floppy formats */
 typedef struct FDFormat {
 FDriveType drive;
@@ -2111,6 +2188,165 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn)
 return buf;
 }
 
+static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors,
+ bool is_write, double elapsed_time, uint64_t *wait) {
+uint64_t bps_limit = 0;
+double   bytes_limit, bytes_disp, bytes_res;
+double   slice_time, wait_time;
+
+if (bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL]) {
+bps_limit = bs->io_limits.bps[BLOCK_IO_LIMIT_TOTAL];
+} else if (bs->io_limits.bps[is_write]) {
+bps_limit = bs->io_limits.bps[is_write];
+} else {
+if (wait) {
+

[Qemu-devel] [PATCH v3 0/2] The intro for QEMU disk I/O limits

2011-07-28 Thread Zhi Yong Wu
The main goal of the patch is to effectively cap the disk I/O speed or counts 
of one single VM.It is only one draft, so it unavoidably has some drawbacks, if 
you catch them, please let me know.

The patch will mainly introduce one block I/O throttling algorithm, one timer 
and one block queue for each I/O limits enabled drive.

When a block request is coming in, the throttling algorithm will check if its 
I/O rate or counts exceed the limits; if yes, then it will enqueue to the block 
queue; The timer will periodically handle the I/O requests in it.

Some available features follow as below:
(1) global bps limit.
-drive bps=xxxin bytes/s
(2) only read bps limit
-drive bps_rd=xxx in bytes/s
(3) only write bps limit
-drive bps_wr=xxx in bytes/s
(4) global iops limit
-drive iops=xxx   in ios/s
(5) only read iops limit
-drive iops_rd=xxxin ios/s
(6) only write iops limit
-drive iops_wr=xxxin ios/s
(7) the combination of some limits.
-drive bps=xxx,iops=xxx

Known Limitations:
(1) #1 can not coexist with #2, #3
(2) #4 can not coexist with #5, #6
(3) When bps/iops limits are specified to a small value such as 511 bytes/s, 
this VM will hang up. We are considering how to handle this senario.

Zhi Yong Wu (2):
  v3: Added the code for extending slice time, and modified the method to 
compute wait time for the timer.
  The command line support for QEMU disk I/O limits
  The implement of block queue, block timer, and I/O throttling
algorithm
  v2: The codes V2 for QEMU disk I/O limits.
  Modified the codes mainly based on stefan's comments.

  v1: Submit the codes for QEMU disk I/O limits.
  Only a code draft.

 Makefile.objs   |2 +-
 block.c |  302 +--
 block.h |1 -
 block_int.h |   29 ++
 blockdev.c  |   22 
 qemu-config.c   |   24 +
 qemu-option.c   |   17 +++
 qemu-option.h   |1 +
 qemu-options.hx |1 +
 9 files changed, 389 insertions(+), 10 deletions(-)

-- 
1.7.2.3




Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug

2011-07-28 Thread Michael S. Tsirkin
On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote:
> I hate the virtio pointer thicket.

The problem is that each device is both a virtio pci
device and a virtio net device.

-- 
MST



Re: [Qemu-devel] [PATCH] add QEMU_LD_PREFIX environment variable

2011-07-28 Thread Johannes Schauer
Hi,

On Thu, Jul 28, 2011 at 11:41:09AM +0300, Riku Voipio wrote:
> On Sat, Jul 23, 2011 at 07:47:49AM +0200, josch wrote:
> > This could be avoided by setting the proposed environment variable 
> > QEMU_LD_PREFIX to the just
> > created debian rootfs. As mentioned earlier, the usage of the -L option
> > is not possible in this scenario because qemu-user is only implicitly
> > called by the binfmt mechanism.
> 
> What worries me here is that we are beginning to add a enviroment
> variable for each and every command line option of qemu linux-user.

Are there other environment variables? I didnt see any? (well besides
QEMU_STRACE that is)

> I think it would be better to have a wrapper binary to be registered
> as the binfmt runner.

If you need help with writing something - dont hesitate to ask for help.
I'm very interested in having this functionality working because of the
reasons I gave in my initial mail. [1]

> Alternatively we should have a generic setup for mapping enviroment
> variables to command line options. Now we get special per-option code
> every time someone needs to setup a command line option from binfmt.

What other options are there that would be interesting for binfmt?

I was also sending this patch to qemu-devel list a month ago but got no
reply. [2]

As I said - if you would like a wrapper or a generic setup for mapping
env variables to commandline parameters and dont have time to do it
yourself, dont hesitate to delegate some work to me :)


@Geert Stappers:

you are patching bsd-user/main.c and darwin-user/main.c as well. I take
it that you did test your changes on those platforms? does it work there
as well? I have no clue of darwin but is it really useful there?

cheers, josch

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=632192#5
[2] http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg00459.html



Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread Artyom Tarasenko
Hi,
On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
> Hi,
>
> At Mon, 25 Jul 2011 19:22:38 +0200,
> Artyom Tarasenko wrote:
>
>> clear interrupt request if the interrupt priority < CPU pil
>> clear hardware interrupt request if interrupts are disabled
>
> Not directly related to the fix, but I'd like to note a problem
> of hw/sun4u.c interrupt code:
>
> The interrupt code probably mixes hardware interrupts and
> software interrupts.
> %pil is for software interrupts (interrupt_level_n traps).
> %pil can not mask hardware interrupts (interrupt_vector traps);
> the CPU raises interrupt_vector traps even on %pil=15.
> But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
> seem to be masked by %pil.

The interrupt_vector traps are currently not implemented, are they?
So it's hard to tell whether they are masked.

>> Signed-off-by: Artyom Tarasenko 
>> ---
>>  hw/sun4u.c |    6 --
>>  1 files changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/sun4u.c b/hw/sun4u.c
>> index d7dcaf0..7f95aeb 100644
>> --- a/hw/sun4u.c
>> +++ b/hw/sun4u.c
>> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
>>          pil |= 1 << 14;
>>      }
>>
>> -    if (!pil) {
>> +    if (pil < (2 << env->psrpil)){
>>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
>>                             env->interrupt_index);
>> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
>>                  break;
>>              }
>>          }
>> -    } else {
>> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
>> softint=%08x "
>>                         "current interrupt %x\n",
>>                         pil, env->pil_in, env->softint, 
>> env->interrupt_index);
>> +        env->interrupt_index = 0;
>> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
>>      }
>>  }
>>
>> --
>> 1.7.3.4
>
>
> 
> Tsuneo Saito 
>



-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



[Qemu-devel] [PATCH] libcacard: use INSTALL_DATA for data

2011-07-28 Thread Alon Levy
Signed-off-by: Alon Levy 
---
 libcacard/Makefile |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/libcacard/Makefile b/libcacard/Makefile
index 15205b5..33c1302 100644
--- a/libcacard/Makefile
+++ b/libcacard/Makefile
@@ -56,9 +56,8 @@ install-libcacard: libcacard.pc libcacard.la vscclient
$(INSTALL_DIR) "$(DESTDIR)$(bindir)"
libtool --mode=install $(INSTALL_PROG) vscclient "$(DESTDIR)$(bindir)"
libtool --mode=install $(INSTALL_PROG) libcacard.la 
"$(DESTDIR)$(libdir)"
-   libtool --mode=install $(INSTALL_PROG) libcacard.pc 
"$(DESTDIR)$(libdir)/pkgconfig"
+   libtool --mode=install $(INSTALL_DATA) libcacard.pc 
"$(DESTDIR)$(libdir)/pkgconfig"
for inc in *.h; do \
-   libtool --mode=install $(INSTALL_PROG) 
$(libcacard_srcpath)/$$inc "$(DESTDIR)$(libcacard_includedir)"; \
+   libtool --mode=install $(INSTALL_DATA) 
$(libcacard_srcpath)/$$inc "$(DESTDIR)$(libcacard_includedir)"; \
done
-
 endif
-- 
1.7.6




Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread tsnsaito
At Thu, 28 Jul 2011 13:51:08 +0200,
Artyom Tarasenko wrote:
> On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
> > Hi,
> >
> > At Mon, 25 Jul 2011 19:22:38 +0200,
> > Artyom Tarasenko wrote:
> >
> >> clear interrupt request if the interrupt priority < CPU pil
> >> clear hardware interrupt request if interrupts are disabled
> >
> > Not directly related to the fix, but I'd like to note a problem
> > of hw/sun4u.c interrupt code:
> >
> > The interrupt code probably mixes hardware interrupts and
> > software interrupts.
> > %pil is for software interrupts (interrupt_level_n traps).
> > %pil can not mask hardware interrupts (interrupt_vector traps);
> > the CPU raises interrupt_vector traps even on %pil=15.
> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
> > seem to be masked by %pil.
> 
> The interrupt_vector traps are currently not implemented, are they?
> So it's hard to tell whether they are masked.

Yes, interrupt_vector is not implemented yet.
I failed to explain the problem.
The problem is that cpu_set_irqs() should raise interrupt_vector
traps but it raises interrupt_level_n traps actually.

sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
the 1st argument.  The allocated irqs (the irq variable) are
passed to pci_apb_init().  APB should generate interrupt_vector
traps (hardware interrupts), not the interrupt_vector_n traps.
The interrupts from APB would be reported by cpu_set_irq(),
but cpu_set_irq() seems to generate interrupt_vector_n traps.

> >> Signed-off-by: Artyom Tarasenko 
> >> ---
> >>  hw/sun4u.c |    6 --
> >>  1 files changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/sun4u.c b/hw/sun4u.c
> >> index d7dcaf0..7f95aeb 100644
> >> --- a/hw/sun4u.c
> >> +++ b/hw/sun4u.c
> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
> >>          pil |= 1 << 14;
> >>      }
> >>
> >> -    if (!pil) {
> >> +    if (pil < (2 << env->psrpil)){
> >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
> >>                             env->interrupt_index);
> >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
> >>                  break;
> >>              }
> >>          }
> >> -    } else {
> >> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> >>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
> >> softint=%08x "
> >>                         "current interrupt %x\n",
> >>                         pil, env->pil_in, env->softint, 
> >> env->interrupt_index);
> >> +        env->interrupt_index = 0;
> >> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
> >>      }
> >>  }
> >>
> >> --
> >> 1.7.3.4
> >
> >
> > 
> > Tsuneo Saito 
> >
> 
> 
> 
> -- 
> Regards,
> Artyom Tarasenko
> 
> solaris/sparc under qemu blog: http://tyom.blogspot.com/

 
Tsuneo Saito 



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote:
> Another item that just came up on IRC again: Allow guests to toggle WCE,
> so that we finally can get rid of the cache=writethrough default. We
> should definitely get this done before 1.0.

Guest toggling is nice and cool, but the real feature is to allow it on
the command line.  As mentioned about 10 times before guests generally don't
fiddle with it without the administrator triggering it manually.  So
specifying it in the qemu config is the much better interface for 95% of
the use cases.




Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
> Coroutines in the block layer [Kevin]
>  * Programming model to simplify block drivers without blocking QEMU threads

Can anyone explain what the whole point of this is?  It really just is
a bit of syntactic sugar for the current async state machines.  What does
it buy us over going for real threading?




Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread Tsuneo Saito
2011/7/28 :
>
> At Thu, 28 Jul 2011 13:51:08 +0200,
> Artyom Tarasenko wrote:
> > On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
> > > Hi,
> > >
> > > At Mon, 25 Jul 2011 19:22:38 +0200,
> > > Artyom Tarasenko wrote:
> > >
> > >> clear interrupt request if the interrupt priority < CPU pil
> > >> clear hardware interrupt request if interrupts are disabled
> > >
> > > Not directly related to the fix, but I'd like to note a problem
> > > of hw/sun4u.c interrupt code:
> > >
> > > The interrupt code probably mixes hardware interrupts and
> > > software interrupts.
> > > %pil is for software interrupts (interrupt_level_n traps).
> > > %pil can not mask hardware interrupts (interrupt_vector traps);
> > > the CPU raises interrupt_vector traps even on %pil=15.
> > > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
> > > seem to be masked by %pil.
> >
> > The interrupt_vector traps are currently not implemented, are they?
> > So it's hard to tell whether they are masked.
>
> Yes, interrupt_vector is not implemented yet.

Ah, I intended to write:
  No, interrupt_vector is not implemented yet.
I'm very sorry for my engrish :-(

> I failed to explain the problem.
> The problem is that cpu_set_irqs() should raise interrupt_vector
> traps but it raises interrupt_level_n traps actually.
>
> sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
> the 1st argument.  The allocated irqs (the irq variable) are
> passed to pci_apb_init().  APB should generate interrupt_vector
> traps (hardware interrupts), not the interrupt_vector_n traps.
> The interrupts from APB would be reported by cpu_set_irq(),
> but cpu_set_irq() seems to generate interrupt_vector_n traps.
>
> > >> Signed-off-by: Artyom Tarasenko 
> > >> ---
> > >>  hw/sun4u.c |    6 --
> > >>  1 files changed, 4 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/hw/sun4u.c b/hw/sun4u.c
> > >> index d7dcaf0..7f95aeb 100644
> > >> --- a/hw/sun4u.c
> > >> +++ b/hw/sun4u.c
> > >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
> > >>          pil |= 1 << 14;
> > >>      }
> > >>
> > >> -    if (!pil) {
> > >> +    if (pil < (2 << env->psrpil)){
> > >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> > >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
> > >>                             env->interrupt_index);
> > >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
> > >>                  break;
> > >>              }
> > >>          }
> > >> -    } else {
> > >> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> > >>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
> > >> softint=%08x "
> > >>                         "current interrupt %x\n",
> > >>                         pil, env->pil_in, env->softint, 
> > >> env->interrupt_index);
> > >> +        env->interrupt_index = 0;
> > >> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
> > >>      }
> > >>  }
> > >>
> > >> --
> > >> 1.7.3.4
> > >
> > >
> > > 
> > > Tsuneo Saito 
> > >
> >
> >
> >
> > --
> > Regards,
> > Artyom Tarasenko
> >
> > solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
> 
> Tsuneo Saito 



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Frediano Ziglio
2011/7/28 Christoph Hellwig :
> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
>> Coroutines in the block layer [Kevin]
>>  * Programming model to simplify block drivers without blocking QEMU threads
>
> Can anyone explain what the whole point of this is?  It really just is
> a bit of syntactic sugar for the current async state machines.  What does
> it buy us over going for real threading?
>

This has nothing (or few) to do with threads. Instead of splitting
functions in callbacks at every synchronous function it allow to write
more readable code. For instance after changing qcow code from current
to coroutine you remove about 400 lines (about 30%). This will help
maintaining code and develop more complicated optimizations.

About threading you can do threading using AIO and using coroutines.

Frediano



Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 09:47:05AM +0200, Kevin Wolf wrote:
> > Indeed.  This has come up a few times, and actually is a mostly trivial
> > task.  Maybe we should give up waiting for -blockdev and separate cache
> > mode settings and allow a nocache-writethrough or similar mode now?  It's
> > going to be around 10 lines of code + documentation.
> 
> I understand that there may be reasons for using O_DIRECT | O_DSYNC, but
> what is the explanation for O_DSYNC improving performance?

There isn't any, at least for modern Linux.  O_DSYNC at this point is
equivalent to a range fdatasync for each write call, and given that we're
doing O_DIRECT the ranges flush doesn't matter.  If you do have a modern
host and an old guest it might end up beeing faster because the barrier
implementation in Linux used to suck so badly, but that's not inhrent
to the I/O model.  If you guest however doesn't support cache flushes
at all O_DIRECT | O_DSYNC is the only sane model to use for local filesystems
and block devices.
 

> Christoph, on another note: Can we rely on Linux AIO never returning
> short writes except on EOF? Currently we return -EINVAL in this case, so
> I hope it's true or we wouldn't return the correct error code.

More or less.  There's one corner case for all Linux I/O, and that is
only writes up to INT_MAX are supported, and larger writes (and reads)
get truncated to it.  It's pretty nasty, but Linux has been vocally
opposed to fixing this issue.




Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Christoph Hellwig
On Thu, Jul 28, 2011 at 02:15:38PM +0200, Frediano Ziglio wrote:
> 
> This has nothing (or few) to do with threads. Instead of splitting
> functions in callbacks at every synchronous function it allow to write
> more readable code.

Thanks for repeating my statement that it doesn't fix the real thing
but just adds syntactic sugar.  




Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug

2011-07-28 Thread Markus Armbruster
"Michael S. Tsirkin"  writes:

> On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote:
>> I hate the virtio pointer thicket.
>
> The problem is that each device is both a virtio pci
> device and a virtio net device.

In my possibly naive opinion, virtio-FOO-pci is a PCI device, and has a
common virtio part.



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:08, schrieb Christoph Hellwig:
> On Thu, Jul 28, 2011 at 12:05:43PM +0200, Kevin Wolf wrote:
>> Another item that just came up on IRC again: Allow guests to toggle WCE,
>> so that we finally can get rid of the cache=writethrough default. We
>> should definitely get this done before 1.0.
> 
> Guest toggling is nice and cool, but the real feature is to allow it on
> the command line.  As mentioned about 10 times before guests generally don't
> fiddle with it without the administrator triggering it manually.  So
> specifying it in the qemu config is the much better interface for 95% of
> the use cases.

I don't really care about guest toggling. I care about the default cache
mode, and Anthony said that he doesn't agree with changing it unless
guests can toggle it.

Kevin



Re: [Qemu-devel] [PATCH v2 5/5] virtio-balloon: Unregister savevm section on device unplug

2011-07-28 Thread Michael S. Tsirkin
On Thu, Jul 28, 2011 at 02:21:32PM +0200, Markus Armbruster wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Thu, Jul 28, 2011 at 09:45:44AM +0200, Markus Armbruster wrote:
> >> I hate the virtio pointer thicket.
> >
> > The problem is that each device is both a virtio pci
> > device and a virtio net device.
> 
> In my possibly naive opinion, virtio-FOO-pci is a PCI device, and has a
> common virtio part.

Yes, but it also has a comon virtio-pci part with other
virtio-FOO-pci devices, and a common virtio-net
part with virtio-net-.

-- 
MST



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:09, schrieb Christoph Hellwig:
> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
>> Coroutines in the block layer [Kevin]
>>  * Programming model to simplify block drivers without blocking QEMU threads
> 
> Can anyone explain what the whole point of this is?  It really just is
> a bit of syntactic sugar for the current async state machines.  What does
> it buy us over going for real threading?

The only current block driver that really does everything in an async
state machine is qed. It's definitely not nice code, and having to
convert all of the other block drivers to this would be a lot of work.

So if it only means that we're making things async that would block the
VCPU until now, isn't that a great improvement already?

The advantage compared to threading is that it allows an easy and
incremental transition.

Kevin



Re: [Qemu-devel] [PATCH 1/2] linux aio: support flush operation

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:15, schrieb Christoph Hellwig:
>> Christoph, on another note: Can we rely on Linux AIO never returning
>> short writes except on EOF? Currently we return -EINVAL in this case, so

"short reads" I meant, of course.

>> I hope it's true or we wouldn't return the correct error code.
> 
> More or less.  There's one corner case for all Linux I/O, and that is
> only writes up to INT_MAX are supported, and larger writes (and reads)
> get truncated to it.  It's pretty nasty, but Linux has been vocally
> opposed to fixing this issue.

I think we can safely ignore this. So just replacing the current
ret = -EINVAL; by a memset(buf + ret, 0, len - ret); ret = 0; should be
okay, right? (Of course using the qiov versions, but you get the idea)

Kevin



Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model

2011-07-28 Thread Anthony Liguori

On 07/28/2011 02:36 AM, Paolo Bonzini wrote:

On 07/27/2011 10:01 PM, Anthony Liguori wrote:




That's milkymist, not GoldFish.


Oh, Goldfish is fake. It's not real hardware.

The enumerator device is not a real device. It's weird because it's
imaginary and was designed specifically within QEMU.

It's not a good example for discussing modelling.


There are plenty of PV interfaces implemented by QEMU. Would you say the
same of virtio?


Virtio was designed to look like real hardware.

I would say that trying to fit XenStore into QOM would not be a good 
exercise.


Regards,

Anthony Liguori



Paolo






Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Anthony Liguori

On 07/28/2011 05:13 AM, Supriya Kannery wrote:

On 07/27/2011 09:32 PM, Anthony Liguori wrote:

On 07/27/2011 09:31 AM, Stefan Hajnoczi wrote:

On Wed, Jul 27, 2011 at 1:58 PM, Anthony
Liguori wrote:

Index: qemu/hmp-commands.hx
===
--- qemu.orig/hmp-commands.hx
+++ qemu/hmp-commands.hx
@@ -70,6 +70,20 @@ but should be used with extreme caution.
resizes image files, it can not resize block devices like LVM volumes.
ETEXI

+ {
+ .name = "block_set",
+ .args_type = "device:B,device:O",
+ .params = "device [prop=value][,...]",
+ .help = "Change block device parameters
[hostcache=on/off]",
+ .user_print = monitor_user_noop,
+ .mhandler.cmd_new = do_block_set,
+ },
+STEXI
+@item block_set @var{config}
+@findex block_set
+Change block device parameters (eg: hostcache=on/off) while guest is
running.
+ETEXI
+


block_set_hostcache() please.

Multiplexing commands is generally a bad idea. It weakens typing. In
the
absence of a generic way to set block device properties, implementing
properties as generic in the QMP layer seems like a bad idea to me.


The idea behind block_set was to have a unified interface for changing
block device parameters at runtime. This prevents us from reinventing
new commands from scratch. For example, block I/O throttling is
already queued up to add run-time parameters.

Without a unified command we have a bulkier QMP/HMP interface,
duplicated code, and possibly inconsistencies in syntax between the
commands. Isn't the best way to avoid these problems a unified
interface?

I understand the lack of type safety concern but in this case we
already have to manually pull parsed arguments (i.e. cast to specific
types and deal with invalid input). To me this is a reason *for*
using a unified interface like block_set.


Think about it from a client perspective. How do I determine which
properties are supported by this version of QEMU? I have no way to
identify programmatically what arguments are valid for block_set.



User can programmatically find out valid parameters for
block_set. Internally, validation of prop=value is done against -drive
parameter list and then, only the valid/implemented commands (for now,
hostcache) are accepted from that list. Please find execution output
attached:

(qemu) info block
ide0-hd0: removable=0 hostcache=1 file=../rhel6-32.qcow2 ro=0 drv=qcow2
encrypted=0
floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw
encrypted=0
ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted]
sd0: removable=1 locked=0 hostcache=1 [not inserted]
(qemu) block
block_resize block_set block_passwd
(qemu) block_set


That's HMP btw, not QMP.


block_set: block device name expected
(qemu) block
block_resize block_set block_passwd
(qemu) help block_set
block_set device [prop=value][,...] -- Change block device parameters
[hostcache=on/off]


Parsing help text is not introspection!

Regards,

Anthony Liguori


(qemu) block_set ide
Device 'ide' not found
(qemu) block_set ide0-hd0
Usage: block_set device [prop=value][,...]
(qemu) block_set ide0-hd0 cache
Invalid parameter 'cache'
(qemu) block_set ide0-hd0 cache=on
Parameter 'hostcache' expects on/off
(qemu) block_set ide0-hd0 hostcache=on
(qemu) block_set ide0-hd0 hostcache=off
(qemu) info block
ide0-hd0: removable=0 hostcache=0 file=../rhel6-32.qcow2 ro=0 drv=qcow2
encrypted=0
floppy0: removable=1 locked=0 hostcache=0 file=test.img ro=0 drv=raw
encrypted=0
ide1-cd0: removable=1 locked=0 hostcache=1 [not inserted]
sd0: removable=1 locked=0 hostcache=1 [not inserted]


When we add further parameters, "usage" string can be enhanced to
include those parameters for informing user.

- Thanks, Supriya


OTOH, if you have strong types like block_set_hostcache, query-commands
tells me exactly what's supported.

Regards,

Anthony Liguori



Stefan












Re: [Qemu-devel] [PATCH v2] pci: Common overflow prevention

2011-07-28 Thread Isaku Yamahata
On Thu, Jul 28, 2011 at 11:40:21AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jul 28, 2011 at 04:23:24PM +0900, Isaku Yamahata wrote:
> > This might be a bit late comment...
> > 
> > On Fri, Jul 22, 2011 at 11:05:01AM +0200, Jan Kiszka wrote:
> > > diff --git a/hw/pci_host.c b/hw/pci_host.c
> > > index 728e2d4..bfdc321 100644
> > > --- a/hw/pci_host.c
> > > +++ b/hw/pci_host.c
> > > @@ -47,17 +47,33 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus 
> > > *bus, uint32_t addr)
> > >  return pci_find_device(bus, bus_num, devfn);
> > >  }
> > >  
> > > +void pci_config_write_common(PCIDevice *pci_dev, uint32_t addr,
> > > + uint32_t limit, uint32_t val, uint32_t len)
> > > +{
> > > +assert(len <= 4);
> > > +pci_dev->config_write(pci_dev, addr, val, MIN(len, limit - addr));
> > > +}
> > > +
> > > +uint32_t pci_config_read_common(PCIDevice *pci_dev, uint32_t addr,
> > > +uint32_t limit, uint32_t len)
> > > +{
> > > +assert(len <= 4);
> > > +return pci_dev->config_read(pci_dev, addr, MIN(len, limit - addr));
> > > +}
> > > +
> > 
> > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < 
> > addr.
> > So we need explicit "if (limit < addr) return;".
> > Here's the patch for pci branch.
> > 
> > >From 75c1a2b47c93ad987cd7a37fb62bda9a59f27948 Mon Sep 17 00:00:00 2001
> > Message-Id: 
> > <75c1a2b47c93ad987cd7a37fb62bda9a59f27948.1311837763.git.yamah...@valinux.co.jp>
> > From: Isaku Yamahata 
> > Date: Thu, 28 Jul 2011 16:20:28 +0900
> > Subject: [PATCH] pci/host: limit check of pci_host_config_read/write_common
> > 
> > This patch adds boundary check in pci_host_config_read/write_common()
> > Since limit and addr is unsigned, MIN(len, limit - addr) = len if limit < 
> > addr.
> > So we need explicit "if (limit <= addr) return;"
> > 
> > Signed-off-by: Isaku Yamahata 
> 
> I don't see a problem with this, but could you please clarify when does
> this happen? I think this is only possible for a pci device
> behind an express root. If so, this belongs in pcie_host.c

Right. I'll move the check into pcie_host.c

-- 
yamahata



Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread Artyom Tarasenko
On Thu, Jul 28, 2011 at 2:03 PM,   wrote:
> At Thu, 28 Jul 2011 13:51:08 +0200,
> Artyom Tarasenko wrote:
>> On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
>> > Hi,
>> >
>> > At Mon, 25 Jul 2011 19:22:38 +0200,
>> > Artyom Tarasenko wrote:
>> >
>> >> clear interrupt request if the interrupt priority < CPU pil
>> >> clear hardware interrupt request if interrupts are disabled
>> >
>> > Not directly related to the fix, but I'd like to note a problem
>> > of hw/sun4u.c interrupt code:
>> >
>> > The interrupt code probably mixes hardware interrupts and
>> > software interrupts.
>> > %pil is for software interrupts (interrupt_level_n traps).
>> > %pil can not mask hardware interrupts (interrupt_vector traps);
>> > the CPU raises interrupt_vector traps even on %pil=15.
>> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
>> > seem to be masked by %pil.
>>
>> The interrupt_vector traps are currently not implemented, are they?
>> So it's hard to tell whether they are masked.
>
> Yes, interrupt_vector is not implemented yet.
> I failed to explain the problem.
> The problem is that cpu_set_irqs() should raise interrupt_vector
> traps but it raises interrupt_level_n traps actually.
> sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
> the 1st argument.  The allocated irqs (the irq variable) are
> passed to pci_apb_init().  APB should generate interrupt_vector
> traps (hardware interrupts), not the interrupt_vector_n traps.

Yes, this is true. But it's more complicated than this: cpu_check_irqs
also checks tick/stick/hstick interrupts. They should produce the
interrupt_level_n traps as they currently do.

The patch merely fixes the problem of hanging on a interrupt_vector_n
trap if the trap handler uses pil for interrupt masking. The problem
exists independently from interrupt_vector trap generation (as you
pointed out).

Do you have objections to this patch in its current form?

> The interrupts from APB would be reported by cpu_set_irq(),
> but cpu_set_irq() seems to generate interrupt_vector_n traps.

For me it's not obvious. The interrupt vector not just one line, but
the vector, which is written in the corresponding CPU register (also
missing in the current qemu implementation). On the real hardware the
vector is created by the IOMMU (PBM/APB/...). If qemu intends to
support multiple chipsets, we should keep it the way it's done on the
real hardware (for instance the interrupt vectors for on-board devices
on Ultra-1 and E6500 are not the same).

I'd suggest APB shall use some other interface for communicating
interrupts to the CPU. Something like
cpu_receive_ivec(interrupt_vector).

>> >> Signed-off-by: Artyom Tarasenko 
>> >> ---
>> >>  hw/sun4u.c |    6 --
>> >>  1 files changed, 4 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/hw/sun4u.c b/hw/sun4u.c
>> >> index d7dcaf0..7f95aeb 100644
>> >> --- a/hw/sun4u.c
>> >> +++ b/hw/sun4u.c
>> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
>> >>          pil |= 1 << 14;
>> >>      }
>> >>
>> >> -    if (!pil) {
>> >> +    if (pil < (2 << env->psrpil)){
>> >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>> >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
>> >>                             env->interrupt_index);
>> >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
>> >>                  break;
>> >>              }
>> >>          }
>> >> -    } else {
>> >> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>> >>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
>> >> softint=%08x "
>> >>                         "current interrupt %x\n",
>> >>                         pil, env->pil_in, env->softint, 
>> >> env->interrupt_index);
>> >> +        env->interrupt_index = 0;
>> >> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
>> >>      }
>> >>  }
>> >>
>> >> --
>> >> 1.7.3.4
>> >
>> >
>> > 
>> > Tsuneo Saito 
>> >
>>
>>
>>
>> --
>> Regards,
>> Artyom Tarasenko
>>
>> solaris/sparc under qemu blog: http://tyom.blogspot.com/
>
> 
> Tsuneo Saito 
>

-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Anthony Liguori

On 07/28/2011 07:09 AM, Christoph Hellwig wrote:

On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:

Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU threads


Can anyone explain what the whole point of this is?  It really just is
a bit of syntactic sugar for the current async state machines.  What does
it buy us over going for real threading?


It is threading--just with a common locking model where a single big 
lock is held to make up for the fact that most of QEMU isn't reentrant.


By restructuring the code to be threaded, we can incrementally remove 
the big lock if we audit for use of non-reentrant functions and 
introduce more granular locking.


The whole ucontext/setjmp thing is just an optimization.  I would hope 
it entirely disappears long term as we promote coroutines to full threads.


Regards,

Anthony Liguori









Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf  wrote:
> Am 28.07.2011 14:09, schrieb Christoph Hellwig:
>> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
>>> Coroutines in the block layer [Kevin]
>>>  * Programming model to simplify block drivers without blocking QEMU threads
>>
>> Can anyone explain what the whole point of this is?  It really just is
>> a bit of syntactic sugar for the current async state machines.  What does
>> it buy us over going for real threading?
>
> The only current block driver that really does everything in an async
> state machine is qed. It's definitely not nice code, and having to
> convert all of the other block drivers to this would be a lot of work.

Thanks Kevin :).  I do agree with the clumsiness of async callback
programming - a lot of code is spent bundling and unbundling variables
to pass between functions, not to mention that the control flow is
much harder to follow.

> So if it only means that we're making things async that would block the
> VCPU until now, isn't that a great improvement already?
>
> The advantage compared to threading is that it allows an easy and
> incremental transition.

Also remember that we already have a threads-based implementation of
coroutines.  Later on we can do fine-grained locking and switch to
threads directly instead of coroutines, if need be.

Stefan



Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread Artyom Tarasenko
On Thu, Jul 28, 2011 at 2:10 PM, Tsuneo Saito  wrote:
> 2011/7/28 :
>>
>> At Thu, 28 Jul 2011 13:51:08 +0200,
>> Artyom Tarasenko wrote:
>> > On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
>> > > Hi,
>> > >
>> > > At Mon, 25 Jul 2011 19:22:38 +0200,
>> > > Artyom Tarasenko wrote:
>> > >
>> > >> clear interrupt request if the interrupt priority < CPU pil
>> > >> clear hardware interrupt request if interrupts are disabled
>> > >
>> > > Not directly related to the fix, but I'd like to note a problem
>> > > of hw/sun4u.c interrupt code:
>> > >
>> > > The interrupt code probably mixes hardware interrupts and
>> > > software interrupts.
>> > > %pil is for software interrupts (interrupt_level_n traps).
>> > > %pil can not mask hardware interrupts (interrupt_vector traps);
>> > > the CPU raises interrupt_vector traps even on %pil=15.
>> > > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
>> > > seem to be masked by %pil.
>> >
>> > The interrupt_vector traps are currently not implemented, are they?
>> > So it's hard to tell whether they are masked.
>>
>> Yes, interrupt_vector is not implemented yet.
>
> Ah, I intended to write:
>  No, interrupt_vector is not implemented yet.
> I'm very sorry for my engrish :-(

Not having English as my native language, I even haven't noticed
something wrong here.

Despite what they say the International language is not English, but
broken English.  :)

>> I failed to explain the problem.
>> The problem is that cpu_set_irqs() should raise interrupt_vector
>> traps but it raises interrupt_level_n traps actually.
>>
>> sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
>> the 1st argument.  The allocated irqs (the irq variable) are
>> passed to pci_apb_init().  APB should generate interrupt_vector
>> traps (hardware interrupts), not the interrupt_vector_n traps.
>> The interrupts from APB would be reported by cpu_set_irq(),
>> but cpu_set_irq() seems to generate interrupt_vector_n traps.
>>
>> > >> Signed-off-by: Artyom Tarasenko 
>> > >> ---
>> > >>  hw/sun4u.c |    6 --
>> > >>  1 files changed, 4 insertions(+), 2 deletions(-)
>> > >>
>> > >> diff --git a/hw/sun4u.c b/hw/sun4u.c
>> > >> index d7dcaf0..7f95aeb 100644
>> > >> --- a/hw/sun4u.c
>> > >> +++ b/hw/sun4u.c
>> > >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
>> > >>          pil |= 1 << 14;
>> > >>      }
>> > >>
>> > >> -    if (!pil) {
>> > >> +    if (pil < (2 << env->psrpil)){
>> > >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>> > >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
>> > >>                             env->interrupt_index);
>> > >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
>> > >>                  break;
>> > >>              }
>> > >>          }
>> > >> -    } else {
>> > >> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>> > >>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
>> > >> softint=%08x "
>> > >>                         "current interrupt %x\n",
>> > >>                         pil, env->pil_in, env->softint, 
>> > >> env->interrupt_index);
>> > >> +        env->interrupt_index = 0;
>> > >> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
>> > >>      }
>> > >>  }
>> > >>
>> > >> --
>> > >> 1.7.3.4
>> > >
>> > >
>> > > 
>> > > Tsuneo Saito 
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Artyom Tarasenko
>> >
>> > solaris/sparc under qemu blog: http://tyom.blogspot.com/
>>
>> 
>> Tsuneo Saito 
>



-- 
Regards,
Artyom Tarasenko

solaris/sparc under qemu blog: http://tyom.blogspot.com/



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 14:54, schrieb Stefan Hajnoczi:
> On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf  wrote:
>> Am 28.07.2011 14:09, schrieb Christoph Hellwig:
>>> On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
 Coroutines in the block layer [Kevin]
  * Programming model to simplify block drivers without blocking QEMU 
 threads
>>>
>>> Can anyone explain what the whole point of this is?  It really just is
>>> a bit of syntactic sugar for the current async state machines.  What does
>>> it buy us over going for real threading?
>>
>> The only current block driver that really does everything in an async
>> state machine is qed. It's definitely not nice code, and having to
>> convert all of the other block drivers to this would be a lot of work.
> 
> Thanks Kevin :). 

I certainly didn't mean to attack your code or even yourself. It's not
that qed is done particularly bad or anything. That the code isn't
really nice is just the natural result of the callback-based programming
model.

>> So if it only means that we're making things async that would block the
>> VCPU until now, isn't that a great improvement already?
>>
>> The advantage compared to threading is that it allows an easy and
>> incremental transition.
> 
> Also remember that we already have a threads-based implementation of
> coroutines.  Later on we can do fine-grained locking and switch to
> threads directly instead of coroutines, if need be.

Might be an option for the future, but for now there's enough left to do
to take real advantage of coroutines.

Kevin



Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 11:23 AM, Kevin Wolf  wrote:
> Am 27.07.2011 16:51, schrieb Stefan Hajnoczi:
>> 2011/7/27 Michael Tokarev :
>>> 27.07.2011 15:30, Supriya Kannery wrote:
 New command "block_set" added for dynamically changing any of the block
 device parameters. For now, dynamic setting of hostcache params using this
 command is implemented. Other block device parameter changes, can be
 integrated in similar lines.

 Signed-off-by: Supriya Kannery 

 ---
  block.c         |   54 +
  block.h         |    2 +
  blockdev.c      |   61 
 
  blockdev.h      |    1
  hmp-commands.hx |   14 
  qemu-config.c   |   13 +++
  qemu-option.c   |   25 ++
  qemu-option.h   |    2 +
  qmp-commands.hx |   28 +
  9 files changed, 200 insertions(+)

 Index: qemu/block.c
 ===
 --- qemu.orig/block.c
 +++ qemu/block.c
 @@ -651,6 +651,34 @@ unlink_and_fail:
      return ret;
  }

 +int bdrv_reopen(BlockDriverState *bs, int bdrv_flags)
 +{
 +    BlockDriver *drv = bs->drv;
 +    int ret = 0, open_flags;
 +
 +    /* Quiesce IO for the given block device */
 +    qemu_aio_flush();
 +    if (bdrv_flush(bs)) {
 +        qerror_report(QERR_DATA_SYNC_FAILED, bs->device_name);
 +        return ret;
 +    }
 +    open_flags = bs->open_flags;
 +    bdrv_close(bs);
 +
 +    ret = bdrv_open(bs, bs->filename, bdrv_flags, drv);
 +    if (ret < 0) {
 +        /* Reopen failed. Try to open with original flags */
 +        qerror_report(QERR_REOPEN_FILE_FAILED, bs->filename);
 +        ret = bdrv_open(bs, bs->filename, open_flags, drv);
 +        if (ret < 0) {
 +            /* Reopen failed with orig and modified flags */
 +            abort();
 +        }
>>>
>>> Can we please avoid this stuff completely?  Just keep the
>>> old device open still, until you're sure new one is ok.
>>>
>>> Or else it will be quite dangerous command in many cases.
>>> For example, after -runas/-chroot, or additional selinux
>>> settings or whatnot.  And in this case, instead of merely
>>> returning an error, we'll see abort().  Boom.
>>
>> Slight complication for image formats that use a dirty bit here.  QED
>> has a dirty bit.  VMDK also specifies one but we don't implement it
>> today.
>>
>> If the image file is dirty then all its metadata will be scanned for
>> consistency when it is opened.  This increases the bdrv_open() time
>> and hence the downtime of the VM.  So it is not ideal to open the
>> image file twice, even though there is no consistency problem.
>
> In general I really like understatements, but opening an image file
> twice isn't only "not ideal", but should be considered verboten.

Point taken.

>> I'll think about this some more, there are a couple of solutions like
>> keeping only the file descriptor around, introducing a flush command
>> that makes sure the file is in a clean state, or changing QED to not
>> do this.
>
> Changing the format drivers doesn't really look like the right solution.
>
> Keeping the fd around looks okay, we can probably achieve this by
> introducing a bdrv_reopen function. It means that we may need to reopen
> the format layer, but it can't really fail.

My concern is that this assumes a single file descriptor.  For vmdk
there may be multiple split files.

I'm almost starting to think bdrv_reopen() should be recursive down
the BlockDriverState tree.

Stefan



Re: [Qemu-devel] Block layer roadmap

2011-07-28 Thread Stefan Hajnoczi
On Thu, Jul 28, 2011 at 2:10 PM, Kevin Wolf  wrote:
> Am 28.07.2011 14:54, schrieb Stefan Hajnoczi:
>> On Thu, Jul 28, 2011 at 1:35 PM, Kevin Wolf  wrote:
>>> Am 28.07.2011 14:09, schrieb Christoph Hellwig:
 On Wed, Jul 27, 2011 at 01:37:31PM +0100, Stefan Hajnoczi wrote:
> Coroutines in the block layer [Kevin]
>  * Programming model to simplify block drivers without blocking QEMU 
> threads

 Can anyone explain what the whole point of this is?  It really just is
 a bit of syntactic sugar for the current async state machines.  What does
 it buy us over going for real threading?
>>>
>>> The only current block driver that really does everything in an async
>>> state machine is qed. It's definitely not nice code, and having to
>>> convert all of the other block drivers to this would be a lot of work.
>>
>> Thanks Kevin :).
>
> I certainly didn't mean to attack your code or even yourself. It's not
> that qed is done particularly bad or anything. That the code isn't
> really nice is just the natural result of the callback-based programming
> model.

No worries, no offence taken :)

Stefan



Re: [Qemu-devel] [V5 Patch 3/4]Qemu: Command "block_set" for dynamic block params change

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 15:10, schrieb Stefan Hajnoczi:
> On Thu, Jul 28, 2011 at 11:23 AM, Kevin Wolf  wrote:
>> Am 27.07.2011 16:51, schrieb Stefan Hajnoczi:
>>> I'll think about this some more, there are a couple of solutions like
>>> keeping only the file descriptor around, introducing a flush command
>>> that makes sure the file is in a clean state, or changing QED to not
>>> do this.
>>
>> Changing the format drivers doesn't really look like the right solution.
>>
>> Keeping the fd around looks okay, we can probably achieve this by
>> introducing a bdrv_reopen function. It means that we may need to reopen
>> the format layer, but it can't really fail.
> 
> My concern is that this assumes a single file descriptor.  For vmdk
> there may be multiple split files.
> 
> I'm almost starting to think bdrv_reopen() should be recursive down
> the BlockDriverState tree.

Yes, VMDK would have to call bdrv_reopen() for each file that it uses.

Kevin



[Qemu-devel] [PATCH RESEND 2/3] Move the xenfb pointer handler to the connected method

2011-07-28 Thread stefano.stabellini
From: John Haxby 

Ensure that we read "request-abs-pointer" after the frontend has written
it.  This means that we will correctly set up an ansolute or relative
pointer handler correctly.

Signed-off-by: John Haxby 
Acked-by: Stefano Stabellini 
---
 hw/xenfb.c |   19 ++-
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/xenfb.c b/hw/xenfb.c
index c001ab9..d964318 100644
--- a/hw/xenfb.c
+++ b/hw/xenfb.c
@@ -356,10 +356,6 @@ static int input_initialise(struct XenDevice *xendev)
 struct XenInput *in = container_of(xendev, struct XenInput, c.xendev);
 int rc;
 
-if (xenstore_read_fe_int(xendev, "request-abs-pointer",
- &in->abs_pointer_wanted) == -1)
-   in->abs_pointer_wanted = 0;
-
 if (!in->c.ds) {
 char *vfb = xenstore_read_str(NULL, "device/vfb");
 if (vfb == NULL) {
@@ -377,10 +373,22 @@ static int input_initialise(struct XenDevice *xendev)
return rc;
 
 qemu_add_kbd_event_handler(xenfb_key_event, in);
+return 0;
+}
+
+static void input_connected(struct XenDevice *xendev)
+{
+struct XenInput *in = container_of(xendev, struct XenInput, c.xendev);
+
+if (xenstore_read_fe_int(xendev, "request-abs-pointer",
+ &in->abs_pointer_wanted) == -1)
+   in->abs_pointer_wanted = 0;
+
+if (in->qmouse)
+   qemu_remove_mouse_event_handler(in->qmouse);
 in->qmouse = qemu_add_mouse_event_handler(xenfb_mouse_event, in,
  in->abs_pointer_wanted,
  "Xen PVFB Mouse");
-return 0;
 }
 
 static void input_disconnect(struct XenDevice *xendev)
@@ -960,6 +968,7 @@ struct XenDevOps xen_kbdmouse_ops = {
 .size   = sizeof(struct XenInput),
 .init   = input_init,
 .initialise = input_initialise,
+.connected  = input_connected,
 .disconnect = input_disconnect,
 .event  = input_event,
 };
-- 
1.7.2.3




[Qemu-devel] [PATCH RESEND 1/3] Introduce a new 'connected' xendev op called when Connected.

2011-07-28 Thread stefano.stabellini
From: John Haxby 

Rename the existing xendev 'connect' op to 'initialised' and introduce
a new 'connected' op.  This new op, if defined, is called when the
backend is connected.  Note that since there is no state transition this
may be called more than once.

Signed-off-by: John Haxby 
Acked-by: Stefano Stabellini 
---
 hw/xen_backend.c |   39 +--
 hw/xen_backend.h |3 ++-
 hw/xen_console.c |4 ++--
 hw/xen_disk.c|2 +-
 hw/xen_nic.c |2 +-
 hw/xenfb.c   |8 
 6 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/hw/xen_backend.c b/hw/xen_backend.c
index d881fa2..c506dfe 100644
--- a/hw/xen_backend.c
+++ b/hw/xen_backend.c
@@ -421,13 +421,13 @@ static int xen_be_try_init(struct XenDevice *xendev)
 }
 
 /*
- * Try to connect xendev.  Depends on the frontend being ready
+ * Try to initialise xendev.  Depends on the frontend being ready
  * for it (shared ring and evtchn info in xenstore, state being
  * Initialised or Connected).
  *
  * Goes to Connected on success.
  */
-static int xen_be_try_connect(struct XenDevice *xendev)
+static int xen_be_try_initialise(struct XenDevice *xendev)
 {
 int rc = 0;
 
@@ -441,11 +441,11 @@ static int xen_be_try_connect(struct XenDevice *xendev)
 }
 }
 
-if (xendev->ops->connect) {
-rc = xendev->ops->connect(xendev);
+if (xendev->ops->initialise) {
+rc = xendev->ops->initialise(xendev);
 }
 if (rc != 0) {
-xen_be_printf(xendev, 0, "connect() failed\n");
+xen_be_printf(xendev, 0, "initialise() failed\n");
 return rc;
 }
 
@@ -454,6 +454,28 @@ static int xen_be_try_connect(struct XenDevice *xendev)
 }
 
 /*
+ * Try to let xendev know that it is connected.  Depends on the
+ * frontend being Connected.  Note that this may be called more
+ * than once since the backend state is not modified.
+ */
+static void xen_be_try_connected(struct XenDevice *xendev)
+{
+if (!xendev->ops->connected)
+   return;
+
+if (xendev->fe_state != XenbusStateConnected) {
+   if (xendev->ops->flags & DEVOPS_FLAG_IGNORE_STATE) {
+   xen_be_printf(xendev, 2, "frontend not ready, ignoring\n");
+   } else {
+   xen_be_printf(xendev, 2, "frontend not ready (yet)\n");
+   return;
+   }
+}
+
+xendev->ops->connected(xendev);
+}
+
+/*
  * Teardown connection.
  *
  * Goes to Closed when done.
@@ -508,7 +530,12 @@ void xen_be_check_state(struct XenDevice *xendev)
 rc = xen_be_try_init(xendev);
 break;
 case XenbusStateInitWait:
-rc = xen_be_try_connect(xendev);
+rc = xen_be_try_initialise(xendev);
+break;
+case XenbusStateConnected:
+/* xendev->be_state doesn't change */
+xen_be_try_connected(xendev);
+rc = -1;
 break;
 case XenbusStateClosed:
 rc = xen_be_try_reset(xendev);
diff --git a/hw/xen_backend.h b/hw/xen_backend.h
index 6401c85..3305630 100644
--- a/hw/xen_backend.h
+++ b/hw/xen_backend.h
@@ -21,7 +21,8 @@ struct XenDevOps {
 uint32_t  flags;
 void  (*alloc)(struct XenDevice *xendev);
 int   (*init)(struct XenDevice *xendev);
-int   (*connect)(struct XenDevice *xendev);
+int   (*initialise)(struct XenDevice *xendev);
+void  (*connected)(struct XenDevice *xendev);
 void  (*event)(struct XenDevice *xendev);
 void  (*disconnect)(struct XenDevice *xendev);
 int   (*free)(struct XenDevice *xendev);
diff --git a/hw/xen_console.c b/hw/xen_console.c
index 8ef104c..c6bea81 100644
--- a/hw/xen_console.c
+++ b/hw/xen_console.c
@@ -212,7 +212,7 @@ out:
 return ret;
 }
 
-static int con_connect(struct XenDevice *xendev)
+static int con_initialise(struct XenDevice *xendev)
 {
 struct XenConsole *con = container_of(xendev, struct XenConsole, xendev);
 int limit;
@@ -273,7 +273,7 @@ struct XenDevOps xen_console_ops = {
 .size   = sizeof(struct XenConsole),
 .flags  = DEVOPS_FLAG_IGNORE_STATE,
 .init   = con_init,
-.connect= con_connect,
+.initialise = con_initialise,
 .event  = con_event,
 .disconnect = con_disconnect,
 };
diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index add815f..4abdaf9 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -846,7 +846,7 @@ struct XenDevOps xen_blkdev_ops = {
 .flags  = DEVOPS_FLAG_NEED_GNTDEV,
 .alloc  = blk_alloc,
 .init   = blk_init,
-.connect= blk_connect,
+.initialise= blk_connect,
 .disconnect = blk_disconnect,
 .event  = blk_event,
 .free   = blk_free,
diff --git a/hw/xen_nic.c b/hw/xen_nic.c
index ff86491..9a97d3e 100644
--- a/hw/xen_nic.c
+++ b/hw/xen_nic.c
@@ -433,7 +433,7 @@ struct XenDevOps xen_netdev_ops = {
 .size   = sizeof(struct XenNetDev),
 .flags  = DEVOPS_FLAG_NEED_GNTDEV,
 .init   = net_init,
-.connect 

[Qemu-devel] [PATCH RESEND 3/3] xen: use uint64_t instead of target_ulong in cpu_ioreq_move

2011-07-28 Thread stefano.stabellini
From: Stefano Stabellini 

cpu_ioreq_move might move 8 bytes at a time so we must make sure that
the temporary variable can hold 8 bytes.

Signed-off-by: Stefano Stabellini 
---
 xen-all.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/xen-all.c b/xen-all.c
index 9eaeac1..3611e19 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -620,7 +620,7 @@ static void cpu_ioreq_move(ioreq_t *req)
 }
 }
 } else {
-target_ulong tmp;
+uint64_t tmp;
 
 if (req->dir == IOREQ_READ) {
 for (i = 0; i < req->count; i++) {
-- 
1.7.2.3




Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()

2011-07-28 Thread Luiz Capitulino
On Thu, 28 Jul 2011 10:23:22 +0300
Avi Kivity  wrote:

> On 07/28/2011 12:44 AM, Blue Swirl wrote:
> > On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino  
> > wrote:
> > >  This function should be used when the VM is not supposed to resume
> > >  execution (eg. by issuing 'cont' monitor command).
> > >
> > >  Today, we allow the user to resume execution even when:
> > >
> > >o the guest shuts down and -no-shutdown is used
> > >o there's a kvm internal error
> > >o loading the VM state with -loadvm or "loadvm" in the monitor fails
> > >
> > >  I think only badness can happen from the cases above.
> >
> > I'd suppose a system_reset should bring the system back to sanity and
> > then clear vm_permanent_stopped (where's -ly?)

What's -ly?

> > except maybe for KVM
> > internal error if that can't be recovered. Then it would not very
> > permanent anymore, so the name would need adjusting.
> 
> Currently, all kvm internal errors are recoverable by reset (and 
> possibly by fiddling with memory/registers).

Ok, but a poweroff in the guest isn't recoverable with system_reset
right? Or does it depend on the guest?

I get funny results if qemu is started with -no-shutdown and I run cont after
a poweroff in a F15 guest. Sometimes qemu will exit after a few seconds,
sometimes 'info status' will say 'running'.



Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()

2011-07-28 Thread Avi Kivity

On 07/28/2011 04:31 PM, Luiz Capitulino wrote:

On Thu, 28 Jul 2011 10:23:22 +0300
Avi Kivity  wrote:

>  On 07/28/2011 12:44 AM, Blue Swirl wrote:
>  >  On Wed, Jul 27, 2011 at 9:42 PM, Luiz Capitulino  
 wrote:
>  >  >   This function should be used when the VM is not supposed to resume
>  >  >   execution (eg. by issuing 'cont' monitor command).
>  >  >
>  >  >   Today, we allow the user to resume execution even when:
>  >  >
>  >  > o the guest shuts down and -no-shutdown is used
>  >  > o there's a kvm internal error
>  >  > o loading the VM state with -loadvm or "loadvm" in the monitor fails
>  >  >
>  >  >   I think only badness can happen from the cases above.
>  >
>  >  I'd suppose a system_reset should bring the system back to sanity and
>  >  then clear vm_permanent_stopped (where's -ly?)

What's -ly?



permanent-ly.


>  >  except maybe for KVM
>  >  internal error if that can't be recovered. Then it would not very
>  >  permanent anymore, so the name would need adjusting.
>
>  Currently, all kvm internal errors are recoverable by reset (and
>  possibly by fiddling with memory/registers).

Ok, but a poweroff in the guest isn't recoverable with system_reset
right? Or does it depend on the guest?


Right, it's not recoverable if you shut the power down where the tractor 
beam is coupled to the main reactor.



I get funny results if qemu is started with -no-shutdown and I run cont after
a poweroff in a F15 guest. Sometimes qemu will exit after a few seconds,
sometimes 'info status' will say 'running'.


Yes, best fixed.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 0/4] Set of patches for chrooted environment

2011-07-28 Thread Laurent Vivier
This set of patches helps to use qemu-linux-user in a chrooted environment.

It mostly allows to define the default cpu model as we can't use '-cpu' 
argument.
The last one defines enviromnent variables to be able to use log file and 
gdb server  ('-d' and '-g' arguments).

NOTE: I saw some comments in the mailing list about environment variables,
if patch #4 dislikes, I've also a patch providing a "qemu-wrapper" with the 
same behavior.

[PATCH 1/4] linux-user: define default cpu model in configure instead of 
linux-user/main.c
[PATCH 2/4] linux-user: specify the cpu model during configure
[PATCH 3/4] linux-user,m68k: display default cpu
[PATCH 4/4] linux-user: define new environment variables



[Qemu-devel] [PATCH 2/4] linux-user: specify the cpu model during configure

2011-07-28 Thread Laurent Vivier
This patch allows to set the default cpu model for a given architecture,
for instance:

 configure --target-list=m68k-linux-user --m68k-default-cpu=m68040

Signed-off-by: Laurent Vivier 
---
 configure |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index c74a5f9..d54f0ed 100755
--- a/configure
+++ b/configure
@@ -527,6 +527,10 @@ for opt do
   ;;
   --target-list=*) target_list="$optarg"
   ;;
+  --*-default-cpu=*)
+tmp=`expr "x$opt" : 'x--\(.*\)-default-cpu=.*'`
+eval ${tmp}_default_cpu="$optarg"
+  ;;
   --enable-trace-backend=*) trace_backend="$optarg"
   ;;
   --with-trace-file=*) trace_file="$optarg"
@@ -916,6 +920,7 @@ echo "   use %M for cpu name 
[$interp_prefix]"
 echo "  --target-list=LIST   set target list (default: build everything)"
 echo "Available targets: $default_target_list" | \
 fold -s -w 53 | sed -e 's/^/   /'
+echo "  --ARCH-default-cpu=CPU   set the default cpu for a given architecture"
 echo ""
 echo "Advanced options (experts only):"
 echo "  --source-path=PATH   path of source code [$source_path]"
@@ -3291,6 +3296,10 @@ case "$target_arch2" in
 exit 1
   ;;
 esac
+tmp_target_default_cpu=`eval echo \\$${target_arch2}_default_cpu`
+if [ "x$tmp_target_default_cpu" != "x" ] ; then
+  target_default_cpu="$tmp_target_default_cpu"
+fi
 echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> $config_target_mak
 echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak
 echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak
-- 
1.7.4.1




[Qemu-devel] [PATCH 3/4] linux-user,m68k: display default cpu

2011-07-28 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
---
 target-m68k/helper.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-m68k/helper.c b/target-m68k/helper.c
index a936fe7..f5d33cd 100644
--- a/target-m68k/helper.c
+++ b/target-m68k/helper.c
@@ -57,6 +57,11 @@ void m68k_cpu_list(FILE *f, fprintf_function cpu_fprintf)
 unsigned int i;
 
 for (i = 0; m68k_cpu_defs[i].name; i++) {
+if (strcmp(m68k_cpu_defs[i].name, TARGET_DEFAULT_CPU) == 0) {
+(*cpu_fprintf)(f, " >");
+} else {
+(*cpu_fprintf)(f, "  ");
+}
 (*cpu_fprintf)(f, "%s\n", m68k_cpu_defs[i].name);
 }
 }
-- 
1.7.4.1




[Qemu-devel] [PATCH 4/4] linux-user: define new environment variables

2011-07-28 Thread Laurent Vivier
QEMU_GDB=port allows to define gdb server port to wait on.
QEMU_DEBUG=options allows to activate log file (like -d options)

Signed-off-by: Laurent Vivier 
---
 linux-user/main.c |   11 ---
 qemu-doc.texi |4 
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 7180cee..ff1720b 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2815,8 +2815,10 @@ static void usage(void)
"-strace  log system calls\n"
"\n"
"Environment variables:\n"
-   "QEMU_STRACE   Print system calls and arguments similar to 
the\n"
-   "  'strace' program.  Enable by setting to any 
value.\n"
+   "QEMU_STRACEPrint system calls and arguments similar to 
the\n"
+   "   'strace' program.  Enable by setting to any 
value.\n"
+   "QEMU_DEBUG=options Activate log. Use same options as '-d' 
options\n"
+   "QEMU_GDB=port  Wait gdb connection to port\n"
"You can use -E and -U options to set/unset environment variables\n"
"for target process.  It is possible to provide several variables\n"
"by repeating the option.  For example:\n"
@@ -2872,7 +2874,7 @@ int main(int argc, char **argv, char **envp)
 const char *filename;
 const char *cpu_model;
 const char *log_file = DEBUG_LOGFILE;
-const char *log_mask = NULL;
+const char *log_mask = getenv("QEMU_DEBUG");
 struct target_pt_regs regs1, *regs = ®s1;
 struct image_info info1, *info = &info1;
 struct linux_binprm bprm;
@@ -2919,6 +2921,9 @@ int main(int argc, char **argv, char **envp)
 #if defined(cpudef_setup)
 cpudef_setup(); /* parse cpu definitions in target config file (TBD) */
 #endif
+if (getenv("QEMU_GDB")) {
+  gdbstub_port = atoi(getenv("QEMU_GDB"));
+}
 
 optind = 1;
 for(;;) {
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 47e1991..330f9d0 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -2285,6 +2285,10 @@ space emulator hasn't implemented ptrace).  At the 
moment this is
 incomplete.  All system calls that don't have a specific argument
 format are printed with information for six arguments.  Many
 flag-style arguments don't have decoders and will show up as numbers.
+@item QEMU_DEBUG=options
+Activate log. Use same options as '-d' options.
+@item QEMU_GDB=port
+Wait gdb connection to port.
 @end table
 
 @node Other binaries
-- 
1.7.4.1




[Qemu-devel] [PATCH 1/4] linux-user: define default cpu model in configure instead of linux-user/main.c

2011-07-28 Thread Laurent Vivier
Signed-off-by: Laurent Vivier 
---
 configure |   15 +++
 linux-user/main.c |   34 +-
 2 files changed, 16 insertions(+), 33 deletions(-)

diff --git a/configure b/configure
index fb8819b..c74a5f9 100755
--- a/configure
+++ b/configure
@@ -3075,6 +3075,7 @@ target_dir="$target"
 config_target_mak=$target_dir/config-target.mak
 target_arch2=`echo $target | cut -d '-' -f 1`
 target_bigendian="no"
+target_default_cpu="any"
 
 case "$target_arch2" in
   
armeb|lm32|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|s390x|sh4eb|sparc|sparc64|sparc32plus)
@@ -3151,11 +3152,13 @@ TARGET_ABI_DIR=""
 case "$target_arch2" in
   i386)
 target_phys_bits=64
+target_default_cpu="qemu32"
   ;;
   x86_64)
 TARGET_BASE_ARCH=i386
 target_phys_bits=64
 target_long_alignment=8
+target_default_cpu="qemu64"
   ;;
   alpha)
 target_phys_bits=64
@@ -3173,6 +3176,7 @@ case "$target_arch2" in
   cris)
 target_nptl="yes"
 target_phys_bits=32
+target_default_cpu=""
   ;;
   lm32)
 target_phys_bits=32
@@ -3198,12 +3202,14 @@ case "$target_arch2" in
 echo "TARGET_ABI_MIPSO32=y" >> $config_target_mak
 target_nptl="yes"
 target_phys_bits=64
+target_default_cpu="24Kf"
   ;;
   mipsn32|mipsn32el)
 TARGET_ARCH=mipsn32
 TARGET_BASE_ARCH=mips
 echo "TARGET_ABI_MIPSN32=y" >> $config_target_mak
 target_phys_bits=64
+target_default_cpu="20Kc"
   ;;
   mips64|mips64el)
 TARGET_ARCH=mips64
@@ -3211,12 +3217,14 @@ case "$target_arch2" in
 echo "TARGET_ABI_MIPSN64=y" >> $config_target_mak
 target_phys_bits=64
 target_long_alignment=8
+target_default_cpu="20Kc"
   ;;
   ppc)
 gdb_xml_files="power-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
 target_phys_bits=32
 target_nptl="yes"
 target_libs_softmmu="$fdt_libs"
+target_default_cpu="750"
   ;;
   ppcemb)
 TARGET_BASE_ARCH=ppc
@@ -3225,6 +3233,7 @@ case "$target_arch2" in
 target_phys_bits=64
 target_nptl="yes"
 target_libs_softmmu="$fdt_libs"
+target_default_cpu="750"
   ;;
   ppc64)
 TARGET_BASE_ARCH=ppc
@@ -3233,6 +3242,7 @@ case "$target_arch2" in
 target_phys_bits=64
 target_long_alignment=8
 target_libs_softmmu="$fdt_libs"
+target_default_cpu="970fx"
   ;;
   ppc64abi32)
 TARGET_ARCH=ppc64
@@ -3242,6 +3252,7 @@ case "$target_arch2" in
 gdb_xml_files="power64-core.xml power-fpu.xml power-altivec.xml 
power-spe.xml"
 target_phys_bits=64
 target_libs_softmmu="$fdt_libs"
+target_default_cpu="750"
   ;;
   sh4|sh4eb)
 TARGET_ARCH=sh4
@@ -3251,11 +3262,13 @@ case "$target_arch2" in
   ;;
   sparc)
 target_phys_bits=64
+target_default_cpu="Fujitsu MB86904"
   ;;
   sparc64)
 TARGET_BASE_ARCH=sparc
 target_phys_bits=64
 target_long_alignment=8
+target_default_cpu="TI UltraSparc II"
   ;;
   sparc32plus)
 TARGET_ARCH=sparc64
@@ -3263,6 +3276,7 @@ case "$target_arch2" in
 TARGET_ABI_DIR=sparc
 echo "TARGET_ABI32=y" >> $config_target_mak
 target_phys_bits=64
+target_default_cpu="Fujitsu MB86904"
   ;;
   s390x)
 target_nptl="yes"
@@ -3281,6 +3295,7 @@ echo "TARGET_SHORT_ALIGNMENT=$target_short_alignment" >> 
$config_target_mak
 echo "TARGET_INT_ALIGNMENT=$target_int_alignment" >> $config_target_mak
 echo "TARGET_LONG_ALIGNMENT=$target_long_alignment" >> $config_target_mak
 echo "TARGET_LLONG_ALIGNMENT=$target_llong_alignment" >> $config_target_mak
+echo "TARGET_DEFAULT_CPU=\"$target_default_cpu\"" >> $config_target_mak
 echo "TARGET_ARCH=$TARGET_ARCH" >> $config_target_mak
 target_arch_name="`echo $TARGET_ARCH | tr '[:lower:]' '[:upper:]'`"
 echo "TARGET_$target_arch_name=y" >> $config_target_mak
diff --git a/linux-user/main.c b/linux-user/main.c
index 2135b9c..7180cee 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3082,39 +3082,7 @@ int main(int argc, char **argv, char **envp)
 init_paths(interp_prefix);
 
 if (cpu_model == NULL) {
-#if defined(TARGET_I386)
-#ifdef TARGET_X86_64
-cpu_model = "qemu64";
-#else
-cpu_model = "qemu32";
-#endif
-#elif defined(TARGET_ARM)
-cpu_model = "any";
-#elif defined(TARGET_UNICORE32)
-cpu_model = "any";
-#elif defined(TARGET_M68K)
-cpu_model = "any";
-#elif defined(TARGET_SPARC)
-#ifdef TARGET_SPARC64
-cpu_model = "TI UltraSparc II";
-#else
-cpu_model = "Fujitsu MB86904";
-#endif
-#elif defined(TARGET_MIPS)
-#if defined(TARGET_ABI_MIPSN32) || defined(TARGET_ABI_MIPSN64)
-cpu_model = "20Kc";
-#else
-cpu_model = "24Kf";
-#endif
-#elif defined(TARGET_PPC)
-#ifdef TARGET_PPC64
-cpu_model = "970fx";
-#else
-cpu_model = "750";
-#endif
-#else
-cpu_model = "any";
-#endif
+cpu_model = TARGET_DEFAULT_CPU;
 }
 cpu_exec_init_all(0);
 /* NOTE: we need to init the CPU at this stage to get
-- 
1.7.4.1




Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread tsnsaito
At Thu, 28 Jul 2011 14:50:57 +0200,
Artyom Tarasenko wrote:
> On Thu, Jul 28, 2011 at 2:03 PM,   wrote:
> > At Thu, 28 Jul 2011 13:51:08 +0200,
> > Artyom Tarasenko wrote:
> >> On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
> >> > Hi,
> >> >
> >> > At Mon, 25 Jul 2011 19:22:38 +0200,
> >> > Artyom Tarasenko wrote:
> >> >
> >> >> clear interrupt request if the interrupt priority < CPU pil
> >> >> clear hardware interrupt request if interrupts are disabled
> >> >
> >> > Not directly related to the fix, but I'd like to note a problem
> >> > of hw/sun4u.c interrupt code:
> >> >
> >> > The interrupt code probably mixes hardware interrupts and
> >> > software interrupts.
> >> > %pil is for software interrupts (interrupt_level_n traps).
> >> > %pil can not mask hardware interrupts (interrupt_vector traps);
> >> > the CPU raises interrupt_vector traps even on %pil=15.
> >> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
> >> > seem to be masked by %pil.
> >>
> >> The interrupt_vector traps are currently not implemented, are they?
> >> So it's hard to tell whether they are masked.
> >
> > Yes, interrupt_vector is not implemented yet.
> > I failed to explain the problem.
> > The problem is that cpu_set_irqs() should raise interrupt_vector
> > traps but it raises interrupt_level_n traps actually.
> > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
> > the 1st argument.  The allocated irqs (the irq variable) are
> > passed to pci_apb_init().  APB should generate interrupt_vector
> > traps (hardware interrupts), not the interrupt_vector_n traps.
> 
> Yes, this is true. But it's more complicated than this: cpu_check_irqs
> also checks tick/stick/hstick interrupts. They should produce the
> interrupt_level_n traps as they currently do.

That's right.
tick/stick/hstick must raise interrupt_level_n traps.

> The patch merely fixes the problem of hanging on a interrupt_vector_n
> trap if the trap handler uses pil for interrupt masking. The problem
> exists independently from interrupt_vector trap generation (as you
> pointed out).

I understand what is the problem that your patch is going to fix.
Thanks for the explanation.

> Do you have objections to this patch in its current form?

No, I don't have any objections.

> > The interrupts from APB would be reported by cpu_set_irq(),
> > but cpu_set_irq() seems to generate interrupt_vector_n traps.
> 
> For me it's not obvious. The interrupt vector not just one line, but
> the vector, which is written in the corresponding CPU register (also
> missing in the current qemu implementation). On the real hardware the
> vector is created by the IOMMU (PBM/APB/...). If qemu intends to
> support multiple chipsets, we should keep it the way it's done on the
> real hardware (for instance the interrupt vectors for on-board devices
> on Ultra-1 and E6500 are not the same).

Sorry, I can't keep up with this vector thing...
Does the CPU receive hardware interrupts as interrupt_vector traps
(trap type=0x60) regardless of the kind of the interrupt controller,
doesn't it?

> I'd suggest APB shall use some other interface for communicating
> interrupts to the CPU. Something like
> cpu_receive_ivec(interrupt_vector).
> 
> >> >> Signed-off-by: Artyom Tarasenko 
> >> >> ---
> >> >>  hw/sun4u.c |    6 --
> >> >>  1 files changed, 4 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c
> >> >> index d7dcaf0..7f95aeb 100644
> >> >> --- a/hw/sun4u.c
> >> >> +++ b/hw/sun4u.c
> >> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
> >> >>          pil |= 1 << 14;
> >> >>      }
> >> >>
> >> >> -    if (!pil) {
> >> >> +    if (pil < (2 << env->psrpil)){
> >> >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> >> >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
> >> >>                             env->interrupt_index);
> >> >> @@ -287,10 +287,12 @@ void cpu_check_irqs(CPUState *env)
> >> >>                  break;
> >> >>              }
> >> >>          }
> >> >> -    } else {
> >> >> +    } else if (env->interrupt_request & CPU_INTERRUPT_HARD) {
> >> >>          CPUIRQ_DPRINTF("Interrupts disabled, pil=%08x pil_in=%08x 
> >> >> softint=%08x "
> >> >>                         "current interrupt %x\n",
> >> >>                         pil, env->pil_in, env->softint, 
> >> >> env->interrupt_index);
> >> >> +        env->interrupt_index = 0;
> >> >> +        cpu_reset_interrupt(env, CPU_INTERRUPT_HARD);
> >> >>      }
> >> >>  }
> >> >>
> >> >> --
> >> >> 1.7.3.4
> >> >
> >> >
> >> > 
> >> > Tsuneo Saito 
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Artyom Tarasenko
> >>
> >> solaris/sparc under qemu blog: http://tyom.blogspot.com/
> >
> > 
> > Tsuneo Saito 
> >
> 
> -- 
> Regards,
> Artyom Tarasenko
> 
> solaris/sparc under qemu blog: http://tyom.blogspot.com/


Tsuneo Saito 



[Qemu-devel] [PATCH] [RFC] qcow2: group refcount updates during cow

2011-07-28 Thread Frediano Ziglio
Well, I think this is the first real improve patch.
Is more a RFC than a patch. Yes, some lines are terrible!
It collapses refcount decrement during cow.
>From a first check time executing 015 test passed from about 600 seconds
to 70.
This at least prove that refcount updates counts!
Some doubt:
1- place the code in qcow2-refcount.c as it update only refcount and not
  cluster?
2- allow some sort of "begin transaction" / "commit" / "rollback" like 
  databases instead?
3- allow changing tables from different coroutines?

1) If you have a sequence like (1, 2, 4) probably these clusters are all in
the same l2 table but with this code you get two write instead of one.
I'm thinking about a function in qcow2-refcount.c that accept an array of 
cluster
instead of a start + len.

Signed-off-by: Frediano Ziglio 
---
 block/qcow2-cluster.c |   36 ++--
 1 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 81cf77d..da17365 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -675,10 +675,42 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
  * Also flush bs->file to get the right order for L2 and refcount update.
  */
 if (j != 0) {
+int64_t old_start = 0, old_end = -2;
+int count = 0;
 for (i = 0; i < j; i++) {
-qcow2_free_any_clusters(bs,
-be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED, 1);
+old_cluster[i] = be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED;
 }
+// XXX sort old_cluster
+for (i = 0; i < j; i++) {
+int64_t cluster = old_cluster[i];
+
+/* group if contiguos */
+if (old_end + 1 == (cluster >> s->cluster_bits)) {
+++old_end;
+continue;
+}
+
+/* handle */
+if (old_end > 0) {
+qcow2_free_any_clusters(bs, old_start << s->cluster_bits, 
old_end - old_start + 1);
+count += old_end - old_start + 1;
+}
+old_end = -2;
+
+/* handle compressed separately */
+if ((cluster & QCOW_OFLAG_COMPRESSED)) {
+qcow2_free_any_clusters(bs, cluster, 1);
+continue;
+}
+
+/* start a new group */
+old_start = old_end = cluster >> s->cluster_bits;
+}
+if (old_end > 0) {
+qcow2_free_any_clusters(bs, old_start << s->cluster_bits, old_end 
- old_start + 1);
+count += old_end - old_start + 1;
+}
+assert(count == j);
 }
 
 ret = 0;
-- 
1.7.1




Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model

2011-07-28 Thread Paolo Bonzini

On 07/28/2011 02:46 PM, Anthony Liguori wrote:


There are plenty of PV interfaces implemented by QEMU. Would you say the
same of virtio?


Virtio was designed to look like real hardware.

I would say that trying to fit XenStore into QOM would not be a good
exercise.


No doubt about that. :)  I'd put a lot more hope into Goldfish though.

Paolo



Re: [Qemu-devel] [PATCH 00/12] bugfix and qdevify NAND and ONENAND devices

2011-07-28 Thread Peter Maydell
Ping on this one too, since I'm nearly ready with the next chunk of
patches (18 patches fixing bugs and adding NAND support to omap_gpmc)
and they obviously depend on this set.

-- PMM

On 15 July 2011 15:58, Peter Maydell  wrote:
> This patchseries is more goodies from the Meego tree. Specifically, various
> bug fixes to hw/nand and hw/onenand, plus conversion of both to sysbus.
>
> [An advance note on the next thing due to come out of the pipe: qdeving
> nand and onenand allows us to move to qdevification of omap_gpmc, which
> takes both of those (and other things) as 'downstream' devices. That
> change will make it accept arbitrary SysBus devices as its 'downstream'
> devices rather than the current setup of taking an opaque pointer and a
> pair of resize/unmap function pointers. (There are also patches which add
> NAND support so you can tell omap_gpmc "this is a NAND device" and it then
> uses the nand_setpins/nand_getpins/nand_setio interface.) The rationale for
> using SysBus there rather than defining some new kind of bus is that you
> want to be able to plug any random thing into it; for instance the Overo
> board has a lan9118 ethernet controller hanging off the GPMC. ]
>
> A note on dependencies:
>  * This patch depends on the omap gpio patchset (although more textually
>   than seriously semantically)
>  * the six nand patches and the six onenand patches are broadly independent
>   of each other
>
> Juha Riihimäki (9):
>  hw/nand: Support large NAND devices
>  hw/nand: Support devices wider than 8 bits
>  hw/nand: Support multiple reads following READ STATUS
>  hw/nand: qdevify
>  onenand: Handle various ID fields separately
>  onenand: Ignore zero writes to boot command space
>  hw/onenand: program actions can only clear bits
>  hw/sysbus: Add sysbus_mmio_unmap() for unmapping a region
>  hw/onenand: qdevify
>
> Peter Maydell (3):
>  hw/nand: Pass block device state to init function
>  hw/nand: Writing to NAND can only clear bits
>  onenand: Pass BlockDriverState to init function
>
>  hw/axis_dev88.c |    8 +-
>  hw/flash.h      |   19 ++--
>  hw/nand.c       |  345 ---
>  hw/nseries.c    |   13 ++-
>  hw/onenand.c    |  373 
> +++
>  hw/spitz.c      |    6 +-
>  hw/sysbus.c     |   17 +++
>  hw/sysbus.h     |    1 +
>  hw/tc6393xb.c   |    7 +-
>  9 files changed, 561 insertions(+), 228 deletions(-)



Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model

2011-07-28 Thread Anthony Liguori

On 07/28/2011 08:50 AM, Paolo Bonzini wrote:

On 07/28/2011 02:46 PM, Anthony Liguori wrote:


There are plenty of PV interfaces implemented by QEMU. Would you say the
same of virtio?


Virtio was designed to look like real hardware.

I would say that trying to fit XenStore into QOM would not be a good
exercise.


No doubt about that. :) I'd put a lot more hope into Goldfish though.


What's unclear to me about the Goldfish enumerator is whether it should 
be filled out through interaction with hardware devices or via some 
other mechanism.


In many ways, it's similar to ACPI and a Device Tree.  In both of those 
cases, firmware actually is responsible for constructing those tables.


Regards,

Anthony Liguori



Paolo






[Qemu-devel] [Bug 524447] Re: virsh save is very slow

2011-07-28 Thread Jeff Snider
Michael: Yes, that is the correct patch.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/524447

Title:
  virsh save is very slow

Status in libvirt virtualization API:
  Unknown
Status in QEMU:
  Fix Released
Status in “libvirt” package in Ubuntu:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Fix Released
Status in “libvirt” source package in Lucid:
  Won't Fix
Status in “qemu-kvm” source package in Lucid:
  Won't Fix
Status in “libvirt” source package in Maverick:
  Won't Fix
Status in “qemu-kvm” source package in Maverick:
  Won't Fix

Bug description:
  ==
  SRU Justification:
  1. impact: 'qemu save' is slow
  2. how addressed: a patch upstream fixes the case when a file does not 
announce when it is ready.
  3. patch: see the patch in linked bzr trees
  4. TEST CASE: see comment #4 for a specific recipe
  5. regression potential:  this patch only touches the vm save path.
  ==

  As reported here: http://www.redhat.com/archives/libvir-
  list/2009-December/msg00203.html

  "virsh save" is very slow - it writes the image at around 1MB/sec on
  my test system.

  (I think I saw a bug report for this issue on Fedora's bugzilla, but I
  can't find it now...)

  Confirmed under Karmic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt/+bug/524447/+subscriptions



Re: [Qemu-devel] [PATCH] [RFC] qcow2: group refcount updates during cow

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 15:50, schrieb Frediano Ziglio:
> Well, I think this is the first real improve patch.
> Is more a RFC than a patch. Yes, some lines are terrible!
> It collapses refcount decrement during cow.
> From a first check time executing 015 test passed from about 600 seconds
> to 70.
> This at least prove that refcount updates counts!
> Some doubt:
> 1- place the code in qcow2-refcount.c as it update only refcount and not
>   cluster?
> 2- allow some sort of "begin transaction" / "commit" / "rollback" like 
>   databases instead?
> 3- allow changing tables from different coroutines?
> 
> 1) If you have a sequence like (1, 2, 4) probably these clusters are all in
> the same l2 table but with this code you get two write instead of one.
> I'm thinking about a function in qcow2-refcount.c that accept an array of 
> cluster
> instead of a start + len.
> 
> Signed-off-by: Frediano Ziglio 

I think what you're seeing is actually just one special case of a more
general problem. The problem is that we're interpreting writethrough
stricter than required.

The semantics that we really need is that on completion of a request,
all of its data and metadata must be flushed to disk. There is no
requirement that we flush all intermediate states.

My recent update to qcow2_update_snapshot_refcount() is just another
case of the same problem. I think the solution should be similar to what
I did there, i.e. switch the cache to writeback mode while we're
operating on it and switch back when we're done. We should probably have
functions that make both of this a one-liner (I think here we have some
similarity to your begin/commit idea).

With the right functions, this could become as easy as this (might need
better function names, but you get the idea):

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 882f50a..45b67b1 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -612,6 +612,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState
*bs, QCowL2Meta *m)
 if (m->nb_clusters == 0)
 return 0;

+qcow2_cache_disable_writethrough(bs);
+
 old_cluster = qemu_malloc(m->nb_clusters * sizeof(uint64_t));

 /* copy content of unmodified sectors */
@@ -683,6 +685,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState
*bs, QCowL2Meta *m)

 ret = 0;
 err:
+qcow2_cache_restore_writethrough(bs);
 qemu_free(old_cluster);
 return ret;
  }

Kevin



Re: [Qemu-devel] Volume key in qcow3?

2011-07-28 Thread Kevin Wolf
Am 28.07.2011 10:05, schrieb Frediano Ziglio:
> Hi,
>   I noted that AES encryption using qcow2 just use the password given
> as as key (and also truncating it to 16 bytes == 128 bits).
> This is prone to brute force attacks and is not also easy to change
> password (you have to decrypt and encrypt again the entire image).
> LUKS and EncFS use another way. They generate a random key (the
> "volume key") then use the password you give to encrypt N times (where
> N is decided by security level or automatically based on time to
> decrypt the volume key. To change the password just give the old one,
> get the volume key and encrypt again using the new one. LUKS support
> also multiple "slots" to allow multiple password and even using an
> external key file.
> Obviously this require an additional extension to qcow2 so I think it
> require a new qcow3 format.

Yes, once we have qcow3, adding things like this should be easy enough.
I think the idea makes sense.

Another thing to consider with encryption is that we don't encrypt
metadata currently. I'm not entirely sure if this is a good or a bad
thing. Metadata is relatively predictable and I think that might hurt
the encryption? Though I'm really not an expert in this area.

Kevin



Re: [Qemu-devel] [RFC] Introduce vm_stop_permanent()

2011-07-28 Thread Jan Kiszka
On 2011-07-28 15:37, Avi Kivity wrote:
> On 07/28/2011 04:31 PM, Luiz Capitulino wrote:
>> On Thu, 28 Jul 2011 10:23:22 +0300
>> Avi Kivity  wrote:
>>
>> >  On 07/28/2011 12:44 AM, Blue Swirl wrote:
>> >  >  On Wed, Jul 27, 2011 at 9:42 PM, Luiz
>> Capitulino   wrote:
>> >  >  >   This function should be used when the VM is not supposed to
>> resume
>> >  >  >   execution (eg. by issuing 'cont' monitor command).
>> >  >  >
>> >  >  >   Today, we allow the user to resume execution even when:
>> >  >  >
>> >  >  > o the guest shuts down and -no-shutdown is used
>> >  >  > o there's a kvm internal error
>> >  >  > o loading the VM state with -loadvm or "loadvm" in the
>> monitor fails
>> >  >  >
>> >  >  >   I think only badness can happen from the cases above.
>> >  >
>> >  >  I'd suppose a system_reset should bring the system back to
>> sanity and
>> >  >  then clear vm_permanent_stopped (where's -ly?)
>>
>> What's -ly?
>>
> 
> permanent-ly.
> 
>> >  >  except maybe for KVM
>> >  >  internal error if that can't be recovered. Then it would not very
>> >  >  permanent anymore, so the name would need adjusting.
>> >
>> >  Currently, all kvm internal errors are recoverable by reset (and
>> >  possibly by fiddling with memory/registers).
>>
>> Ok, but a poweroff in the guest isn't recoverable with system_reset
>> right? Or does it depend on the guest?
> 
> Right, it's not recoverable if you shut the power down where the tractor
> beam is coupled to the main reactor.

system_reset will bring all emulated devices back into their power-on
state - unless we have remaining bugs to fix. Actually, one may consider
issuing this reset automatically on vm_start after "permant" vm_stop.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 524447] Re: virsh save is very slow

2011-07-28 Thread Michael Tokarev
I just wanted to point out that we've this patch in Debian since ages,
and it's been included in upstream for a long time too.  Added a debbug
reference for this as well.

** Bug watch added: Debian Bug tracker #597517
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597517

** Also affects: qemu-kvm (Debian) via
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=597517
   Importance: Unknown
   Status: Unknown

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/524447

Title:
  virsh save is very slow

Status in libvirt virtualization API:
  Unknown
Status in QEMU:
  Fix Released
Status in “libvirt” package in Ubuntu:
  Invalid
Status in “qemu-kvm” package in Ubuntu:
  Fix Released
Status in “libvirt” source package in Lucid:
  Won't Fix
Status in “qemu-kvm” source package in Lucid:
  Won't Fix
Status in “libvirt” source package in Maverick:
  Won't Fix
Status in “qemu-kvm” source package in Maverick:
  Won't Fix
Status in “qemu-kvm” package in Debian:
  Unknown

Bug description:
  ==
  SRU Justification:
  1. impact: 'qemu save' is slow
  2. how addressed: a patch upstream fixes the case when a file does not 
announce when it is ready.
  3. patch: see the patch in linked bzr trees
  4. TEST CASE: see comment #4 for a specific recipe
  5. regression potential:  this patch only touches the vm save path.
  ==

  As reported here: http://www.redhat.com/archives/libvir-
  list/2009-December/msg00203.html

  "virsh save" is very slow - it writes the image at around 1MB/sec on
  my test system.

  (I think I saw a bug report for this issue on Fedora's bugzilla, but I
  can't find it now...)

  Confirmed under Karmic.

To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt/+bug/524447/+subscriptions



Re: [Qemu-devel] [RFC][PATCH 0/21] QEMU Object Model

2011-07-28 Thread Paolo Bonzini

On 07/28/2011 04:03 PM, Anthony Liguori wrote:

No doubt about that. :) I'd put a lot more hope into Goldfish though.


What's unclear to me about the Goldfish enumerator is whether it should
be filled out through interaction with hardware devices or via some
other mechanism.

In many ways, it's similar to ACPI and a Device Tree.  In both of those
cases, firmware actually is responsible for constructing those tables.


Yes, it is a flat device tree.

Since it supports hotplug (at least in theory, the Android emulator 
predates qdev so it doesn't support it), I would say it is more similar 
to PCI configuration space.  The difference is that IRQ numbers and MMIO 
base addresses are handed out by hardware (by a piece of the SoC) rather 
than by the firmware.


So yes, the hardware would have some kind of bus to talk to the devices 
and arbitrate hotplug/hotunplug.  The only peculiarity being that the 
bus enumerator hardcodes itself in the list it exposes, in addition to 
the devices on the bus.


But that still means that the devices have two views:

1) the enumerator's view is either "this is my name, my MMIO base, my 
IRQ base" or "I need 4k of MMIO and 1 IRQ line, please tell me 
where/which are those", depending on the device;


2) the PIC's view is "please bring this IRQ line up/down" (the device 
says which line, since the enumerator can assign those dynamically).


The PIC's view is more complicated than a Pin, and more similar to ISA.

Paolo



Re: [Qemu-devel] [PATCH] fix disabling interrupts in sun4u

2011-07-28 Thread Artyom Tarasenko
On Thu, Jul 28, 2011 at 3:40 PM,   wrote:
> At Thu, 28 Jul 2011 14:50:57 +0200,
> Artyom Tarasenko wrote:
>> On Thu, Jul 28, 2011 at 2:03 PM,   wrote:
>> > At Thu, 28 Jul 2011 13:51:08 +0200,
>> > Artyom Tarasenko wrote:
>> >> On Thu, Jul 28, 2011 at 12:31 PM,   wrote:
>> >> > Hi,
>> >> >
>> >> > At Mon, 25 Jul 2011 19:22:38 +0200,
>> >> > Artyom Tarasenko wrote:
>> >> >
>> >> >> clear interrupt request if the interrupt priority < CPU pil
>> >> >> clear hardware interrupt request if interrupts are disabled
>> >> >
>> >> > Not directly related to the fix, but I'd like to note a problem
>> >> > of hw/sun4u.c interrupt code:
>> >> >
>> >> > The interrupt code probably mixes hardware interrupts and
>> >> > software interrupts.
>> >> > %pil is for software interrupts (interrupt_level_n traps).
>> >> > %pil can not mask hardware interrupts (interrupt_vector traps);
>> >> > the CPU raises interrupt_vector traps even on %pil=15.
>> >> > But in cpu_check_irqs() and cpu_set_irq(), hardware interrupts
>> >> > seem to be masked by %pil.
>> >>
>> >> The interrupt_vector traps are currently not implemented, are they?
>> >> So it's hard to tell whether they are masked.
>> >
>> > Yes, interrupt_vector is not implemented yet.
>> > I failed to explain the problem.
>> > The problem is that cpu_set_irqs() should raise interrupt_vector
>> > traps but it raises interrupt_level_n traps actually.
>> > sun4uv_init() calls qemu_allocate_irqs() with cpu_set_irq as
>> > the 1st argument.  The allocated irqs (the irq variable) are
>> > passed to pci_apb_init().  APB should generate interrupt_vector
>> > traps (hardware interrupts), not the interrupt_vector_n traps.
>>
>> Yes, this is true. But it's more complicated than this: cpu_check_irqs
>> also checks tick/stick/hstick interrupts. They should produce the
>> interrupt_level_n traps as they currently do.
>
> That's right.
> tick/stick/hstick must raise interrupt_level_n traps.
>
>> The patch merely fixes the problem of hanging on a interrupt_vector_n
>> trap if the trap handler uses pil for interrupt masking. The problem
>> exists independently from interrupt_vector trap generation (as you
>> pointed out).
>
> I understand what is the problem that your patch is going to fix.
> Thanks for the explanation.
>

Sorry, I haven't implied that you don't. Meant to say "As you pointed
out, the problem exists independently...".

>> Do you have objections to this patch in its current form?
>
> No, I don't have any objections.
>
>> > The interrupts from APB would be reported by cpu_set_irq(),
>> > but cpu_set_irq() seems to generate interrupt_vector_n traps.
>>
>> For me it's not obvious. The interrupt vector not just one line, but
>> the vector, which is written in the corresponding CPU register (also
>> missing in the current qemu implementation). On the real hardware the
>> vector is created by the IOMMU (PBM/APB/...). If qemu intends to
>> support multiple chipsets, we should keep it the way it's done on the
>> real hardware (for instance the interrupt vectors for on-board devices
>> on Ultra-1 and E6500 are not the same).
>
> Sorry, I can't keep up with this vector thing...
> Does the CPU receive hardware interrupts as interrupt_vector traps
> (trap type=0x60) regardless of the kind of the interrupt controller,
> doesn't it?

It does indeed, but it also stores the interrupt vector identifying
the initiator device, in a CPU register readable with asi 0x7f .
What would APB pass to the cpu_set_irq? I see the three following variants:

a) it passes the PCI interrupt id, which is translated to the
interrupt vector in cpu_set_irq()
b) it passes the vector. This implies that 2048 (0-0x7ff) CPU
interrupts have to be allocated.
c) hack combining a+b: allocate only the interrupts known to be used
and translate an internal interrupt id to a vector.

The variant "a" is bad because it doesn't allow support for different
chipsets. The variant "b" is bad because qemu has to allocate way too
many interrupts. Only few of them will be used actually. The variant
"c" is bad, well, because it's a hack.

That's why I suggest using another interface between APB and CPU.

Have I overlooked something?

>> I'd suggest APB shall use some other interface for communicating
>> interrupts to the CPU. Something like
>> cpu_receive_ivec(interrupt_vector).
>>
>> >> >> Signed-off-by: Artyom Tarasenko 
>> >> >> ---
>> >> >>  hw/sun4u.c |    6 --
>> >> >>  1 files changed, 4 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/hw/sun4u.c b/hw/sun4u.c
>> >> >> index d7dcaf0..7f95aeb 100644
>> >> >> --- a/hw/sun4u.c
>> >> >> +++ b/hw/sun4u.c
>> >> >> @@ -255,7 +255,7 @@ void cpu_check_irqs(CPUState *env)
>> >> >>          pil |= 1 << 14;
>> >> >>      }
>> >> >>
>> >> >> -    if (!pil) {
>> >> >> +    if (pil < (2 << env->psrpil)){
>> >> >>          if (env->interrupt_request & CPU_INTERRUPT_HARD) {
>> >> >>              CPUIRQ_DPRINTF("Reset CPU IRQ (current interrupt %x)\n",
>> >> >>                             en

[Qemu-devel] [PATCH] lm832x: Take DeviceState pointer in lm832x_key_event()

2011-07-28 Thread Peter Maydell
Since lm832x has been qdev'ified, its users will generally
have a DeviceState pointer rather than an i2c_slave pointer,
so adjust lm832x_key_event's prototype to suit.

This allows the n810 (its only user) to actually pass a correct
pointer to it rather than NULL. The effect is that we no longer
segfault when a key is pressed.

Signed-off-by: Peter Maydell 
---
NB: this patch depends on the OMAP GPIO v2 patchset.

 hw/i2c.h |2 +-
 hw/lm832x.c  |4 ++--
 hw/nseries.c |7 +++
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/i2c.h b/hw/i2c.h
index 5514402..9381d01 100644
--- a/hw/i2c.h
+++ b/hw/i2c.h
@@ -72,6 +72,6 @@ void wm8750_set_bclk_in(void *opaque, int new_hz);
 void tmp105_set(i2c_slave *i2c, int temp);
 
 /* lm832x.c */
-void lm832x_key_event(i2c_slave *i2c, int key, int state);
+void lm832x_key_event(DeviceState *dev, int key, int state);
 
 #endif
diff --git a/hw/lm832x.c b/hw/lm832x.c
index 590a4cc..992ce49 100644
--- a/hw/lm832x.c
+++ b/hw/lm832x.c
@@ -474,9 +474,9 @@ static int lm8323_init(i2c_slave *i2c)
 return 0;
 }
 
-void lm832x_key_event(struct i2c_slave *i2c, int key, int state)
+void lm832x_key_event(DeviceState *dev, int key, int state)
 {
-LM823KbdState *s = (LM823KbdState *) i2c;
+LM823KbdState *s = FROM_I2C_SLAVE(LM823KbdState, I2C_SLAVE_FROM_QDEV(dev));
 
 if ((s->status & INT_ERROR) && (s->error & ERR_FIFOOVR))
 return;
diff --git a/hw/nseries.c b/hw/nseries.c
index 32f2f53..45b52bb 100644
--- a/hw/nseries.c
+++ b/hw/nseries.c
@@ -45,7 +45,7 @@ struct n800_s {
 i2c_bus *i2c;
 
 int keymap[0x80];
-i2c_slave *kbd;
+DeviceState *kbd;
 
 TUSBState *usb;
 void *retu;
@@ -362,7 +362,6 @@ static int n810_keys[0x80] = {
 static void n810_kbd_setup(struct n800_s *s)
 {
 qemu_irq kbd_irq = qdev_get_gpio_in(s->cpu->gpio, N810_KEYBOARD_GPIO);
-DeviceState *dev;
 int i;
 
 for (i = 0; i < 0x80; i ++)
@@ -375,8 +374,8 @@ static void n810_kbd_setup(struct n800_s *s)
 
 /* Attach the LM8322 keyboard to the I2C bus,
  * should happen in n8x0_i2c_setup and s->kbd be initialised here.  */
-dev = i2c_create_slave(s->i2c, "lm8323", N810_LM8323_ADDR);
-qdev_connect_gpio_out(dev, 0, kbd_irq);
+s->kbd = i2c_create_slave(s->i2c, "lm8323", N810_LM8323_ADDR);
+qdev_connect_gpio_out(s->kbd, 0, kbd_irq);
 }
 
 /* LCD MIPI DBI-C controller (URAL) */
-- 
1.7.1




Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.

2011-07-28 Thread Marcelo Tosatti
On Thu, Jul 28, 2011 at 12:24:48PM +0800, Zhi Yong Wu wrote:
> On Wed, Jul 27, 2011 at 11:49 PM, Marcelo Tosatti  wrote:
> > On Wed, Jul 27, 2011 at 06:17:15PM +0800, Zhi Yong Wu wrote:
> >> >> +        wait_time = 1;
> >> >> +    }
> >> >> +
> >> >> +    wait_time = wait_time + (slice_time - elapsed_time);
> >> >> +    if (wait) {
> >> >> +        *wait = wait_time * BLOCK_IO_SLICE_TIME * 10 + 1;
> >> >> +    }
> >> >
> >> > The guest can keep submitting requests where "wait_time = 1" above,
> >> > and the timer will be rearmed continuously in the future.
> >
> > This is wrong.
> >
> >>  Can't you
> >> > simply arm the timer to the next slice start? _Some_ data must be
> >> > transfered by then, anyway (and nothing can be transfered earlier than
> >> > that).
> >
> > This is valid.
> >
> >> Sorry, i have got what you mean. Can you elaborate in more detail?
> >
> > Sorry, the bug i mentioned about timer being rearmed does not exist.
> >
> > But arming the timer for the last request as its done is 
> > confusing/unnecessary.
> >
> > The timer processes queued requests, but the timer is armed accordingly
> > to the last queued request in the slice. For example, if a request is
> > submitted 1ms before the slice ends, the timer is armed 100ms +
> > (slice_time - elapsed_time) in the future.
> 
> If the timer is simply amred to the next slice start, this timer will
> be a periodic timer, either the I/O rate can not be throttled under
> the limits, or the enqueued request can be delayed to handled, this
> will lower I/O rate seriously than the limits.

Yes, periodic but disarmed when there is no queueing. I don't understand
your point about low I/O rate.

> Maybe the slice time should be variable with current I/O rate. What do
> you think of it?

Not sure if its necessary. The slice should be small to avoid excessive
work on timer context, but the advantage of increasing the slice is
very small if any. BTW, 10ms seems a better default than 100ms.




  1   2   >