Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Stefan Hajnoczi
On Wed, Jan 19, 2011 at 1:12 AM, Jamie Lokier  wrote:
> Chunqiang Tang wrote:
>> Moreover, using a host file system not only adds overhead, but
>> also introduces data integrity issues. Specifically, if I/Os uses O_DSYNC,
>> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data
>> integrity in the event of a host crash. See
>> http://lwn.net/Articles/348739/ .
>
> You have the same issue with O_DIRECT when using a raw disk device
> too.  That is, O_DIRECT on a raw device does not guarantee integrity
> in the event of a host crash either, for mostly the same reasons.

QEMU has semantics that use O_DIRECT safely; there is no issue here.
When a drive is added with cache=none, QEMU not only uses O_DIRECT but
also advertises an enabled write cache to the guest.

The guest *must* flush the cache when it wants to ensure data is
stable.  In the event of a host crash, all, some, or none of the I/O
since the last flush may have made it to disk.  Each of these
possibilities is fair game since the guest may only depend on writes
being on disk if they completed and a successful flush was issued
afterwards.

Stefan



[Qemu-devel] [PATCH 0/1] add spicevmc chardev

2011-01-19 Thread Alon Levy
In additional to the description in the patch, see following email for uses:
http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg01208.html

In particular it would/could be used for:
 current vdagent of spice uses this for mouse, copy paste
 proposed smartcard device for qemu uses this with spice
 usb forwarding
 terminal forwarding
 qemu monitor forwarding

Alon Levy (1):
  spice: add chardev (v5)

 Makefile.objs |2 +-
 qemu-char.c   |4 +
 qemu-config.c |6 ++
 qemu-options.hx   |   16 -
 spice-qemu-char.c |  190 +
 trace-events  |6 ++
 ui/qemu-spice.h   |3 +
 7 files changed, 225 insertions(+), 2 deletions(-)
 create mode 100644 spice-qemu-char.c

-- 
1.7.3.4




[Qemu-devel] [PATCH 1/1] spice: add chardev (v5)

2011-01-19 Thread Alon Levy
Adding a chardev backend for spice, where spice determines what
to do with it based on the name attribute given during chardev creation.
For usage by spice vdagent in conjunction with a properly named
virtio-serial device, and future smartcard channel usage.

Example usage:
 qemu -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent \
 -device virtserialport,chardev=vdagent,name=com.redhat.spice.0

v4->v5:
 * add tracing events
 * fix missing comma
 * fix help string to show debug is optional

v3->v4:
 * updated commit message

v1->v3 changes: (v2 had a wrong commit message)
 * removed spice-qemu-char.h, folded into ui/qemu-spice.h
 * removed dead IOCTL code
 * removed comment
 * removed ifdef CONFIG_SPICE from qemu-config.c and qemu-options.hx help.
---
 Makefile.objs |2 +-
 qemu-char.c   |4 +
 qemu-config.c |6 ++
 qemu-options.hx   |   16 -
 spice-qemu-char.c |  190 +
 trace-events  |6 ++
 ui/qemu-spice.h   |3 +
 7 files changed, 225 insertions(+), 2 deletions(-)
 create mode 100644 spice-qemu-char.c

diff --git a/Makefile.objs b/Makefile.objs
index c3e52c5..b9e9ef6 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -105,7 +105,7 @@ common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 common-obj-$(CONFIG_WIN32) += version.o
 
-common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o 
ui/spice-display.o
+common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o 
ui/spice-display.o spice-qemu-char.o
 
 audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o
 audio-obj-$(CONFIG_SDL) += sdlaudio.o
diff --git a/qemu-char.c b/qemu-char.c
index edc9ad6..acc7130 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -97,6 +97,7 @@
 #endif
 
 #include "qemu_socket.h"
+#include "ui/qemu-spice.h"
 
 #define READ_BUF_LEN 4096
 
@@ -2495,6 +2496,9 @@ static const struct {
 || defined(__FreeBSD_kernel__)
 { .name = "parport",   .open = qemu_chr_open_pp },
 #endif
+#ifdef CONFIG_SPICE
+{ .name = "spicevmc", .open = qemu_chr_open_spice },
+#endif
 };
 
 CharDriverState *qemu_chr_open_opts(QemuOpts *opts,
diff --git a/qemu-config.c b/qemu-config.c
index 965fa46..323d3c2 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -146,6 +146,12 @@ static QemuOptsList qemu_chardev_opts = {
 },{
 .name = "signal",
 .type = QEMU_OPT_BOOL,
+},{
+.name = "name",
+.type = QEMU_OPT_STRING,
+},{
+.name = "debug",
+.type = QEMU_OPT_NUMBER,
 },
 { /* end of list */ }
 },
diff --git a/qemu-options.hx b/qemu-options.hx
index 898561d..939297a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1368,6 +1368,9 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
 #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__)
 "-chardev parport,id=id,path=path[,mux=on|off]\n"
 #endif
+#if defined(CONFIG_SPICE)
+"-chardev spicevmc,id=id,name=name[,debug=debug]\n"
+#endif
 , QEMU_ARCH_ALL
 )
 
@@ -1392,7 +1395,8 @@ Backend is one of:
 @option{stdio},
 @option{braille},
 @option{tty},
-@option{parport}.
+@option{parport},
+@option{spicevmc}.
 The specific backend will determine the applicable options.
 
 All devices must have an id, which can be any string up to 127 characters long.
@@ -1568,6 +1572,16 @@ Connect to a local parallel port.
 @option{path} specifies the path to the parallel port device. @option{path} is
 required.
 
+#if defined(CONFIG_SPICE)
+@item -chardev spicevmc ,id=@var{id} ,debug=@var{debug}, name=@var{name}
+
+@option{debug} debug level for spicevmc
+
+@option{name} name of spice channel to connect to
+
+Connect to a spice virtual machine channel, such as vdiport.
+#endif
+
 @end table
 ETEXI
 
diff --git a/spice-qemu-char.c b/spice-qemu-char.c
new file mode 100644
index 000..517f337
--- /dev/null
+++ b/spice-qemu-char.c
@@ -0,0 +1,190 @@
+#include "config-host.h"
+#include "trace.h"
+#include "ui/qemu-spice.h"
+#include 
+#include 
+
+#include "osdep.h"
+
+#define dprintf(_scd, _level, _fmt, ...)\
+do {\
+static unsigned __dprintf_counter = 0;  \
+if (_scd->debug >= _level) {\
+fprintf(stderr, "scd: %3d: " _fmt, ++__dprintf_counter, ## 
__VA_ARGS__);\
+}   \
+} while (0)
+
+#define VMC_MAX_HOST_WRITE2048
+
+typedef struct SpiceCharDriver {
+CharDriverState*  chr;
+SpiceCharDeviceInstance sin;
+char  *subtype;
+bool  active;
+uint8_t   *buffer;
+uint8_t   *datapos;
+ssize_t   bufsize, datalen;
+uint32_t  debug;
+} SpiceCharDriver;
+
+sta

Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura 
> ---
>  block.c |   11 +++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include 
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
> *bs, int64_t sector_num,
>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>  return NULL;
>  
> +if (bs->device_name && event_tap_is_on()) {

bs->device_name is a pointer to a char array contained in bs, so it's
never NULL. You probably mean *bs->device_name?

Kevin



Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap controls when to start FT transaction, and provides proxy
> functions to called from net/block devices.  While FT transaction, it
> queues up net/block requests, and flush them when the transaction gets
> completed.
> 
> Signed-off-by: Yoshiaki Tamura 
> Signed-off-by: OHMURA Kei 

One general comment: On the first glance this seems to mix block and net
(and some other things) arbitrarily instead of having a section for
handling all block stuff, then network, etc.

Is there a specific reason for the order in which you put the functions?
If not, maybe reordering them might improve readability.

> ---
>  Makefile.target |1 +
>  event-tap.c |  847 
> +++
>  event-tap.h |   42 +++
>  qemu-tool.c |   24 ++
>  trace-events|9 +
>  5 files changed, 923 insertions(+), 0 deletions(-)
>  create mode 100644 event-tap.c
>  create mode 100644 event-tap.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index e15b1c4..f36cd75 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -199,6 +199,7 @@ obj-y += rwhandler.o
>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>  LIBS+=-lz
> +obj-y += event-tap.o
>  
>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>  QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
> diff --git a/event-tap.c b/event-tap.c
> new file mode 100644
> index 000..f492708
> --- /dev/null
> +++ b/event-tap.c

> @@ -0,0 +1,847 @@
> +/*
> + * Event Tap functions for QEMU
> + *
> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu-common.h"
> +#include "qemu-error.h"
> +#include "block.h"
> +#include "block_int.h"
> +#include "ioport.h"
> +#include "osdep.h"
> +#include "sysemu.h"
> +#include "hw/hw.h"
> +#include "net.h"
> +#include "event-tap.h"
> +#include "trace.h"
> +
> +enum EVENT_TAP_STATE {
> +EVENT_TAP_OFF,
> +EVENT_TAP_ON,
> +EVENT_TAP_FLUSH,
> +EVENT_TAP_LOAD,
> +EVENT_TAP_REPLAY,
> +};
> +
> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */

Indeed, bdrv_aio_cancel will segfault this way.

If you use dummies instead of real ACBs the only way to correctly
implement bdrv_aio_cancel is waiting for all in-flight AIOs
(qemu_aio_flush).

> +typedef struct EventTapIOport {
> +uint32_t address;
> +uint32_t data;
> +int  index;
> +} EventTapIOport;
> +
> +#define MMIO_BUF_SIZE 8
> +
> +typedef struct EventTapMMIO {
> +uint64_t address;
> +uint8_t  buf[MMIO_BUF_SIZE];
> +int  len;
> +} EventTapMMIO;
> +
> +typedef struct EventTapNetReq {
> +char *device_name;
> +int iovcnt;
> +struct iovec *iov;
> +int vlan_id;
> +bool vlan_needed;
> +bool async;
> +NetPacketSent *sent_cb;
> +} EventTapNetReq;
> +
> +#define MAX_BLOCK_REQUEST 32
> +
> +typedef struct EventTapBlkReq {
> +char *device_name;
> +int num_reqs;
> +int num_cbs;
> +bool is_flush;
> +BlockRequest reqs[MAX_BLOCK_REQUEST];
> +BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST];
> +void *opaque[MAX_BLOCK_REQUEST];
> +} EventTapBlkReq;
> +
> +#define EVENT_TAP_IOPORT (1 << 0)
> +#define EVENT_TAP_MMIO   (1 << 1)
> +#define EVENT_TAP_NET(1 << 2)
> +#define EVENT_TAP_BLK(1 << 3)
> +
> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
> +
> +typedef struct EventTapLog {
> +int mode;
> +union {
> +EventTapIOport ioport;
> +EventTapMMIO mmio;
> +};
> +union {
> +EventTapNetReq net_req;
> +EventTapBlkReq blk_req;
> +};
> +QTAILQ_ENTRY(EventTapLog) node;
> +} EventTapLog;
> +
> +static EventTapLog *last_event_tap;
> +
> +static QTAILQ_HEAD(, EventTapLog) event_list;
> +static QTAILQ_HEAD(, EventTapLog) event_pool;
> +
> +static int (*event_tap_cb)(void);
> +static QEMUBH *event_tap_bh;
> +static VMChangeStateEntry *vmstate;
> +
> +static void event_tap_bh_cb(void *p)
> +{
> +if (event_tap_cb) {
> +event_tap_cb();
> +}
> +
> +qemu_bh_delete(event_tap_bh);
> +event_tap_bh = NULL;
> +}
> +
> +static void event_tap_schedule_bh(void)
> +{
> +trace_event_tap_ignore_bh(!!event_tap_bh);
> +
> +/* if bh is already set, we ignore it for now */
> +if (event_tap_bh) {
> +return;
> +}
> +
> +event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
> +qemu_bh_schedule(event_tap_bh);
> +
> +return ;
> +}
> +
> +static void event_tap_alloc_net_req(EventTapNetReq *net_req,
> +   VLANClientState *vc,
> +   const struct iovec *iov, int iovcnt,
> +   NetPacketSent *sent_cb, bool async)
> +{
> +int i;
> +
> +net_req->iovcnt = iovcnt;
> +net_req->async 

Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
> event-tap function is called only when it is on, and requests sent
> from device emulators.
> 
> Signed-off-by: Yoshiaki Tamura 
> ---
>  block.c |   11 +++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index ff2795b..85bd8b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "qemu-objects.h"
> +#include "event-tap.h"
>  
>  #ifdef CONFIG_BSD
>  #include 
> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
> *bs, int64_t sector_num,
>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>  return NULL;
>  
> +if (bs->device_name && event_tap_is_on()) {
> +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
> + cb, opaque);
> +}
> +
>  if (bs->dirty_bitmap) {
>  blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>   opaque);

Just noticed the context here... Does this patch break block migration
when event-tap is on?

Another question that came to my mind is if we really hook everything we
need. I think we'll need to have a hook in bdrv_flush as well. I don't
know if you do hook qemu_aio_flush and friends -  does a call cause
event-tap to flush its queue? If not, a call to qemu_aio_flush might
hang qemu because it's waiting for requests to complete which are
actually stuck in the event-tap queue.

Kevin



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Gerd Hoffmann

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't work 
with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=.  This is true for *every* device, including all pci devices. 
 If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.  Once q35 finally arrives 
this will change of course.


cheers,
  Gerd



Re: [Qemu-devel] [PATCH 2/2] strtosz(): Use suffix macros in switch() statement

2011-01-19 Thread Jes Sorensen
On 01/18/11 21:39, Anthony Liguori wrote:
> On 01/18/2011 02:36 PM, Jes Sorensen wrote:
>> On 01/18/11 21:30, Anthony Liguori wrote:  
>>> On 01/18/2011 10:53 AM, Eric Blake wrote:   
 And it does, via the toupper() added earlier in the series (and which
 has separately been pointed out that using qemu_toupper() might be
 nicer).

>>> Ok.  Just taking the different case labels would be nicer IMHO.
>>>  
>> The old code did that, but I was suggested to do it this way, which I
>> think is cleaner too. Fewer lines of code are easier to read.
>>
> toupper() is based on locale so it's not consistent.

If you can show me an actually used locale where the toupper() on
k/m/g/t doesn't result in K/M/G/T then I guess there's a case. Otherwise
I don't really see this being a real point.

I think we are hitting the point where it's about who's taste is better,
and not about the actual code in this discussion.

One point in favor of the patch is this:

Without the patch:
jes@red-feather qemu]$ size cutils.o
   textdata bss dec hex filename
   4212   0   042121074 cutils.o
With patch:
[jes@red-feather qemu]$ size cutils.o
   textdata bss dec hex filename
   4196   0   041961064 cutils.o

IMHO it makes the code easier to read, but beyond that there isn't much.
If people are strongly against it, I'll just drop the patch. It's not
worth our time arguing over this level of detail. Otherwise I'd like to
see it applied.

Cheers,
Jes



Re: [Qemu-devel] [V3 PATCH 5/8] virtio-9p: Create support in chroot environment

2011-01-19 Thread M. Mohan Kumar
Hi Blue Swirl,

Thanks for your review comments. I will address these in my next version of 
patchset.


M. Mohan Kumar

On Tuesday 18 January 2011 10:38:21 pm Blue Swirl wrote:
> On Tue, Jan 18, 2011 at 6:25 AM, M. Mohan Kumar  wrote:
> > Add both server & client side interfaces to create regular files in
> > chroot environment
> > 
> > Signed-off-by: M. Mohan Kumar 
> > ---
> >  hw/9pfs/virtio-9p-chroot.c |   42
> > ++ hw/9pfs/virtio-9p-local.c  |
> >   22 --
> >  2 files changed, 62 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/9pfs/virtio-9p-chroot.c b/hw/9pfs/virtio-9p-chroot.c
> > index b599e23..e7f85e2 100644
> > --- a/hw/9pfs/virtio-9p-chroot.c
> > +++ b/hw/9pfs/virtio-9p-chroot.c
> > @@ -193,6 +193,42 @@ static void chroot_do_open(V9fsFileObjectRequest
> > *request, FdInfo *fd_info) }
> >  }
> > 
> > +/*
> > + * Helper routine to create a file and return the file descriptor and
> > + * error status in FdInfo structure.
> > + */
> > +static void chroot_do_create(V9fsFileObjectRequest *request, FdInfo
> > *fd_info) +{
> > +int cur_uid, cur_gid;
> 
> uid_t cur_uid;
> gid_t cur_gid;
> 
> > +
> > +cur_uid = geteuid();
> > +cur_gid = getegid();
> > +
> > +fd_info->fi_fd = -1;
> > +
> > +if (setfsuid(request->data.uid) < 0) {
> > +fd_info->fi_error = errno;
> > +return;
> > +}
> > +if (setfsgid(request->data.gid) < 0) {
> > +fd_info->fi_error = errno;
> > +goto unset_uid;
> > +}
> > +
> > +fd_info->fi_fd = open(request->path.path, request->data.flags,
> > +request->data.mode);
> > +
> > +if (fd_info->fi_fd < 0) {
> > +fd_info->fi_error = errno;
> > +} else {
> > +fd_info->fi_error = 0;
> > +}
> > +
> > +setfsgid(cur_gid);
> > +unset_uid:
> > +setfsuid(cur_uid);
> > +}
> > +
> >  static int chroot_daemonize(int chroot_sock)
> >  {
> > sigset_t sigset;
> > @@ -276,6 +312,12 @@ int v9fs_chroot(FsContext *fs_ctx)
> > error = -2;
> > }
> > break;
> > +case T_CREATE:
> > +chroot_do_create(&request, &fd_info);
> > +if (chroot_sendfd(chroot_sock, &fd_info) <= 0) {
> > +error = -2;
> > +}
> > +break;
> > default:
> > break;
> > }
> > diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c
> > index 2376ec2..7f39b40 100644
> > --- a/hw/9pfs/virtio-9p-local.c
> > +++ b/hw/9pfs/virtio-9p-local.c
> > @@ -52,6 +52,23 @@ static int __open(FsContext *fs_ctx, const char *path,
> > int flags) return fd;
> >  }
> > 
> > +static int __create(FsContext *fs_ctx, const char *path, int flags,
> 
> Please don't use identifiers starting with underscores.




Re: [Qemu-devel] [sparc] Floating point exception issue

2011-01-19 Thread Mateusz Loskot

On 18/01/11 21:51, Blue Swirl wrote:

On Tue, Jan 18, 2011 at 6:00 PM, Mateusz Loskot  wrote:

On 18/01/11 17:36, Blue Swirl wrote:


On Tue, Jan 18, 2011 at 3:27 PM, Mateusz Loskot
  wrote:


Hi,

Recently, I have reported mysterious issues on NetBSD 5.1
emulated on SPARC. The whole first thread is here:

http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01509.html

I decided to investigate the problem deeper and with great help
from NetBSD folks, I managed to find reproducible test case.
Initially, it was AWK command:

# echo NaN | awk '{print "test"}'
awk: floating point exception 8
  source line number 1

and next it boiled down to simple C program (see below).
Details of the investigation are archived in the NetBSD Problem
Report #44389 here:

http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44389


Here is final version of the test program which reproduces the problem:

#include
#include
#include
#include

int is_number(const char *s)
{
double r;
char *ep;
errno = 0;
r = strtod(s,&ep);
if (r == HUGE_VAL)
printf("X:%g\n", r);

if (ep == s || r == HUGE_VAL || errno == ERANGE)
return 0;
while (*ep == ' ' || *ep == '\t' || *ep == '\n')
ep++;
if (*ep == '\0')
return 1;
else
return 0;
}

int main(int argc, char **argv)
{
double v;

if (is_number("NaN")) {
printf("is a number\n");
v = atof("NaN");
} else {
printf("not a number\n");
v = 0.0;
}
printf("%.4f\n", v);

return 0;
}


On NetBSD/SPARC, the program receives SIGFPE:

$ gcc ./nan_test_2.c
  $ ./a.out
  [1]   Floating point exception (core dumped) ./a.out

Specifically, it's caused by r == HUGE_VAL condition in
  if (ep == s || r == HUGE_VAL || errno == ERANGE)
where r is NaN.

All the signs indicate there is a bug in QEMU.


I'll install 5.1, but on 4.0 which I had installed, the program works
fine:
$ ./sigfpe
is a number
nan


I just tested on NetBSD 5.0/SPARC under QEMU 0.13 (same version I use with
NetBSD 5.1/SPARC) and it works well indeed:

mloskot@qemu-netbsd-50-sparc:~/tmp# ./a.out
is a number
nan
mloskot@qemu-netbsd-50-sparc:~/tmp#

Hmm, this is becoming interesting.

I run QEMU 0.13 on Windows Vista (64-bit).
Perhaps host system and QEMU binaries are relevant here.
I will try on Linux host system later tonight.

BTW, here are my images:

http://mateusz.loskot.net/tmp/qemu/


The problem was with NaN handling in fcmped instruction. I've
committed a patch that fixes the problem, please test.


I'm having problems with building qemu under MinGW, so I will need to 
wait for someone to update binaries.


Stefan, are you planning to update your binaries?
I would be grateful.


Thanks for reporting and the test case.


Thanks for fixing!

Best regards,
--
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org



Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> ---
>>  block.c |   11 +++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include 
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>> *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>
> bs->device_name is a pointer to a char array contained in bs, so it's
> never NULL. You probably mean *bs->device_name?

Yes, thanks for pointing out.  It didn't expose because
event_tap_is_on() was false upon flushing after synchronization.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



[Qemu-devel] [PATCH 0/7] virtio-serial: Don't copy guest buf to host, flow control

2011-01-19 Thread Amit Shah
Hello,

This series is now separated from the chardev flow control series.
The virtio-serial code now does not copy over data from the guest to
the host.  It instead keeps track of how far we are in consuming the
data and maintains this state.

For flow control, when a user of the virtio-serial port signals it has
consumed less data than given, port throttling is enabled.  The
consumer can then later disable throttling and we can re-start sending
the data from where we left off.

Finally, new fields introduced are added to the save/restore section
to preserve state across live migrations.

Please apply.

Amit Shah (7):
  virtio-console: Factor out common init between console and generic
ports
  virtio-console: Remove unnecessary braces
  virtio-serial-bus: separate out discard logic in a separate function
  virtio-serial: Don't copy over guest buffer to host
  virtio-serial: Let virtio-serial-bus know if all data was consumed
  virtio-serial: Add support for flow control
  virtio-serial: save/restore new fields in port struct

 hw/virtio-console.c|   38 +++
 hw/virtio-serial-bus.c |  123 +++-
 hw/virtio-serial.h |   24 -
 3 files changed, 139 insertions(+), 46 deletions(-)

-- 
1.7.3.4




[Qemu-devel] [PATCH 2/7] virtio-console: Remove unnecessary braces

2011-01-19 Thread Amit Shah
Remove unnecessary braces around a case statement.

Signed-off-by: Amit Shah 
---
 hw/virtio-console.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index d7fe68b..d0b9354 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -48,10 +48,9 @@ static void chr_event(void *opaque, int event)
 VirtConsole *vcon = opaque;
 
 switch (event) {
-case CHR_EVENT_OPENED: {
+case CHR_EVENT_OPENED:
 virtio_serial_open(&vcon->port);
 break;
-}
 case CHR_EVENT_CLOSED:
 virtio_serial_close(&vcon->port);
 break;
-- 
1.7.3.4




[Qemu-devel] [PATCH 3/7] virtio-serial-bus: separate out discard logic in a separate function

2011-01-19 Thread Amit Shah
Instead of combining flush logic into the discard case and not discard
case, have one function doing discard case.  This will help later when
adding flow control logic to the do_flush_queued_data() function.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   47 ++-
 1 files changed, 30 insertions(+), 17 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 74ba5ec..e8c2a16 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -113,39 +113,48 @@ static size_t write_to_port(VirtIOSerialPort *port,
 return offset;
 }
 
+static void discard_vq_data(VirtQueue *vq, VirtIODevice *vdev)
+{
+VirtQueueElement elem;
+
+while (virtqueue_pop(vq, &elem)) {
+virtqueue_push(vq, &elem, 0);
+}
+virtio_notify(vdev, vq);
+}
+
 static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq,
- VirtIODevice *vdev, bool discard)
+ VirtIODevice *vdev)
 {
 VirtQueueElement elem;
 
-assert(port || discard);
+assert(port);
 assert(virtio_queue_ready(vq));
 
-while ((discard || !port->throttled) && virtqueue_pop(vq, &elem)) {
+while (!port->throttled && virtqueue_pop(vq, &elem)) {
 uint8_t *buf;
 size_t ret, buf_size;
 
-if (!discard) {
-buf_size = iov_size(elem.out_sg, elem.out_num);
-buf = qemu_malloc(buf_size);
-ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+buf_size = iov_size(elem.out_sg, elem.out_num);
+buf = qemu_malloc(buf_size);
+ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+
+port->info->have_data(port, buf, ret);
+qemu_free(buf);
 
-port->info->have_data(port, buf, ret);
-qemu_free(buf);
-}
 virtqueue_push(vq, &elem, 0);
 }
 virtio_notify(vdev, vq);
 }
 
-static void flush_queued_data(VirtIOSerialPort *port, bool discard)
+static void flush_queued_data(VirtIOSerialPort *port)
 {
 assert(port);
 
 if (!virtio_queue_ready(port->ovq)) {
 return;
 }
-do_flush_queued_data(port, port->ovq, &port->vser->vdev, discard);
+do_flush_queued_data(port, port->ovq, &port->vser->vdev);
 }
 
 static size_t send_control_msg(VirtIOSerialPort *port, void *buf, size_t len)
@@ -204,7 +213,7 @@ int virtio_serial_close(VirtIOSerialPort *port)
  * consume, reset the throttling flag and discard the data.
  */
 port->throttled = false;
-flush_queued_data(port, true);
+discard_vq_data(port->ovq, &port->vser->vdev);
 
 send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 0);
 
@@ -258,7 +267,7 @@ void virtio_serial_throttle_port(VirtIOSerialPort *port, 
bool throttle)
 return;
 }
 
-flush_queued_data(port, false);
+flush_queued_data(port);
 }
 
 /* Guest wants to notify us of some event */
@@ -414,11 +423,15 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 discard = true;
 }
 
-if (!discard && port->throttled) {
+if (discard) {
+discard_vq_data(vq, vdev);
+return;
+}
+if (port->throttled) {
 return;
 }
 
-do_flush_queued_data(port, vq, vdev, discard);
+do_flush_queued_data(port, vq, vdev);
 }
 
 static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
@@ -634,7 +647,7 @@ static void remove_port(VirtIOSerial *vser, uint32_t 
port_id)
 
 port = find_port_by_id(vser, port_id);
 /* Flush out any unconsumed buffers first */
-flush_queued_data(port, true);
+discard_vq_data(port->ovq, &port->vser->vdev);
 
 send_control_event(port, VIRTIO_CONSOLE_PORT_REMOVE, 1);
 }
-- 
1.7.3.4




[Qemu-devel] [PATCH 1/7] virtio-console: Factor out common init between console and generic ports

2011-01-19 Thread Amit Shah
The initialisation for generic ports and console ports is similar.
Factor out the parts that are the same in a different function that can
be called from each of the initfns.

Signed-off-by: Amit Shah 
---
 hw/virtio-console.c |   31 ++-
 1 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index caea11f..d7fe68b 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -58,24 +58,28 @@ static void chr_event(void *opaque, int event)
 }
 }
 
-/* Virtio Console Ports */
-static int virtconsole_initfn(VirtIOSerialDevice *dev)
+static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
 {
-VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
-VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
-
-port->info = dev->info;
-
-port->is_console = true;
+vcon->port.info = dev->info;
 
 if (vcon->chr) {
 qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
   vcon);
-port->info->have_data = flush_buf;
+vcon->port.info->have_data = flush_buf;
 }
 return 0;
 }
 
+/* Virtio Console Ports */
+static int virtconsole_initfn(VirtIOSerialDevice *dev)
+{
+VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
+VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+
+port->is_console = true;
+return generic_port_init(vcon, dev);
+}
+
 static int virtconsole_exitfn(VirtIOSerialDevice *dev)
 {
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
@@ -115,14 +119,7 @@ static int virtserialport_initfn(VirtIOSerialDevice *dev)
 VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 
-port->info = dev->info;
-
-if (vcon->chr) {
-qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
-  vcon);
-port->info->have_data = flush_buf;
-}
-return 0;
+return generic_port_init(vcon, dev);
 }
 
 static VirtIOSerialPortInfo virtserialport_info = {
-- 
1.7.3.4




[Qemu-devel] [PATCH 4/7] virtio-serial: Don't copy over guest buffer to host

2011-01-19 Thread Amit Shah
When the guest writes something to a host, we copied over the entire
buffer first into the host and then processed it.  Do away with that, it
could result in a malicious guest causing a DoS on the host.

Reported-by: Paul Brook 
Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index e8c2a16..6726f72 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -132,16 +132,17 @@ static void do_flush_queued_data(VirtIOSerialPort *port, 
VirtQueue *vq,
 assert(virtio_queue_ready(vq));
 
 while (!port->throttled && virtqueue_pop(vq, &elem)) {
-uint8_t *buf;
-size_t ret, buf_size;
+unsigned int i;
 
-buf_size = iov_size(elem.out_sg, elem.out_num);
-buf = qemu_malloc(buf_size);
-ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+for (i = 0; i < elem.out_num; i++) {
+size_t buf_size;
 
-port->info->have_data(port, buf, ret);
-qemu_free(buf);
+buf_size = elem.out_sg[i].iov_len;
 
+port->info->have_data(port,
+  elem.out_sg[i].iov_base,
+  buf_size);
+}
 virtqueue_push(vq, &elem, 0);
 }
 virtio_notify(vdev, vq);
-- 
1.7.3.4




[Qemu-devel] [PATCH 5/7] virtio-serial: Let virtio-serial-bus know if all data was consumed

2011-01-19 Thread Amit Shah
The have_data() API to hand off guest data to apps using virtio-serial
so far assumed all the data was consumed.  Relax this assumption.
Future commits will allow for incomplete writes.

Signed-off-by: Amit Shah 
---
 hw/virtio-console.c |4 ++--
 hw/virtio-serial.h  |7 ---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index d0b9354..62624ec 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -20,11 +20,11 @@ typedef struct VirtConsole {
 
 
 /* Callback function that's called when the guest sends us data */
-static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len)
+static ssize_t flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t 
len)
 {
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 
-qemu_chr_write(vcon->chr, buf, len);
+return qemu_chr_write(vcon->chr, buf, len);
 }
 
 /* Readiness of the guest to accept data on a port */
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index ff08c40..9cc0fb3 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -137,10 +137,11 @@ struct VirtIOSerialPortInfo {
 
 /*
  * Guest wrote some data to the port. This data is handed over to
- * the app via this callback.  The app is supposed to consume all
- * the data that is presented to it.
+ * the app via this callback.  The app can return a size less than
+ * 'len'.  In this case, throttling will be enabled for this port.
  */
-void (*have_data)(VirtIOSerialPort *port, const uint8_t *buf, size_t len);
+ssize_t (*have_data)(VirtIOSerialPort *port, const uint8_t *buf,
+ size_t len);
 };
 
 /* Interface to the virtio-serial bus */
-- 
1.7.3.4




[Qemu-devel] [PATCH 6/7] virtio-serial: Add support for flow control

2011-01-19 Thread Amit Shah
This commit lets apps signal an incomplete write.  When that happens,
stop sending out any more data to the app and wait for it to unthrottle
the port.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   49 +--
 hw/virtio-serial.h |   17 
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 6726f72..32938b5 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -126,24 +126,49 @@ static void discard_vq_data(VirtQueue *vq, VirtIODevice 
*vdev)
 static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq,
  VirtIODevice *vdev)
 {
-VirtQueueElement elem;
-
 assert(port);
 assert(virtio_queue_ready(vq));
 
-while (!port->throttled && virtqueue_pop(vq, &elem)) {
+while (!port->throttled) {
 unsigned int i;
 
-for (i = 0; i < elem.out_num; i++) {
-size_t buf_size;
-
-buf_size = elem.out_sg[i].iov_len;
+/* Pop an elem only if we haven't left off a previous one mid-way */
+if (!port->elem.out_num) {
+if (!virtqueue_pop(vq, &port->elem)) {
+break;
+}
+port->iov_idx = 0;
+port->iov_offset = 0;
+}
 
-port->info->have_data(port,
-  elem.out_sg[i].iov_base,
-  buf_size);
+for (i = port->iov_idx; i < port->elem.out_num; i++) {
+size_t buf_size;
+ssize_t ret;
+
+buf_size = port->elem.out_sg[i].iov_len - port->iov_offset;
+ret = port->info->have_data(port,
+port->elem.out_sg[i].iov_base
+  + port->iov_offset,
+buf_size);
+if (ret < 0 && ret != -EAGAIN) {
+/* We don't handle any other type of errors here */
+abort();
+}
+if (ret == -EAGAIN || (ret >= 0 && ret < buf_size)) {
+virtio_serial_throttle_port(port, true);
+port->iov_idx = i;
+if (ret > 0) {
+port->iov_offset += ret;
+}
+break;
+}
+port->iov_offset = 0;
 }
-virtqueue_push(vq, &elem, 0);
+if (port->throttled) {
+break;
+}
+virtqueue_push(vq, &port->elem, 0);
+port->elem.out_num = 0;
 }
 virtio_notify(vdev, vq);
 }
@@ -709,6 +734,8 @@ static int virtser_port_qdev_init(DeviceState *qdev, 
DeviceInfo *base)
 port->guest_connected = true;
 }
 
+port->elem.out_num = 0;
+
 QTAILQ_INSERT_TAIL(&port->vser->ports, port, next);
 port->ivq = port->vser->ivqs[port->id];
 port->ovq = port->vser->ovqs[port->id];
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index 9cc0fb3..a308196 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -102,6 +102,23 @@ struct VirtIOSerialPort {
  */
 uint32_t id;
 
+/*
+ * This is the elem that we pop from the virtqueue.  A slow
+ * backend that consumes guest data (e.g. the file backend for
+ * qemu chardevs) can cause the guest to block till all the output
+ * is flushed.  This isn't desired, so we keep a note of the last
+ * element popped and continue consuming it once the backend
+ * becomes writable again.
+ */
+VirtQueueElement elem;
+
+/*
+ * The index and the offset into the iov buffer that was popped in
+ * elem above.
+ */
+uint32_t iov_idx;
+uint64_t iov_offset;
+
 /* Identify if this is a port that binds with hvc in the guest */
 uint8_t is_console;
 
-- 
1.7.3.4




[Qemu-devel] [PATCH 7/7] virtio-serial: save/restore new fields in port struct

2011-01-19 Thread Amit Shah
The new fields that got added as part of not copying over the guest
buffer to the host need to be saved/restored across migration.  Do that
and bump up the version number.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   42 --
 1 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 32938b5..2ca97a9 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -527,9 +527,24 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
  * Items in struct VirtIOSerialPort.
  */
 QTAILQ_FOREACH(port, &s->ports, next) {
+uint32_t elem_popped;
+
 qemu_put_be32s(f, &port->id);
 qemu_put_byte(f, port->guest_connected);
 qemu_put_byte(f, port->host_connected);
+
+   elem_popped = 0;
+if (port->elem.out_num) {
+elem_popped = 1;
+}
+qemu_put_be32s(f, &elem_popped);
+if (elem_popped) {
+qemu_put_be32s(f, &port->iov_idx);
+qemu_put_be64s(f, &port->iov_offset);
+
+qemu_put_buffer(f, (unsigned char *)&port->elem,
+sizeof(port->elem));
+}
 }
 }
 
@@ -540,7 +555,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 uint32_t max_nr_ports, nr_active_ports, ports_map;
 unsigned int i;
 
-if (version_id > 2) {
+if (version_id > 3) {
 return -EINVAL;
 }
 
@@ -593,6 +608,29 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN,
port->host_connected);
 }
+
+if (version_id > 2) {
+uint32_t elem_popped;
+
+qemu_get_be32s(f, &elem_popped);
+if (elem_popped) {
+qemu_get_be32s(f, &port->iov_idx);
+qemu_get_be64s(f, &port->iov_offset);
+
+qemu_get_buffer(f, (unsigned char *)&port->elem,
+sizeof(port->elem));
+virtqueue_map_sg(port->elem.in_sg, port->elem.in_addr,
+ port->elem.in_num, 1);
+virtqueue_map_sg(port->elem.out_sg, port->elem.out_addr,
+ port->elem.out_num, 1);
+
+/*
+ *  Port was throttled on source machine.  Let's
+ *  unthrottle it here so data starts flowing again.
+ */
+virtio_serial_throttle_port(port, false);
+}
+}
 }
 return 0;
 }
@@ -841,7 +879,7 @@ VirtIODevice *virtio_serial_init(DeviceState *dev, uint32_t 
max_nr_ports)
  * Register for the savevm section with the virtio-console name
  * to preserve backward compat
  */
-register_savevm(dev, "virtio-console", -1, 2, virtio_serial_save,
+register_savevm(dev, "virtio-console", -1, 3, virtio_serial_save,
 virtio_serial_load, vser);
 
 return vdev;
-- 
1.7.3.4




[Qemu-devel] Missing roms/seabios/Makefile and roms/vgabios/Makefile

2011-01-19 Thread Mateusz Loskot

Hi,

Running ./configure (under MinGW/MSYS) and symlink gives up trying to
create links to non-existing Makefiles in
roms/seabios/Makefile
roms/vgabios/Makefile

Quick fix is to remove those Makefiles from the FILES list:

diff --git a/configure b/configure
index d68f862..92e2527 100755
--- a/configure
+++ b/configure
@@ -3235,7 +3235,6 @@ DIRS="$DIRS fsdev ui"
 FILES="Makefile tests/Makefile"
 FILES="$FILES tests/cris/Makefile tests/cris/.gdbinit"
 FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps"
-FILES="$FILES roms/seabios/Makefile roms/vgabios/Makefile"
 for bios_file in $source_path/pc-bios/*.bin $source_path/pc-bios/*.dtb 
$source_path/pc-bios/openbios-*; do

 FILES="$FILES pc-bios/`basename $bios_file`"
 done

Regards,
--
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org



Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap controls when to start FT transaction, and provides proxy
>> functions to called from net/block devices.  While FT transaction, it
>> queues up net/block requests, and flush them when the transaction gets
>> completed.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> Signed-off-by: OHMURA Kei 
>
> One general comment: On the first glance this seems to mix block and net
> (and some other things) arbitrarily instead of having a section for
> handling all block stuff, then network, etc.
>
> Is there a specific reason for the order in which you put the functions?
> If not, maybe reordering them might improve readability.

Thanks.  I'll rework on that.

>
>> ---
>>  Makefile.target |    1 +
>>  event-tap.c     |  847 
>> +++
>>  event-tap.h     |   42 +++
>>  qemu-tool.c     |   24 ++
>>  trace-events    |    9 +
>>  5 files changed, 923 insertions(+), 0 deletions(-)
>>  create mode 100644 event-tap.c
>>  create mode 100644 event-tap.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index e15b1c4..f36cd75 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -199,6 +199,7 @@ obj-y += rwhandler.o
>>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>>  LIBS+=-lz
>> +obj-y += event-tap.o
>>
>>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>>  QEMU_CFLAGS += $(VNC_SASL_CFLAGS)
>> diff --git a/event-tap.c b/event-tap.c
>> new file mode 100644
>> index 000..f492708
>> --- /dev/null
>> +++ b/event-tap.c
>
>> @@ -0,0 +1,847 @@
>> +/*
>> + * Event Tap functions for QEMU
>> + *
>> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "qemu-error.h"
>> +#include "block.h"
>> +#include "block_int.h"
>> +#include "ioport.h"
>> +#include "osdep.h"
>> +#include "sysemu.h"
>> +#include "hw/hw.h"
>> +#include "net.h"
>> +#include "event-tap.h"
>> +#include "trace.h"
>> +
>> +enum EVENT_TAP_STATE {
>> +    EVENT_TAP_OFF,
>> +    EVENT_TAP_ON,
>> +    EVENT_TAP_FLUSH,
>> +    EVENT_TAP_LOAD,
>> +    EVENT_TAP_REPLAY,
>> +};
>> +
>> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF;
>> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */
>
> Indeed, bdrv_aio_cancel will segfault this way.
>
> If you use dummies instead of real ACBs the only way to correctly
> implement bdrv_aio_cancel is waiting for all in-flight AIOs
> (qemu_aio_flush).

So I need to insert a new event_tap function to bdrv_aio_cancel
to do that.

>
>> +typedef struct EventTapIOport {
>> +    uint32_t address;
>> +    uint32_t data;
>> +    int      index;
>> +} EventTapIOport;
>> +
>> +#define MMIO_BUF_SIZE 8
>> +
>> +typedef struct EventTapMMIO {
>> +    uint64_t address;
>> +    uint8_t  buf[MMIO_BUF_SIZE];
>> +    int      len;
>> +} EventTapMMIO;
>> +
>> +typedef struct EventTapNetReq {
>> +    char *device_name;
>> +    int iovcnt;
>> +    struct iovec *iov;
>> +    int vlan_id;
>> +    bool vlan_needed;
>> +    bool async;
>> +    NetPacketSent *sent_cb;
>> +} EventTapNetReq;
>> +
>> +#define MAX_BLOCK_REQUEST 32
>> +
>> +typedef struct EventTapBlkReq {
>> +    char *device_name;
>> +    int num_reqs;
>> +    int num_cbs;
>> +    bool is_flush;
>> +    BlockRequest reqs[MAX_BLOCK_REQUEST];
>> +    BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST];
>> +    void *opaque[MAX_BLOCK_REQUEST];
>> +} EventTapBlkReq;
>> +
>> +#define EVENT_TAP_IOPORT (1 << 0)
>> +#define EVENT_TAP_MMIO   (1 << 1)
>> +#define EVENT_TAP_NET    (1 << 2)
>> +#define EVENT_TAP_BLK    (1 << 3)
>> +
>> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1)
>> +
>> +typedef struct EventTapLog {
>> +    int mode;
>> +    union {
>> +        EventTapIOport ioport;
>> +        EventTapMMIO mmio;
>> +    };
>> +    union {
>> +        EventTapNetReq net_req;
>> +        EventTapBlkReq blk_req;
>> +    };
>> +    QTAILQ_ENTRY(EventTapLog) node;
>> +} EventTapLog;
>> +
>> +static EventTapLog *last_event_tap;
>> +
>> +static QTAILQ_HEAD(, EventTapLog) event_list;
>> +static QTAILQ_HEAD(, EventTapLog) event_pool;
>> +
>> +static int (*event_tap_cb)(void);
>> +static QEMUBH *event_tap_bh;
>> +static VMChangeStateEntry *vmstate;
>> +
>> +static void event_tap_bh_cb(void *p)
>> +{
>> +    if (event_tap_cb) {
>> +        event_tap_cb();
>> +    }
>> +
>> +    qemu_bh_delete(event_tap_bh);
>> +    event_tap_bh = NULL;
>> +}
>> +
>> +static void event_tap_schedule_bh(void)
>> +{
>> +    trace_event_tap_ignore_bh(!!event_tap_bh);
>> +
>> +    /* if bh is already set, we ignore it for now */
>> +    if (event_tap_bh) {
>> +        return;
>> +    }
>> +
>> +    event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL);
>> +    qemu_bh_schedule(event_tap_bh);
>> +
>> +    return ;
>> +}
>> +
>> +static void event_tap_alloc_net_r

[Qemu-devel] [Bug 702885] Re: "Internal resource leak" error with ARM NEON vmull.s32 insn

2011-01-19 Thread Wolfgang Schildbach
This bug is fixed on HEAD in the qemu-meego tree (commit 
8493a687d54e542ac4eec8b2f8326415edf37ec4
A)
- Wolfgang

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/702885

Title:
  "Internal resource leak" error with ARM NEON vmull.s32 insn

Status in QEMU:
  New

Bug description:
  This bug occurs in qemu, commit
  78a59470e6bbc6e16dc4033767492649c1ae4cfd (most recent as of
  01/14/2011).

  Compile, assemble, and link the code below, with the ARM tools. (I use
  ARM C/C++ Compiler, 4.1 [Build 462]).

  armasm --cpu Cortex-A8 --licensing=flex foo.s
  armcc --cpu Cortex-A8 --licensing=flex -o main -L--sysv main.c foo.o

  Execute on qemu-arm and observe an "Internal resource leak" message.

  > qemu-arm main
  Internal resource leak before 818c

  - Wolfgang

  main.c:
  int main(void)
  {
void foofunc(void);
foofunc();

return 0 ;
  }

  
  foo.s:
  ARM
  REQUIRE8
  PRESERVE8
  AREA code, CODE, READONLY, ALIGN=2
 
  foofunc PROC
  VMULL.S32 q1, d2, d4
  MOV pc, lr

  ENDP

  EXPORT foofunc [CODE]
  END





Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori  writes:

> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>  
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>
>
> But we also don't do PCI passthrough so we really haven't even
> explored how that maps in qdev.  I don't know if qemu-kvm has
> attempted to qdev-ify it.
>
 Management and analysis tools must be able to traverse the system buses
 and find guest devices this way.

>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>  
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
>
> The qdev device model is not a stable interface.  I think that's been
> clear from the very beginning.
>
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>
>
> The -device interface is a stable interface.  Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device.  It's implied in the interface.

Now I'm confused.  Isn't "-device FOO,bus=pci.0" specifying the PCI bus?

[...]



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Gerd Hoffmann  writes:

> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
 The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via
> bus=.  This is true for *every* device, including all pci
> devices. If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to
> specify it, thats why nobody does that today.  Once q35 finally
> arrives this will change of course.

As far as I know, libvirt does it already.



Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>> event-tap function is called only when it is on, and requests sent
>> from device emulators.
>>
>> Signed-off-by: Yoshiaki Tamura 
>> ---
>>  block.c |   11 +++
>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index ff2795b..85bd8b8 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "module.h"
>>  #include "qemu-objects.h"
>> +#include "event-tap.h"
>>
>>  #ifdef CONFIG_BSD
>>  #include 
>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>> *bs, int64_t sector_num,
>>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>>          return NULL;
>>
>> +    if (bs->device_name && event_tap_is_on()) {
>> +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>> +                                         cb, opaque);
>> +    }
>> +
>>      if (bs->dirty_bitmap) {
>>          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>                                           opaque);
>
> Just noticed the context here... Does this patch break block migration
> when event-tap is on?

I don't think so.  event-tap will call bdrv_aio_writev() upon
flushing requests and it shouldn't affect block-migration.  The
block written after the synchronization should be marked as dirty
and should be sent in the next round.  Am I missing the point?

> Another question that came to my mind is if we really hook everything we
> need. I think we'll need to have a hook in bdrv_flush as well. I don't
> know if you do hook qemu_aio_flush and friends -  does a call cause
> event-tap to flush its queue? If not, a call to qemu_aio_flush might
> hang qemu because it's waiting for requests to complete which are
> actually stuck in the event-tap queue.

No it doesn't queue at event-tap.  Marcelo pointed that we should
hook bdrv_aio_flush to avoid requests inversion, that made sense
to me.  Do we need to hook bdrv_flush for that same reason?  If
we hook bdrv_flush and qemu_aio_flush, we're going loop forever
because the synchronization code is calling vm_stop that call
bdrv_flush_all and qemu_aio_flush.

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [Qemu-devel] [Bug 702885] Re: "Internal resource leak" error with ARM NEON vmull.s32 insn

2011-01-19 Thread Peter Maydell
On 19 January 2011 12:42, Wolfgang Schildbach <702...@bugs.launchpad.net> wrote:
> This bug is fixed on HEAD in the qemu-meego tree (commit 
> 8493a687d54e542ac4eec8b2f8326415edf37ec4
> A)

Note that the qemu-meego tree disables these warnings using
an #ifdef, so even if the message is not printed we might still
not be handling the temporaries correctly (although there are a
lot of fixes for mishandled temps in that tree).

-- PMM



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Markus Armbruster
Anthony Liguori  writes:

> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>  
 On 2011-01-12 11:31, Jan Kiszka wrote:


> Am 12.01.2011 11:22, Avi Kivity wrote:
>
>  
>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>>
>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>> The devices can get at KVMState through the BusState.
>>>
>>>  
>> There is no kvm bus in a PC (I looked).  We're bending the device model
>> here because a device is implemented in the kernel and not in
>> userspace.  An implementation detail is magnified beyond all proportions.
>>
>> An ioapic that is implemented by kvm lives in exactly the same place
>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>> it elsewhere, not through creating imaginary buses that don't exist.
>>
>>
>>
> Exactly.
>
> So we can either "infect" the whole device tree with kvm (or maybe a
> more generic accelerator structure that also deals with Xen) or we need
> to pull the reference inside the device's init function from some global
> service (kvm_get_state).
>
>  
 Note that this topic is still waiting for good suggestions, specifically
 from those who believe in kvm_state references :). This is not only
 blocking kvmstate merge but will affect KVM irqchips as well.

 It boils down to how we reasonably pass a kvm_state reference from
 machine init code to a sysbus device. I'm probably biased, but I don't
 see any way that does not work against the idea of confining access to
 kvm_state or breaks device instantiation from the command line or a
 config file.


>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>  
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>
>
> The bus topology reflects how I/O flows in and out of a device.  We do
> not model a perfect PC bus architecture and I don't think we ever
> intend to.  Instead, we model a functional architecture.
>
> I/O from an assigned device does not flow through the emulated PCI
> bus.  Therefore, it does not belong on the emulated PCI bus.
>
> Assigned devices need to interact with the emulated PCI bus, but they
> shouldn't be children of it.

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?



[Qemu-devel] Re: [PATCH V5 2/4] nmi: make cpu-index argument optional

2011-01-19 Thread Luiz Capitulino

Sorry for the long delay on this one, in general looks good, I have just
a few small comments.

On Mon, 10 Jan 2011 17:27:51 +0800
Lai Jiangshan  wrote:

> When the argument "cpu-index" is not given,
> then "nmi" command will inject NMI on all CPUs.

Please, state that we're changing the human monitor behavior on this.

> This simulate the nmi button on physical machine.
> 
> Thanks to Markus Armbruster for correcting the logic
> detecting "cpu-index" is given or not.
> 
> Signed-off-by:  Lai Jiangshan 
> ---
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 99b96a8..a49fcd4 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -721,9 +721,9 @@ ETEXI
>  #if defined(TARGET_I386)
>  {
>  .name   = "nmi",
> -.args_type  = "cpu-index:i",
> -.params = "cpu",
> -.help   = "inject an NMI on the given CPU",
> +.args_type  = "cpu-index:i?",
> +.params = "[cpu]",
> +.help   = "inject an NMI on all CPUs or the given CPU",

IMO, it's better to be a bit more clear, something like: "Inject an NMI on all
CPUs if no argument is given, otherwise inject it on the specified CPU".

>  .mhandler.cmd = do_inject_nmi,
>  },
>  #endif
> diff --git a/monitor.c b/monitor.c
> index fd18887..952f67f 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2520,8 +2520,15 @@ static void do_wav_capture(Monitor *mon, const QDict 
> *qdict)
>  static void do_inject_nmi(Monitor *mon, const QDict *qdict)
>  {
>  CPUState *env;
> -int cpu_index = qdict_get_int(qdict, "cpu-index");
> +int cpu_index;
>  
> +if (!qdict_get(qdict, "cpu-index")) {

Please, use qdict_haskey().

> +for (env = first_cpu; env != NULL; env = env->next_cpu)
> +cpu_interrupt(env, CPU_INTERRUPT_NMI);
> +return;
> +}
> +
> +cpu_index = qdict_get_int(qdict, "cpu-index");
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  if (env->cpu_index == cpu_index) {
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> 




[Qemu-devel] Re: [PATCH V5 3/4] qmp, nmi: convert do_inject_nmi() to QObject

2011-01-19 Thread Luiz Capitulino
On Mon, 10 Jan 2011 17:28:14 +0800
Lai Jiangshan  wrote:

> Make we can inject NMI via qemu-monitor-protocol.
> We use "inject-nmi" for the qmp command name, the meaning is clearer.
> 
> Signed-off-by:  Lai Jiangshan 
> ---
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index a49fcd4..4db413d 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -724,7 +724,8 @@ ETEXI
>  .args_type  = "cpu-index:i?",
>  .params = "[cpu]",
>  .help   = "inject an NMI on all CPUs or the given CPU",
> -.mhandler.cmd = do_inject_nmi,
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_inject_nmi,
>  },
>  #endif
>  STEXI
> diff --git a/monitor.c b/monitor.c
> index 952f67f..1bee840 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2517,7 +2517,7 @@ static void do_wav_capture(Monitor *mon, const QDict 
> *qdict)
>  #endif
>  
>  #if defined(TARGET_I386)
> -static void do_inject_nmi(Monitor *mon, const QDict *qdict)
> +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject 
> **ret_data)
>  {
>  CPUState *env;
>  int cpu_index;
> @@ -2525,15 +2525,17 @@ static void do_inject_nmi(Monitor *mon, const QDict 
> *qdict)
>  if (!qdict_get(qdict, "cpu-index")) {
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> -return;
> +return 0;
>  }
>  
>  cpu_index = qdict_get_int(qdict, "cpu-index");
>  for (env = first_cpu; env != NULL; env = env->next_cpu)
>  if (env->cpu_index == cpu_index) {
>  cpu_interrupt(env, CPU_INTERRUPT_NMI);
> -break;
> +return 0;
>  }
> +
> +return -1;
>  }
>  #endif
>  
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 56c4d8b..c2d619c 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -429,6 +429,33 @@ Example:
>  
>  EQMP
>  
> +#if defined(TARGET_I386)
> +{
> +.name   = "inject_nmi",

Please use an hyphen.

> +.args_type  = "cpu-index:i?",
> +.params = "[cpu]",
> +.help   = "inject an NMI on all CPUs or the given CPU",
> +.user_print = monitor_user_noop,
> +.mhandler.cmd_new = do_inject_nmi,
> +},
> +#endif
> +SQMP
> +inject_nmi
> +--
> +
> +Inject an NMI on the given CPU (x86 only).

Please, explain that we can also inject on all CPUs.

> +
> +Arguments:
> +
> +- "cpu_index": the index of the CPU to be injected NMI (json-int)

It's actually "cpu-index", and you should write "(json-int, optional)".

> +
> +Example:
> +
> +-> { "execute": "inject_nmi", "arguments": { "cpu-index": 0 } }

Hyphen.

> +<- { "return": {} }
> +
> +EQMP
> +
>  {
>  .name   = "migrate",
>  .args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
> 




Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 14:04, schrieb Yoshiaki Tamura:
>>> +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
>>> +{
>>> +BlockDriverState *bs;
>>> +
>>> +bs = bdrv_find(blk_req->device_name);
>>
>> Please store the BlockDriverState in blk_req. This code loops over all
>> block devices and does a string comparison - and that for each request.
>> You can also save the qemu_strdup() when creating the request.
>>
>> In the few places where you really need the device name (might be the
>> case for load/save, I'm not sure), you can still get it from the
>> BlockDriverState.
> 
> I would do so for the primary side.  Although we haven't
> implemented yet, we want to replay block requests from block
> layer on the secondary side, and need device name to restore
> BlockDriverState.

Hm, I see. I'm not happy about it, but I don't have a suggestion right
away how to avoid it.

>>
>>> +
>>> +if (blk_req->is_flush) {
>>> +bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque);
>>
>> You need to handle errors. If bdrv_aio_flush returns NULL, call the
>> callback with -EIO.
> 
> I'll do so.
> 
>>
>>> +return;
>>> +}
>>> +
>>> +bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
>>> +blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
>>> +blk_req->reqs[0].opaque);
>>
>> Same here.
>>
>>> +bdrv_flush(bs);
>>
>> This looks really strange. What is this supposed to do?
>>
>> One point is that you write it immediately after bdrv_aio_write, so you
>> get an fsync for which you don't know if it includes the current write
>> request or if it doesn't. Which data do you want to get flushed to the disk?
> 
> I was expecting to flush the aio request that was just initiated.
> Am I misunderstanding the function?

Seems so. The function names don't use really clear terminology either,
so you're not the first one to fall in this trap. Basically we have:

* qemu_aio_flush() waits for all AIO requests to complete. I think you
wanted to have exactly this, but only for a single block device. Such a
function doesn't exist yet.

* bdrv_flush() makes sure that all successfully completed requests are
written to disk (by calling fsync)

* bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
the fsync in the thread pool

>> The other thing is that you introduce a bdrv_flush for each request,
>> basically forcing everyone to something very similar to writethrough
>> mode. I'm sure this will have a big impact on performance.
> 
> The reason is to avoid inversion of queued requests.  Although
> processing one-by-one is heavy, wouldn't having requests flushed
> to disk out of order break the disk image?

No, that's fine. If a guest issues two requests at the same time, they
may complete in any order. You just need to make sure that you don't
call the completion callback before the request really has completed.

I'm just starting to wonder if the guest won't timeout the requests if
they are queued for too long. Even more, with IDE, it can only handle
one request at a time, so not completing requests doesn't sound like a
good idea at all. In what intervals is the event-tap queue flushed?

On the other hand, if you complete before actually writing out, you
don't get timeouts, but you signal success to the guest when the request
could still fail. What would you do in this case? With a writeback cache
mode we're fine, we can just fail the next flush (until then nothing is
guaranteed to be on disk and order doesn't matter either), but with
cache=writethrough we're in serious trouble.

Have you thought about this problem? Maybe we end up having to flush the
event-tap queue for each single write in writethrough mode.

>> Additionally, error handling is missing.
> 
> I looked at the codes using bdrv_flush and realized some of them
> doesn't handle errors, but scsi-disk.c does.  Should everyone
> handle errors or depends on the usage?

I added the return code only recently, it was a void function
previously. Probably some error handling should be added to all of them.

Kevin



[Qemu-devel] [PATCH 2/5] qemu and qemu-xen: fix segfault on con_disconnect

2011-01-19 Thread Stefano Stabellini
The test on xendev->gnttabdev in con_disconnect is wrong: what
differentiates the consoles is the dev number.

Signed-off-by: Stefano Stabellini 

diff --git a/hw/xen_console.c b/hw/xen_console.c
index d7099c4..0a2374c 100644
--- a/hw/xen_console.c
+++ b/hw/xen_console.c
@@ -258,7 +258,7 @@ static void con_disconnect(struct XenDevice *xendev)
 xen_be_unbind_evtchn(&con->xendev);
 
 if (con->sring) {
-if (!xendev->gnttabdev)
+if (!xendev->dev)
munmap(con->sring, XC_PAGE_SIZE);
 else
 xc_gnttab_munmap(xendev->gnttabdev, con->sring, 1);



Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
> 2011/1/19 Kevin Wolf :
>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
>>> event-tap function is called only when it is on, and requests sent
>>> from device emulators.
>>>
>>> Signed-off-by: Yoshiaki Tamura 
>>> ---
>>>  block.c |   11 +++
>>>  1 files changed, 11 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index ff2795b..85bd8b8 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -28,6 +28,7 @@
>>>  #include "block_int.h"
>>>  #include "module.h"
>>>  #include "qemu-objects.h"
>>> +#include "event-tap.h"
>>>
>>>  #ifdef CONFIG_BSD
>>>  #include 
>>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
>>> *bs, int64_t sector_num,
>>>  if (bdrv_check_request(bs, sector_num, nb_sectors))
>>>  return NULL;
>>>
>>> +if (bs->device_name && event_tap_is_on()) {
>>> +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>>> + cb, opaque);
>>> +}
>>> +
>>>  if (bs->dirty_bitmap) {
>>>  blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
>>>   opaque);
>>
>> Just noticed the context here... Does this patch break block migration
>> when event-tap is on?
> 
> I don't think so.  event-tap will call bdrv_aio_writev() upon
> flushing requests and it shouldn't affect block-migration.  The
> block written after the synchronization should be marked as dirty
> and should be sent in the next round.  Am I missing the point?

No, that makes sense. I don't have a complete understanding of the whole
series yet, so there may be well more misunderstandings on my side.

>> Another question that came to my mind is if we really hook everything we
>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>> know if you do hook qemu_aio_flush and friends -  does a call cause
>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>> hang qemu because it's waiting for requests to complete which are
>> actually stuck in the event-tap queue.
> 
> No it doesn't queue at event-tap.  Marcelo pointed that we should
> hook bdrv_aio_flush to avoid requests inversion, that made sense
> to me.  Do we need to hook bdrv_flush for that same reason?  If

bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
general it seems likely that we need to do the same.

> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
> because the synchronization code is calling vm_stop that call
> bdrv_flush_all and qemu_aio_flush.

qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
we would loop forever. It just waits for AIO requests to complete.

I just looked up the code and I think the situation is a bit different
than I thought originally: qemu_aio_flush waits only for completion of
requests which belong to a driver that has registered using
qemu_aio_set_fd_handler. So this means that AIO requests queued in
event-tap are not considered in-flight requests and we won't get stuck
in a loop. Maybe we have no problem in fact. :-)

On the other hand, e.g. migration relies on the fact that after a
qemu_aio_flush, all AIO requests that the device model has submitted are
completed. I think event-tap must take care that the requests which are
queued are not forgotten to be migrated. (Maybe the code already
considers this, I'm just writing down what comes to my mind...)

Kevin



[Qemu-devel] [Bug 704904] [NEW] No rule to make target ../libhw32/virtio.o

2011-01-19 Thread Mateusz Łoskot
Public bug reported:

Building qemu from current git using 32-bit MinGW (installer from
2010-11-07) on Windows Vista (64-bit) fails with the following error:

make[1]: *** No rule to make target `../libhw32/virtio.o', needed by 
`qemu.exe'.  Stop.
make: *** [subdir-i386-softmmu] Error 2

Here is my ./configure summary:

###
Mateuszl@dog /g/src/qemu/_git/master
$ ./configure
warning: proceeding without pkg-config
Install prefixc:/Program Files/Qemu
BIOS directoryc:/Program Files/Qemu
binary directory  c:/Program Files/Qemu
config directory  c:/Program Files/Qemu
Source path   /g/src/qemu/_git/master
C compilergcc
Host C compiler   gcc
CFLAGS-O2 -g
QEMU_CFLAGS   -m32 -D__USE_MINGW_ANSI_STDIO=1 -DWIN32_LEAN_AND_MEAN 
-DWINVER=0x501 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 
-D_LARGEFILE_SOU
RCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wendif-labels 
-Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing  
-fstack-protector-all -Wmissin
g-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k 
-Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wty
pe-limits
LDFLAGS   -Wl,--nxcompat -Wl,--no-seh -Wl,--dynamicbase 
-Wl,--warn-common -m32 -g
make  make
install   install
host CPU  i386
host big endian   no
target list   i386-softmmu x86_64-softmmu arm-softmmu cris-softmmu 
m68k-softmmu microblaze-softmmu mips-softmmu mipsel-softmmu mips64-softmmu 
mips64el-softm
mu ppc-softmmu ppcemb-softmmu ppc64-softmmu sh4-softmmu sh4eb-softmmu 
sparc-softmmu sparc64-softmmu
tcg debug enabled no
Mon debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
-Werror enabled   no
SDL support   yes
curses supportno
curl support  no
check support no
mingw32 support   yes
Audio drivers winwave
Extra audio cards ac97 es1370 sb16 hda
Block whitelist
Mixer emulation   no
VNC TLS support   no
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
VNC threadno
xen support   no
brlapi supportno
bluez  supportno
Documentation no
NPTL support  no
GUEST_BASEyes
PIE user targets  no
vde support   no
IO thread no
Linux AIO support no
ATTR/XATTR support no
Install blobs yes
KVM support   no
fdt support   no
preadv supportno
fdatasync no
madvise   no
posix_madvise no
uuid support  no
vhost-net support no
Trace backend nop
Trace output file trace-
spice support no
rbd support   no
xfsctl supportno
###

and here is full make output until the error:

###
GEN   qemu-img-cmds.h
  GEN   config-host.h
  GEN   trace.h
  GEN   qemu-options.def
  CCqemu-img.o
qemu-img.c: In function 'img_convert':
qemu-img.c:826:27: warning: format '%I64d' expects type 'int', but argument 2 
has type 'int64_t'
  CCqemu-tool.o
  CCqemu-error.o
  CCosdep.o
  CCoslib-win32.o
  GEN   trace.c
  CCtrace.o
  CCcutils.o
  CCcache-utils.o
  CCqemu-malloc.o
  CCqemu-option.o
  CCmodule.o
  CCnbd.o
  CCblock.o
block.c: In function 'bdrv_stats_iter':
block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 3 has 
type 'int64_t'
block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 4 has 
type 'int64_t'
block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 5 has 
type 'int64_t'
block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 6 has 
type 'int64_t'
block.c: In function 'bdrv_info_stats_bs':
block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 2 has 
type 'uint64_t'
block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 3 has 
type 'uint64_t'
block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 4 has 
type 'uint64_t'
block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 5 has 
type 'uint64_t'
block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 6 has 
type 'long long unsigned int'
  CCaio.o
  CCaes.o
  CCqemu-config.o
  CCblock/raw.o
  CCblock/cow.o
  CCblock/qcow.o
  CCblock/vdi.o
block/vdi.c: In function 'uuid_unparse':
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: unknown conversion type character 'h' in format
block/vdi.c:135:13: warning: un

RE: [Qemu-devel] [Bug 702885] Re: "Internal resource leak" error withARM NEON vmull.s32 insn

2011-01-19 Thread Schildbach, Wolfgang
> From: qemu-devel-bounces+wschi=dolby@nongnu.org 
> [mailto:qemu-devel-bounces+wschi=dolby@nongnu.org] On 
> Behalf Of Peter Maydell
> Sent: Wednesday, January 19, 2011 5:16 AM
> To: Bug 702885
> Cc: qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] [Bug 702885] Re: "Internal resource 
> leak" error withARM NEON vmull.s32 insn
> 
> On 19 January 2011 12:42, Wolfgang Schildbach 
> <702...@bugs.launchpad.net> wrote:
> > This bug is fixed on HEAD in the qemu-meego tree (commit 
> > 8493a687d54e542ac4eec8b2f8326415edf37ec4
> > A)
> 
> Note that the qemu-meego tree disables these warnings using 
> an #ifdef, so even if the message is not printed we might 
> still not be handling the temporaries correctly (although 
> there are a lot of fixes for mishandled temps in that tree).
> 
> -- PMM

Just when I was getting my hopes up :-)

Indeed, I just checked with the meego tree that qemu still segfaults one
of my testcases. I'll try to isolate the case such that I can document
it on launchpad.

- Wolfgang



Re: [Qemu-devel] [PATCH] target-arm: Fix garbage collection of temporaries in Neon emulation.

2011-01-19 Thread Christophe Lyon
Here is an updated patch which will hopefully not be mangled by my mailer.

Fix garbage collection of temporaries in Neon emulation.


Signed-off-by: Christophe Lyon 
---
 target-arm/translate.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 57664bc..b3e3d70 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -4176,6 +4176,13 @@ static inline void gen_neon_mull(TCGv_i64 dest, TCGv a, 
TCGv b, int size, int u)
 break;
 default: abort();
 }
+
+/* gen_helper_neon_mull_[su]{8|16} do not free their parameters.
+   Don't forget to clean them now.  */
+if (size < 2) {
+  dead_tmp(a);
+  dead_tmp(b);
+}
 }
 
 /* Translate a NEON data processing instruction.  Return nonzero if the
@@ -4840,7 +4847,7 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)
 if (size == 3) {
 tcg_temp_free_i64(tmp64);
 } else {
-dead_tmp(tmp2);
+tcg_temp_free_i32(tmp2);
 }
 } else if (op == 10) {
 /* VSHLL */
@@ -5076,8 +5083,6 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)
 case 8: case 9: case 10: case 11: case 12: case 13:
 /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */
 gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-dead_tmp(tmp2);
-dead_tmp(tmp);
 break;
 case 14: /* Polynomial VMULL */
 cpu_abort(env, "Polynomial VMULL not implemented");
@@ -5228,6 +5233,10 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)
 return 1;
 
 tmp2 = neon_get_scalar(size, rm);
+/* We need a copy of tmp2 because gen_neon_mull
+ * deletes it during pass 0.  */
+tmp4 = new_tmp();
+tcg_gen_mov_i32(tmp4, tmp2);
 tmp3 = neon_load_reg(rn, 1);
 
 for (pass = 0; pass < 2; pass++) {
@@ -5235,9 +5244,9 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)
 tmp = neon_load_reg(rn, 0);
 } else {
 tmp = tmp3;
+tmp2 = tmp4;
 }
 gen_neon_mull(cpu_V0, tmp, tmp2, size, u);
-dead_tmp(tmp);
 if (op == 6 || op == 7) {
 gen_neon_negl(cpu_V0, size);
 }
@@ -5264,7 +5273,6 @@ static int disas_neon_data_insn(CPUState * env, 
DisasContext *s, uint32_t insn)
 neon_store_reg64(cpu_V0, rd + pass);
 }
 
-dead_tmp(tmp2);
 
 break;
 default: /* 14 and 15 are RESERVED */
-- 
1.7.2.3




Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Chunqiang Tang
> > Doing both fault injection and verification together introduces some 
> > subtlety. For example, even under the random failure mode, two disk 
writes 
> > triggered by one VM-issued write must either fail together or succeed 
> > together. Otherwise, the truth image and the test image will diverge 
and 
> > verification won't succeed. Currently, qemu-test carefully works with 
the 
> > 'sim' driver to guarantee those conditions. Those conditions need be 
> > retained after code restructure. 
> 
> If the real backend is a host system file or device, and AIO or
> multi-threaded writes are used, you can't depend on two parallel disk
> writes (triggered by one VM-issued write) failing together or
> succeeding together.  All you can do is look at the error code after
> each operation completes, and use it to prevent issuing later
> operations.  You can't stop the other parallel operations that are
> already in progress.
> 
> Is that an issue in your design assumptions?

Your description of the problem is accurate, i.e., "if  AIO or 
multi-threaded writes are used, you can't stop the other parallel 
operations that are already in progress." As a result, a naive extension 
of blkverify to test concurrent requests would not work. The simulated 
block driver (block/sim.c) in the FVD patch uses neither AIO or nor 
multi-threaded I/O as the backend. It instead uses a 'simulated' backend, 
which allows a full control of I/O order and callback order, and can 
enforce that two parallel disk writes (triggered by one VM-issued write) 
either failing together or succeeding together, and some other properties 
as well, which makes testing more comprehensive and debugging easier. 



[Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB

2011-01-19 Thread Pierre Riteau
b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
value of bdrv_write and aborts migration when it fails. However, if the
size of the block device to migrate is not a multiple of BLOCK_SIZE
(currently 1 MB), the last bdrv_write will fail with -EIO.

Fixed by calling bdrv_write with the correct size of the last block.
---
 block-migration.c |   16 +++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 1475325..eeb9c62 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 int64_t addr;
 BlockDriverState *bs;
 uint8_t *buf;
+int64_t total_sectors;
+int nr_sectors;
 
 do {
 addr = qemu_get_be64(f);
@@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return -EINVAL;
 }
 
+total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
+if (total_sectors <= 0) {
+fprintf(stderr, "Error getting length of block device %s\n", 
device_name);
+return -EINVAL;
+}
+
+if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
+nr_sectors = total_sectors - addr;
+} else {
+nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
+}
+
 buf = qemu_malloc(BLOCK_SIZE);
 
 qemu_get_buffer(f, buf, BLOCK_SIZE);
-ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK);
+ret = bdrv_write(bs, addr, buf, nr_sectors);
 
 qemu_free(buf);
 if (ret < 0) {
-- 
1.7.3.5




Re: [Qemu-devel] [PULL] piix, pci, qdev

2011-01-19 Thread Christoph Hellwig
> >   pci: fix migration path for devices behind bridges

This patch breaks starting qemu for me on 32-bit x86 with the following assert:

qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion
`!se->compat || se->instance_id == 0' failed.

my qemu command line is:

/opt/qemu/bin/qemu-system-x86_64 \
-m 1500 \
-enable-kvm \
-drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \
-drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \
-drive if=none,file=/home/test1.img,cache=none,id=scratch \
-device virtio-blk-pci,drive=root \
-device virtio-blk-pci,drive=test \
-device virtio-blk-pci,drive=scratch \
-append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \
-monitor tcp:127.0.0.1:31337,server,nowait \
-kernel arch/x86/boot/bzImage \
-nographic

btw, I can't find a patch for that commit anywhere in the qemu-devel
archives.  Did it get reviewed anywhere at all?



Re: [Qemu-devel] [PULL] piix, pci, qdev

2011-01-19 Thread Michael S. Tsirkin
On Wed, Jan 19, 2011 at 04:14:48PM +0100, Christoph Hellwig wrote:
> > >   pci: fix migration path for devices behind bridges
> 
> This patch breaks starting qemu for me on 32-bit x86 with the following 
> assert:
> 
> qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion
> `!se->compat || se->instance_id == 0' failed.
> 
> my qemu command line is:
> 
> /opt/qemu/bin/qemu-system-x86_64 \
>   -m 1500 \
>   -enable-kvm \
>   -drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \
>   -drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \
>   -drive if=none,file=/home/test1.img,cache=none,id=scratch \
>   -device virtio-blk-pci,drive=root \
>   -device virtio-blk-pci,drive=test \
>   -device virtio-blk-pci,drive=scratch \
>   -append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \
>   -monitor tcp:127.0.0.1:31337,server,nowait \
>   -kernel arch/x86/boot/bzImage \
>   -nographic

I'll try that. generally qemu works on 32 bit for me but I only
used 2 virtio devices: block and net. will try with 3.

> 
> btw, I can't find a patch for that commit anywhere in the qemu-devel
> archives.  Did it get reviewed anywhere at all?

message ids:

20101113211054.ga23...@redhat.com
20101224084529.ga23...@redhat.com

-- 
MST



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Chunqiang Tang
> >> Moreover, using a host file system not only adds overhead, but
> >> also introduces data integrity issues. Specifically, if I/Os uses 
O_DSYNC,
> >> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data
> >> integrity in the event of a host crash. See
> >> http://lwn.net/Articles/348739/ .
> >
> > You have the same issue with O_DIRECT when using a raw disk device
> > too.  That is, O_DIRECT on a raw device does not guarantee integrity
> > in the event of a host crash either, for mostly the same reasons.
> 
> QEMU has semantics that use O_DIRECT safely; there is no issue here.
> When a drive is added with cache=none, QEMU not only uses O_DIRECT but
> also advertises an enabled write cache to the guest.
> 
> The guest *must* flush the cache when it wants to ensure data is
> stable.  In the event of a host crash, all, some, or none of the I/O
> since the last flush may have made it to disk.  Each of these
> possibilities is fair game since the guest may only depend on writes
> being on disk if they completed and a successful flush was issued
> afterwards.

Thank both of you for the explanation, which is very helpful to me. With 
FVD's capability of eliminating the host file system and storing the image 
on a logical volume, then perhaps we can always use O_DSYNC, because there 
is little (or no?) LVM metadata that needs a flush on every write and 
hence O_DSYNC  does not add overhead? I am not certain on this, and need 
help for confirmation. If this is true, the guest does not need to flush 
the cache. 



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Christoph Hellwig
On Wed, Jan 19, 2011 at 10:17:47AM -0500, Chunqiang Tang wrote:
> Thank both of you for the explanation, which is very helpful to me. With 
> FVD's capability of eliminating the host file system and storing the image 
> on a logical volume, then perhaps we can always use O_DSYNC, because there 
> is little (or no?) LVM metadata that needs a flush on every write and 
> hence O_DSYNC  does not add overhead? I am not certain on this, and need 
> help for confirmation. If this is true, the guest does not need to flush 
> the cache. 

O_DSYNC flushes the volatile write cache of the disk on every write,
which can be very ineffienct.  In addition to that image formats really
should obey the configurable caching settings qemu has, they exist for
a reason and should be handled uniformly over different image formats
and protocol drivers.



Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Christoph Hellwig
On Mon, Jan 17, 2011 at 06:36:15PM +0100, Kevin Wolf wrote:
> > +.args_type  = "id:s,size:l",
> 
> size should be 'o' instead of 'l'. The latter may be too small on 32 bit
> hosts and doesn't support convenient suffixes:

o actually fails for 2GB or more for me:

(qemu) resize scratch 2047 
resize scratch 2047 
(qemu) 
(qemu) resize scrarch 2048
resize scrarch 2048
invalid size

for l these worked fine.




Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 16:35, schrieb Christoph Hellwig:
> On Mon, Jan 17, 2011 at 06:36:15PM +0100, Kevin Wolf wrote:
>>> +.args_type  = "id:s,size:l",
>>
>> size should be 'o' instead of 'l'. The latter may be too small on 32 bit
>> hosts and doesn't support convenient suffixes:
> 
> o actually fails for 2GB or more for me:
> 
> (qemu) resize scratch 2047 
> resize scratch 2047 
> (qemu) 
> (qemu) resize scrarch 2048
> resize scrarch 2048
> invalid size
> 
> for l these worked fine.

Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32
bit host as well. Though I assume that you use a 64 bit host, and I
can't really see what's the problem there...

Kevin



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Christoph Hellwig
On Fri, Jan 14, 2011 at 03:56:00PM -0500, Chunqiang Tang wrote:
> P2) Overhead of storing an image on a host file system. Specifically, a 
> RAW image stored on ext3 is 50-63% slower than a RAW image stored on a raw 
> partition.

Sorry, benchmarking this against ext3 really doesn't matter.  Benchmark
it against xfs or ext4 with a preallocated image (fallocate or dd).

> For P1), I uses the term compact image instead of sparse image, because a 
> RAW image stored as a sparse file in ext3 is a sparse image, but is not a 
> compact image. A compact image stores data in such a way that the file 
> size of the image file is smaller than the size of the virtual disk 
> perceived by the VM. QCOW2 is a compact image. The disadvantage of a 
> compact image is that the data layout perceived by the guest OS differs 
> from the actual layout on the physical disk, which defeats many 
> optimizations in guest file systems.

It's something filesystems have to deal with.  Real storage is getting
increasingly virtualized.  While this didn't matter for the real high
end storage which has been doing this for a long time it's getting more
and more exposed to the filesystem.  That includes LVM layouts and
thinly provisioned disk arrays, which are getting increasingly popular.
That doesn't matter the 64k (or until recently 4k) cluster size in qcow2
is a good idea, we'd want at least a magnitude or two larger extents
to perform well, but it means filesystems really do have to cope with
it.

> For P2), using a host file system is inefficient, because 1) historically 
> file systems are optimized for small files rather than large images,

I'm not sure what hole you're pulling off this bullshit, but this is
absolutely not correct.  Since the damn of time you have filesystems
optimized for small files, for larger or really large files, or trying
to deal with a tradeoff inbetween.

> 2) certain functions of a host file system are simply redundant with 
> respect to the function of a compact image, e.g., performing storage 
> allocation. Moreover, using a host file system not only adds overhead, but 
> also introduces data integrity issues.

I/O into fully preallocated files uses exactly the same codepath as
doing I/O to the block device, except for a identify logical to physical
block mapping in the block device and a non-trivial one in the
filesysgem.  Note that the block mapping is cached and does not affect
the performance.  I've published the numbers for qemu in the various
caching modes and all major filesystems a while ago, so I'm not making
this up.


> Specifically, if I/Os uses O_DSYNC, 
> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data 
> integrity in the event of a host crash. See 
> http://lwn.net/Articles/348739/ . 

I/O to the block devices does not guarantee data integrity without
O_DSYNC either.

> Storage over-commit means that, e.g., a 100GB physical disk can be used to 
> host 10 VMs, each with a 20GB virtual disk.

The current storage industry buzz word for that is thin provisioning.




Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Christoph Hellwig
On Wed, Jan 19, 2011 at 04:49:04PM +0100, Kevin Wolf wrote:
> > (qemu) resize scratch 2047 
> > resize scratch 2047 
> > (qemu) 
> > (qemu) resize scrarch 2048
> > resize scrarch 2048
> > invalid size
> > 
> > for l these worked fine.
> 
> Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32
> bit host as well. Though I assume that you use a 64 bit host, and I
> can't really see what's the problem there...

This is a test on a 32-bit host.  The target_long of "l" worked fine
because a kvm enabled qemu always builds for an x86-64 target, even
with a 32-bit host.




Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Kevin Wolf
Am 19.01.2011 16:52, schrieb Christoph Hellwig:
> On Wed, Jan 19, 2011 at 04:49:04PM +0100, Kevin Wolf wrote:
>>> (qemu) resize scratch 2047 
>>> resize scratch 2047 
>>> (qemu) 
>>> (qemu) resize scrarch 2048
>>> resize scrarch 2048
>>> invalid size
>>>
>>> for l these worked fine.
>>
>> Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32
>> bit host as well. Though I assume that you use a 64 bit host, and I
>> can't really see what's the problem there...
> 
> This is a test on a 32-bit host.  The target_long of "l" worked fine
> because a kvm enabled qemu always builds for an x86-64 target, even
> with a 32-bit host.

Oh, okay, then it makes perfect sense. We should fix 'o' then to use
int64_t. The patch below should do it.

strtosz (which is used by 'o') has the same problem in git master.
There's a fix by Jes, so be sure to test this on the block branch.

Kevin

diff --git a/monitor.c b/monitor.c
index d291158..0cda3da 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4162,7 +4162,7 @@ static const mon_cmd_t
*monitor_parse_command(Monitor *mon,
 break;
 case 'o':
 {
-ssize_t val;
+int64_t val;
 char *end;

 while (qemu_isspace(*p)) {



[Qemu-devel] [PATCH] Support saturation with shift=0.

2011-01-19 Thread Christophe Lyon

This patch fixes corner-case saturations, when the target range is
zero. It merely removes the guard against (sh == 0), and makes:
__ssat(0x87654321, 1) return 0x and set the saturation flag
__usat(0x87654321, 0) return 0 and set the saturation flag

Signed-off-by: Christophe Lyon 
---
 target-arm/translate.c |   28 
 1 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index 721bada..41cbb96 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -6896,27 +6896,23 @@ static void disas_arm_insn(CPUState * env, DisasContext 
*s)
 tcg_gen_shli_i32(tmp, tmp, shift);
 }
 sh = (insn >> 16) & 0x1f;
-if (sh != 0) {
-tmp2 = tcg_const_i32(sh);
-if (insn & (1 << 22))
-gen_helper_usat(tmp, tmp, tmp2);
-else
-gen_helper_ssat(tmp, tmp, tmp2);
-tcg_temp_free_i32(tmp2);
-}
+tmp2 = tcg_const_i32(sh);
+if (insn & (1 << 22))
+  gen_helper_usat(tmp, tmp, tmp2);
+else
+  gen_helper_ssat(tmp, tmp, tmp2);
+tcg_temp_free_i32(tmp2);
 store_reg(s, rd, tmp);
 } else if ((insn & 0x00300fe0) == 0x00200f20) {
 /* [us]sat16 */
 tmp = load_reg(s, rm);
 sh = (insn >> 16) & 0x1f;
-if (sh != 0) {
-tmp2 = tcg_const_i32(sh);
-if (insn & (1 << 22))
-gen_helper_usat16(tmp, tmp, tmp2);
-else
-gen_helper_ssat16(tmp, tmp, tmp2);
-tcg_temp_free_i32(tmp2);
-}
+tmp2 = tcg_const_i32(sh);
+if (insn & (1 << 22))
+  gen_helper_usat16(tmp, tmp, tmp2);
+else
+  gen_helper_ssat16(tmp, tmp, tmp2);
+tcg_temp_free_i32(tmp2);
 store_reg(s, rd, tmp);
 } else if ((insn & 0x00700fe0) == 0x0fa0) {
 /* Select bytes.  */
-- 
1.7.2.3




Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Christoph Hellwig
On Wed, Jan 19, 2011 at 05:01:19PM +0100, Kevin Wolf wrote:
> Oh, okay, then it makes perfect sense. We should fix 'o' then to use
> int64_t. The patch below should do it.
> 
> strtosz (which is used by 'o') has the same problem in git master.
> There's a fix by Jes, so be sure to test this on the block branch.

The larger sizes work fine on the block branch.




Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Chunqiang Tang
> It's something filesystems have to deal with.  Real storage is getting
> increasingly virtualized.  While this didn't matter for the real high
> end storage which has been doing this for a long time it's getting more
> and more exposed to the filesystem.  That includes LVM layouts and
> thinly provisioned disk arrays, which are getting increasingly popular.
> That doesn't matter the 64k (or until recently 4k) cluster size in qcow2
> is a good idea, we'd want at least a magnitude or two larger extents
> to perform well, but it means filesystems really do have to cope with
> it.

Yes, a fundamental and optimal solution would be changing guest file 
systems, but it would take a much much longer route to introduce 
virtualization awareness into all guest file systems and its also requires 
changing the interface between the guest and the host.

> I/O into fully preallocated files uses exactly the same codepath as
> doing I/O to the block device, except for a identify logical to physical
> block mapping in the block device and a non-trivial one in the
> filesysgem.  Note that the block mapping is cached and does not affect
> the performance.  I've published the numbers for qemu in the various
> caching modes and all major filesystems a while ago, so I'm not making
> this up.

Preallocation is not a universal solution here, because it just defeats 
the other goal: thin provisioning. Moreover, if preallocation is used, it 
works best for RAW images and makes it unnecessary to use a compact image, 
which is exactly one goal of FVD, i.e., allowing optionally disabling a 
compact image data layout without giving up other features, e.g., 
copy-on-write.



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Christoph Hellwig
On Wed, Jan 19, 2011 at 11:21:07AM -0500, Chunqiang Tang wrote:
> Yes, a fundamental and optimal solution would be changing guest file 
> systems, but it would take a much much longer route to introduce 
> virtualization awareness into all guest file systems and its also requires 
> changing the interface between the guest and the host.

Actually current filesystems do pretty well on thinly provisioned
storage, as long as your extent size is not too small.  Starting from
extent size in the 64M to 256M range there's almost no difference to
non-virtualized storage.

> Preallocation is not a universal solution here, because it just defeats 
> the other goal: thin provisioning. Moreover, if preallocation is used, it 
> works best for RAW images and makes it unnecessary to use a compact image, 
> which is exactly one goal of FVD, i.e., allowing optionally disabling a 
> compact image data layout without giving up other features, e.g., 
> copy-on-write.

Again, sparse images with a large enough allocation size give you almost
the same numbers as preallocated images.  I've been doing quite a lot of
work on TP support in QEMU.  Using an XFS filesystem to back the image
with an extent size hint in the above mentioned range gives performance
withing 1% of fully preallocated images.  With the added benefit of
allowing to deallocate the space again through the SCSI WRITE_SAME
or ATA TRIM commands.



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:15 AM, Markus Armbruster wrote:

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?
   


It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really 
a tree, you're 100% correct.  This is part of why a factory interface 
that just takes a parent bus is too simplistic.


I think we ought to introduce a -pci-device option that is specifically 
for creating PCI devices that doesn't require a parent bus argument but 
provides a way to specify stable addressing (for instancing, using a 
linear index).


Regards,

Anthony Liguori





Re: [Qemu-devel] [PATCH] Support saturation with shift=0.

2011-01-19 Thread Peter Maydell
On 19 January 2011 16:10, Christophe Lyon  wrote:
>
> This patch fixes corner-case saturations, when the target range is
> zero. It merely removes the guard against (sh == 0), and makes:
> __ssat(0x87654321, 1) return 0x and set the saturation flag

did you mean __ssat(0x87654321, 0) here? (they give the same
result, of course, but it's the sh==0 case the patch is changing...)

> __usat(0x87654321, 0) return 0 and set the saturation flag
>
> Signed-off-by: Christophe Lyon 

Checked against the ARM ARM and tested by
random-instruction-sequence generation.

(We could have taken the opportunity of adding braces to
conform to the coding style while touching these lines, but
since the patch isn't changing them, just reindenting, I'm
happy not to worry about this.)

Reviewed-by: Peter Maydell 
(for the content of the patch if not the commit message, anyway).

-- PMM



[Qemu-devel] [PATCH 1/3] block: add resize monitor command

2011-01-19 Thread Christoph Hellwig
Add a monitor command that allows resizing of block devices while
qemu is running.  It uses the existing bdrv_truncate method already
used by qemu-img to do it's work.  Compared to qemu-img the size
parsing is very simplicistic, but I think having a properly numering
object is more useful for non-humand monitor users than having
the units and relative resize parsing.

For SCSI devices the new size can be updated in Linux guests by
doing the following shell command:

echo > /sys/class/scsi_device/0:0:0:0/device/rescan

For ATA devices I don't know of a way to update the block device
size in Linux system, and for virtio-blk the next two patches
will provide an automatic update of the size when this command
is issued on the host.

Signed-off-by: Christoph Hellwig 

Index: qemu/hmp-commands.hx
===
--- qemu.orig/hmp-commands.hx   2011-01-19 17:47:10.444004409 +0100
+++ qemu/hmp-commands.hx2011-01-19 17:49:51.673254095 +0100
@@ -53,6 +53,25 @@ Quit the emulator.
 ETEXI
 
 {
+.name   = "resize",
+.args_type  = "id:s,size:o",
+.params = "device size",
+.help   = "resize a block image",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_resize,
+},
+
+STEXI
+@item resize
+@findex resize
+Resize a block image while a guest is running.  Usually requires guest
+action to see the updated size.  Resize to a lower size is supported,
+but should be used with extreme caution.  Note that this command only
+resizes image files, it can not resize block devices like LVM volumes.
+ETEXI
+
+
+{
 .name   = "eject",
 .args_type  = "force:-f,device:B",
 .params = "[-f] device",
Index: qemu/blockdev.c
===
--- qemu.orig/blockdev.c2011-01-19 17:47:10.455004828 +0100
+++ qemu/blockdev.c 2011-01-19 17:49:51.674253955 +0100
@@ -706,3 +706,33 @@ int do_drive_del(Monitor *mon, const QDi
 
 return 0;
 }
+
+/*
+ * XXX: replace the QERR_UNDEFINED_ERROR errors with real values once the
+ * existing QERR_ macro mess is cleaned up.  A good example for better
+ * error reports can be found in the qemu-img resize code.
+ */
+int do_resize(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+const char *id = qdict_get_str(qdict, "id");
+int64_t size = qdict_get_int(qdict, "size");
+BlockDriverState *bs;
+
+bs = bdrv_find(id);
+if (!bs) {
+qerror_report(QERR_DEVICE_NOT_FOUND, id);
+return -1;
+}
+
+if (size < 0) {
+qerror_report(QERR_UNDEFINED_ERROR);
+return -1;
+}
+
+if (bdrv_truncate(bs, size)) {
+qerror_report(QERR_UNDEFINED_ERROR);
+return -1;
+}
+
+return 0;
+}
Index: qemu/blockdev.h
===
--- qemu.orig/blockdev.h2011-01-19 17:47:10.464012091 +0100
+++ qemu/blockdev.h 2011-01-19 17:49:51.677282590 +0100
@@ -53,5 +53,6 @@ int do_change_block(Monitor *mon, const
 const char *filename, const char *fmt);
 int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data);
 int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data);
+int do_resize(Monitor *mon, const QDict *qdict, QObject **ret_data);
 
 #endif
Index: qemu/qmp-commands.hx
===
--- qemu.orig/qmp-commands.hx   2011-01-19 17:47:10.478012371 +0100
+++ qemu/qmp-commands.hx2011-01-19 17:50:07.406016841 +0100
@@ -601,6 +601,34 @@ Example:
 -> { "execute": "netdev_del", "arguments": { "id": "netdev1" } }
 <- { "return": {} }
 
+
+EQMP
+
+{
+.name   = "block_resize",
+.args_type  = "id:s,size:o",
+.params = "id size",
+.help   = "resize a block image",
+.user_print = monitor_user_noop,
+.mhandler.cmd_new = do_resize,
+},
+
+SQMP
+block_resize
+
+
+Resize a block image while a guest is running.
+
+Arguments:
+
+- "id": the device's ID, must be unique (json-string)
+- "size": new size
+
+Example:
+
+-> { "execute": "block_resize", "arguments": { "id": "scratch", "size": 
1073741824 } }
+<- { "return": {} }
+
 EQMP
 
 {



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel



[Qemu-devel] [PATCH 3/3] virtio-blk: tell the guest about size changes

2011-01-19 Thread Christoph Hellwig
Raise a config change interrupt when the size changed.  This allows
virtio-blk guest drivers to read-read the information from the
config space once it got the config chaged interrupt.

Signed-off-by: Christoph Hellwig 

Index: qemu/hw/virtio-blk.c
===
--- qemu.orig/hw/virtio-blk.c   2011-01-19 17:47:10.335004828 +0100
+++ qemu/hw/virtio-blk.c2011-01-19 17:50:10.748272881 +0100
@@ -504,6 +504,15 @@ static int virtio_blk_load(QEMUFile *f,
 return 0;
 }
 
+static void virtio_blk_change_cb(void *opaque, int reason)
+{
+VirtIOBlock *s = opaque;
+
+if (reason & CHANGE_SIZE) {
+virtio_notify_config(&s->vdev);
+}
+}
+
 VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf)
 {
 VirtIOBlock *s;
@@ -546,6 +555,7 @@ VirtIODevice *virtio_blk_init(DeviceStat
 register_savevm(dev, "virtio-blk", virtio_blk_id++, 2,
 virtio_blk_save, virtio_blk_load, s);
 bdrv_set_removable(s->bs, 0);
+bdrv_set_change_cb(s->bs, virtio_blk_change_cb, s);
 s->bs->buffer_alignment = conf->logical_block_size;
 
 add_boot_device_path(conf->bootindex, dev, "/disk@0,0");



[Qemu-devel] [PATCH 0/3 v2] allow online resizing of block devices

2011-01-19 Thread Christoph Hellwig
This patchset adds support for online resizing of block devices.

The first patch adds a new resize monitor command which call into
the existing image resize code.  This is the meat of the series
and probably needs quite a bit of review and help as I'm not sure
about how to implement the error handling for monitor commands
correctly.  Am I really supposed to add a new QERR_ definition
for each possible error?  And if yes how am I supposed to define
them?  The macros for them aren't exactly self-explaining.

The second patch adds a way to tell drivers about a resize, and the
third one adds a guest notification for config changes to virtio-blk
which allows the guest to pick it up without a rescan.  I've just sent
the corresponding Linux guest driver patch to Rusty.

Changes from version 1 to version 2:
 - also add a QMP command (block_resize)
 - use the o format for the size in the monitor command
 - fix typos
 - use QERR_UNDEFINED_ERROR for errors instead of unstructured strings
 - remove the CDROM hint check
 - add a reason argument to the change callback



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 07:11 AM, Markus Armbruster wrote:

Gerd Hoffmann  writes:

   

On 01/18/11 18:09, Anthony Liguori wrote:
 

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
   
 

The device model topology is 100% a hidden architectural detail.
   

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
 

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
   

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

 

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
   

Wrong.  You can specify the bus you want attach the device to via
bus=.  This is true for *every* device, including all pci
devices. If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need to
specify it, thats why nobody does that today.  Once q35 finally
arrives this will change of course.
 

As far as I know, libvirt does it already.
   


I think that's a bad idea from a forward compatibility perspective.

Regards,

Anthony Liguori





[Qemu-devel] [PATCH 2/3] block: tell drivers about an image resize

2011-01-19 Thread Christoph Hellwig
Extend the change_cb callback with a reason argument, and use it
to tell drivers about size changes.

Signed-off-by: Christoph Hellwig 

Index: qemu/block.c
===
--- qemu.orig/block.c   2011-01-18 20:54:45.246021572 +0100
+++ qemu/block.c2011-01-18 20:56:54.117255612 +0100
@@ -645,7 +645,7 @@ int bdrv_open(BlockDriverState *bs, cons
 /* call the change callback */
 bs->media_changed = 1;
 if (bs->change_cb)
-bs->change_cb(bs->change_opaque);
+bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
 }
 
 return 0;
@@ -684,7 +684,7 @@ void bdrv_close(BlockDriverState *bs)
 /* call the change callback */
 bs->media_changed = 1;
 if (bs->change_cb)
-bs->change_cb(bs->change_opaque);
+bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
 }
 }
 
@@ -1135,6 +1135,8 @@ int bdrv_truncate(BlockDriverState *bs,
 ret = drv->bdrv_truncate(bs, offset);
 if (ret == 0) {
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
+if (bs->change_cb)
+bs->change_cb(bs->change_opaque, CHANGE_SIZE);
 }
 return ret;
 }
@@ -1366,7 +1368,8 @@ int bdrv_enable_write_cache(BlockDriverS
 
 /* XXX: no longer used */
 void bdrv_set_change_cb(BlockDriverState *bs,
-void (*change_cb)(void *opaque), void *opaque)
+void (*change_cb)(void *opaque, int reason),
+void *opaque)
 {
 bs->change_cb = change_cb;
 bs->change_opaque = opaque;
@@ -1411,7 +1414,7 @@ int bdrv_set_key(BlockDriverState *bs, c
 /* call the change callback now, we skipped it on open */
 bs->media_changed = 1;
 if (bs->change_cb)
-bs->change_cb(bs->change_opaque);
+bs->change_cb(bs->change_opaque, CHANGE_MEDIA);
 }
 return ret;
 }
Index: qemu/block.h
===
--- qemu.orig/block.h   2011-01-18 20:57:00.792253448 +0100
+++ qemu/block.h2011-01-18 20:57:09.771282850 +0100
@@ -182,7 +182,8 @@ int bdrv_is_locked(BlockDriverState *bs)
 void bdrv_set_locked(BlockDriverState *bs, int locked);
 int bdrv_eject(BlockDriverState *bs, int eject_flag);
 void bdrv_set_change_cb(BlockDriverState *bs,
-void (*change_cb)(void *opaque), void *opaque);
+void (*change_cb)(void *opaque, int reason),
+void *opaque);
 void bdrv_get_format(BlockDriverState *bs, char *buf, int buf_size);
 BlockDriverState *bdrv_find(const char *name);
 BlockDriverState *bdrv_next(BlockDriverState *bs);
Index: qemu/block_int.h
===
--- qemu.orig/block_int.h   2011-01-18 20:55:43.783009908 +0100
+++ qemu/block_int.h2011-01-18 20:56:58.912004806 +0100
@@ -153,7 +153,7 @@ struct BlockDriverState {
 int valid_key; /* if true, a valid encryption key has been set */
 int sg;/* if true, the device is a /dev/sg* */
 /* event callback when inserting/removing */
-void (*change_cb)(void *opaque);
+void (*change_cb)(void *opaque, int reason);
 void *change_opaque;
 
 BlockDriver *drv; /* NULL means no media */
@@ -203,6 +203,9 @@ struct BlockDriverState {
 void *private;
 };
 
+#define CHANGE_MEDIA   0x01
+#define CHANGE_SIZE0x02
+
 struct BlockDriverAIOCB {
 AIOPool *pool;
 BlockDriverState *bs;
Index: qemu/hw/ide/core.c
===
--- qemu.orig/hw/ide/core.c 2011-01-18 20:57:17.321260780 +0100
+++ qemu/hw/ide/core.c  2011-01-18 20:59:46.242256451 +0100
@@ -1625,11 +1625,15 @@ static void ide_cfata_metadata_write(IDE
 }
 
 /* called when the inserted state of the media has changed */
-static void cdrom_change_cb(void *opaque)
+static void cdrom_change_cb(void *opaque, int reason)
 {
 IDEState *s = opaque;
 uint64_t nb_sectors;
 
+if (!(reason & CHANGE_MEDIA)) {
+return;
+}
+
 bdrv_get_geometry(s->bs, &nb_sectors);
 s->nb_sectors = nb_sectors;
 
Index: qemu/hw/sd.c
===
--- qemu.orig/hw/sd.c   2011-01-18 20:57:48.978261549 +0100
+++ qemu/hw/sd.c2011-01-18 20:59:37.0 +0100
@@ -422,9 +422,14 @@ static void sd_reset(SDState *sd, BlockD
 sd->pwd_len = 0;
 }
 
-static void sd_cardchange(void *opaque)
+static void sd_cardchange(void *opaque, int reason)
 {
 SDState *sd = opaque;
+
+if (!(reason & CHANGE_MEDIA)) {
+return;
+}
+
 qemu_set_irq(sd->inserted_cb, bdrv_is_inserted(sd->bdrv));
 if (bdrv_is_inserted(sd->bdrv)) {
 sd_reset(sd, sd->bdrv);



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Chunqiang Tang
> Actually current filesystems do pretty well on thinly provisioned
> storage, as long as your extent size is not too small.  Starting from
> extent size in the 64M to 256M range there's almost no difference to
> non-virtualized storage.
> 
> Again, sparse images with a large enough allocation size give you almost
> the same numbers as preallocated images.  I've been doing quite a lot of
> work on TP support in QEMU.  Using an XFS filesystem to back the image
> with an extent size hint in the above mentioned range gives performance
> withing 1% of fully preallocated images.  With the added benefit of
> allowing to deallocate the space again through the SCSI WRITE_SAME
> or ATA TRIM commands.

These  numbers are very interesting and I would like to read more. Are 
your detailed results accessible on the Internet? Do you  have numbers on 
the impact of using a large extent size (64-256MB) on thin provisioning 
(since a guest file system's metadata are written first and are scattered 
in the virtual disk)? 



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
> On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> >Gerd Hoffmann  writes:
> >
> >>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >The device model topology is 100% a hidden architectural detail.
> This is true for the sysbus, it is obviously not the case for PCI and
> similarly discoverable buses. There we have a guest-explorable topology
> that is currently equivalent to the the qdev layout.
> >>>But we also don't do PCI passthrough so we really haven't even explored
> >>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>qdev-ify it.
> >>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>
> >>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>work with tcg?
> >>
> >>>The -device interface is a stable interface. Right now, you don't
> >>>specify any type of identifier of the pci bus when you create a PCI
> >>>device. It's implied in the interface.
> >>Wrong.  You can specify the bus you want attach the device to via
> >>bus=.  This is true for *every* device, including all pci
> >>devices. If unspecified qdev uses the first bus it finds.
> >>
> >>As long as there is a single pci bus only there is simply no need to
> >>specify it, thats why nobody does that today.  Once q35 finally
> >>arrives this will change of course.
> >As far as I know, libvirt does it already.
> 
> I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices & their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:

On 01/18/11 18:09, Anthony Liguori wrote:

On 01/18/2011 10:56 AM, Jan Kiszka wrote:



The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.


But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.


It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't 
work with tcg?



The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.


Wrong.  You can specify the bus you want attach the device to via 
bus=.  This is true for *every* device, including all pci 
devices.  If unspecified qdev uses the first bus it finds.


As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.


Right.  In terms of specifying bus=, what are we promising re: 
compatibility?  Will there always be a pci.0?  If we add some PCI-to-PCI 
bridges in order to support more devices, is libvirt support to parse 
the hierarchy and figure out which bus to put the device on?


Regards,

Anthony Liguori


  Once q35 finally arrives this will change of course.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Jan Kiszka
On 2011-01-19 17:57, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
> 
> It's almost arbitrary, but I would say it's the direction that I/Os flow.

We need both if we want KVM buses. They are useless for enumerating the
device on that bus the guest sees it on.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Christoph Hellwig
On Wed, Jan 19, 2011 at 12:08:41PM -0500, Chunqiang Tang wrote:
> These  numbers are very interesting and I would like to read more. Are 
> your detailed results accessible on the Internet? Do you  have numbers on 
> the impact of using a large extent size (64-256MB) on thin provisioning 
> (since a guest file system's metadata are written first and are scattered 
> in the virtual disk)? 

It's still work in progress.  I will publish patches and numbers in a
few weeks.



[Qemu-devel] Re: [PATCH 0/3 v2] allow online resizing of block devices

2011-01-19 Thread Juan Quintela
Christoph Hellwig  wrote:
> This patchset adds support for online resizing of block devices.
>
> The first patch adds a new resize monitor command which call into
> the existing image resize code.  This is the meat of the series
> and probably needs quite a bit of review and help as I'm not sure
> about how to implement the error handling for monitor commands
> correctly.  Am I really supposed to add a new QERR_ definition
> for each possible error?  And if yes how am I supposed to define
> them?  The macros for them aren't exactly self-explaining.
>
> The second patch adds a way to tell drivers about a resize, and the
> third one adds a guest notification for config changes to virtio-blk
> which allows the guest to pick it up without a rescan.  I've just sent
> the corresponding Linux guest driver patch to Rusty.
>
> Changes from version 1 to version 2:
>  - also add a QMP command (block_resize)
>  - use the o format for the size in the monitor command
>  - fix typos
>  - use QERR_UNDEFINED_ERROR for errors instead of unstructured strings
>  - remove the CDROM hint check
>  - add a reason argument to the change callback

Does this handle the change of the rtc data?  I can't see how/where.
(Not that I know if it is important at all until reboot).

cmos_init_hb() uses the size to guess the right geometry of the drives,
if we change the size of the drive, should we change that one?



Later, Juan.

hw/pc.c
static void pc_cmos_init_late(void *opaque)
{
pc_cmos_init_late_arg *arg = opaque;
ISADevice *s = arg->rtc_state;
pint val;
BlockDriverState *hd_table[4];
int i;

ide_get_bs(hd_table, arg->idebus0);
ide_get_bs(hd_table + 2, arg->idebus1);

rtc_set_memory(s, 0x12, (hd_table[0] ? 0xf0 : 0) | (hd_table[1] ? 0x0f : 
0));
if (hd_table[0])
cmos_init_hd(0x19, 0x1b, hd_table[0], s);
if (hd_table[1])
cmos_init_hd(0x1a, 0x24, hd_table[1], s);

.
}

static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd,
 ISADevice *s)
{
int cylinders, heads, sectors;
bdrv_get_geometry_hint(hd, &cylinders, &heads, §ors);
rtc_set_memory(s, type_ofs, 47);
rtc_set_memory(s, info_ofs, cylinders);
rtc_set_memory(s, info_ofs + 1, cylinders >> 8);
rtc_set_memory(s, info_ofs + 2, heads);




Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
 

On 01/18/11 18:09, Anthony Liguori wrote:
   

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
 
   

The device model topology is 100% a hidden architectural detail.
 

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.
   

But we also don't do PCI passthrough so we really haven't even explored
how that maps in qdev. I don't know if qemu-kvm has attempted to
qdev-ify it.
 

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't
work with tcg?

   

The -device interface is a stable interface. Right now, you don't
specify any type of identifier of the pci bus when you create a PCI
device. It's implied in the interface.
 

Wrong.  You can specify the bus you want attach the device to via
bus=.  This is true for *every* device, including all pci
devices.  If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need
to specify it, thats why nobody does that today.
   

Right.  In terms of specifying bus=, what are we promising re:
compatibility?  Will there always be a pci.0?  If we add some
PCI-to-PCI bridges in order to support more devices, is libvirt
support to parse the hierarchy and figure out which bus to put the
device on?
 

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic
   


Yeah, we can't promise that.  If you use -M pc, you aren't guaranteed a 
stable PCI bus topology even with -nodefconfig/-nodefaults.


Regards,

Anthony Liguori


Daniel
   





Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:19 AM, Daniel P. Berrange wrote:


In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices&  their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)
   


There are two distinct things here:

1) creating exactly the same virtual machine (like for migration) given 
a newer version of QEMU


2) creating a reasonably similar virtual machine given a newer version 
of QEMU


For (1), you cannot use -M pc.  You should use things like bus=X,addr=Y 
much better is for QEMU to dump a device file and to just reuse that 
instead of guessing what you need.


For (2), you cannot use bus=X,addr=Y because it makes assumptions about 
the PCI topology which may change in newer -M pc's.


I think libvirt needs to treat this two scenarios differently to support 
forwards compatibility.


Regards,

Anthony Liguori


Daniel
   





Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:


The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   


Yeah, but replacing the main chipset will certainly change the PCI 
topology such that if you're specifying bus=X and addr=X and then also 
using -M pc, unless you're parsing the default topology to come up with 
the addressing, it will break in the future.


That's why I think something simpler like a linear index that QEMU maps 
to a static location in the topology is probably the best future proof 
interface.


Regards,

Anthony Liguori


Regards,
Daniel
   





Re: [Qemu-devel] Missing roms/seabios/Makefile and roms/vgabios/Makefile

2011-01-19 Thread Blue Swirl
On Wed, Jan 19, 2011 at 12:44 PM, Mateusz Loskot  wrote:
> Hi,
>
> Running ./configure (under MinGW/MSYS) and symlink gives up trying to
> create links to non-existing Makefiles in
> roms/seabios/Makefile
> roms/vgabios/Makefile

Those directiories are actually git submodules, when you run 'git
submodule update', they get populated.

Maybe configure should check if the directories are OK and disables
building ROMs if not.



[Qemu-devel] libcacard as a git submodule

2011-01-19 Thread Alon Levy
Any objections?

Alon



[Qemu-devel] usb redirection status report

2011-01-19 Thread Hans de Goede

Hi All,

As most of you know I'm working on usb redirection (making client usb devices
accessible in guests over the network).

I thought it would be a good idea to write a short status report.

So fat the following has been done:

* written and posted a network protocol for usb redir. over the network,
and send this to the list.

* a 2nd revison is ready incorporating all comments from the mailinglist
discussion. I'll post this to the list soon.

* looked at using some pre-existing marshalling / demarshalling solution,
specifically looked at google's protocol buffers. Not an option as this
uses c++. There is a third party C version of protocol buffers, but this
cannot deal with streaming input, making it not usable for usb redirection.

* Designed an API for a transport independent, marshaller / demarshaller
for the protocol.

* Implemented a roll my own marshaller / demarshaller
for the protocol.

* Designed an API for a (transport indepenent) usb-host object/library
which can be incorporated into spice-client, or a vnc client, etc.
To easily add usb host capabilities to client-applications.

* Implemented a skeleton version of the usb-host (still need to implement
most usb redir commands).

* Wrote a standalone usb-host application using standard tcp/ip[v4|v6] as
transport, as proof of concept / for testing purposes: usbredirserver

To Do:
* Finish usb-host library
* Write a test client (usb-guest) for testing
* Implement a transport independent qemu usb-host talking the usb
redir protocol.
* Hook up a monitor command to hookup the qemu usb-redir-host connect to a
usbredirserver
* Test / debug / test
* Integrate with Spice (use a spice channel as transport)
* Integrate with vnc?

Regards,

Hans



[Qemu-devel] Re: libcacard as a git submodule

2011-01-19 Thread Anthony Liguori

On 01/19/2011 12:11 PM, Alon Levy wrote:

Any objections?
   


My primary concern is that it is easy to make changes to QEMU that 
impact the interface between QEMU and libcacard so I'd actually prefer 
it not be a submodule.


Regards,

Anthony Liguori


Alon
   





[Qemu-devel] Re: MIPS, io-thread, icount and wfi

2011-01-19 Thread Marcelo Tosatti
On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote:
> On 2011-01-18 01:19, Edgar E. Iglesias wrote:
> > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
> >> Hi,
> >>
> >> I'm running an io-thread enabled qemu-system-mipsel with icount.
> >> When the guest (linux) goes to sleep through the wait insn (waiting
> >> to be woken up by future timer interrupts), the thing deadlocks.
> >>
> >> IIUC, this is because vm timers are driven by icount, but the CPU is
> >> halted so icount makes no progress and time stands still.
> >>
> >> I've locally disabled vcpu halting when icount is enabled, that
> >> works around my problem but of course makes qemu consume 100% host cpu.
> >>
> >> I don't know why I only see this problem with io-thread builds?
> >> Could be related timing and luck.
> >>
> >> Would be interesting to know if someone has any info on how this was
> >> intended to work (if it was)? And if there are ideas for better
> >> workarounds or fixes that don't disable vcpu halting entirely.
> > 
> > Hi,
> > 
> > I've found the problem. For some reason io-thread builds use a
> > static timeout for wait loops. The entire chunk of code that
> > makes sure qemu_icount makes forward progress when the CPU's
> > are idle has been ifdef'ed away...
> > 
> > This fixes the problem for me, hopefully without affecting
> > io-thread runs without icount.
> > 
> > commit 0f4f3a919952500b487b438c5520f07a1c6be35b
> > Author: Edgar E. Iglesias 
> > Date:   Tue Jan 18 01:01:57 2011 +0100
> > 
> > qemu-timer: Fix timeout calc for io-thread with icount
> > 
> > Make sure we always make forward progress with qemu_icount to
> > avoid deadlocks. For io-thread, use the static 1000 timeout
> > only if icount is disabled.
> > 
> > Signed-off-by: Edgar E. Iglesias 
> > 
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 95814af..db1ec49 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
> >  }
> >  }
> >  
> > -#ifndef CONFIG_IOTHREAD
> >  static int64_t qemu_icount_delta(void)
> >  {
> >  if (!use_icount) {
> > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
> >  return cpu_get_icount() - cpu_get_clock();
> >  }
> >  }
> > -#endif
> >  
> >  /* enable cpu_get_ticks() */
> >  void cpu_enable_ticks(void)
> > @@ -1077,9 +1075,17 @@ void quit_timers(void)
> >  
> >  int qemu_calculate_timeout(void)
> >  {
> > -#ifndef CONFIG_IOTHREAD
> >  int timeout;
> >  
> > +#ifdef CONFIG_IOTHREAD
> > +/* When using icount, making forward progress with qemu_icount when the
> > +   guest CPU is idle is critical. We only use the static io-thread 
> > timeout
> > +   for non icount runs.  */
> > +if (!use_icount) {
> > +return 1000;
> > +}
> > +#endif
> > +
> >  if (!vm_running)
> >  timeout = 5000;
> >  else {
> > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
> >  }
> >  
> >  return timeout;
> > -#else /* CONFIG_IOTHREAD */
> > -return 1000;
> > -#endif
> >  }
> >  
> > 
> > 
> 
> This logic and timeout values were imported on iothread merge. And I bet
> at least the timeout value of 1s (vs. 5s) can still be found in
> qemu-kvm. Maybe someone over there can remember the rationales behind
> choosing this value.
> 
> Jan

This timeout is for the main select() call. So there is not a lot
of reasoning, how long to wait when there's no activity on the file
descriptors.




Re: [Qemu-devel] [Bug 704904] [NEW] No rule to make target ../libhw32/virtio.o

2011-01-19 Thread Blue Swirl
On Wed, Jan 19, 2011 at 1:57 PM, Mateusz Łoskot  wrote:
> Public bug reported:
>
> Building qemu from current git using 32-bit MinGW (installer from
> 2010-11-07) on Windows Vista (64-bit) fails with the following error:
>
> make[1]: *** No rule to make target `../libhw32/virtio.o', needed by 
> `qemu.exe'.  Stop.
> make: *** [subdir-i386-softmmu] Error 2

This should have been fixed by
0601740a5db12ea7ae0f2f7826f0cfb05854500a.

I don't see a line like:
  GEN   config-all-devices.mak

Please delete config-all-devices.mak and try again.



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
> >
> >The reason we specify 'bus' is that we wanted to be flexible wrt
> >upgrades of libvirt, without needing restarts of QEMU instances
> >it manages. That way we can introduce new functionality into
> >libvirt that relies on it having previously set 'bus' on all
> >active QEMUs.
> >
> >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> >be adding the extra bridges. I'd expect that QEMU provided just
> >the first bridge and then libvirt would specify how many more
> >bridges to create at boot or hotplug them later. So it wouldn't
> >ever need to parse topology.
> 
> Yeah, but replacing the main chipset will certainly change the PCI
> topology such that if you're specifying bus=X and addr=X and then
> also using -M pc, unless you're parsing the default topology to come
> up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Daniel P. Berrange
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >>>On 01/18/11 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>The device model topology is 100% a hidden architectural detail.
> >This is true for the sysbus, it is obviously not the case for PCI and
> >similarly discoverable buses. There we have a guest-explorable topology
> >that is currently equivalent to the the qdev layout.
> But we also don't do PCI passthrough so we really haven't even explored
> how that maps in qdev. I don't know if qemu-kvm has attempted to
> qdev-ify it.
> >>>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>>
> >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>>work with tcg?
> >>>
> The -device interface is a stable interface. Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device. It's implied in the interface.
> >>>Wrong.  You can specify the bus you want attach the device to via
> >>>bus=.  This is true for *every* device, including all pci
> >>>devices.  If unspecified qdev uses the first bus it finds.
> >>>
> >>>As long as there is a single pci bus only there is simply no need
> >>>to specify it, thats why nobody does that today.
> >>Right.  In terms of specifying bus=, what are we promising re:
> >>compatibility?  Will there always be a pci.0?  If we add some
> >>PCI-to-PCI bridges in order to support more devices, is libvirt
> >>support to parse the hierarchy and figure out which bus to put the
> >>device on?
> >The answer to your questions probably differ depending on
> >whether '-nodefconfig' and '-nodefaults' are set on the
> >command line.  If they are set, then I'd expect to only
> >ever see one PCI bus with name pci.0 forever more, unless
> >i explicitly ask for more. If they are not set, then you
> >might expect to see multiple PCI buses by appear by magic
> 
> Yeah, we can't promise that.  If you use -M pc, you aren't
> guaranteed a stable PCI bus topology even with
> -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Anthony Liguori

On 01/19/2011 12:52 PM, Daniel P. Berrange wrote:

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
   

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
 

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.
   

Yeah, but replacing the main chipset will certainly change the PCI
topology such that if you're specifying bus=X and addr=X and then
also using -M pc, unless you're parsing the default topology to come
up with the addressing, it will break in the future.
 

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem
   


Right, but you expose bus addressing via the XML, no?  That means that 
if a user specifies something like '1.0', and you translate that to 
bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used 
for the integrated 3D VGA graphics adapter, the guest creation will fail.


If you expose topological configuration to the user, the guest will not 
continue working down the road unless you come up with a scheme where 
you map addresses to a different address range for newer pcs.


Regards,

Anthony Liguori


Regards,
Daniel
   





[Qemu-devel] IDE development

2011-01-19 Thread steven765
Hello, 
For a project we're slowly making our own operating system. My task is the 
file system.  We plan on using the FatFs file system.
  My question is how to link FatFs to the IDE Hard Drive QEMU emulates? I 
cannot find any documentation on how to access the disk, API, calls, ect.  Any 
search typically returns posts to folks trying to mound loopback devices or 
additions to QEMU.  I get a lot of hits for the addition of SATA and so on.  I 
just need to bind read and write to qemu's hard drive interface.  Any pointers 
in the right direction would be appreciated.
Thanks,
Steve


  



[Qemu-devel] Re: MIPS, io-thread, icount and wfi

2011-01-19 Thread Edgar E. Iglesias
On Wed, Jan 19, 2011 at 03:02:26PM -0200, Marcelo Tosatti wrote:
> On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote:
> > On 2011-01-18 01:19, Edgar E. Iglesias wrote:
> > > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote:
> > >> Hi,
> > >>
> > >> I'm running an io-thread enabled qemu-system-mipsel with icount.
> > >> When the guest (linux) goes to sleep through the wait insn (waiting
> > >> to be woken up by future timer interrupts), the thing deadlocks.
> > >>
> > >> IIUC, this is because vm timers are driven by icount, but the CPU is
> > >> halted so icount makes no progress and time stands still.
> > >>
> > >> I've locally disabled vcpu halting when icount is enabled, that
> > >> works around my problem but of course makes qemu consume 100% host cpu.
> > >>
> > >> I don't know why I only see this problem with io-thread builds?
> > >> Could be related timing and luck.
> > >>
> > >> Would be interesting to know if someone has any info on how this was
> > >> intended to work (if it was)? And if there are ideas for better
> > >> workarounds or fixes that don't disable vcpu halting entirely.
> > > 
> > > Hi,
> > > 
> > > I've found the problem. For some reason io-thread builds use a
> > > static timeout for wait loops. The entire chunk of code that
> > > makes sure qemu_icount makes forward progress when the CPU's
> > > are idle has been ifdef'ed away...
> > > 
> > > This fixes the problem for me, hopefully without affecting
> > > io-thread runs without icount.
> > > 
> > > commit 0f4f3a919952500b487b438c5520f07a1c6be35b
> > > Author: Edgar E. Iglesias 
> > > Date:   Tue Jan 18 01:01:57 2011 +0100
> > > 
> > > qemu-timer: Fix timeout calc for io-thread with icount
> > > 
> > > Make sure we always make forward progress with qemu_icount to
> > > avoid deadlocks. For io-thread, use the static 1000 timeout
> > > only if icount is disabled.
> > > 
> > > Signed-off-by: Edgar E. Iglesias 
> > > 
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 95814af..db1ec49 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void)
> > >  }
> > >  }
> > >  
> > > -#ifndef CONFIG_IOTHREAD
> > >  static int64_t qemu_icount_delta(void)
> > >  {
> > >  if (!use_icount) {
> > > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void)
> > >  return cpu_get_icount() - cpu_get_clock();
> > >  }
> > >  }
> > > -#endif
> > >  
> > >  /* enable cpu_get_ticks() */
> > >  void cpu_enable_ticks(void)
> > > @@ -1077,9 +1075,17 @@ void quit_timers(void)
> > >  
> > >  int qemu_calculate_timeout(void)
> > >  {
> > > -#ifndef CONFIG_IOTHREAD
> > >  int timeout;
> > >  
> > > +#ifdef CONFIG_IOTHREAD
> > > +/* When using icount, making forward progress with qemu_icount when 
> > > the
> > > +   guest CPU is idle is critical. We only use the static io-thread 
> > > timeout
> > > +   for non icount runs.  */
> > > +if (!use_icount) {
> > > +return 1000;
> > > +}
> > > +#endif
> > > +
> > >  if (!vm_running)
> > >  timeout = 5000;
> > >  else {
> > > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void)
> > >  }
> > >  
> > >  return timeout;
> > > -#else /* CONFIG_IOTHREAD */
> > > -return 1000;
> > > -#endif
> > >  }
> > >  
> > > 
> > > 
> > 
> > This logic and timeout values were imported on iothread merge. And I bet
> > at least the timeout value of 1s (vs. 5s) can still be found in
> > qemu-kvm. Maybe someone over there can remember the rationales behind
> > choosing this value.
> > 
> > Jan
> 
> This timeout is for the main select() call. So there is not a lot
> of reasoning, how long to wait when there's no activity on the file
> descriptors.

OK, I suspected something like that. Thanks both of you for the info.
I'll give people a couple of days to complain at the patch, if noone
does I'll apply it.

Cheers



Re: [Qemu-devel] [PATCH] target-arm: Fix garbage collection of temporaries in Neon emulation.

2011-01-19 Thread Peter Maydell
On 19 January 2011 14:37, Christophe Lyon  wrote:
> Here is an updated patch which will hopefully not be mangled by my mailer.
>
> Fix garbage collection of temporaries in Neon emulation.

I've tested this patch and it does indeed fix the problems with VMULL
and friends (I was seeing assertions/hangs). I've tested with random
instruction sequence generation and with this patch the non-scalar
forms of VMLAL, VMLSL, VQDMLAL, VQDMLSL, VMULL, VQDMULL
now all pass. The scalar forms now pass random-sequence testing
with the addition of a patch from the qemu-meego tree. Since I have
effectively just tested that meego patch I'll post it to the list in
a moment.

Reviewed-by: Peter Maydell 

I would personally prefer slightly less terse commit messages
(for instance it might be nice to list the affected instructions in
this case). The convention is also to preface the summary line
with the file or directory affected, ie "target-arm: Fix garbage
collection of temporaries in Neon emulation".

-- PMM



Re: [Qemu-devel] [PULL] piix, pci, qdev

2011-01-19 Thread Michael S. Tsirkin
On Wed, Jan 19, 2011 at 04:14:48PM +0100, Christoph Hellwig wrote:
> > >   pci: fix migration path for devices behind bridges
> 
> This patch breaks starting qemu for me on 32-bit x86 with the following 
> assert:

Yes, sorry about that.
I sent a fix, could you check out it please?

> qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion
> `!se->compat || se->instance_id == 0' failed.
> 
> my qemu command line is:
> 
> /opt/qemu/bin/qemu-system-x86_64 \
>   -m 1500 \
>   -enable-kvm \
>   -drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \
>   -drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \
>   -drive if=none,file=/home/test1.img,cache=none,id=scratch \
>   -device virtio-blk-pci,drive=root \
>   -device virtio-blk-pci,drive=test \
>   -device virtio-blk-pci,drive=scratch \
>   -append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \
>   -monitor tcp:127.0.0.1:31337,server,nowait \
>   -kernel arch/x86/boot/bzImage \
>   -nographic
> 
> btw, I can't find a patch for that commit anywhere in the qemu-devel
> archives.  Did it get reviewed anywhere at all?



[Qemu-devel] [PATCH] target-arm: Fix loading of scalar value for Neon multiply-by-scalar

2011-01-19 Thread Peter Maydell
Fix the register and part of register we get the scalar from in
the various "multiply vector by scalar" ops (VMUL by scalar
and friends).

Signed-off-by: Peter Maydell 
---
 target-arm/translate.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index c60cd18..0c2856a 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -3608,14 +3608,14 @@ static inline TCGv neon_get_scalar(int size, int reg)
 {
 TCGv tmp;
 if (size == 1) {
-tmp = neon_load_reg(reg >> 1, reg & 1);
-} else {
-tmp = neon_load_reg(reg >> 2, (reg >> 1) & 1);
-if (reg & 1) {
-gen_neon_dup_low16(tmp);
-} else {
+tmp = neon_load_reg(reg & 7, reg >> 4);
+if (reg & 8) {
 gen_neon_dup_high16(tmp);
+} else {
+gen_neon_dup_low16(tmp);
 }
+} else {
+tmp = neon_load_reg(reg & 15, reg >> 4);
 }
 return tmp;
 }
-- 
1.6.3.3




[Qemu-devel] [PATCH] pci: fix device paths

2011-01-19 Thread Michael S. Tsirkin
Patch a6a7005d14b3c32d4864a718fb1cb19c789f58a5 generated
broken device paths. We snprintf with a length shorter
than the output, so the last character is discarded and replaced
by the null byte. Fix it up by snprintf to a buffer
which is larger by 1 byte and then memcpy the data (without
the null byte) to where we need it.

Reported-by: Christoph Hellwig 
Signed-off-by: Michael S. Tsirkin 
---

This fixes the issue for me. Could you ack pls?

 hw/pci.c |   16 
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8d0e3df..c77f6e9 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2032,10 +2032,13 @@ static char *pcibus_get_dev_path(DeviceState *dev)
  * domain:Bus:Slot.Func for systems without nested PCI bridges.
  * Slot.Function list specifies the slot and function numbers for all
  * devices on the path from root to the specific device. */
-int domain_len = strlen(":00");
-int slot_len = strlen(":SS.F");
+char domain[] = ":00";
+char slot[] = ":SS.F";
+int domain_len = sizeof domain - 1 /* For '\0' */;
+int slot_len = sizeof slot - 1 /* For '\0' */;
 int path_len;
 char *path, *p;
+int s;
 
 /* Calculate # of slots on path between device and root. */;
 slot_depth = 0;
@@ -2050,14 +2053,19 @@ static char *pcibus_get_dev_path(DeviceState *dev)
 path[path_len] = '\0';
 
 /* First field is the domain. */
-snprintf(path, domain_len, "%04x:00", pci_find_domain(d->bus));
+s = snprintf(domain, sizeof domain, "%04x:00", pci_find_domain(d->bus));
+assert(s == domain_len);
+memcpy(path, domain, domain_len);
 
 /* Fill in slot numbers. We walk up from device to root, so need to print
  * them in the reverse order, last to first. */
 p = path + path_len;
 for (t = d; t; t = t->bus->parent_dev) {
 p -= slot_len;
-snprintf(p, slot_len, ":%02x.%x", PCI_SLOT(t->devfn), 
PCI_FUNC(d->devfn));
+s = snprintf(slot, sizeof slot, ":%02x.%x",
+ PCI_SLOT(t->devfn), PCI_FUNC(d->devfn));
+assert(s == slot_len);
+memcpy(p, slot, slot_len);
 }
 
 return path;
-- 
1.7.3.2.91.g446ac



Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state

2011-01-19 Thread Blue Swirl
On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
 wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>
> But if the underlying observation is that the device tree is not really a
> tree, you're 100% correct.  This is part of why a factory interface that
> just takes a parent bus is too simplistic.
>
> I think we ought to introduce a -pci-device option that is specifically for
> creating PCI devices that doesn't require a parent bus argument but provides
> a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.



[Qemu-devel] Re: [PATCH 1/2] prep: Remove bogus BIOS size check

2011-01-19 Thread Andreas Färber

Am 18.01.2011 um 22:43 schrieb Hervé Poussineau:


From: Andreas Färber 

r3480 added this check to account for the entry vector 0xfff00100 to  
be

available for CPUs that need it. Today however, the NIP is not yet
initialized at this point (zero), so the check always triggers.

Moreover, BIOS size check is already done previously, so this part can
be removed too.

Cc: Alexander Graf 
Signed-off-by: Andreas Färber 
Signed-off-by: Hervé Poussineau 
---
hw/ppc_prep.c |3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c
index 1492266..6b22122 100644
--- a/hw/ppc_prep.c
+++ b/hw/ppc_prep.c
@@ -600,9 +600,6 @@ static void ppc_prep_init (ram_addr_t ram_size,
if (filename) {
qemu_free(filename);
}
-if (env->nip < 0xFFF8 && bios_size < 0x0010) {
-hw_error("PowerPC 601 / 620 / 970 need a 1MB BIOS\n");
-}


I've been thinking if we could replace this with a check for env- 
>excep_prefix + env->hreset_vector or similar but haven't found the  
time to try it out yet.


Andreas



if (linux_boot) {
kernel_base = KERNEL_LOAD_ADDR;
--
1.7.2.3






[Qemu-devel] [PATCH 1/5] Fast Virtual Disk (FVD) Proposal Part 1

2011-01-19 Thread Chunqiang Tang
Part 1 of the block device driver for the proposed FVD image format.
Multiple patches are used in order to manage the size of each patch.
This patch includes existing files that are modified by FVD.

See the related discussions at
http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg00426.html .

Signed-off-by: Chunqiang Tang 
---
 Makefile |   10 +---
 Makefile.objs|1 +
 block.c  |   12 +-
 block_int.h  |5 ++-
 configure|2 +-
 qemu-img-cmds.hx |6 +
 qemu-img.c   |   62 -
 qemu-io.c|3 ++
 qemu-option.c|4 +++
 qemu-tool.c  |   36 ---
 10 files changed, 81 insertions(+), 60 deletions(-)

diff --git a/Makefile b/Makefile
index 6d601ee..da4d777 100644
--- a/Makefile
+++ b/Makefile
@@ -151,13 +151,15 @@ version-obj-$(CONFIG_WIN32) += version.o
 ##
 
 qemu-img.o: qemu-img-cmds.h
-qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o cmd.o: $(GENERATED_HEADERS)
+qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o cmd.o qemu-test.o: 
$(GENERATED_HEADERS)
 
-qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
+qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-tool-time.o qemu-error.o 
$(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
 
-qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
+qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-tool-time.o qemu-error.o 
$(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
 
-qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
+qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-tool-time.o qemu-error.o 
$(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
+
+qemu-test$(EXESUF): qemu-test.o qemu-tool.o qemu-error.o $(oslib-obj-y) 
$(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) 
qemu-timer-common.o
 
 qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@,"  GEN   $@")
diff --git a/Makefile.objs b/Makefile.objs
index c3e52c5..c0c1155 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -23,6 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
qcow2-snapshot.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
+block-nested-y += blksim.o fvd.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block.c b/block.c
index ff2795b..856bb1a 100644
--- a/block.c
+++ b/block.c
@@ -58,7 +58,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t 
sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
 
-static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
+QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
 static QLIST_HEAD(, BlockDriver) bdrv_drivers =
@@ -768,7 +768,7 @@ int bdrv_commit(BlockDriverState *bs)
 
 if (!drv)
 return -ENOMEDIUM;
-
+
 if (!bs->backing_hd) {
 return -ENOTSUP;
 }
@@ -1538,7 +1538,7 @@ int bdrv_discard(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors)
  * 'nb_sectors' is the max value 'pnum' should be set to.
  */
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-   int *pnum)
+int *pnum)
 {
 int64_t n;
 if (!bs->drv->bdrv_is_allocated) {
@@ -2050,9 +2050,9 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, 
int64_t sector_num,
   cb, opaque);
 
 if (ret) {
-   /* Update stats even though technically transfer has not happened. */
-   bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
-   bs->rd_ops ++;
+/* Update stats even though technically transfer has not happened. */
+bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
+bs->rd_ops ++;
 }
 
 return ret;
diff --git a/block_int.h b/block_int.h
index 12663e8..2343d07 100644
--- a/block_int.h
+++ b/block_int.h
@@ -28,8 +28,8 @@
 #include "qemu-option.h"
 #include "qemu-queue.h"
 
-#define BLOCK_FLAG_ENCRYPT 1
-#define BLOCK_FLAG_COMPAT6 4
+#define BLOCK_FLAG_ENCRYPT1
+#define BLOCK_FLAG_COMPAT64
 
 #define BLOCK_OPT_SIZE  "size"
 #define BLOCK_OPT_ENCRYPT   "encryption"
@@ -98,6 +98,7 @@ struc

Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Jamie Lokier
Chunqiang Tang wrote:
> > >> Moreover, using a host file system not only adds overhead, but
> > >> also introduces data integrity issues. Specifically, if I/Os uses 
> O_DSYNC,
> > >> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data
> > >> integrity in the event of a host crash. See
> > >> http://lwn.net/Articles/348739/ .
> > >
> > > You have the same issue with O_DIRECT when using a raw disk device
> > > too.  That is, O_DIRECT on a raw device does not guarantee integrity
> > > in the event of a host crash either, for mostly the same reasons.
> > 
> > QEMU has semantics that use O_DIRECT safely; there is no issue here.
> > When a drive is added with cache=none, QEMU not only uses O_DIRECT but
> > also advertises an enabled write cache to the guest.
> > 
> > The guest *must* flush the cache when it wants to ensure data is
> > stable.  In the event of a host crash, all, some, or none of the I/O
> > since the last flush may have made it to disk.  Each of these
> > possibilities is fair game since the guest may only depend on writes
> > being on disk if they completed and a successful flush was issued
> > afterwards.
> 
> Thank both of you for the explanation, which is very helpful to me. With 
> FVD's capability of eliminating the host file system and storing the image 
> on a logical volume, then perhaps we can always use O_DSYNC, because there 
> is little (or no?) LVM metadata that needs a flush on every write and 
> hence O_DSYNC  does not add overhead? I am not certain on this, and need 
> help for confirmation. If this is true, the guest does not need to flush 
> the cache. 

I think O_DSYNC does not work as you might expect on raw disk devices
and logical volumes.

That doesn't mean you don't need something for crash durability!
Instead, you need to issue the disk cache flushes in whatever way works.

It actually has a very *high* overhead.

The overhead isn't from metadata - it is from needing to flush the
disk cache after every write, which prevents the disk from reordering
writes.

If you don't issue the flushes, and the physical device has a volatile
write cache, then you cannot guarantee integrity in the event of a
host crash.

This can make a filesystem faster than a raw disk or logical volume in
some configurations, if the filesystem journals data writes to limit
the seeking needed to commit durably.

-- Jamie



Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Pierre Riteau :
> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
> value of bdrv_write and aborts migration when it fails. However, if the
> size of the block device to migrate is not a multiple of BLOCK_SIZE
> (currently 1 MB), the last bdrv_write will fail with -EIO.
>
> Fixed by calling bdrv_write with the correct size of the last block.
> ---
>  block-migration.c |   16 +++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
>
> diff --git a/block-migration.c b/block-migration.c
> index 1475325..eeb9c62 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>     int64_t addr;
>     BlockDriverState *bs;
>     uint8_t *buf;
> +    int64_t total_sectors;
> +    int nr_sectors;
>
>     do {
>         addr = qemu_get_be64(f);
> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int 
> version_id)
>                 return -EINVAL;
>             }
>
> +            total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
> +            if (total_sectors <= 0) {
> +                fprintf(stderr, "Error getting length of block device %s\n", 
> device_name);
> +                return -EINVAL;
> +            }
> +
> +            if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
> +                nr_sectors = total_sectors - addr;
> +            } else {
> +                nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
> +            }
> +
>             buf = qemu_malloc(BLOCK_SIZE);
>
>             qemu_get_buffer(f, buf, BLOCK_SIZE);
> -            ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK);
> +            ret = bdrv_write(bs, addr, buf, nr_sectors);
>
>             qemu_free(buf);
>             if (ret < 0) {
> --
> 1.7.3.5
>
>
>

Hi Pierre,

I don't think the fix above is correct.  If you have a file which
isn't aliened with BLOCK_SIZE, you won't get an error with the
patch.  However, the receiver doesn't know how much sectors which
the sender wants to be written, so the guest may fail after
migration because some data may not be written.  IIUC, although
changing bytestream should be prevented as much as possible, we
should save/load total_sectors to check appropriate file is
allocated on the receiver side.

BTW, you should use error_report instead of fprintf(stderr, ...).

Thanks,

Yoshi



Re: [Qemu-devel] [PATCH] savevm: fix corruption in vmstate_subsection_load().

2011-01-19 Thread Yoshiaki Tamura
2010/12/14 Yoshiaki Tamura :
> Although it's rare to happen in live migration, when the head of a
> byte stream contains 0x05 which is the marker of subsection, the
> loader gets corrupted because vmstate_subsection_load() continues even
> the device doesn't require it.  This patch adds a checker whether
> subsection is needed, and skips following routines if not needed.
>
> Signed-off-by: Yoshiaki Tamura 
> ---
>  savevm.c |    8 +++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/savevm.c b/savevm.c
> index d38f79e..72f6249 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1633,6 +1633,12 @@ static const VMStateDescription 
> *vmstate_get_subsection(const VMStateSubsection
>  static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription 
> *vmsd,
>                                    void *opaque)
>  {
> +    const VMStateSubsection *sub = vmsd->subsections;
> +
> +    if (!sub || !sub->needed) {
> +        return 0;
> +    }
> +
>     while (qemu_peek_byte(f) == QEMU_VM_SUBSECTION) {
>         char idstr[256];
>         int ret;
> @@ -1645,7 +1651,7 @@ static int vmstate_subsection_load(QEMUFile *f, const 
> VMStateDescription *vmsd,
>         idstr[len] = 0;
>         version_id = qemu_get_be32(f);
>
> -        sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr);
> +        sub_vmsd = vmstate_get_subsection(sub, idstr);
>         if (sub_vmsd == NULL) {
>             return -ENOENT;
>         }
> --
> 1.7.1.2
>
>
>

Hi Juan,

This is an error that always happen with Kemari.  Could you tell
me if I'm fixing incorrectly?

Thanks,

Yoshi



Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%

2011-01-19 Thread Chunqiang Tang
> when I tried to use your patch, I found several problems:
> 
> * The patch does apply cleanly to latest QEMU.
>This is caused by recent changes in QEMU git master.
> 
> * The new code uses tabs instead of spaces (QEMU coding rules).
> 
> * Some lines of the new code end with blank characters.
> 
> * The patch adds empty lines at the end of some files.
> 
> The last two points are reported by newer versions of git
> (which refuse to take such patches with the default setting).
> 
> Could you please update your patch to fix those topics?
> I'd like to apply it to my QEMU code and try the new FVD.

Thank you for the detailed instructions. I fixed all the issues above and 
posted the latest patches to the mailing list (see below). Please use this 
latest version, which also includes some other fixes. BTW, I found out 
that my previous patches were rejected by the mailing list because a 
single patch for FVD was too big.

http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01948.html
http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01947.html
http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01950.html
http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01949.html
http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01951.html

Regards,
ChunQiang (CQ) Tang, Ph.D.
Homepage: http://www.research.ibm.com/people/c/ctang



[Qemu-devel] [PATCH] linux-user: Add support for -version option

2011-01-19 Thread Peter Maydell
Add support to the linux-user qemu for the -version command line
option, bringing it into line with the system emulation qemu.

Signed-off-by: Peter Maydell 
---
 linux-user/main.c |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 0d627d6..e651bfd 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2624,14 +2624,21 @@ void cpu_loop (CPUState *env)
 }
 #endif /* TARGET_ALPHA */
 
+static void version(void)
+{
+printf("qemu-" TARGET_ARCH " version " QEMU_VERSION QEMU_PKGVERSION
+   ", Copyright (c) 2003-2008 Fabrice Bellard\n");
+}
+
 static void usage(void)
 {
-printf("qemu-" TARGET_ARCH " version " QEMU_VERSION QEMU_PKGVERSION ", 
Copyright (c) 2003-2008 Fabrice Bellard\n"
-   "usage: qemu-" TARGET_ARCH " [options] program [arguments...]\n"
+version();
+printf("usage: qemu-" TARGET_ARCH " [options] program [arguments...]\n"
"Linux CPU emulator (compiled for %s emulation)\n"
"\n"
"Standard options:\n"
"-hprint this help\n"
+   "-version  display version information and exit\n"
"-g port   wait gdb connection to port\n"
"-L path   set the elf interpreter prefix (default=%s)\n"
"-s size   set the stack size in bytes (default=%ld)\n"
@@ -2886,8 +2893,10 @@ int main(int argc, char **argv, char **envp)
 singlestep = 1;
 } else if (!strcmp(r, "strace")) {
 do_strace = 1;
-} else
-{
+} else if (!strcmp(r, "version")) {
+version();
+exit(0);
+} else {
 usage();
 }
 }
-- 
1.6.3.3




Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 14:16, schrieb Yoshiaki Tamura:
>> 2011/1/19 Kevin Wolf :
>>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura:
 event-tap function is called only when it is on, and requests sent
 from device emulators.

 Signed-off-by: Yoshiaki Tamura 
 ---
  block.c |   11 +++
  1 files changed, 11 insertions(+), 0 deletions(-)

 diff --git a/block.c b/block.c
 index ff2795b..85bd8b8 100644
 --- a/block.c
 +++ b/block.c
 @@ -28,6 +28,7 @@
  #include "block_int.h"
  #include "module.h"
  #include "qemu-objects.h"
 +#include "event-tap.h"

  #ifdef CONFIG_BSD
  #include 
 @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState 
 *bs, int64_t sector_num,
      if (bdrv_check_request(bs, sector_num, nb_sectors))
          return NULL;

 +    if (bs->device_name && event_tap_is_on()) {
 +        return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
 +                                         cb, opaque);
 +    }
 +
      if (bs->dirty_bitmap) {
          blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
                                           opaque);
>>>
>>> Just noticed the context here... Does this patch break block migration
>>> when event-tap is on?
>>
>> I don't think so.  event-tap will call bdrv_aio_writev() upon
>> flushing requests and it shouldn't affect block-migration.  The
>> block written after the synchronization should be marked as dirty
>> and should be sent in the next round.  Am I missing the point?
>
> No, that makes sense. I don't have a complete understanding of the whole
> series yet, so there may be well more misunderstandings on my side.

It's OK.  I'm glad that you're reviewing.

>>> Another question that came to my mind is if we really hook everything we
>>> need. I think we'll need to have a hook in bdrv_flush as well. I don't
>>> know if you do hook qemu_aio_flush and friends -  does a call cause
>>> event-tap to flush its queue? If not, a call to qemu_aio_flush might
>>> hang qemu because it's waiting for requests to complete which are
>>> actually stuck in the event-tap queue.
>>
>> No it doesn't queue at event-tap.  Marcelo pointed that we should
>> hook bdrv_aio_flush to avoid requests inversion, that made sense
>> to me.  Do we need to hook bdrv_flush for that same reason?  If
>
> bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in
> general it seems likely that we need to do the same.

Hmm.  Because it's synchronous, we need to start synchronization
right away, and once done, flush requests queued in event-tap
then return.

>> we hook bdrv_flush and qemu_aio_flush, we're going loop forever
>> because the synchronization code is calling vm_stop that call
>> bdrv_flush_all and qemu_aio_flush.
>
> qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why
> we would loop forever. It just waits for AIO requests to complete.
>
> I just looked up the code and I think the situation is a bit different
> than I thought originally: qemu_aio_flush waits only for completion of
> requests which belong to a driver that has registered using
> qemu_aio_set_fd_handler. So this means that AIO requests queued in
> event-tap are not considered in-flight requests and we won't get stuck
> in a loop. Maybe we have no problem in fact. :-)

I had the same thoughts.  We don't have to hook qemu_aio_flush.

> On the other hand, e.g. migration relies on the fact that after a
> qemu_aio_flush, all AIO requests that the device model has submitted are
> completed. I think event-tap must take care that the requests which are
> queued are not forgotten to be migrated. (Maybe the code already
> considers this, I'm just writing down what comes to my mind...)

That's where event-tap is calling qemu_aio_flush.  It should be
almost same as for live migration.  Requests queued in event-tap
are replayed on the secondary side, that is the core design of
Kemari.

Thanks,

Yoshi

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.

2011-01-19 Thread Yoshiaki Tamura
2011/1/19 Kevin Wolf :
> Am 19.01.2011 14:04, schrieb Yoshiaki Tamura:
 +static void event_tap_blk_flush(EventTapBlkReq *blk_req)
 +{
 +    BlockDriverState *bs;
 +
 +    bs = bdrv_find(blk_req->device_name);
>>>
>>> Please store the BlockDriverState in blk_req. This code loops over all
>>> block devices and does a string comparison - and that for each request.
>>> You can also save the qemu_strdup() when creating the request.
>>>
>>> In the few places where you really need the device name (might be the
>>> case for load/save, I'm not sure), you can still get it from the
>>> BlockDriverState.
>>
>> I would do so for the primary side.  Although we haven't
>> implemented yet, we want to replay block requests from block
>> layer on the secondary side, and need device name to restore
>> BlockDriverState.
>
> Hm, I see. I'm not happy about it, but I don't have a suggestion right
> away how to avoid it.
>
>>>
 +
 +    if (blk_req->is_flush) {
 +        bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque);
>>>
>>> You need to handle errors. If bdrv_aio_flush returns NULL, call the
>>> callback with -EIO.
>>
>> I'll do so.
>>
>>>
 +        return;
 +    }
 +
 +    bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov,
 +                    blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb,
 +                    blk_req->reqs[0].opaque);
>>>
>>> Same here.
>>>
 +    bdrv_flush(bs);
>>>
>>> This looks really strange. What is this supposed to do?
>>>
>>> One point is that you write it immediately after bdrv_aio_write, so you
>>> get an fsync for which you don't know if it includes the current write
>>> request or if it doesn't. Which data do you want to get flushed to the disk?
>>
>> I was expecting to flush the aio request that was just initiated.
>> Am I misunderstanding the function?
>
> Seems so. The function names don't use really clear terminology either,
> so you're not the first one to fall in this trap. Basically we have:
>
> * qemu_aio_flush() waits for all AIO requests to complete. I think you
> wanted to have exactly this, but only for a single block device. Such a
> function doesn't exist yet.
>
> * bdrv_flush() makes sure that all successfully completed requests are
> written to disk (by calling fsync)
>
> * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run
> the fsync in the thread pool

Then what I wanted to do is, call qemu_aio_flush first, then
bdrv_flush.  It should be like live migration.

>
>>> The other thing is that you introduce a bdrv_flush for each request,
>>> basically forcing everyone to something very similar to writethrough
>>> mode. I'm sure this will have a big impact on performance.
>>
>> The reason is to avoid inversion of queued requests.  Although
>> processing one-by-one is heavy, wouldn't having requests flushed
>> to disk out of order break the disk image?
>
> No, that's fine. If a guest issues two requests at the same time, they
> may complete in any order. You just need to make sure that you don't
> call the completion callback before the request really has completed.

We need to flush requests, meaning aio and fsync, before sending
the final state of the guests, to make sure we can switch to the
secondary safely.

> I'm just starting to wonder if the guest won't timeout the requests if
> they are queued for too long. Even more, with IDE, it can only handle
> one request at a time, so not completing requests doesn't sound like a
> good idea at all. In what intervals is the event-tap queue flushed?

The requests are flushed once each transaction completes.  So
it's not with specific intervals.

> On the other hand, if you complete before actually writing out, you
> don't get timeouts, but you signal success to the guest when the request
> could still fail. What would you do in this case? With a writeback cache
> mode we're fine, we can just fail the next flush (until then nothing is
> guaranteed to be on disk and order doesn't matter either), but with
> cache=writethrough we're in serious trouble.
>
> Have you thought about this problem? Maybe we end up having to flush the
> event-tap queue for each single write in writethrough mode.

Yes, and that's what I'm trying to do at this point.  I know that
performance matters a lot, but sacrificing reliability over
performance now isn't a good idea.  I first want to lay the
ground, and then focus on optimization.  Note that without dirty
bitmap optimization, Kemari suffers a lot in sending rams.
Anthony and I discussed to take this approach at KVM Forum.

>>> Additionally, error handling is missing.
>>
>> I looked at the codes using bdrv_flush and realized some of them
>> doesn't handle errors, but scsi-disk.c does.  Should everyone
>> handle errors or depends on the usage?
>
> I added the return code only recently, it was a void function
> previously. Probably some error handling should be added to all of them.

Ah:)  Glad to hear 

Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB

2011-01-19 Thread Pierre Riteau
On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote:

> 2011/1/19 Pierre Riteau :
>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
>> value of bdrv_write and aborts migration when it fails. However, if the
>> size of the block device to migrate is not a multiple of BLOCK_SIZE
>> (currently 1 MB), the last bdrv_write will fail with -EIO.
>> 
>> Fixed by calling bdrv_write with the correct size of the last block.
>> ---
>>  block-migration.c |   16 +++-
>>  1 files changed, 15 insertions(+), 1 deletions(-)
>> 
>> diff --git a/block-migration.c b/block-migration.c
>> index 1475325..eeb9c62 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>> int64_t addr;
>> BlockDriverState *bs;
>> uint8_t *buf;
>> +int64_t total_sectors;
>> +int nr_sectors;
>> 
>> do {
>> addr = qemu_get_be64(f);
>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int 
>> version_id)
>> return -EINVAL;
>> }
>> 
>> +total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>> +if (total_sectors <= 0) {
>> +fprintf(stderr, "Error getting length of block device 
>> %s\n", device_name);
>> +return -EINVAL;
>> +}
>> +
>> +if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>> +nr_sectors = total_sectors - addr;
>> +} else {
>> +nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>> +}
>> +
>> buf = qemu_malloc(BLOCK_SIZE);
>> 
>> qemu_get_buffer(f, buf, BLOCK_SIZE);
>> -ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK);
>> +ret = bdrv_write(bs, addr, buf, nr_sectors);
>> 
>> qemu_free(buf);
>> if (ret < 0) {
>> --
>> 1.7.3.5
>> 
>> 
>> 
> 
> Hi Pierre,
> 
> I don't think the fix above is correct.  If you have a file which
> isn't aliened with BLOCK_SIZE, you won't get an error with the
> patch.  However, the receiver doesn't know how much sectors which
> the sender wants to be written, so the guest may fail after
> migration because some data may not be written.  IIUC, although
> changing bytestream should be prevented as much as possible, we
> should save/load total_sectors to check appropriate file is
> allocated on the receiver side.

Isn't the guest supposed to be started using a file with the correct size?
But I guess changing the protocol would be best as it would avoid headaches to 
people who mistakenly created a file that is too small.

> BTW, you should use error_report instead of fprintf(stderr, ...).

I didn't know that, I followed what was used in this file. Thank you.

-- 
Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France
http://perso.univ-rennes1.fr/pierre.riteau/




[Qemu-devel] [PATCH] pci/pcie: make pci_find_device() ARI aware.

2011-01-19 Thread Isaku Yamahata
make pci_find_device() ARI aware.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8d0e3df..851f350 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1596,11 +1596,18 @@ PCIBus *pci_find_bus(PCIBus *bus, int bus_num)
 
 PCIDevice *pci_find_device(PCIBus *bus, int bus_num, int slot, int function)
 {
+PCIDevice *d;
 bus = pci_find_bus(bus, bus_num);
 
 if (!bus)
 return NULL;
 
+d = bus->parent_dev;
+if (d && pci_is_express(d) &&
+pcie_cap_get_type(d) == PCI_EXP_TYPE_DOWNSTREAM &&
+!pcie_cap_is_ari_enabled(d) && slot > 0) {
+return NULL;
+}
 return bus->devices[PCI_DEVFN(slot, function)];
 }
 
-- 
1.7.1.1




[Qemu-devel] [PATCH] pci: use qemu_malloc() in pcibus_get_dev_path()

2011-01-19 Thread Isaku Yamahata
use qemu_malloc() instead of direct use of malloc().

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 851f350..86af0ee 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -2053,7 +2053,7 @@ static char *pcibus_get_dev_path(DeviceState *dev)
 path_len = domain_len + slot_len * slot_depth;
 
 /* Allocate memory, fill in the terminating null byte. */
-path = malloc(path_len + 1 /* For '\0' */);
+path = qemu_malloc(path_len + 1 /* For '\0' */);
 path[path_len] = '\0';
 
 /* First field is the domain. */
-- 
1.7.1.1




[Qemu-devel] [PATCH] pci: make pci_bus_new() aware of pci domain

2011-01-19 Thread Isaku Yamahata
This patch makes pci bus creation aware of pci domain.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c  |   19 ++-
 hw/pci.h  |7 ---
 hw/piix_pci.c |2 +-
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 86af0ee..e1e7b25 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -246,8 +246,11 @@ int pci_find_domain(const PCIBus *bus)
 return -1;
 }
 
+/* create root pci bus.
+ * If secondary pci bus is wanted, use pci_bridge_initfn()
+ */
 void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
- const char *name, int devfn_min)
+ const char *name, int domain, int devfn_min)
 {
 qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name);
 assert(PCI_FUNC(devfn_min) == 0);
@@ -255,18 +258,19 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 
 /* host bridge */
 QLIST_INIT(&bus->child);
-pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */
+pci_host_bus_register(domain, bus);
 
 vmstate_register(NULL, -1, &vmstate_pcibus, bus);
 }
 
-PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min)
+PCIBus *pci_bus_new(DeviceState *parent, const char *name,
+int domain, int devfn_min)
 {
 PCIBus *bus;
 
 bus = qemu_mallocz(sizeof(*bus));
 bus->qbus.qdev_allocated = 1;
-pci_bus_new_inplace(bus, parent, name, devfn_min);
+pci_bus_new_inplace(bus, parent, name, domain, devfn_min);
 return bus;
 }
 
@@ -292,13 +296,18 @@ void pci_bus_set_mem_base(PCIBus *bus, target_phys_addr_t 
base)
 bus->mem_base = base;
 }
 
+/* deprecated: kept for compatility of existing codes.
+ * pci_bus_new() and pci_bus_irqs() should be used for root pci bus
+ * like i440fx_init().
+ * pci_bridge_initfn() should be used for secondary pci bus
+ */
 PCIBus *pci_register_bus(DeviceState *parent, const char *name,
  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
  void *irq_opaque, int devfn_min, int nirq)
 {
 PCIBus *bus;
 
-bus = pci_bus_new(parent, name, devfn_min);
+bus = pci_bus_new(parent, name, 0 /* domain = 0 for compat */, devfn_min);
 pci_bus_irqs(bus, set_irq, map_irq, irq_opaque, nirq);
 return bus;
 }
diff --git a/hw/pci.h b/hw/pci.h
index bc8d5bb..a0fd953 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -229,14 +229,15 @@ typedef enum {
 typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev,
   PCIHotplugState state);
 void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
- const char *name, int devfn_min);
-PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min);
+ const char *name, int domain, int devfn_min);
+PCIBus *pci_bus_new(DeviceState *parent, const char *name,
+int domain, int devfn_min);
 void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
   void *irq_opaque, int nirq);
 void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn hotplug, DeviceState *dev);
 PCIBus *pci_register_bus(DeviceState *parent, const char *name,
  pci_set_irq_fn set_irq, pci_map_irq_fn map_irq,
- void *irq_opaque, int devfn_min, int nirq);
+ void *irq_opaque, int devfn_min, int nirq); /* 
deprecated */
 void pci_device_reset(PCIDevice *dev);
 void pci_bus_reset(PCIBus *bus);
 
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index 358da58..718983d 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -226,7 +226,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix3_devfn, qemu_irq *
 
 dev = qdev_create(NULL, "i440FX-pcihost");
 s = FROM_SYSBUS(I440FXState, sysbus_from_qdev(dev));
-b = pci_bus_new(&s->busdev.qdev, NULL, 0);
+b = pci_bus_new(&s->busdev.qdev, NULL, 0, 0);
 s->bus = b;
 qdev_init_nofail(dev);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 1/3] pci: deassert intx on reset.

2011-01-19 Thread Isaku Yamahata
deassert intx on device reset.
So far pci_device_reset() is used for system reset.
In that case, interrupt controller is reset at the same time so that
all irq is are deasserted.
But now pci bus reset/flr is supported, and in that case irq needs to be
disabled explicitly.

Signed-off-by: Isaku Yamahata 
---
 hw/pci.c |9 +
 hw/pci.h |2 ++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index e1e7b25..de6370f 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -137,6 +137,14 @@ static void pci_update_irq_status(PCIDevice *dev)
 }
 }
 
+void pci_device_deassert_intx(PCIDevice *dev)
+{
+int i;
+for (i = 0; i < PCI_NUM_PINS; ++i) {
+qemu_set_irq(dev->irq[i], 0);
+}
+}
+
 /*
  * This function is called on #RST and FLR.
  * FLR if PCI_EXP_DEVCTL_BCR_FLR is set
@@ -152,6 +160,7 @@ void pci_device_reset(PCIDevice *dev)
 
 dev->irq_state = 0;
 pci_update_irq_status(dev);
+pci_device_deassert_intx(dev);
 /* Clear all writeable bits */
 pci_word_test_and_clear_mask(dev->config + PCI_COMMAND,
  pci_get_word(dev->wmask + PCI_COMMAND) |
diff --git a/hw/pci.h b/hw/pci.h
index a0fd953..01c3285 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -265,6 +265,8 @@ void do_pci_info_print(Monitor *mon, const QObject *data);
 void do_pci_info(Monitor *mon, QObject **ret_data);
 void pci_bridge_update_mappings(PCIBus *b);
 
+void pci_device_deassert_intx(PCIDevice *dev);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
-- 
1.7.1.1




[Qemu-devel] [PATCH 2/3] msi: simply write config a bit.

2011-01-19 Thread Isaku Yamahata
use pci_device_deassert_intx().

Signed-off-by: Isaku Yamahata 
---
 hw/msi.c |5 +
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index f03f519..3dc3a24 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -255,7 +255,6 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, 
uint32_t val, int len)
 uint8_t log_max_vecs;
 unsigned int vector;
 uint32_t pending;
-int i;
 
 if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) {
 return;
@@ -296,9 +295,7 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, 
uint32_t val, int len)
  *   from using its INTx# pin (if implemented) to request
  *   service (MSI, MSI-X, and INTx# are mutually exclusive).
  */
-for (i = 0; i < PCI_NUM_PINS; ++i) {
-qemu_set_irq(dev->irq[i], 0);
-}
+pci_device_deassert_intx(dev);
 
 /*
  * nr_vectors might be set bigger than capable. So clamp it.
-- 
1.7.1.1




[Qemu-devel] [PATCH 0/3] pci: disable intx on flr/bus reset

2011-01-19 Thread Isaku Yamahata
So far pci_device_reset() is used for system reset.
In that case, interrupt controller is also reset so that
all irq is are deasserted.
But now pci bus reset/flr is supported, and in that case irq needs to be
disabled explicitly.

Isaku Yamahata (3):
  pci: deassert intx on reset.
  msi: simply write config a bit.
  msix: simply write config

 hw/msi.c  |5 +
 hw/msix.c |5 +
 hw/pci.c  |9 +
 hw/pci.h  |2 ++
 4 files changed, 13 insertions(+), 8 deletions(-)




  1   2   >