Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
On Wed, Jan 19, 2011 at 1:12 AM, Jamie Lokier wrote: > Chunqiang Tang wrote: >> Moreover, using a host file system not only adds overhead, but >> also introduces data integrity issues. Specifically, if I/Os uses O_DSYNC, >> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data >> integrity in the event of a host crash. See >> http://lwn.net/Articles/348739/ . > > You have the same issue with O_DIRECT when using a raw disk device > too. That is, O_DIRECT on a raw device does not guarantee integrity > in the event of a host crash either, for mostly the same reasons. QEMU has semantics that use O_DIRECT safely; there is no issue here. When a drive is added with cache=none, QEMU not only uses O_DIRECT but also advertises an enabled write cache to the guest. The guest *must* flush the cache when it wants to ensure data is stable. In the event of a host crash, all, some, or none of the I/O since the last flush may have made it to disk. Each of these possibilities is fair game since the guest may only depend on writes being on disk if they completed and a successful flush was issued afterwards. Stefan
[Qemu-devel] [PATCH 0/1] add spicevmc chardev
In additional to the description in the patch, see following email for uses: http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg01208.html In particular it would/could be used for: current vdagent of spice uses this for mouse, copy paste proposed smartcard device for qemu uses this with spice usb forwarding terminal forwarding qemu monitor forwarding Alon Levy (1): spice: add chardev (v5) Makefile.objs |2 +- qemu-char.c |4 + qemu-config.c |6 ++ qemu-options.hx | 16 - spice-qemu-char.c | 190 + trace-events |6 ++ ui/qemu-spice.h |3 + 7 files changed, 225 insertions(+), 2 deletions(-) create mode 100644 spice-qemu-char.c -- 1.7.3.4
[Qemu-devel] [PATCH 1/1] spice: add chardev (v5)
Adding a chardev backend for spice, where spice determines what to do with it based on the name attribute given during chardev creation. For usage by spice vdagent in conjunction with a properly named virtio-serial device, and future smartcard channel usage. Example usage: qemu -device virtio-serial -chardev spicevmc,name=vdagent,id=vdagent \ -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 v4->v5: * add tracing events * fix missing comma * fix help string to show debug is optional v3->v4: * updated commit message v1->v3 changes: (v2 had a wrong commit message) * removed spice-qemu-char.h, folded into ui/qemu-spice.h * removed dead IOCTL code * removed comment * removed ifdef CONFIG_SPICE from qemu-config.c and qemu-options.hx help. --- Makefile.objs |2 +- qemu-char.c |4 + qemu-config.c |6 ++ qemu-options.hx | 16 - spice-qemu-char.c | 190 + trace-events |6 ++ ui/qemu-spice.h |3 + 7 files changed, 225 insertions(+), 2 deletions(-) create mode 100644 spice-qemu-char.c diff --git a/Makefile.objs b/Makefile.objs index c3e52c5..b9e9ef6 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -105,7 +105,7 @@ common-obj-$(CONFIG_BRLAPI) += baum.o common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o common-obj-$(CONFIG_WIN32) += version.o -common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o ui/spice-display.o +common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o ui/spice-display.o spice-qemu-char.o audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o audio-obj-$(CONFIG_SDL) += sdlaudio.o diff --git a/qemu-char.c b/qemu-char.c index edc9ad6..acc7130 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -97,6 +97,7 @@ #endif #include "qemu_socket.h" +#include "ui/qemu-spice.h" #define READ_BUF_LEN 4096 @@ -2495,6 +2496,9 @@ static const struct { || defined(__FreeBSD_kernel__) { .name = "parport", .open = qemu_chr_open_pp }, #endif +#ifdef CONFIG_SPICE +{ .name = "spicevmc", .open = qemu_chr_open_spice }, +#endif }; CharDriverState *qemu_chr_open_opts(QemuOpts *opts, diff --git a/qemu-config.c b/qemu-config.c index 965fa46..323d3c2 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -146,6 +146,12 @@ static QemuOptsList qemu_chardev_opts = { },{ .name = "signal", .type = QEMU_OPT_BOOL, +},{ +.name = "name", +.type = QEMU_OPT_STRING, +},{ +.name = "debug", +.type = QEMU_OPT_NUMBER, }, { /* end of list */ } }, diff --git a/qemu-options.hx b/qemu-options.hx index 898561d..939297a 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1368,6 +1368,9 @@ DEF("chardev", HAS_ARG, QEMU_OPTION_chardev, #if defined(__linux__) || defined(__FreeBSD__) || defined(__DragonFly__) "-chardev parport,id=id,path=path[,mux=on|off]\n" #endif +#if defined(CONFIG_SPICE) +"-chardev spicevmc,id=id,name=name[,debug=debug]\n" +#endif , QEMU_ARCH_ALL ) @@ -1392,7 +1395,8 @@ Backend is one of: @option{stdio}, @option{braille}, @option{tty}, -@option{parport}. +@option{parport}, +@option{spicevmc}. The specific backend will determine the applicable options. All devices must have an id, which can be any string up to 127 characters long. @@ -1568,6 +1572,16 @@ Connect to a local parallel port. @option{path} specifies the path to the parallel port device. @option{path} is required. +#if defined(CONFIG_SPICE) +@item -chardev spicevmc ,id=@var{id} ,debug=@var{debug}, name=@var{name} + +@option{debug} debug level for spicevmc + +@option{name} name of spice channel to connect to + +Connect to a spice virtual machine channel, such as vdiport. +#endif + @end table ETEXI diff --git a/spice-qemu-char.c b/spice-qemu-char.c new file mode 100644 index 000..517f337 --- /dev/null +++ b/spice-qemu-char.c @@ -0,0 +1,190 @@ +#include "config-host.h" +#include "trace.h" +#include "ui/qemu-spice.h" +#include +#include + +#include "osdep.h" + +#define dprintf(_scd, _level, _fmt, ...)\ +do {\ +static unsigned __dprintf_counter = 0; \ +if (_scd->debug >= _level) {\ +fprintf(stderr, "scd: %3d: " _fmt, ++__dprintf_counter, ## __VA_ARGS__);\ +} \ +} while (0) + +#define VMC_MAX_HOST_WRITE2048 + +typedef struct SpiceCharDriver { +CharDriverState* chr; +SpiceCharDeviceInstance sin; +char *subtype; +bool active; +uint8_t *buffer; +uint8_t *datapos; +ssize_t bufsize, datalen; +uint32_t debug; +} SpiceCharDriver; + +sta
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: > event-tap function is called only when it is on, and requests sent > from device emulators. > > Signed-off-by: Yoshiaki Tamura > --- > block.c | 11 +++ > 1 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/block.c b/block.c > index ff2795b..85bd8b8 100644 > --- a/block.c > +++ b/block.c > @@ -28,6 +28,7 @@ > #include "block_int.h" > #include "module.h" > #include "qemu-objects.h" > +#include "event-tap.h" > > #ifdef CONFIG_BSD > #include > @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState > *bs, int64_t sector_num, > if (bdrv_check_request(bs, sector_num, nb_sectors)) > return NULL; > > +if (bs->device_name && event_tap_is_on()) { bs->device_name is a pointer to a char array contained in bs, so it's never NULL. You probably mean *bs->device_name? Kevin
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: > event-tap controls when to start FT transaction, and provides proxy > functions to called from net/block devices. While FT transaction, it > queues up net/block requests, and flush them when the transaction gets > completed. > > Signed-off-by: Yoshiaki Tamura > Signed-off-by: OHMURA Kei One general comment: On the first glance this seems to mix block and net (and some other things) arbitrarily instead of having a section for handling all block stuff, then network, etc. Is there a specific reason for the order in which you put the functions? If not, maybe reordering them might improve readability. > --- > Makefile.target |1 + > event-tap.c | 847 > +++ > event-tap.h | 42 +++ > qemu-tool.c | 24 ++ > trace-events|9 + > 5 files changed, 923 insertions(+), 0 deletions(-) > create mode 100644 event-tap.c > create mode 100644 event-tap.h > > diff --git a/Makefile.target b/Makefile.target > index e15b1c4..f36cd75 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -199,6 +199,7 @@ obj-y += rwhandler.o > obj-$(CONFIG_KVM) += kvm.o kvm-all.o > obj-$(CONFIG_NO_KVM) += kvm-stub.o > LIBS+=-lz > +obj-y += event-tap.o > > QEMU_CFLAGS += $(VNC_TLS_CFLAGS) > QEMU_CFLAGS += $(VNC_SASL_CFLAGS) > diff --git a/event-tap.c b/event-tap.c > new file mode 100644 > index 000..f492708 > --- /dev/null > +++ b/event-tap.c > @@ -0,0 +1,847 @@ > +/* > + * Event Tap functions for QEMU > + * > + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + */ > + > +#include "qemu-common.h" > +#include "qemu-error.h" > +#include "block.h" > +#include "block_int.h" > +#include "ioport.h" > +#include "osdep.h" > +#include "sysemu.h" > +#include "hw/hw.h" > +#include "net.h" > +#include "event-tap.h" > +#include "trace.h" > + > +enum EVENT_TAP_STATE { > +EVENT_TAP_OFF, > +EVENT_TAP_ON, > +EVENT_TAP_FLUSH, > +EVENT_TAP_LOAD, > +EVENT_TAP_REPLAY, > +}; > + > +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF; > +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */ Indeed, bdrv_aio_cancel will segfault this way. If you use dummies instead of real ACBs the only way to correctly implement bdrv_aio_cancel is waiting for all in-flight AIOs (qemu_aio_flush). > +typedef struct EventTapIOport { > +uint32_t address; > +uint32_t data; > +int index; > +} EventTapIOport; > + > +#define MMIO_BUF_SIZE 8 > + > +typedef struct EventTapMMIO { > +uint64_t address; > +uint8_t buf[MMIO_BUF_SIZE]; > +int len; > +} EventTapMMIO; > + > +typedef struct EventTapNetReq { > +char *device_name; > +int iovcnt; > +struct iovec *iov; > +int vlan_id; > +bool vlan_needed; > +bool async; > +NetPacketSent *sent_cb; > +} EventTapNetReq; > + > +#define MAX_BLOCK_REQUEST 32 > + > +typedef struct EventTapBlkReq { > +char *device_name; > +int num_reqs; > +int num_cbs; > +bool is_flush; > +BlockRequest reqs[MAX_BLOCK_REQUEST]; > +BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST]; > +void *opaque[MAX_BLOCK_REQUEST]; > +} EventTapBlkReq; > + > +#define EVENT_TAP_IOPORT (1 << 0) > +#define EVENT_TAP_MMIO (1 << 1) > +#define EVENT_TAP_NET(1 << 2) > +#define EVENT_TAP_BLK(1 << 3) > + > +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1) > + > +typedef struct EventTapLog { > +int mode; > +union { > +EventTapIOport ioport; > +EventTapMMIO mmio; > +}; > +union { > +EventTapNetReq net_req; > +EventTapBlkReq blk_req; > +}; > +QTAILQ_ENTRY(EventTapLog) node; > +} EventTapLog; > + > +static EventTapLog *last_event_tap; > + > +static QTAILQ_HEAD(, EventTapLog) event_list; > +static QTAILQ_HEAD(, EventTapLog) event_pool; > + > +static int (*event_tap_cb)(void); > +static QEMUBH *event_tap_bh; > +static VMChangeStateEntry *vmstate; > + > +static void event_tap_bh_cb(void *p) > +{ > +if (event_tap_cb) { > +event_tap_cb(); > +} > + > +qemu_bh_delete(event_tap_bh); > +event_tap_bh = NULL; > +} > + > +static void event_tap_schedule_bh(void) > +{ > +trace_event_tap_ignore_bh(!!event_tap_bh); > + > +/* if bh is already set, we ignore it for now */ > +if (event_tap_bh) { > +return; > +} > + > +event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL); > +qemu_bh_schedule(event_tap_bh); > + > +return ; > +} > + > +static void event_tap_alloc_net_req(EventTapNetReq *net_req, > + VLANClientState *vc, > + const struct iovec *iov, int iovcnt, > + NetPacketSent *sent_cb, bool async) > +{ > +int i; > + > +net_req->iovcnt = iovcnt; > +net_req->async
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: > event-tap function is called only when it is on, and requests sent > from device emulators. > > Signed-off-by: Yoshiaki Tamura > --- > block.c | 11 +++ > 1 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/block.c b/block.c > index ff2795b..85bd8b8 100644 > --- a/block.c > +++ b/block.c > @@ -28,6 +28,7 @@ > #include "block_int.h" > #include "module.h" > #include "qemu-objects.h" > +#include "event-tap.h" > > #ifdef CONFIG_BSD > #include > @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState > *bs, int64_t sector_num, > if (bdrv_check_request(bs, sector_num, nb_sectors)) > return NULL; > > +if (bs->device_name && event_tap_is_on()) { > +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, > + cb, opaque); > +} > + > if (bs->dirty_bitmap) { > blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb, > opaque); Just noticed the context here... Does this patch break block migration when event-tap is on? Another question that came to my mind is if we really hook everything we need. I think we'll need to have a hook in bdrv_flush as well. I don't know if you do hook qemu_aio_flush and friends - does a call cause event-tap to flush its queue? If not, a call to qemu_aio_flush might hang qemu because it's waiting for requests to complete which are actually stuck in the event-tap queue. Kevin
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/18/11 18:09, Anthony Liguori wrote: On 01/18/2011 10:56 AM, Jan Kiszka wrote: The device model topology is 100% a hidden architectural detail. This is true for the sysbus, it is obviously not the case for PCI and similarly discoverable buses. There we have a guest-explorable topology that is currently equivalent to the the qdev layout. But we also don't do PCI passthrough so we really haven't even explored how that maps in qdev. I don't know if qemu-kvm has attempted to qdev-ify it. It is qdev-ified. It is a normal pci device from qdev's point of view. BTW: is there any reason why (vfio-based) pci passthrough couldn't work with tcg? The -device interface is a stable interface. Right now, you don't specify any type of identifier of the pci bus when you create a PCI device. It's implied in the interface. Wrong. You can specify the bus you want attach the device to via bus=. This is true for *every* device, including all pci devices. If unspecified qdev uses the first bus it finds. As long as there is a single pci bus only there is simply no need to specify it, thats why nobody does that today. Once q35 finally arrives this will change of course. cheers, Gerd
Re: [Qemu-devel] [PATCH 2/2] strtosz(): Use suffix macros in switch() statement
On 01/18/11 21:39, Anthony Liguori wrote: > On 01/18/2011 02:36 PM, Jes Sorensen wrote: >> On 01/18/11 21:30, Anthony Liguori wrote: >>> On 01/18/2011 10:53 AM, Eric Blake wrote: And it does, via the toupper() added earlier in the series (and which has separately been pointed out that using qemu_toupper() might be nicer). >>> Ok. Just taking the different case labels would be nicer IMHO. >>> >> The old code did that, but I was suggested to do it this way, which I >> think is cleaner too. Fewer lines of code are easier to read. >> > toupper() is based on locale so it's not consistent. If you can show me an actually used locale where the toupper() on k/m/g/t doesn't result in K/M/G/T then I guess there's a case. Otherwise I don't really see this being a real point. I think we are hitting the point where it's about who's taste is better, and not about the actual code in this discussion. One point in favor of the patch is this: Without the patch: jes@red-feather qemu]$ size cutils.o textdata bss dec hex filename 4212 0 042121074 cutils.o With patch: [jes@red-feather qemu]$ size cutils.o textdata bss dec hex filename 4196 0 041961064 cutils.o IMHO it makes the code easier to read, but beyond that there isn't much. If people are strongly against it, I'll just drop the patch. It's not worth our time arguing over this level of detail. Otherwise I'd like to see it applied. Cheers, Jes
Re: [Qemu-devel] [V3 PATCH 5/8] virtio-9p: Create support in chroot environment
Hi Blue Swirl, Thanks for your review comments. I will address these in my next version of patchset. M. Mohan Kumar On Tuesday 18 January 2011 10:38:21 pm Blue Swirl wrote: > On Tue, Jan 18, 2011 at 6:25 AM, M. Mohan Kumar wrote: > > Add both server & client side interfaces to create regular files in > > chroot environment > > > > Signed-off-by: M. Mohan Kumar > > --- > > hw/9pfs/virtio-9p-chroot.c | 42 > > ++ hw/9pfs/virtio-9p-local.c | > > 22 -- > > 2 files changed, 62 insertions(+), 2 deletions(-) > > > > diff --git a/hw/9pfs/virtio-9p-chroot.c b/hw/9pfs/virtio-9p-chroot.c > > index b599e23..e7f85e2 100644 > > --- a/hw/9pfs/virtio-9p-chroot.c > > +++ b/hw/9pfs/virtio-9p-chroot.c > > @@ -193,6 +193,42 @@ static void chroot_do_open(V9fsFileObjectRequest > > *request, FdInfo *fd_info) } > > } > > > > +/* > > + * Helper routine to create a file and return the file descriptor and > > + * error status in FdInfo structure. > > + */ > > +static void chroot_do_create(V9fsFileObjectRequest *request, FdInfo > > *fd_info) +{ > > +int cur_uid, cur_gid; > > uid_t cur_uid; > gid_t cur_gid; > > > + > > +cur_uid = geteuid(); > > +cur_gid = getegid(); > > + > > +fd_info->fi_fd = -1; > > + > > +if (setfsuid(request->data.uid) < 0) { > > +fd_info->fi_error = errno; > > +return; > > +} > > +if (setfsgid(request->data.gid) < 0) { > > +fd_info->fi_error = errno; > > +goto unset_uid; > > +} > > + > > +fd_info->fi_fd = open(request->path.path, request->data.flags, > > +request->data.mode); > > + > > +if (fd_info->fi_fd < 0) { > > +fd_info->fi_error = errno; > > +} else { > > +fd_info->fi_error = 0; > > +} > > + > > +setfsgid(cur_gid); > > +unset_uid: > > +setfsuid(cur_uid); > > +} > > + > > static int chroot_daemonize(int chroot_sock) > > { > > sigset_t sigset; > > @@ -276,6 +312,12 @@ int v9fs_chroot(FsContext *fs_ctx) > > error = -2; > > } > > break; > > +case T_CREATE: > > +chroot_do_create(&request, &fd_info); > > +if (chroot_sendfd(chroot_sock, &fd_info) <= 0) { > > +error = -2; > > +} > > +break; > > default: > > break; > > } > > diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c > > index 2376ec2..7f39b40 100644 > > --- a/hw/9pfs/virtio-9p-local.c > > +++ b/hw/9pfs/virtio-9p-local.c > > @@ -52,6 +52,23 @@ static int __open(FsContext *fs_ctx, const char *path, > > int flags) return fd; > > } > > > > +static int __create(FsContext *fs_ctx, const char *path, int flags, > > Please don't use identifiers starting with underscores.
Re: [Qemu-devel] [sparc] Floating point exception issue
On 18/01/11 21:51, Blue Swirl wrote: On Tue, Jan 18, 2011 at 6:00 PM, Mateusz Loskot wrote: On 18/01/11 17:36, Blue Swirl wrote: On Tue, Jan 18, 2011 at 3:27 PM, Mateusz Loskot wrote: Hi, Recently, I have reported mysterious issues on NetBSD 5.1 emulated on SPARC. The whole first thread is here: http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01509.html I decided to investigate the problem deeper and with great help from NetBSD folks, I managed to find reproducible test case. Initially, it was AWK command: # echo NaN | awk '{print "test"}' awk: floating point exception 8 source line number 1 and next it boiled down to simple C program (see below). Details of the investigation are archived in the NetBSD Problem Report #44389 here: http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=44389 Here is final version of the test program which reproduces the problem: #include #include #include #include int is_number(const char *s) { double r; char *ep; errno = 0; r = strtod(s,&ep); if (r == HUGE_VAL) printf("X:%g\n", r); if (ep == s || r == HUGE_VAL || errno == ERANGE) return 0; while (*ep == ' ' || *ep == '\t' || *ep == '\n') ep++; if (*ep == '\0') return 1; else return 0; } int main(int argc, char **argv) { double v; if (is_number("NaN")) { printf("is a number\n"); v = atof("NaN"); } else { printf("not a number\n"); v = 0.0; } printf("%.4f\n", v); return 0; } On NetBSD/SPARC, the program receives SIGFPE: $ gcc ./nan_test_2.c $ ./a.out [1] Floating point exception (core dumped) ./a.out Specifically, it's caused by r == HUGE_VAL condition in if (ep == s || r == HUGE_VAL || errno == ERANGE) where r is NaN. All the signs indicate there is a bug in QEMU. I'll install 5.1, but on 4.0 which I had installed, the program works fine: $ ./sigfpe is a number nan I just tested on NetBSD 5.0/SPARC under QEMU 0.13 (same version I use with NetBSD 5.1/SPARC) and it works well indeed: mloskot@qemu-netbsd-50-sparc:~/tmp# ./a.out is a number nan mloskot@qemu-netbsd-50-sparc:~/tmp# Hmm, this is becoming interesting. I run QEMU 0.13 on Windows Vista (64-bit). Perhaps host system and QEMU binaries are relevant here. I will try on Linux host system later tonight. BTW, here are my images: http://mateusz.loskot.net/tmp/qemu/ The problem was with NaN handling in fcmped instruction. I've committed a patch that fixes the problem, please test. I'm having problems with building qemu under MinGW, so I will need to wait for someone to update binaries. Stefan, are you planning to update your binaries? I would be grateful. Thanks for reporting and the test case. Thanks for fixing! Best regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
2011/1/19 Kevin Wolf : > Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: >> event-tap function is called only when it is on, and requests sent >> from device emulators. >> >> Signed-off-by: Yoshiaki Tamura >> --- >> block.c | 11 +++ >> 1 files changed, 11 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index ff2795b..85bd8b8 100644 >> --- a/block.c >> +++ b/block.c >> @@ -28,6 +28,7 @@ >> #include "block_int.h" >> #include "module.h" >> #include "qemu-objects.h" >> +#include "event-tap.h" >> >> #ifdef CONFIG_BSD >> #include >> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState >> *bs, int64_t sector_num, >> if (bdrv_check_request(bs, sector_num, nb_sectors)) >> return NULL; >> >> + if (bs->device_name && event_tap_is_on()) { > > bs->device_name is a pointer to a char array contained in bs, so it's > never NULL. You probably mean *bs->device_name? Yes, thanks for pointing out. It didn't expose because event_tap_is_on() was false upon flushing after synchronization. Yoshi > > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
[Qemu-devel] [PATCH 0/7] virtio-serial: Don't copy guest buf to host, flow control
Hello, This series is now separated from the chardev flow control series. The virtio-serial code now does not copy over data from the guest to the host. It instead keeps track of how far we are in consuming the data and maintains this state. For flow control, when a user of the virtio-serial port signals it has consumed less data than given, port throttling is enabled. The consumer can then later disable throttling and we can re-start sending the data from where we left off. Finally, new fields introduced are added to the save/restore section to preserve state across live migrations. Please apply. Amit Shah (7): virtio-console: Factor out common init between console and generic ports virtio-console: Remove unnecessary braces virtio-serial-bus: separate out discard logic in a separate function virtio-serial: Don't copy over guest buffer to host virtio-serial: Let virtio-serial-bus know if all data was consumed virtio-serial: Add support for flow control virtio-serial: save/restore new fields in port struct hw/virtio-console.c| 38 +++ hw/virtio-serial-bus.c | 123 +++- hw/virtio-serial.h | 24 - 3 files changed, 139 insertions(+), 46 deletions(-) -- 1.7.3.4
[Qemu-devel] [PATCH 2/7] virtio-console: Remove unnecessary braces
Remove unnecessary braces around a case statement. Signed-off-by: Amit Shah --- hw/virtio-console.c |3 +-- 1 files changed, 1 insertions(+), 2 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index d7fe68b..d0b9354 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -48,10 +48,9 @@ static void chr_event(void *opaque, int event) VirtConsole *vcon = opaque; switch (event) { -case CHR_EVENT_OPENED: { +case CHR_EVENT_OPENED: virtio_serial_open(&vcon->port); break; -} case CHR_EVENT_CLOSED: virtio_serial_close(&vcon->port); break; -- 1.7.3.4
[Qemu-devel] [PATCH 3/7] virtio-serial-bus: separate out discard logic in a separate function
Instead of combining flush logic into the discard case and not discard case, have one function doing discard case. This will help later when adding flow control logic to the do_flush_queued_data() function. Signed-off-by: Amit Shah --- hw/virtio-serial-bus.c | 47 ++- 1 files changed, 30 insertions(+), 17 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 74ba5ec..e8c2a16 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -113,39 +113,48 @@ static size_t write_to_port(VirtIOSerialPort *port, return offset; } +static void discard_vq_data(VirtQueue *vq, VirtIODevice *vdev) +{ +VirtQueueElement elem; + +while (virtqueue_pop(vq, &elem)) { +virtqueue_push(vq, &elem, 0); +} +virtio_notify(vdev, vq); +} + static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, - VirtIODevice *vdev, bool discard) + VirtIODevice *vdev) { VirtQueueElement elem; -assert(port || discard); +assert(port); assert(virtio_queue_ready(vq)); -while ((discard || !port->throttled) && virtqueue_pop(vq, &elem)) { +while (!port->throttled && virtqueue_pop(vq, &elem)) { uint8_t *buf; size_t ret, buf_size; -if (!discard) { -buf_size = iov_size(elem.out_sg, elem.out_num); -buf = qemu_malloc(buf_size); -ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size); +buf_size = iov_size(elem.out_sg, elem.out_num); +buf = qemu_malloc(buf_size); +ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size); + +port->info->have_data(port, buf, ret); +qemu_free(buf); -port->info->have_data(port, buf, ret); -qemu_free(buf); -} virtqueue_push(vq, &elem, 0); } virtio_notify(vdev, vq); } -static void flush_queued_data(VirtIOSerialPort *port, bool discard) +static void flush_queued_data(VirtIOSerialPort *port) { assert(port); if (!virtio_queue_ready(port->ovq)) { return; } -do_flush_queued_data(port, port->ovq, &port->vser->vdev, discard); +do_flush_queued_data(port, port->ovq, &port->vser->vdev); } static size_t send_control_msg(VirtIOSerialPort *port, void *buf, size_t len) @@ -204,7 +213,7 @@ int virtio_serial_close(VirtIOSerialPort *port) * consume, reset the throttling flag and discard the data. */ port->throttled = false; -flush_queued_data(port, true); +discard_vq_data(port->ovq, &port->vser->vdev); send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 0); @@ -258,7 +267,7 @@ void virtio_serial_throttle_port(VirtIOSerialPort *port, bool throttle) return; } -flush_queued_data(port, false); +flush_queued_data(port); } /* Guest wants to notify us of some event */ @@ -414,11 +423,15 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq) discard = true; } -if (!discard && port->throttled) { +if (discard) { +discard_vq_data(vq, vdev); +return; +} +if (port->throttled) { return; } -do_flush_queued_data(port, vq, vdev, discard); +do_flush_queued_data(port, vq, vdev); } static void handle_input(VirtIODevice *vdev, VirtQueue *vq) @@ -634,7 +647,7 @@ static void remove_port(VirtIOSerial *vser, uint32_t port_id) port = find_port_by_id(vser, port_id); /* Flush out any unconsumed buffers first */ -flush_queued_data(port, true); +discard_vq_data(port->ovq, &port->vser->vdev); send_control_event(port, VIRTIO_CONSOLE_PORT_REMOVE, 1); } -- 1.7.3.4
[Qemu-devel] [PATCH 1/7] virtio-console: Factor out common init between console and generic ports
The initialisation for generic ports and console ports is similar. Factor out the parts that are the same in a different function that can be called from each of the initfns. Signed-off-by: Amit Shah --- hw/virtio-console.c | 31 ++- 1 files changed, 14 insertions(+), 17 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index caea11f..d7fe68b 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -58,24 +58,28 @@ static void chr_event(void *opaque, int event) } } -/* Virtio Console Ports */ -static int virtconsole_initfn(VirtIOSerialDevice *dev) +static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev) { -VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev); -VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); - -port->info = dev->info; - -port->is_console = true; +vcon->port.info = dev->info; if (vcon->chr) { qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event, vcon); -port->info->have_data = flush_buf; +vcon->port.info->have_data = flush_buf; } return 0; } +/* Virtio Console Ports */ +static int virtconsole_initfn(VirtIOSerialDevice *dev) +{ +VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev); +VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); + +port->is_console = true; +return generic_port_init(vcon, dev); +} + static int virtconsole_exitfn(VirtIOSerialDevice *dev) { VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev); @@ -115,14 +119,7 @@ static int virtserialport_initfn(VirtIOSerialDevice *dev) VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev); VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); -port->info = dev->info; - -if (vcon->chr) { -qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event, - vcon); -port->info->have_data = flush_buf; -} -return 0; +return generic_port_init(vcon, dev); } static VirtIOSerialPortInfo virtserialport_info = { -- 1.7.3.4
[Qemu-devel] [PATCH 4/7] virtio-serial: Don't copy over guest buffer to host
When the guest writes something to a host, we copied over the entire buffer first into the host and then processed it. Do away with that, it could result in a malicious guest causing a DoS on the host. Reported-by: Paul Brook Signed-off-by: Amit Shah --- hw/virtio-serial-bus.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index e8c2a16..6726f72 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -132,16 +132,17 @@ static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, assert(virtio_queue_ready(vq)); while (!port->throttled && virtqueue_pop(vq, &elem)) { -uint8_t *buf; -size_t ret, buf_size; +unsigned int i; -buf_size = iov_size(elem.out_sg, elem.out_num); -buf = qemu_malloc(buf_size); -ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size); +for (i = 0; i < elem.out_num; i++) { +size_t buf_size; -port->info->have_data(port, buf, ret); -qemu_free(buf); +buf_size = elem.out_sg[i].iov_len; +port->info->have_data(port, + elem.out_sg[i].iov_base, + buf_size); +} virtqueue_push(vq, &elem, 0); } virtio_notify(vdev, vq); -- 1.7.3.4
[Qemu-devel] [PATCH 5/7] virtio-serial: Let virtio-serial-bus know if all data was consumed
The have_data() API to hand off guest data to apps using virtio-serial so far assumed all the data was consumed. Relax this assumption. Future commits will allow for incomplete writes. Signed-off-by: Amit Shah --- hw/virtio-console.c |4 ++-- hw/virtio-serial.h |7 --- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/virtio-console.c b/hw/virtio-console.c index d0b9354..62624ec 100644 --- a/hw/virtio-console.c +++ b/hw/virtio-console.c @@ -20,11 +20,11 @@ typedef struct VirtConsole { /* Callback function that's called when the guest sends us data */ -static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len) +static ssize_t flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len) { VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port); -qemu_chr_write(vcon->chr, buf, len); +return qemu_chr_write(vcon->chr, buf, len); } /* Readiness of the guest to accept data on a port */ diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h index ff08c40..9cc0fb3 100644 --- a/hw/virtio-serial.h +++ b/hw/virtio-serial.h @@ -137,10 +137,11 @@ struct VirtIOSerialPortInfo { /* * Guest wrote some data to the port. This data is handed over to - * the app via this callback. The app is supposed to consume all - * the data that is presented to it. + * the app via this callback. The app can return a size less than + * 'len'. In this case, throttling will be enabled for this port. */ -void (*have_data)(VirtIOSerialPort *port, const uint8_t *buf, size_t len); +ssize_t (*have_data)(VirtIOSerialPort *port, const uint8_t *buf, + size_t len); }; /* Interface to the virtio-serial bus */ -- 1.7.3.4
[Qemu-devel] [PATCH 6/7] virtio-serial: Add support for flow control
This commit lets apps signal an incomplete write. When that happens, stop sending out any more data to the app and wait for it to unthrottle the port. Signed-off-by: Amit Shah --- hw/virtio-serial-bus.c | 49 +-- hw/virtio-serial.h | 17 2 files changed, 55 insertions(+), 11 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 6726f72..32938b5 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -126,24 +126,49 @@ static void discard_vq_data(VirtQueue *vq, VirtIODevice *vdev) static void do_flush_queued_data(VirtIOSerialPort *port, VirtQueue *vq, VirtIODevice *vdev) { -VirtQueueElement elem; - assert(port); assert(virtio_queue_ready(vq)); -while (!port->throttled && virtqueue_pop(vq, &elem)) { +while (!port->throttled) { unsigned int i; -for (i = 0; i < elem.out_num; i++) { -size_t buf_size; - -buf_size = elem.out_sg[i].iov_len; +/* Pop an elem only if we haven't left off a previous one mid-way */ +if (!port->elem.out_num) { +if (!virtqueue_pop(vq, &port->elem)) { +break; +} +port->iov_idx = 0; +port->iov_offset = 0; +} -port->info->have_data(port, - elem.out_sg[i].iov_base, - buf_size); +for (i = port->iov_idx; i < port->elem.out_num; i++) { +size_t buf_size; +ssize_t ret; + +buf_size = port->elem.out_sg[i].iov_len - port->iov_offset; +ret = port->info->have_data(port, +port->elem.out_sg[i].iov_base + + port->iov_offset, +buf_size); +if (ret < 0 && ret != -EAGAIN) { +/* We don't handle any other type of errors here */ +abort(); +} +if (ret == -EAGAIN || (ret >= 0 && ret < buf_size)) { +virtio_serial_throttle_port(port, true); +port->iov_idx = i; +if (ret > 0) { +port->iov_offset += ret; +} +break; +} +port->iov_offset = 0; } -virtqueue_push(vq, &elem, 0); +if (port->throttled) { +break; +} +virtqueue_push(vq, &port->elem, 0); +port->elem.out_num = 0; } virtio_notify(vdev, vq); } @@ -709,6 +734,8 @@ static int virtser_port_qdev_init(DeviceState *qdev, DeviceInfo *base) port->guest_connected = true; } +port->elem.out_num = 0; + QTAILQ_INSERT_TAIL(&port->vser->ports, port, next); port->ivq = port->vser->ivqs[port->id]; port->ovq = port->vser->ovqs[port->id]; diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h index 9cc0fb3..a308196 100644 --- a/hw/virtio-serial.h +++ b/hw/virtio-serial.h @@ -102,6 +102,23 @@ struct VirtIOSerialPort { */ uint32_t id; +/* + * This is the elem that we pop from the virtqueue. A slow + * backend that consumes guest data (e.g. the file backend for + * qemu chardevs) can cause the guest to block till all the output + * is flushed. This isn't desired, so we keep a note of the last + * element popped and continue consuming it once the backend + * becomes writable again. + */ +VirtQueueElement elem; + +/* + * The index and the offset into the iov buffer that was popped in + * elem above. + */ +uint32_t iov_idx; +uint64_t iov_offset; + /* Identify if this is a port that binds with hvc in the guest */ uint8_t is_console; -- 1.7.3.4
[Qemu-devel] [PATCH 7/7] virtio-serial: save/restore new fields in port struct
The new fields that got added as part of not copying over the guest buffer to the host need to be saved/restored across migration. Do that and bump up the version number. Signed-off-by: Amit Shah --- hw/virtio-serial-bus.c | 42 -- 1 files changed, 40 insertions(+), 2 deletions(-) diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c index 32938b5..2ca97a9 100644 --- a/hw/virtio-serial-bus.c +++ b/hw/virtio-serial-bus.c @@ -527,9 +527,24 @@ static void virtio_serial_save(QEMUFile *f, void *opaque) * Items in struct VirtIOSerialPort. */ QTAILQ_FOREACH(port, &s->ports, next) { +uint32_t elem_popped; + qemu_put_be32s(f, &port->id); qemu_put_byte(f, port->guest_connected); qemu_put_byte(f, port->host_connected); + + elem_popped = 0; +if (port->elem.out_num) { +elem_popped = 1; +} +qemu_put_be32s(f, &elem_popped); +if (elem_popped) { +qemu_put_be32s(f, &port->iov_idx); +qemu_put_be64s(f, &port->iov_offset); + +qemu_put_buffer(f, (unsigned char *)&port->elem, +sizeof(port->elem)); +} } } @@ -540,7 +555,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) uint32_t max_nr_ports, nr_active_ports, ports_map; unsigned int i; -if (version_id > 2) { +if (version_id > 3) { return -EINVAL; } @@ -593,6 +608,29 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id) send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, port->host_connected); } + +if (version_id > 2) { +uint32_t elem_popped; + +qemu_get_be32s(f, &elem_popped); +if (elem_popped) { +qemu_get_be32s(f, &port->iov_idx); +qemu_get_be64s(f, &port->iov_offset); + +qemu_get_buffer(f, (unsigned char *)&port->elem, +sizeof(port->elem)); +virtqueue_map_sg(port->elem.in_sg, port->elem.in_addr, + port->elem.in_num, 1); +virtqueue_map_sg(port->elem.out_sg, port->elem.out_addr, + port->elem.out_num, 1); + +/* + * Port was throttled on source machine. Let's + * unthrottle it here so data starts flowing again. + */ +virtio_serial_throttle_port(port, false); +} +} } return 0; } @@ -841,7 +879,7 @@ VirtIODevice *virtio_serial_init(DeviceState *dev, uint32_t max_nr_ports) * Register for the savevm section with the virtio-console name * to preserve backward compat */ -register_savevm(dev, "virtio-console", -1, 2, virtio_serial_save, +register_savevm(dev, "virtio-console", -1, 3, virtio_serial_save, virtio_serial_load, vser); return vdev; -- 1.7.3.4
[Qemu-devel] Missing roms/seabios/Makefile and roms/vgabios/Makefile
Hi, Running ./configure (under MinGW/MSYS) and symlink gives up trying to create links to non-existing Makefiles in roms/seabios/Makefile roms/vgabios/Makefile Quick fix is to remove those Makefiles from the FILES list: diff --git a/configure b/configure index d68f862..92e2527 100755 --- a/configure +++ b/configure @@ -3235,7 +3235,6 @@ DIRS="$DIRS fsdev ui" FILES="Makefile tests/Makefile" FILES="$FILES tests/cris/Makefile tests/cris/.gdbinit" FILES="$FILES pc-bios/optionrom/Makefile pc-bios/keymaps" -FILES="$FILES roms/seabios/Makefile roms/vgabios/Makefile" for bios_file in $source_path/pc-bios/*.bin $source_path/pc-bios/*.dtb $source_path/pc-bios/openbios-*; do FILES="$FILES pc-bios/`basename $bios_file`" done Regards, -- Mateusz Loskot, http://mateusz.loskot.net Charter Member of OSGeo, http://osgeo.org Member of ACCU, http://accu.org
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
2011/1/19 Kevin Wolf : > Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: >> event-tap controls when to start FT transaction, and provides proxy >> functions to called from net/block devices. While FT transaction, it >> queues up net/block requests, and flush them when the transaction gets >> completed. >> >> Signed-off-by: Yoshiaki Tamura >> Signed-off-by: OHMURA Kei > > One general comment: On the first glance this seems to mix block and net > (and some other things) arbitrarily instead of having a section for > handling all block stuff, then network, etc. > > Is there a specific reason for the order in which you put the functions? > If not, maybe reordering them might improve readability. Thanks. I'll rework on that. > >> --- >> Makefile.target | 1 + >> event-tap.c | 847 >> +++ >> event-tap.h | 42 +++ >> qemu-tool.c | 24 ++ >> trace-events | 9 + >> 5 files changed, 923 insertions(+), 0 deletions(-) >> create mode 100644 event-tap.c >> create mode 100644 event-tap.h >> >> diff --git a/Makefile.target b/Makefile.target >> index e15b1c4..f36cd75 100644 >> --- a/Makefile.target >> +++ b/Makefile.target >> @@ -199,6 +199,7 @@ obj-y += rwhandler.o >> obj-$(CONFIG_KVM) += kvm.o kvm-all.o >> obj-$(CONFIG_NO_KVM) += kvm-stub.o >> LIBS+=-lz >> +obj-y += event-tap.o >> >> QEMU_CFLAGS += $(VNC_TLS_CFLAGS) >> QEMU_CFLAGS += $(VNC_SASL_CFLAGS) >> diff --git a/event-tap.c b/event-tap.c >> new file mode 100644 >> index 000..f492708 >> --- /dev/null >> +++ b/event-tap.c > >> @@ -0,0 +1,847 @@ >> +/* >> + * Event Tap functions for QEMU >> + * >> + * Copyright (c) 2010 Nippon Telegraph and Telephone Corporation. >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2. See >> + * the COPYING file in the top-level directory. >> + */ >> + >> +#include "qemu-common.h" >> +#include "qemu-error.h" >> +#include "block.h" >> +#include "block_int.h" >> +#include "ioport.h" >> +#include "osdep.h" >> +#include "sysemu.h" >> +#include "hw/hw.h" >> +#include "net.h" >> +#include "event-tap.h" >> +#include "trace.h" >> + >> +enum EVENT_TAP_STATE { >> + EVENT_TAP_OFF, >> + EVENT_TAP_ON, >> + EVENT_TAP_FLUSH, >> + EVENT_TAP_LOAD, >> + EVENT_TAP_REPLAY, >> +}; >> + >> +static enum EVENT_TAP_STATE event_tap_state = EVENT_TAP_OFF; >> +static BlockDriverAIOCB dummy_acb; /* we may need a pool for dummies */ > > Indeed, bdrv_aio_cancel will segfault this way. > > If you use dummies instead of real ACBs the only way to correctly > implement bdrv_aio_cancel is waiting for all in-flight AIOs > (qemu_aio_flush). So I need to insert a new event_tap function to bdrv_aio_cancel to do that. > >> +typedef struct EventTapIOport { >> + uint32_t address; >> + uint32_t data; >> + int index; >> +} EventTapIOport; >> + >> +#define MMIO_BUF_SIZE 8 >> + >> +typedef struct EventTapMMIO { >> + uint64_t address; >> + uint8_t buf[MMIO_BUF_SIZE]; >> + int len; >> +} EventTapMMIO; >> + >> +typedef struct EventTapNetReq { >> + char *device_name; >> + int iovcnt; >> + struct iovec *iov; >> + int vlan_id; >> + bool vlan_needed; >> + bool async; >> + NetPacketSent *sent_cb; >> +} EventTapNetReq; >> + >> +#define MAX_BLOCK_REQUEST 32 >> + >> +typedef struct EventTapBlkReq { >> + char *device_name; >> + int num_reqs; >> + int num_cbs; >> + bool is_flush; >> + BlockRequest reqs[MAX_BLOCK_REQUEST]; >> + BlockDriverCompletionFunc *cb[MAX_BLOCK_REQUEST]; >> + void *opaque[MAX_BLOCK_REQUEST]; >> +} EventTapBlkReq; >> + >> +#define EVENT_TAP_IOPORT (1 << 0) >> +#define EVENT_TAP_MMIO (1 << 1) >> +#define EVENT_TAP_NET (1 << 2) >> +#define EVENT_TAP_BLK (1 << 3) >> + >> +#define EVENT_TAP_TYPE_MASK (EVENT_TAP_NET - 1) >> + >> +typedef struct EventTapLog { >> + int mode; >> + union { >> + EventTapIOport ioport; >> + EventTapMMIO mmio; >> + }; >> + union { >> + EventTapNetReq net_req; >> + EventTapBlkReq blk_req; >> + }; >> + QTAILQ_ENTRY(EventTapLog) node; >> +} EventTapLog; >> + >> +static EventTapLog *last_event_tap; >> + >> +static QTAILQ_HEAD(, EventTapLog) event_list; >> +static QTAILQ_HEAD(, EventTapLog) event_pool; >> + >> +static int (*event_tap_cb)(void); >> +static QEMUBH *event_tap_bh; >> +static VMChangeStateEntry *vmstate; >> + >> +static void event_tap_bh_cb(void *p) >> +{ >> + if (event_tap_cb) { >> + event_tap_cb(); >> + } >> + >> + qemu_bh_delete(event_tap_bh); >> + event_tap_bh = NULL; >> +} >> + >> +static void event_tap_schedule_bh(void) >> +{ >> + trace_event_tap_ignore_bh(!!event_tap_bh); >> + >> + /* if bh is already set, we ignore it for now */ >> + if (event_tap_bh) { >> + return; >> + } >> + >> + event_tap_bh = qemu_bh_new(event_tap_bh_cb, NULL); >> + qemu_bh_schedule(event_tap_bh); >> + >> + return ; >> +} >> + >> +static void event_tap_alloc_net_r
[Qemu-devel] [Bug 702885] Re: "Internal resource leak" error with ARM NEON vmull.s32 insn
This bug is fixed on HEAD in the qemu-meego tree (commit 8493a687d54e542ac4eec8b2f8326415edf37ec4 A) - Wolfgang -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/702885 Title: "Internal resource leak" error with ARM NEON vmull.s32 insn Status in QEMU: New Bug description: This bug occurs in qemu, commit 78a59470e6bbc6e16dc4033767492649c1ae4cfd (most recent as of 01/14/2011). Compile, assemble, and link the code below, with the ARM tools. (I use ARM C/C++ Compiler, 4.1 [Build 462]). armasm --cpu Cortex-A8 --licensing=flex foo.s armcc --cpu Cortex-A8 --licensing=flex -o main -L--sysv main.c foo.o Execute on qemu-arm and observe an "Internal resource leak" message. > qemu-arm main Internal resource leak before 818c - Wolfgang main.c: int main(void) { void foofunc(void); foofunc(); return 0 ; } foo.s: ARM REQUIRE8 PRESERVE8 AREA code, CODE, READONLY, ALIGN=2 foofunc PROC VMULL.S32 q1, d2, d4 MOV pc, lr ENDP EXPORT foofunc [CODE] END
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Anthony Liguori writes: > On 01/18/2011 10:56 AM, Jan Kiszka wrote: >> >>> The device model topology is 100% a hidden architectural detail. >>> >> This is true for the sysbus, it is obviously not the case for PCI and >> similarly discoverable buses. There we have a guest-explorable topology >> that is currently equivalent to the the qdev layout. >> > > But we also don't do PCI passthrough so we really haven't even > explored how that maps in qdev. I don't know if qemu-kvm has > attempted to qdev-ify it. > Management and analysis tools must be able to traverse the system buses and find guest devices this way. >>> We need to provide a compatible interface to the guest. If you agree >>> with my above statements, then you'll also agree that we can do this >>> without keeping the device model topology stable. >>> >>> But we also need to provide a compatible interface to management tools. >>> Exposing the device model topology as a compatible interface >>> artificially limits us. It's far better to provide higher level >>> supported interfaces to give us the flexibility to change the device >>> model as we need to. >>> >> How do you want to change qdev to keep the guest and management tool >> view stable while branching off kvm sub-buses? > > The qdev device model is not a stable interface. I think that's been > clear from the very beginning. > >> Please propose such >> extensions so that they can be discussed. IIUC, that would be second >> relation between qdev and qbus instances besides the physical topology. >> What further use cases (besides passing kvm_state around) do you have in >> mind? >> > > The -device interface is a stable interface. Right now, you don't > specify any type of identifier of the pci bus when you create a PCI > device. It's implied in the interface. Now I'm confused. Isn't "-device FOO,bus=pci.0" specifying the PCI bus? [...]
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Gerd Hoffmann writes: > On 01/18/11 18:09, Anthony Liguori wrote: >> On 01/18/2011 10:56 AM, Jan Kiszka wrote: >>> The device model topology is 100% a hidden architectural detail. >>> This is true for the sysbus, it is obviously not the case for PCI and >>> similarly discoverable buses. There we have a guest-explorable topology >>> that is currently equivalent to the the qdev layout. >> >> But we also don't do PCI passthrough so we really haven't even explored >> how that maps in qdev. I don't know if qemu-kvm has attempted to >> qdev-ify it. > > It is qdev-ified. It is a normal pci device from qdev's point of view. > > BTW: is there any reason why (vfio-based) pci passthrough couldn't > work with tcg? > >> The -device interface is a stable interface. Right now, you don't >> specify any type of identifier of the pci bus when you create a PCI >> device. It's implied in the interface. > > Wrong. You can specify the bus you want attach the device to via > bus=. This is true for *every* device, including all pci > devices. If unspecified qdev uses the first bus it finds. > > As long as there is a single pci bus only there is simply no need to > specify it, thats why nobody does that today. Once q35 finally > arrives this will change of course. As far as I know, libvirt does it already.
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
2011/1/19 Kevin Wolf : > Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: >> event-tap function is called only when it is on, and requests sent >> from device emulators. >> >> Signed-off-by: Yoshiaki Tamura >> --- >> block.c | 11 +++ >> 1 files changed, 11 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index ff2795b..85bd8b8 100644 >> --- a/block.c >> +++ b/block.c >> @@ -28,6 +28,7 @@ >> #include "block_int.h" >> #include "module.h" >> #include "qemu-objects.h" >> +#include "event-tap.h" >> >> #ifdef CONFIG_BSD >> #include >> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState >> *bs, int64_t sector_num, >> if (bdrv_check_request(bs, sector_num, nb_sectors)) >> return NULL; >> >> + if (bs->device_name && event_tap_is_on()) { >> + return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, >> + cb, opaque); >> + } >> + >> if (bs->dirty_bitmap) { >> blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb, >> opaque); > > Just noticed the context here... Does this patch break block migration > when event-tap is on? I don't think so. event-tap will call bdrv_aio_writev() upon flushing requests and it shouldn't affect block-migration. The block written after the synchronization should be marked as dirty and should be sent in the next round. Am I missing the point? > Another question that came to my mind is if we really hook everything we > need. I think we'll need to have a hook in bdrv_flush as well. I don't > know if you do hook qemu_aio_flush and friends - does a call cause > event-tap to flush its queue? If not, a call to qemu_aio_flush might > hang qemu because it's waiting for requests to complete which are > actually stuck in the event-tap queue. No it doesn't queue at event-tap. Marcelo pointed that we should hook bdrv_aio_flush to avoid requests inversion, that made sense to me. Do we need to hook bdrv_flush for that same reason? If we hook bdrv_flush and qemu_aio_flush, we're going loop forever because the synchronization code is calling vm_stop that call bdrv_flush_all and qemu_aio_flush. Yoshi > > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [Qemu-devel] [Bug 702885] Re: "Internal resource leak" error with ARM NEON vmull.s32 insn
On 19 January 2011 12:42, Wolfgang Schildbach <702...@bugs.launchpad.net> wrote: > This bug is fixed on HEAD in the qemu-meego tree (commit > 8493a687d54e542ac4eec8b2f8326415edf37ec4 > A) Note that the qemu-meego tree disables these warnings using an #ifdef, so even if the message is not printed we might still not be handling the temporaries correctly (although there are a lot of fixes for mishandled temps in that tree). -- PMM
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
Anthony Liguori writes: > On 01/18/2011 09:43 AM, Jan Kiszka wrote: >> On 2011-01-18 16:04, Anthony Liguori wrote: >> >>> On 01/18/2011 08:28 AM, Jan Kiszka wrote: >>> On 2011-01-12 11:31, Jan Kiszka wrote: > Am 12.01.2011 11:22, Avi Kivity wrote: > > >> On 01/11/2011 03:54 PM, Anthony Liguori wrote: >> >> >>> Right, we should introduce a KVMBus that KVM devices are created on. >>> The devices can get at KVMState through the BusState. >>> >>> >> There is no kvm bus in a PC (I looked). We're bending the device model >> here because a device is implemented in the kernel and not in >> userspace. An implementation detail is magnified beyond all proportions. >> >> An ioapic that is implemented by kvm lives in exactly the same place >> that the qemu ioapic lives in. An assigned pci device lives on the PCI >> bus, not a KVMBus. If we need a pointer to KVMState, then we must find >> it elsewhere, not through creating imaginary buses that don't exist. >> >> >> > Exactly. > > So we can either "infect" the whole device tree with kvm (or maybe a > more generic accelerator structure that also deals with Xen) or we need > to pull the reference inside the device's init function from some global > service (kvm_get_state). > > Note that this topic is still waiting for good suggestions, specifically from those who believe in kvm_state references :). This is not only blocking kvmstate merge but will affect KVM irqchips as well. It boils down to how we reasonably pass a kvm_state reference from machine init code to a sysbus device. I'm probably biased, but I don't see any way that does not work against the idea of confining access to kvm_state or breaks device instantiation from the command line or a config file. >>> A KVM device should sit on a KVM specific bus that hangs off of sysbus. >>> It can get to kvm_state through that bus. >>> >>> That bus doesn't get instantiated through qdev so requiring a pointer >>> argument should not be an issue. >>> >>> >> This design is in conflict with the requirement to attach KVM-assisted >> devices also to their home bus, e.g. an assigned PCI device to the PCI >> bus. We don't support multi-homed qdev devices. >> > > The bus topology reflects how I/O flows in and out of a device. We do > not model a perfect PC bus architecture and I don't think we ever > intend to. Instead, we model a functional architecture. > > I/O from an assigned device does not flow through the emulated PCI > bus. Therefore, it does not belong on the emulated PCI bus. > > Assigned devices need to interact with the emulated PCI bus, but they > shouldn't be children of it. So they interact with KVM (need kvm_state), and they interact with the emulated PCI bus. Could you elaborate on the fundamental difference between the two interactions that makes you choose the (hypothetical) KVM bus over the PCI bus as device parent?
[Qemu-devel] Re: [PATCH V5 2/4] nmi: make cpu-index argument optional
Sorry for the long delay on this one, in general looks good, I have just a few small comments. On Mon, 10 Jan 2011 17:27:51 +0800 Lai Jiangshan wrote: > When the argument "cpu-index" is not given, > then "nmi" command will inject NMI on all CPUs. Please, state that we're changing the human monitor behavior on this. > This simulate the nmi button on physical machine. > > Thanks to Markus Armbruster for correcting the logic > detecting "cpu-index" is given or not. > > Signed-off-by: Lai Jiangshan > --- > diff --git a/hmp-commands.hx b/hmp-commands.hx > index 99b96a8..a49fcd4 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -721,9 +721,9 @@ ETEXI > #if defined(TARGET_I386) > { > .name = "nmi", > -.args_type = "cpu-index:i", > -.params = "cpu", > -.help = "inject an NMI on the given CPU", > +.args_type = "cpu-index:i?", > +.params = "[cpu]", > +.help = "inject an NMI on all CPUs or the given CPU", IMO, it's better to be a bit more clear, something like: "Inject an NMI on all CPUs if no argument is given, otherwise inject it on the specified CPU". > .mhandler.cmd = do_inject_nmi, > }, > #endif > diff --git a/monitor.c b/monitor.c > index fd18887..952f67f 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -2520,8 +2520,15 @@ static void do_wav_capture(Monitor *mon, const QDict > *qdict) > static void do_inject_nmi(Monitor *mon, const QDict *qdict) > { > CPUState *env; > -int cpu_index = qdict_get_int(qdict, "cpu-index"); > +int cpu_index; > > +if (!qdict_get(qdict, "cpu-index")) { Please, use qdict_haskey(). > +for (env = first_cpu; env != NULL; env = env->next_cpu) > +cpu_interrupt(env, CPU_INTERRUPT_NMI); > +return; > +} > + > +cpu_index = qdict_get_int(qdict, "cpu-index"); > for (env = first_cpu; env != NULL; env = env->next_cpu) > if (env->cpu_index == cpu_index) { > cpu_interrupt(env, CPU_INTERRUPT_NMI); >
[Qemu-devel] Re: [PATCH V5 3/4] qmp, nmi: convert do_inject_nmi() to QObject
On Mon, 10 Jan 2011 17:28:14 +0800 Lai Jiangshan wrote: > Make we can inject NMI via qemu-monitor-protocol. > We use "inject-nmi" for the qmp command name, the meaning is clearer. > > Signed-off-by: Lai Jiangshan > --- > diff --git a/hmp-commands.hx b/hmp-commands.hx > index a49fcd4..4db413d 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -724,7 +724,8 @@ ETEXI > .args_type = "cpu-index:i?", > .params = "[cpu]", > .help = "inject an NMI on all CPUs or the given CPU", > -.mhandler.cmd = do_inject_nmi, > +.user_print = monitor_user_noop, > +.mhandler.cmd_new = do_inject_nmi, > }, > #endif > STEXI > diff --git a/monitor.c b/monitor.c > index 952f67f..1bee840 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -2517,7 +2517,7 @@ static void do_wav_capture(Monitor *mon, const QDict > *qdict) > #endif > > #if defined(TARGET_I386) > -static void do_inject_nmi(Monitor *mon, const QDict *qdict) > +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject > **ret_data) > { > CPUState *env; > int cpu_index; > @@ -2525,15 +2525,17 @@ static void do_inject_nmi(Monitor *mon, const QDict > *qdict) > if (!qdict_get(qdict, "cpu-index")) { > for (env = first_cpu; env != NULL; env = env->next_cpu) > cpu_interrupt(env, CPU_INTERRUPT_NMI); > -return; > +return 0; > } > > cpu_index = qdict_get_int(qdict, "cpu-index"); > for (env = first_cpu; env != NULL; env = env->next_cpu) > if (env->cpu_index == cpu_index) { > cpu_interrupt(env, CPU_INTERRUPT_NMI); > -break; > +return 0; > } > + > +return -1; > } > #endif > > diff --git a/qmp-commands.hx b/qmp-commands.hx > index 56c4d8b..c2d619c 100644 > --- a/qmp-commands.hx > +++ b/qmp-commands.hx > @@ -429,6 +429,33 @@ Example: > > EQMP > > +#if defined(TARGET_I386) > +{ > +.name = "inject_nmi", Please use an hyphen. > +.args_type = "cpu-index:i?", > +.params = "[cpu]", > +.help = "inject an NMI on all CPUs or the given CPU", > +.user_print = monitor_user_noop, > +.mhandler.cmd_new = do_inject_nmi, > +}, > +#endif > +SQMP > +inject_nmi > +-- > + > +Inject an NMI on the given CPU (x86 only). Please, explain that we can also inject on all CPUs. > + > +Arguments: > + > +- "cpu_index": the index of the CPU to be injected NMI (json-int) It's actually "cpu-index", and you should write "(json-int, optional)". > + > +Example: > + > +-> { "execute": "inject_nmi", "arguments": { "cpu-index": 0 } } Hyphen. > +<- { "return": {} } > + > +EQMP > + > { > .name = "migrate", > .args_type = "detach:-d,blk:-b,inc:-i,uri:s", >
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
Am 19.01.2011 14:04, schrieb Yoshiaki Tamura: >>> +static void event_tap_blk_flush(EventTapBlkReq *blk_req) >>> +{ >>> +BlockDriverState *bs; >>> + >>> +bs = bdrv_find(blk_req->device_name); >> >> Please store the BlockDriverState in blk_req. This code loops over all >> block devices and does a string comparison - and that for each request. >> You can also save the qemu_strdup() when creating the request. >> >> In the few places where you really need the device name (might be the >> case for load/save, I'm not sure), you can still get it from the >> BlockDriverState. > > I would do so for the primary side. Although we haven't > implemented yet, we want to replay block requests from block > layer on the secondary side, and need device name to restore > BlockDriverState. Hm, I see. I'm not happy about it, but I don't have a suggestion right away how to avoid it. >> >>> + >>> +if (blk_req->is_flush) { >>> +bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque); >> >> You need to handle errors. If bdrv_aio_flush returns NULL, call the >> callback with -EIO. > > I'll do so. > >> >>> +return; >>> +} >>> + >>> +bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov, >>> +blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb, >>> +blk_req->reqs[0].opaque); >> >> Same here. >> >>> +bdrv_flush(bs); >> >> This looks really strange. What is this supposed to do? >> >> One point is that you write it immediately after bdrv_aio_write, so you >> get an fsync for which you don't know if it includes the current write >> request or if it doesn't. Which data do you want to get flushed to the disk? > > I was expecting to flush the aio request that was just initiated. > Am I misunderstanding the function? Seems so. The function names don't use really clear terminology either, so you're not the first one to fall in this trap. Basically we have: * qemu_aio_flush() waits for all AIO requests to complete. I think you wanted to have exactly this, but only for a single block device. Such a function doesn't exist yet. * bdrv_flush() makes sure that all successfully completed requests are written to disk (by calling fsync) * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run the fsync in the thread pool >> The other thing is that you introduce a bdrv_flush for each request, >> basically forcing everyone to something very similar to writethrough >> mode. I'm sure this will have a big impact on performance. > > The reason is to avoid inversion of queued requests. Although > processing one-by-one is heavy, wouldn't having requests flushed > to disk out of order break the disk image? No, that's fine. If a guest issues two requests at the same time, they may complete in any order. You just need to make sure that you don't call the completion callback before the request really has completed. I'm just starting to wonder if the guest won't timeout the requests if they are queued for too long. Even more, with IDE, it can only handle one request at a time, so not completing requests doesn't sound like a good idea at all. In what intervals is the event-tap queue flushed? On the other hand, if you complete before actually writing out, you don't get timeouts, but you signal success to the guest when the request could still fail. What would you do in this case? With a writeback cache mode we're fine, we can just fail the next flush (until then nothing is guaranteed to be on disk and order doesn't matter either), but with cache=writethrough we're in serious trouble. Have you thought about this problem? Maybe we end up having to flush the event-tap queue for each single write in writethrough mode. >> Additionally, error handling is missing. > > I looked at the codes using bdrv_flush and realized some of them > doesn't handle errors, but scsi-disk.c does. Should everyone > handle errors or depends on the usage? I added the return code only recently, it was a void function previously. Probably some error handling should be added to all of them. Kevin
[Qemu-devel] [PATCH 2/5] qemu and qemu-xen: fix segfault on con_disconnect
The test on xendev->gnttabdev in con_disconnect is wrong: what differentiates the consoles is the dev number. Signed-off-by: Stefano Stabellini diff --git a/hw/xen_console.c b/hw/xen_console.c index d7099c4..0a2374c 100644 --- a/hw/xen_console.c +++ b/hw/xen_console.c @@ -258,7 +258,7 @@ static void con_disconnect(struct XenDevice *xendev) xen_be_unbind_evtchn(&con->xendev); if (con->sring) { -if (!xendev->gnttabdev) +if (!xendev->dev) munmap(con->sring, XC_PAGE_SIZE); else xc_gnttab_munmap(xendev->gnttabdev, con->sring, 1);
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
Am 19.01.2011 14:16, schrieb Yoshiaki Tamura: > 2011/1/19 Kevin Wolf : >> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: >>> event-tap function is called only when it is on, and requests sent >>> from device emulators. >>> >>> Signed-off-by: Yoshiaki Tamura >>> --- >>> block.c | 11 +++ >>> 1 files changed, 11 insertions(+), 0 deletions(-) >>> >>> diff --git a/block.c b/block.c >>> index ff2795b..85bd8b8 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -28,6 +28,7 @@ >>> #include "block_int.h" >>> #include "module.h" >>> #include "qemu-objects.h" >>> +#include "event-tap.h" >>> >>> #ifdef CONFIG_BSD >>> #include >>> @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState >>> *bs, int64_t sector_num, >>> if (bdrv_check_request(bs, sector_num, nb_sectors)) >>> return NULL; >>> >>> +if (bs->device_name && event_tap_is_on()) { >>> +return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, >>> + cb, opaque); >>> +} >>> + >>> if (bs->dirty_bitmap) { >>> blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb, >>> opaque); >> >> Just noticed the context here... Does this patch break block migration >> when event-tap is on? > > I don't think so. event-tap will call bdrv_aio_writev() upon > flushing requests and it shouldn't affect block-migration. The > block written after the synchronization should be marked as dirty > and should be sent in the next round. Am I missing the point? No, that makes sense. I don't have a complete understanding of the whole series yet, so there may be well more misunderstandings on my side. >> Another question that came to my mind is if we really hook everything we >> need. I think we'll need to have a hook in bdrv_flush as well. I don't >> know if you do hook qemu_aio_flush and friends - does a call cause >> event-tap to flush its queue? If not, a call to qemu_aio_flush might >> hang qemu because it's waiting for requests to complete which are >> actually stuck in the event-tap queue. > > No it doesn't queue at event-tap. Marcelo pointed that we should > hook bdrv_aio_flush to avoid requests inversion, that made sense > to me. Do we need to hook bdrv_flush for that same reason? If bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in general it seems likely that we need to do the same. > we hook bdrv_flush and qemu_aio_flush, we're going loop forever > because the synchronization code is calling vm_stop that call > bdrv_flush_all and qemu_aio_flush. qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why we would loop forever. It just waits for AIO requests to complete. I just looked up the code and I think the situation is a bit different than I thought originally: qemu_aio_flush waits only for completion of requests which belong to a driver that has registered using qemu_aio_set_fd_handler. So this means that AIO requests queued in event-tap are not considered in-flight requests and we won't get stuck in a loop. Maybe we have no problem in fact. :-) On the other hand, e.g. migration relies on the fact that after a qemu_aio_flush, all AIO requests that the device model has submitted are completed. I think event-tap must take care that the requests which are queued are not forgotten to be migrated. (Maybe the code already considers this, I'm just writing down what comes to my mind...) Kevin
[Qemu-devel] [Bug 704904] [NEW] No rule to make target ../libhw32/virtio.o
Public bug reported: Building qemu from current git using 32-bit MinGW (installer from 2010-11-07) on Windows Vista (64-bit) fails with the following error: make[1]: *** No rule to make target `../libhw32/virtio.o', needed by `qemu.exe'. Stop. make: *** [subdir-i386-softmmu] Error 2 Here is my ./configure summary: ### Mateuszl@dog /g/src/qemu/_git/master $ ./configure warning: proceeding without pkg-config Install prefixc:/Program Files/Qemu BIOS directoryc:/Program Files/Qemu binary directory c:/Program Files/Qemu config directory c:/Program Files/Qemu Source path /g/src/qemu/_git/master C compilergcc Host C compiler gcc CFLAGS-O2 -g QEMU_CFLAGS -m32 -D__USE_MINGW_ANSI_STDIO=1 -DWIN32_LEAN_AND_MEAN -DWINVER=0x501 -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOU RCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wendif-labels -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fstack-protector-all -Wmissin g-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wty pe-limits LDFLAGS -Wl,--nxcompat -Wl,--no-seh -Wl,--dynamicbase -Wl,--warn-common -m32 -g make make install install host CPU i386 host big endian no target list i386-softmmu x86_64-softmmu arm-softmmu cris-softmmu m68k-softmmu microblaze-softmmu mips-softmmu mipsel-softmmu mips64-softmmu mips64el-softm mu ppc-softmmu ppcemb-softmmu ppc64-softmmu sh4-softmmu sh4eb-softmmu sparc-softmmu sparc64-softmmu tcg debug enabled no Mon debug enabled no gprof enabled no sparse enabledno strip binariesyes profiler no static build no -Werror enabled no SDL support yes curses supportno curl support no check support no mingw32 support yes Audio drivers winwave Extra audio cards ac97 es1370 sb16 hda Block whitelist Mixer emulation no VNC TLS support no VNC SASL support no VNC JPEG support no VNC PNG support no VNC threadno xen support no brlapi supportno bluez supportno Documentation no NPTL support no GUEST_BASEyes PIE user targets no vde support no IO thread no Linux AIO support no ATTR/XATTR support no Install blobs yes KVM support no fdt support no preadv supportno fdatasync no madvise no posix_madvise no uuid support no vhost-net support no Trace backend nop Trace output file trace- spice support no rbd support no xfsctl supportno ### and here is full make output until the error: ### GEN qemu-img-cmds.h GEN config-host.h GEN trace.h GEN qemu-options.def CCqemu-img.o qemu-img.c: In function 'img_convert': qemu-img.c:826:27: warning: format '%I64d' expects type 'int', but argument 2 has type 'int64_t' CCqemu-tool.o CCqemu-error.o CCosdep.o CCoslib-win32.o GEN trace.c CCtrace.o CCcutils.o CCcache-utils.o CCqemu-malloc.o CCqemu-option.o CCmodule.o CCnbd.o CCblock.o block.c: In function 'bdrv_stats_iter': block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 3 has type 'int64_t' block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 4 has type 'int64_t' block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 5 has type 'int64_t' block.c:1694:25: warning: format '%I64d' expects type 'int', but argument 6 has type 'int64_t' block.c: In function 'bdrv_info_stats_bs': block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 2 has type 'uint64_t' block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 3 has type 'uint64_t' block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 4 has type 'uint64_t' block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 5 has type 'uint64_t' block.c:1717:30: warning: format '%I64d' expects type 'int', but argument 6 has type 'long long unsigned int' CCaio.o CCaes.o CCqemu-config.o CCblock/raw.o CCblock/cow.o CCblock/qcow.o CCblock/vdi.o block/vdi.c: In function 'uuid_unparse': block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: unknown conversion type character 'h' in format block/vdi.c:135:13: warning: un
RE: [Qemu-devel] [Bug 702885] Re: "Internal resource leak" error withARM NEON vmull.s32 insn
> From: qemu-devel-bounces+wschi=dolby@nongnu.org > [mailto:qemu-devel-bounces+wschi=dolby@nongnu.org] On > Behalf Of Peter Maydell > Sent: Wednesday, January 19, 2011 5:16 AM > To: Bug 702885 > Cc: qemu-devel@nongnu.org > Subject: Re: [Qemu-devel] [Bug 702885] Re: "Internal resource > leak" error withARM NEON vmull.s32 insn > > On 19 January 2011 12:42, Wolfgang Schildbach > <702...@bugs.launchpad.net> wrote: > > This bug is fixed on HEAD in the qemu-meego tree (commit > > 8493a687d54e542ac4eec8b2f8326415edf37ec4 > > A) > > Note that the qemu-meego tree disables these warnings using > an #ifdef, so even if the message is not printed we might > still not be handling the temporaries correctly (although > there are a lot of fixes for mishandled temps in that tree). > > -- PMM Just when I was getting my hopes up :-) Indeed, I just checked with the meego tree that qemu still segfaults one of my testcases. I'll try to isolate the case such that I can document it on launchpad. - Wolfgang
Re: [Qemu-devel] [PATCH] target-arm: Fix garbage collection of temporaries in Neon emulation.
Here is an updated patch which will hopefully not be mangled by my mailer. Fix garbage collection of temporaries in Neon emulation. Signed-off-by: Christophe Lyon --- target-arm/translate.c | 18 +- 1 files changed, 13 insertions(+), 5 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index 57664bc..b3e3d70 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -4176,6 +4176,13 @@ static inline void gen_neon_mull(TCGv_i64 dest, TCGv a, TCGv b, int size, int u) break; default: abort(); } + +/* gen_helper_neon_mull_[su]{8|16} do not free their parameters. + Don't forget to clean them now. */ +if (size < 2) { + dead_tmp(a); + dead_tmp(b); +} } /* Translate a NEON data processing instruction. Return nonzero if the @@ -4840,7 +4847,7 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) if (size == 3) { tcg_temp_free_i64(tmp64); } else { -dead_tmp(tmp2); +tcg_temp_free_i32(tmp2); } } else if (op == 10) { /* VSHLL */ @@ -5076,8 +5083,6 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) case 8: case 9: case 10: case 11: case 12: case 13: /* VMLAL, VQDMLAL, VMLSL, VQDMLSL, VMULL, VQDMULL */ gen_neon_mull(cpu_V0, tmp, tmp2, size, u); -dead_tmp(tmp2); -dead_tmp(tmp); break; case 14: /* Polynomial VMULL */ cpu_abort(env, "Polynomial VMULL not implemented"); @@ -5228,6 +5233,10 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) return 1; tmp2 = neon_get_scalar(size, rm); +/* We need a copy of tmp2 because gen_neon_mull + * deletes it during pass 0. */ +tmp4 = new_tmp(); +tcg_gen_mov_i32(tmp4, tmp2); tmp3 = neon_load_reg(rn, 1); for (pass = 0; pass < 2; pass++) { @@ -5235,9 +5244,9 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) tmp = neon_load_reg(rn, 0); } else { tmp = tmp3; +tmp2 = tmp4; } gen_neon_mull(cpu_V0, tmp, tmp2, size, u); -dead_tmp(tmp); if (op == 6 || op == 7) { gen_neon_negl(cpu_V0, size); } @@ -5264,7 +5273,6 @@ static int disas_neon_data_insn(CPUState * env, DisasContext *s, uint32_t insn) neon_store_reg64(cpu_V0, rd + pass); } -dead_tmp(tmp2); break; default: /* 14 and 15 are RESERVED */ -- 1.7.2.3
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
> > Doing both fault injection and verification together introduces some > > subtlety. For example, even under the random failure mode, two disk writes > > triggered by one VM-issued write must either fail together or succeed > > together. Otherwise, the truth image and the test image will diverge and > > verification won't succeed. Currently, qemu-test carefully works with the > > 'sim' driver to guarantee those conditions. Those conditions need be > > retained after code restructure. > > If the real backend is a host system file or device, and AIO or > multi-threaded writes are used, you can't depend on two parallel disk > writes (triggered by one VM-issued write) failing together or > succeeding together. All you can do is look at the error code after > each operation completes, and use it to prevent issuing later > operations. You can't stop the other parallel operations that are > already in progress. > > Is that an issue in your design assumptions? Your description of the problem is accurate, i.e., "if AIO or multi-threaded writes are used, you can't stop the other parallel operations that are already in progress." As a result, a naive extension of blkverify to test concurrent requests would not work. The simulated block driver (block/sim.c) in the FVD patch uses neither AIO or nor multi-threaded I/O as the backend. It instead uses a 'simulated' backend, which allows a full control of I/O order and callback order, and can enforce that two parallel disk writes (triggered by one VM-issued write) either failing together or succeeding together, and some other properties as well, which makes testing more comprehensive and debugging easier.
[Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB
b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return value of bdrv_write and aborts migration when it fails. However, if the size of the block device to migrate is not a multiple of BLOCK_SIZE (currently 1 MB), the last bdrv_write will fail with -EIO. Fixed by calling bdrv_write with the correct size of the last block. --- block-migration.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/block-migration.c b/block-migration.c index 1475325..eeb9c62 100644 --- a/block-migration.c +++ b/block-migration.c @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) int64_t addr; BlockDriverState *bs; uint8_t *buf; +int64_t total_sectors; +int nr_sectors; do { addr = qemu_get_be64(f); @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int version_id) return -EINVAL; } +total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS; +if (total_sectors <= 0) { +fprintf(stderr, "Error getting length of block device %s\n", device_name); +return -EINVAL; +} + +if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) { +nr_sectors = total_sectors - addr; +} else { +nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; +} + buf = qemu_malloc(BLOCK_SIZE); qemu_get_buffer(f, buf, BLOCK_SIZE); -ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK); +ret = bdrv_write(bs, addr, buf, nr_sectors); qemu_free(buf); if (ret < 0) { -- 1.7.3.5
Re: [Qemu-devel] [PULL] piix, pci, qdev
> > pci: fix migration path for devices behind bridges This patch breaks starting qemu for me on 32-bit x86 with the following assert: qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion `!se->compat || se->instance_id == 0' failed. my qemu command line is: /opt/qemu/bin/qemu-system-x86_64 \ -m 1500 \ -enable-kvm \ -drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \ -drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \ -drive if=none,file=/home/test1.img,cache=none,id=scratch \ -device virtio-blk-pci,drive=root \ -device virtio-blk-pci,drive=test \ -device virtio-blk-pci,drive=scratch \ -append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \ -monitor tcp:127.0.0.1:31337,server,nowait \ -kernel arch/x86/boot/bzImage \ -nographic btw, I can't find a patch for that commit anywhere in the qemu-devel archives. Did it get reviewed anywhere at all?
Re: [Qemu-devel] [PULL] piix, pci, qdev
On Wed, Jan 19, 2011 at 04:14:48PM +0100, Christoph Hellwig wrote: > > > pci: fix migration path for devices behind bridges > > This patch breaks starting qemu for me on 32-bit x86 with the following > assert: > > qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion > `!se->compat || se->instance_id == 0' failed. > > my qemu command line is: > > /opt/qemu/bin/qemu-system-x86_64 \ > -m 1500 \ > -enable-kvm \ > -drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \ > -drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \ > -drive if=none,file=/home/test1.img,cache=none,id=scratch \ > -device virtio-blk-pci,drive=root \ > -device virtio-blk-pci,drive=test \ > -device virtio-blk-pci,drive=scratch \ > -append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \ > -monitor tcp:127.0.0.1:31337,server,nowait \ > -kernel arch/x86/boot/bzImage \ > -nographic I'll try that. generally qemu works on 32 bit for me but I only used 2 virtio devices: block and net. will try with 3. > > btw, I can't find a patch for that commit anywhere in the qemu-devel > archives. Did it get reviewed anywhere at all? message ids: 20101113211054.ga23...@redhat.com 20101224084529.ga23...@redhat.com -- MST
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
> >> Moreover, using a host file system not only adds overhead, but > >> also introduces data integrity issues. Specifically, if I/Os uses O_DSYNC, > >> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data > >> integrity in the event of a host crash. See > >> http://lwn.net/Articles/348739/ . > > > > You have the same issue with O_DIRECT when using a raw disk device > > too. That is, O_DIRECT on a raw device does not guarantee integrity > > in the event of a host crash either, for mostly the same reasons. > > QEMU has semantics that use O_DIRECT safely; there is no issue here. > When a drive is added with cache=none, QEMU not only uses O_DIRECT but > also advertises an enabled write cache to the guest. > > The guest *must* flush the cache when it wants to ensure data is > stable. In the event of a host crash, all, some, or none of the I/O > since the last flush may have made it to disk. Each of these > possibilities is fair game since the guest may only depend on writes > being on disk if they completed and a successful flush was issued > afterwards. Thank both of you for the explanation, which is very helpful to me. With FVD's capability of eliminating the host file system and storing the image on a logical volume, then perhaps we can always use O_DSYNC, because there is little (or no?) LVM metadata that needs a flush on every write and hence O_DSYNC does not add overhead? I am not certain on this, and need help for confirmation. If this is true, the guest does not need to flush the cache.
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
On Wed, Jan 19, 2011 at 10:17:47AM -0500, Chunqiang Tang wrote: > Thank both of you for the explanation, which is very helpful to me. With > FVD's capability of eliminating the host file system and storing the image > on a logical volume, then perhaps we can always use O_DSYNC, because there > is little (or no?) LVM metadata that needs a flush on every write and > hence O_DSYNC does not add overhead? I am not certain on this, and need > help for confirmation. If this is true, the guest does not need to flush > the cache. O_DSYNC flushes the volatile write cache of the disk on every write, which can be very ineffienct. In addition to that image formats really should obey the configurable caching settings qemu has, they exist for a reason and should be handled uniformly over different image formats and protocol drivers.
Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command
On Mon, Jan 17, 2011 at 06:36:15PM +0100, Kevin Wolf wrote: > > +.args_type = "id:s,size:l", > > size should be 'o' instead of 'l'. The latter may be too small on 32 bit > hosts and doesn't support convenient suffixes: o actually fails for 2GB or more for me: (qemu) resize scratch 2047 resize scratch 2047 (qemu) (qemu) resize scrarch 2048 resize scrarch 2048 invalid size for l these worked fine.
Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command
Am 19.01.2011 16:35, schrieb Christoph Hellwig: > On Mon, Jan 17, 2011 at 06:36:15PM +0100, Kevin Wolf wrote: >>> +.args_type = "id:s,size:l", >> >> size should be 'o' instead of 'l'. The latter may be too small on 32 bit >> hosts and doesn't support convenient suffixes: > > o actually fails for 2GB or more for me: > > (qemu) resize scratch 2047 > resize scratch 2047 > (qemu) > (qemu) resize scrarch 2048 > resize scrarch 2048 > invalid size > > for l these worked fine. Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32 bit host as well. Though I assume that you use a 64 bit host, and I can't really see what's the problem there... Kevin
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
On Fri, Jan 14, 2011 at 03:56:00PM -0500, Chunqiang Tang wrote: > P2) Overhead of storing an image on a host file system. Specifically, a > RAW image stored on ext3 is 50-63% slower than a RAW image stored on a raw > partition. Sorry, benchmarking this against ext3 really doesn't matter. Benchmark it against xfs or ext4 with a preallocated image (fallocate or dd). > For P1), I uses the term compact image instead of sparse image, because a > RAW image stored as a sparse file in ext3 is a sparse image, but is not a > compact image. A compact image stores data in such a way that the file > size of the image file is smaller than the size of the virtual disk > perceived by the VM. QCOW2 is a compact image. The disadvantage of a > compact image is that the data layout perceived by the guest OS differs > from the actual layout on the physical disk, which defeats many > optimizations in guest file systems. It's something filesystems have to deal with. Real storage is getting increasingly virtualized. While this didn't matter for the real high end storage which has been doing this for a long time it's getting more and more exposed to the filesystem. That includes LVM layouts and thinly provisioned disk arrays, which are getting increasingly popular. That doesn't matter the 64k (or until recently 4k) cluster size in qcow2 is a good idea, we'd want at least a magnitude or two larger extents to perform well, but it means filesystems really do have to cope with it. > For P2), using a host file system is inefficient, because 1) historically > file systems are optimized for small files rather than large images, I'm not sure what hole you're pulling off this bullshit, but this is absolutely not correct. Since the damn of time you have filesystems optimized for small files, for larger or really large files, or trying to deal with a tradeoff inbetween. > 2) certain functions of a host file system are simply redundant with > respect to the function of a compact image, e.g., performing storage > allocation. Moreover, using a host file system not only adds overhead, but > also introduces data integrity issues. I/O into fully preallocated files uses exactly the same codepath as doing I/O to the block device, except for a identify logical to physical block mapping in the block device and a non-trivial one in the filesysgem. Note that the block mapping is cached and does not affect the performance. I've published the numbers for qemu in the various caching modes and all major filesystems a while ago, so I'm not making this up. > Specifically, if I/Os uses O_DSYNC, > it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data > integrity in the event of a host crash. See > http://lwn.net/Articles/348739/ . I/O to the block devices does not guarantee data integrity without O_DSYNC either. > Storage over-commit means that, e.g., a 100GB physical disk can be used to > host 10 VMs, each with a 20GB virtual disk. The current storage industry buzz word for that is thin provisioning.
Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command
On Wed, Jan 19, 2011 at 04:49:04PM +0100, Kevin Wolf wrote: > > (qemu) resize scratch 2047 > > resize scratch 2047 > > (qemu) > > (qemu) resize scrarch 2048 > > resize scrarch 2048 > > invalid size > > > > for l these worked fine. > > Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32 > bit host as well. Though I assume that you use a 64 bit host, and I > can't really see what's the problem there... This is a test on a 32-bit host. The target_long of "l" worked fine because a kvm enabled qemu always builds for an x86-64 target, even with a 32-bit host.
Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command
Am 19.01.2011 16:52, schrieb Christoph Hellwig: > On Wed, Jan 19, 2011 at 04:49:04PM +0100, Kevin Wolf wrote: >>> (qemu) resize scratch 2047 >>> resize scratch 2047 >>> (qemu) >>> (qemu) resize scrarch 2048 >>> resize scrarch 2048 >>> invalid size >>> >>> for l these worked fine. >> >> Hm, yeah, 'o' uses ssize_t instead of int64_t, so it's broken on a 32 >> bit host as well. Though I assume that you use a 64 bit host, and I >> can't really see what's the problem there... > > This is a test on a 32-bit host. The target_long of "l" worked fine > because a kvm enabled qemu always builds for an x86-64 target, even > with a 32-bit host. Oh, okay, then it makes perfect sense. We should fix 'o' then to use int64_t. The patch below should do it. strtosz (which is used by 'o') has the same problem in git master. There's a fix by Jes, so be sure to test this on the block branch. Kevin diff --git a/monitor.c b/monitor.c index d291158..0cda3da 100644 --- a/monitor.c +++ b/monitor.c @@ -4162,7 +4162,7 @@ static const mon_cmd_t *monitor_parse_command(Monitor *mon, break; case 'o': { -ssize_t val; +int64_t val; char *end; while (qemu_isspace(*p)) {
[Qemu-devel] [PATCH] Support saturation with shift=0.
This patch fixes corner-case saturations, when the target range is zero. It merely removes the guard against (sh == 0), and makes: __ssat(0x87654321, 1) return 0x and set the saturation flag __usat(0x87654321, 0) return 0 and set the saturation flag Signed-off-by: Christophe Lyon --- target-arm/translate.c | 28 1 files changed, 12 insertions(+), 16 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index 721bada..41cbb96 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -6896,27 +6896,23 @@ static void disas_arm_insn(CPUState * env, DisasContext *s) tcg_gen_shli_i32(tmp, tmp, shift); } sh = (insn >> 16) & 0x1f; -if (sh != 0) { -tmp2 = tcg_const_i32(sh); -if (insn & (1 << 22)) -gen_helper_usat(tmp, tmp, tmp2); -else -gen_helper_ssat(tmp, tmp, tmp2); -tcg_temp_free_i32(tmp2); -} +tmp2 = tcg_const_i32(sh); +if (insn & (1 << 22)) + gen_helper_usat(tmp, tmp, tmp2); +else + gen_helper_ssat(tmp, tmp, tmp2); +tcg_temp_free_i32(tmp2); store_reg(s, rd, tmp); } else if ((insn & 0x00300fe0) == 0x00200f20) { /* [us]sat16 */ tmp = load_reg(s, rm); sh = (insn >> 16) & 0x1f; -if (sh != 0) { -tmp2 = tcg_const_i32(sh); -if (insn & (1 << 22)) -gen_helper_usat16(tmp, tmp, tmp2); -else -gen_helper_ssat16(tmp, tmp, tmp2); -tcg_temp_free_i32(tmp2); -} +tmp2 = tcg_const_i32(sh); +if (insn & (1 << 22)) + gen_helper_usat16(tmp, tmp, tmp2); +else + gen_helper_ssat16(tmp, tmp, tmp2); +tcg_temp_free_i32(tmp2); store_reg(s, rd, tmp); } else if ((insn & 0x00700fe0) == 0x0fa0) { /* Select bytes. */ -- 1.7.2.3
Re: [Qemu-devel] [PATCH 1/3] block: add resize monitor command
On Wed, Jan 19, 2011 at 05:01:19PM +0100, Kevin Wolf wrote: > Oh, okay, then it makes perfect sense. We should fix 'o' then to use > int64_t. The patch below should do it. > > strtosz (which is used by 'o') has the same problem in git master. > There's a fix by Jes, so be sure to test this on the block branch. The larger sizes work fine on the block branch.
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
> It's something filesystems have to deal with. Real storage is getting > increasingly virtualized. While this didn't matter for the real high > end storage which has been doing this for a long time it's getting more > and more exposed to the filesystem. That includes LVM layouts and > thinly provisioned disk arrays, which are getting increasingly popular. > That doesn't matter the 64k (or until recently 4k) cluster size in qcow2 > is a good idea, we'd want at least a magnitude or two larger extents > to perform well, but it means filesystems really do have to cope with > it. Yes, a fundamental and optimal solution would be changing guest file systems, but it would take a much much longer route to introduce virtualization awareness into all guest file systems and its also requires changing the interface between the guest and the host. > I/O into fully preallocated files uses exactly the same codepath as > doing I/O to the block device, except for a identify logical to physical > block mapping in the block device and a non-trivial one in the > filesysgem. Note that the block mapping is cached and does not affect > the performance. I've published the numbers for qemu in the various > caching modes and all major filesystems a while ago, so I'm not making > this up. Preallocation is not a universal solution here, because it just defeats the other goal: thin provisioning. Moreover, if preallocation is used, it works best for RAW images and makes it unnecessary to use a compact image, which is exactly one goal of FVD, i.e., allowing optionally disabling a compact image data layout without giving up other features, e.g., copy-on-write.
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
On Wed, Jan 19, 2011 at 11:21:07AM -0500, Chunqiang Tang wrote: > Yes, a fundamental and optimal solution would be changing guest file > systems, but it would take a much much longer route to introduce > virtualization awareness into all guest file systems and its also requires > changing the interface between the guest and the host. Actually current filesystems do pretty well on thinly provisioned storage, as long as your extent size is not too small. Starting from extent size in the 64M to 256M range there's almost no difference to non-virtualized storage. > Preallocation is not a universal solution here, because it just defeats > the other goal: thin provisioning. Moreover, if preallocation is used, it > works best for RAW images and makes it unnecessary to use a compact image, > which is exactly one goal of FVD, i.e., allowing optionally disabling a > compact image data layout without giving up other features, e.g., > copy-on-write. Again, sparse images with a large enough allocation size give you almost the same numbers as preallocated images. I've been doing quite a lot of work on TP support in QEMU. Using an XFS filesystem to back the image with an extent size hint in the above mentioned range gives performance withing 1% of fully preallocated images. With the added benefit of allowing to deallocate the space again through the SCSI WRITE_SAME or ATA TRIM commands.
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 07:15 AM, Markus Armbruster wrote: So they interact with KVM (need kvm_state), and they interact with the emulated PCI bus. Could you elaborate on the fundamental difference between the two interactions that makes you choose the (hypothetical) KVM bus over the PCI bus as device parent? It's almost arbitrary, but I would say it's the direction that I/Os flow. But if the underlying observation is that the device tree is not really a tree, you're 100% correct. This is part of why a factory interface that just takes a parent bus is too simplistic. I think we ought to introduce a -pci-device option that is specifically for creating PCI devices that doesn't require a parent bus argument but provides a way to specify stable addressing (for instancing, using a linear index). Regards, Anthony Liguori
Re: [Qemu-devel] [PATCH] Support saturation with shift=0.
On 19 January 2011 16:10, Christophe Lyon wrote: > > This patch fixes corner-case saturations, when the target range is > zero. It merely removes the guard against (sh == 0), and makes: > __ssat(0x87654321, 1) return 0x and set the saturation flag did you mean __ssat(0x87654321, 0) here? (they give the same result, of course, but it's the sh==0 case the patch is changing...) > __usat(0x87654321, 0) return 0 and set the saturation flag > > Signed-off-by: Christophe Lyon Checked against the ARM ARM and tested by random-instruction-sequence generation. (We could have taken the opportunity of adding braces to conform to the coding style while touching these lines, but since the patch isn't changing them, just reindenting, I'm happy not to worry about this.) Reviewed-by: Peter Maydell (for the content of the patch if not the commit message, anyway). -- PMM
[Qemu-devel] [PATCH 1/3] block: add resize monitor command
Add a monitor command that allows resizing of block devices while qemu is running. It uses the existing bdrv_truncate method already used by qemu-img to do it's work. Compared to qemu-img the size parsing is very simplicistic, but I think having a properly numering object is more useful for non-humand monitor users than having the units and relative resize parsing. For SCSI devices the new size can be updated in Linux guests by doing the following shell command: echo > /sys/class/scsi_device/0:0:0:0/device/rescan For ATA devices I don't know of a way to update the block device size in Linux system, and for virtio-blk the next two patches will provide an automatic update of the size when this command is issued on the host. Signed-off-by: Christoph Hellwig Index: qemu/hmp-commands.hx === --- qemu.orig/hmp-commands.hx 2011-01-19 17:47:10.444004409 +0100 +++ qemu/hmp-commands.hx2011-01-19 17:49:51.673254095 +0100 @@ -53,6 +53,25 @@ Quit the emulator. ETEXI { +.name = "resize", +.args_type = "id:s,size:o", +.params = "device size", +.help = "resize a block image", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_resize, +}, + +STEXI +@item resize +@findex resize +Resize a block image while a guest is running. Usually requires guest +action to see the updated size. Resize to a lower size is supported, +but should be used with extreme caution. Note that this command only +resizes image files, it can not resize block devices like LVM volumes. +ETEXI + + +{ .name = "eject", .args_type = "force:-f,device:B", .params = "[-f] device", Index: qemu/blockdev.c === --- qemu.orig/blockdev.c2011-01-19 17:47:10.455004828 +0100 +++ qemu/blockdev.c 2011-01-19 17:49:51.674253955 +0100 @@ -706,3 +706,33 @@ int do_drive_del(Monitor *mon, const QDi return 0; } + +/* + * XXX: replace the QERR_UNDEFINED_ERROR errors with real values once the + * existing QERR_ macro mess is cleaned up. A good example for better + * error reports can be found in the qemu-img resize code. + */ +int do_resize(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, "id"); +int64_t size = qdict_get_int(qdict, "size"); +BlockDriverState *bs; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +if (size < 0) { +qerror_report(QERR_UNDEFINED_ERROR); +return -1; +} + +if (bdrv_truncate(bs, size)) { +qerror_report(QERR_UNDEFINED_ERROR); +return -1; +} + +return 0; +} Index: qemu/blockdev.h === --- qemu.orig/blockdev.h2011-01-19 17:47:10.464012091 +0100 +++ qemu/blockdev.h 2011-01-19 17:49:51.677282590 +0100 @@ -53,5 +53,6 @@ int do_change_block(Monitor *mon, const const char *filename, const char *fmt); int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); +int do_resize(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif Index: qemu/qmp-commands.hx === --- qemu.orig/qmp-commands.hx 2011-01-19 17:47:10.478012371 +0100 +++ qemu/qmp-commands.hx2011-01-19 17:50:07.406016841 +0100 @@ -601,6 +601,34 @@ Example: -> { "execute": "netdev_del", "arguments": { "id": "netdev1" } } <- { "return": {} } + +EQMP + +{ +.name = "block_resize", +.args_type = "id:s,size:o", +.params = "id size", +.help = "resize a block image", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_resize, +}, + +SQMP +block_resize + + +Resize a block image while a guest is running. + +Arguments: + +- "id": the device's ID, must be unique (json-string) +- "size": new size + +Example: + +-> { "execute": "block_resize", "arguments": { "id": "scratch", "size": 1073741824 } } +<- { "return": {} } + EQMP {
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >On 01/18/11 18:09, Anthony Liguori wrote: > >>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>> > The device model topology is 100% a hidden architectural detail. > >>>This is true for the sysbus, it is obviously not the case for PCI and > >>>similarly discoverable buses. There we have a guest-explorable topology > >>>that is currently equivalent to the the qdev layout. > >> > >>But we also don't do PCI passthrough so we really haven't even explored > >>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>qdev-ify it. > > > >It is qdev-ified. It is a normal pci device from qdev's point of view. > > > >BTW: is there any reason why (vfio-based) pci passthrough couldn't > >work with tcg? > > > >>The -device interface is a stable interface. Right now, you don't > >>specify any type of identifier of the pci bus when you create a PCI > >>device. It's implied in the interface. > > > >Wrong. You can specify the bus you want attach the device to via > >bus=. This is true for *every* device, including all pci > >devices. If unspecified qdev uses the first bus it finds. > > > >As long as there is a single pci bus only there is simply no need > >to specify it, thats why nobody does that today. > > Right. In terms of specifying bus=, what are we promising re: > compatibility? Will there always be a pci.0? If we add some > PCI-to-PCI bridges in order to support more devices, is libvirt > support to parse the hierarchy and figure out which bus to put the > device on? The reason we specify 'bus' is that we wanted to be flexible wrt upgrades of libvirt, without needing restarts of QEMU instances it manages. That way we can introduce new functionality into libvirt that relies on it having previously set 'bus' on all active QEMUs. If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to be adding the extra bridges. I'd expect that QEMU provided just the first bridge and then libvirt would specify how many more bridges to create at boot or hotplug them later. So it wouldn't ever need to parse topology. Regards, Daniel
[Qemu-devel] [PATCH 3/3] virtio-blk: tell the guest about size changes
Raise a config change interrupt when the size changed. This allows virtio-blk guest drivers to read-read the information from the config space once it got the config chaged interrupt. Signed-off-by: Christoph Hellwig Index: qemu/hw/virtio-blk.c === --- qemu.orig/hw/virtio-blk.c 2011-01-19 17:47:10.335004828 +0100 +++ qemu/hw/virtio-blk.c2011-01-19 17:50:10.748272881 +0100 @@ -504,6 +504,15 @@ static int virtio_blk_load(QEMUFile *f, return 0; } +static void virtio_blk_change_cb(void *opaque, int reason) +{ +VirtIOBlock *s = opaque; + +if (reason & CHANGE_SIZE) { +virtio_notify_config(&s->vdev); +} +} + VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf) { VirtIOBlock *s; @@ -546,6 +555,7 @@ VirtIODevice *virtio_blk_init(DeviceStat register_savevm(dev, "virtio-blk", virtio_blk_id++, 2, virtio_blk_save, virtio_blk_load, s); bdrv_set_removable(s->bs, 0); +bdrv_set_change_cb(s->bs, virtio_blk_change_cb, s); s->bs->buffer_alignment = conf->logical_block_size; add_boot_device_path(conf->bootindex, dev, "/disk@0,0");
[Qemu-devel] [PATCH 0/3 v2] allow online resizing of block devices
This patchset adds support for online resizing of block devices. The first patch adds a new resize monitor command which call into the existing image resize code. This is the meat of the series and probably needs quite a bit of review and help as I'm not sure about how to implement the error handling for monitor commands correctly. Am I really supposed to add a new QERR_ definition for each possible error? And if yes how am I supposed to define them? The macros for them aren't exactly self-explaining. The second patch adds a way to tell drivers about a resize, and the third one adds a guest notification for config changes to virtio-blk which allows the guest to pick it up without a rescan. I've just sent the corresponding Linux guest driver patch to Rusty. Changes from version 1 to version 2: - also add a QMP command (block_resize) - use the o format for the size in the monitor command - fix typos - use QERR_UNDEFINED_ERROR for errors instead of unstructured strings - remove the CDROM hint check - add a reason argument to the change callback
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 07:11 AM, Markus Armbruster wrote: Gerd Hoffmann writes: On 01/18/11 18:09, Anthony Liguori wrote: On 01/18/2011 10:56 AM, Jan Kiszka wrote: The device model topology is 100% a hidden architectural detail. This is true for the sysbus, it is obviously not the case for PCI and similarly discoverable buses. There we have a guest-explorable topology that is currently equivalent to the the qdev layout. But we also don't do PCI passthrough so we really haven't even explored how that maps in qdev. I don't know if qemu-kvm has attempted to qdev-ify it. It is qdev-ified. It is a normal pci device from qdev's point of view. BTW: is there any reason why (vfio-based) pci passthrough couldn't work with tcg? The -device interface is a stable interface. Right now, you don't specify any type of identifier of the pci bus when you create a PCI device. It's implied in the interface. Wrong. You can specify the bus you want attach the device to via bus=. This is true for *every* device, including all pci devices. If unspecified qdev uses the first bus it finds. As long as there is a single pci bus only there is simply no need to specify it, thats why nobody does that today. Once q35 finally arrives this will change of course. As far as I know, libvirt does it already. I think that's a bad idea from a forward compatibility perspective. Regards, Anthony Liguori
[Qemu-devel] [PATCH 2/3] block: tell drivers about an image resize
Extend the change_cb callback with a reason argument, and use it to tell drivers about size changes. Signed-off-by: Christoph Hellwig Index: qemu/block.c === --- qemu.orig/block.c 2011-01-18 20:54:45.246021572 +0100 +++ qemu/block.c2011-01-18 20:56:54.117255612 +0100 @@ -645,7 +645,7 @@ int bdrv_open(BlockDriverState *bs, cons /* call the change callback */ bs->media_changed = 1; if (bs->change_cb) -bs->change_cb(bs->change_opaque); +bs->change_cb(bs->change_opaque, CHANGE_MEDIA); } return 0; @@ -684,7 +684,7 @@ void bdrv_close(BlockDriverState *bs) /* call the change callback */ bs->media_changed = 1; if (bs->change_cb) -bs->change_cb(bs->change_opaque); +bs->change_cb(bs->change_opaque, CHANGE_MEDIA); } } @@ -1135,6 +1135,8 @@ int bdrv_truncate(BlockDriverState *bs, ret = drv->bdrv_truncate(bs, offset); if (ret == 0) { ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS); +if (bs->change_cb) +bs->change_cb(bs->change_opaque, CHANGE_SIZE); } return ret; } @@ -1366,7 +1368,8 @@ int bdrv_enable_write_cache(BlockDriverS /* XXX: no longer used */ void bdrv_set_change_cb(BlockDriverState *bs, -void (*change_cb)(void *opaque), void *opaque) +void (*change_cb)(void *opaque, int reason), +void *opaque) { bs->change_cb = change_cb; bs->change_opaque = opaque; @@ -1411,7 +1414,7 @@ int bdrv_set_key(BlockDriverState *bs, c /* call the change callback now, we skipped it on open */ bs->media_changed = 1; if (bs->change_cb) -bs->change_cb(bs->change_opaque); +bs->change_cb(bs->change_opaque, CHANGE_MEDIA); } return ret; } Index: qemu/block.h === --- qemu.orig/block.h 2011-01-18 20:57:00.792253448 +0100 +++ qemu/block.h2011-01-18 20:57:09.771282850 +0100 @@ -182,7 +182,8 @@ int bdrv_is_locked(BlockDriverState *bs) void bdrv_set_locked(BlockDriverState *bs, int locked); int bdrv_eject(BlockDriverState *bs, int eject_flag); void bdrv_set_change_cb(BlockDriverState *bs, -void (*change_cb)(void *opaque), void *opaque); +void (*change_cb)(void *opaque, int reason), +void *opaque); void bdrv_get_format(BlockDriverState *bs, char *buf, int buf_size); BlockDriverState *bdrv_find(const char *name); BlockDriverState *bdrv_next(BlockDriverState *bs); Index: qemu/block_int.h === --- qemu.orig/block_int.h 2011-01-18 20:55:43.783009908 +0100 +++ qemu/block_int.h2011-01-18 20:56:58.912004806 +0100 @@ -153,7 +153,7 @@ struct BlockDriverState { int valid_key; /* if true, a valid encryption key has been set */ int sg;/* if true, the device is a /dev/sg* */ /* event callback when inserting/removing */ -void (*change_cb)(void *opaque); +void (*change_cb)(void *opaque, int reason); void *change_opaque; BlockDriver *drv; /* NULL means no media */ @@ -203,6 +203,9 @@ struct BlockDriverState { void *private; }; +#define CHANGE_MEDIA 0x01 +#define CHANGE_SIZE0x02 + struct BlockDriverAIOCB { AIOPool *pool; BlockDriverState *bs; Index: qemu/hw/ide/core.c === --- qemu.orig/hw/ide/core.c 2011-01-18 20:57:17.321260780 +0100 +++ qemu/hw/ide/core.c 2011-01-18 20:59:46.242256451 +0100 @@ -1625,11 +1625,15 @@ static void ide_cfata_metadata_write(IDE } /* called when the inserted state of the media has changed */ -static void cdrom_change_cb(void *opaque) +static void cdrom_change_cb(void *opaque, int reason) { IDEState *s = opaque; uint64_t nb_sectors; +if (!(reason & CHANGE_MEDIA)) { +return; +} + bdrv_get_geometry(s->bs, &nb_sectors); s->nb_sectors = nb_sectors; Index: qemu/hw/sd.c === --- qemu.orig/hw/sd.c 2011-01-18 20:57:48.978261549 +0100 +++ qemu/hw/sd.c2011-01-18 20:59:37.0 +0100 @@ -422,9 +422,14 @@ static void sd_reset(SDState *sd, BlockD sd->pwd_len = 0; } -static void sd_cardchange(void *opaque) +static void sd_cardchange(void *opaque, int reason) { SDState *sd = opaque; + +if (!(reason & CHANGE_MEDIA)) { +return; +} + qemu_set_irq(sd->inserted_cb, bdrv_is_inserted(sd->bdrv)); if (bdrv_is_inserted(sd->bdrv)) { sd_reset(sd, sd->bdrv);
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
> Actually current filesystems do pretty well on thinly provisioned > storage, as long as your extent size is not too small. Starting from > extent size in the 64M to 256M range there's almost no difference to > non-virtualized storage. > > Again, sparse images with a large enough allocation size give you almost > the same numbers as preallocated images. I've been doing quite a lot of > work on TP support in QEMU. Using an XFS filesystem to back the image > with an extent size hint in the above mentioned range gives performance > withing 1% of fully preallocated images. With the added benefit of > allowing to deallocate the space again through the SCSI WRITE_SAME > or ATA TRIM commands. These numbers are very interesting and I would like to read more. Are your detailed results accessible on the Internet? Do you have numbers on the impact of using a large extent size (64-256MB) on thin provisioning (since a guest file system's metadata are written first and are scattered in the virtual disk)?
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote: > On 01/19/2011 07:11 AM, Markus Armbruster wrote: > >Gerd Hoffmann writes: > > > >>On 01/18/11 18:09, Anthony Liguori wrote: > >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >The device model topology is 100% a hidden architectural detail. > This is true for the sysbus, it is obviously not the case for PCI and > similarly discoverable buses. There we have a guest-explorable topology > that is currently equivalent to the the qdev layout. > >>>But we also don't do PCI passthrough so we really haven't even explored > >>>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>>qdev-ify it. > >>It is qdev-ified. It is a normal pci device from qdev's point of view. > >> > >>BTW: is there any reason why (vfio-based) pci passthrough couldn't > >>work with tcg? > >> > >>>The -device interface is a stable interface. Right now, you don't > >>>specify any type of identifier of the pci bus when you create a PCI > >>>device. It's implied in the interface. > >>Wrong. You can specify the bus you want attach the device to via > >>bus=. This is true for *every* device, including all pci > >>devices. If unspecified qdev uses the first bus it finds. > >> > >>As long as there is a single pci bus only there is simply no need to > >>specify it, thats why nobody does that today. Once q35 finally > >>arrives this will change of course. > >As far as I know, libvirt does it already. > > I think that's a bad idea from a forward compatibility perspective. In our past experiance though, *not* specifying attributes like these has also been pretty bad from a forward compatibility perspective too. We're kind of damned either way, so on balance we decided we'd specify every attribute in qdev that's related to unique identification of devices & their inter-relationships. By strictly locking down the topology we were defining, we ought to have a more stable ABI in face of future changes. I accept this might not always work out, so we may have to adjust things over time still. Predicting the future is hard :-) Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: On 01/18/11 18:09, Anthony Liguori wrote: On 01/18/2011 10:56 AM, Jan Kiszka wrote: The device model topology is 100% a hidden architectural detail. This is true for the sysbus, it is obviously not the case for PCI and similarly discoverable buses. There we have a guest-explorable topology that is currently equivalent to the the qdev layout. But we also don't do PCI passthrough so we really haven't even explored how that maps in qdev. I don't know if qemu-kvm has attempted to qdev-ify it. It is qdev-ified. It is a normal pci device from qdev's point of view. BTW: is there any reason why (vfio-based) pci passthrough couldn't work with tcg? The -device interface is a stable interface. Right now, you don't specify any type of identifier of the pci bus when you create a PCI device. It's implied in the interface. Wrong. You can specify the bus you want attach the device to via bus=. This is true for *every* device, including all pci devices. If unspecified qdev uses the first bus it finds. As long as there is a single pci bus only there is simply no need to specify it, thats why nobody does that today. Right. In terms of specifying bus=, what are we promising re: compatibility? Will there always be a pci.0? If we add some PCI-to-PCI bridges in order to support more devices, is libvirt support to parse the hierarchy and figure out which bus to put the device on? Regards, Anthony Liguori Once q35 finally arrives this will change of course. cheers, Gerd
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 2011-01-19 17:57, Anthony Liguori wrote: > On 01/19/2011 07:15 AM, Markus Armbruster wrote: >> So they interact with KVM (need kvm_state), and they interact with the >> emulated PCI bus. Could you elaborate on the fundamental difference >> between the two interactions that makes you choose the (hypothetical) >> KVM bus over the PCI bus as device parent? >> > > It's almost arbitrary, but I would say it's the direction that I/Os flow. We need both if we want KVM buses. They are useless for enumerating the device on that bus the guest sees it on. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
On Wed, Jan 19, 2011 at 12:08:41PM -0500, Chunqiang Tang wrote: > These numbers are very interesting and I would like to read more. Are > your detailed results accessible on the Internet? Do you have numbers on > the impact of using a large extent size (64-256MB) on thin provisioning > (since a guest file system's metadata are written first and are scattered > in the virtual disk)? It's still work in progress. I will publish patches and numbers in a few weeks.
[Qemu-devel] Re: [PATCH 0/3 v2] allow online resizing of block devices
Christoph Hellwig wrote: > This patchset adds support for online resizing of block devices. > > The first patch adds a new resize monitor command which call into > the existing image resize code. This is the meat of the series > and probably needs quite a bit of review and help as I'm not sure > about how to implement the error handling for monitor commands > correctly. Am I really supposed to add a new QERR_ definition > for each possible error? And if yes how am I supposed to define > them? The macros for them aren't exactly self-explaining. > > The second patch adds a way to tell drivers about a resize, and the > third one adds a guest notification for config changes to virtio-blk > which allows the guest to pick it up without a rescan. I've just sent > the corresponding Linux guest driver patch to Rusty. > > Changes from version 1 to version 2: > - also add a QMP command (block_resize) > - use the o format for the size in the monitor command > - fix typos > - use QERR_UNDEFINED_ERROR for errors instead of unstructured strings > - remove the CDROM hint check > - add a reason argument to the change callback Does this handle the change of the rtc data? I can't see how/where. (Not that I know if it is important at all until reboot). cmos_init_hb() uses the size to guess the right geometry of the drives, if we change the size of the drive, should we change that one? Later, Juan. hw/pc.c static void pc_cmos_init_late(void *opaque) { pc_cmos_init_late_arg *arg = opaque; ISADevice *s = arg->rtc_state; pint val; BlockDriverState *hd_table[4]; int i; ide_get_bs(hd_table, arg->idebus0); ide_get_bs(hd_table + 2, arg->idebus1); rtc_set_memory(s, 0x12, (hd_table[0] ? 0xf0 : 0) | (hd_table[1] ? 0x0f : 0)); if (hd_table[0]) cmos_init_hd(0x19, 0x1b, hd_table[0], s); if (hd_table[1]) cmos_init_hd(0x1a, 0x24, hd_table[1], s); . } static void cmos_init_hd(int type_ofs, int info_ofs, BlockDriverState *hd, ISADevice *s) { int cylinders, heads, sectors; bdrv_get_geometry_hint(hd, &cylinders, &heads, §ors); rtc_set_memory(s, type_ofs, 47); rtc_set_memory(s, info_ofs, cylinders); rtc_set_memory(s, info_ofs + 1, cylinders >> 8); rtc_set_memory(s, info_ofs + 2, heads);
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >On 01/18/11 18:09, Anthony Liguori wrote: > >>On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>> > The device model topology is 100% a hidden architectural detail. > >>>This is true for the sysbus, it is obviously not the case for PCI and > >>>similarly discoverable buses. There we have a guest-explorable topology > >>>that is currently equivalent to the the qdev layout. > >> > >>But we also don't do PCI passthrough so we really haven't even explored > >>how that maps in qdev. I don't know if qemu-kvm has attempted to > >>qdev-ify it. > > > >It is qdev-ified. It is a normal pci device from qdev's point of view. > > > >BTW: is there any reason why (vfio-based) pci passthrough couldn't > >work with tcg? > > > >>The -device interface is a stable interface. Right now, you don't > >>specify any type of identifier of the pci bus when you create a PCI > >>device. It's implied in the interface. > > > >Wrong. You can specify the bus you want attach the device to via > >bus=. This is true for *every* device, including all pci > >devices. If unspecified qdev uses the first bus it finds. > > > >As long as there is a single pci bus only there is simply no need > >to specify it, thats why nobody does that today. > > Right. In terms of specifying bus=, what are we promising re: > compatibility? Will there always be a pci.0? If we add some > PCI-to-PCI bridges in order to support more devices, is libvirt > support to parse the hierarchy and figure out which bus to put the > device on? The answer to your questions probably differ depending on whether '-nodefconfig' and '-nodefaults' are set on the command line. If they are set, then I'd expect to only ever see one PCI bus with name pci.0 forever more, unless i explicitly ask for more. If they are not set, then you might expect to see multiple PCI buses by appear by magic Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 11:35 AM, Daniel P. Berrange wrote: On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: On 01/18/11 18:09, Anthony Liguori wrote: On 01/18/2011 10:56 AM, Jan Kiszka wrote: The device model topology is 100% a hidden architectural detail. This is true for the sysbus, it is obviously not the case for PCI and similarly discoverable buses. There we have a guest-explorable topology that is currently equivalent to the the qdev layout. But we also don't do PCI passthrough so we really haven't even explored how that maps in qdev. I don't know if qemu-kvm has attempted to qdev-ify it. It is qdev-ified. It is a normal pci device from qdev's point of view. BTW: is there any reason why (vfio-based) pci passthrough couldn't work with tcg? The -device interface is a stable interface. Right now, you don't specify any type of identifier of the pci bus when you create a PCI device. It's implied in the interface. Wrong. You can specify the bus you want attach the device to via bus=. This is true for *every* device, including all pci devices. If unspecified qdev uses the first bus it finds. As long as there is a single pci bus only there is simply no need to specify it, thats why nobody does that today. Right. In terms of specifying bus=, what are we promising re: compatibility? Will there always be a pci.0? If we add some PCI-to-PCI bridges in order to support more devices, is libvirt support to parse the hierarchy and figure out which bus to put the device on? The answer to your questions probably differ depending on whether '-nodefconfig' and '-nodefaults' are set on the command line. If they are set, then I'd expect to only ever see one PCI bus with name pci.0 forever more, unless i explicitly ask for more. If they are not set, then you might expect to see multiple PCI buses by appear by magic Yeah, we can't promise that. If you use -M pc, you aren't guaranteed a stable PCI bus topology even with -nodefconfig/-nodefaults. Regards, Anthony Liguori Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 11:19 AM, Daniel P. Berrange wrote: In our past experiance though, *not* specifying attributes like these has also been pretty bad from a forward compatibility perspective too. We're kind of damned either way, so on balance we decided we'd specify every attribute in qdev that's related to unique identification of devices& their inter-relationships. By strictly locking down the topology we were defining, we ought to have a more stable ABI in face of future changes. I accept this might not always work out, so we may have to adjust things over time still. Predicting the future is hard :-) There are two distinct things here: 1) creating exactly the same virtual machine (like for migration) given a newer version of QEMU 2) creating a reasonably similar virtual machine given a newer version of QEMU For (1), you cannot use -M pc. You should use things like bus=X,addr=Y much better is for QEMU to dump a device file and to just reuse that instead of guessing what you need. For (2), you cannot use bus=X,addr=Y because it makes assumptions about the PCI topology which may change in newer -M pc's. I think libvirt needs to treat this two scenarios differently to support forwards compatibility. Regards, Anthony Liguori Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 11:01 AM, Daniel P. Berrange wrote: The reason we specify 'bus' is that we wanted to be flexible wrt upgrades of libvirt, without needing restarts of QEMU instances it manages. That way we can introduce new functionality into libvirt that relies on it having previously set 'bus' on all active QEMUs. If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to be adding the extra bridges. I'd expect that QEMU provided just the first bridge and then libvirt would specify how many more bridges to create at boot or hotplug them later. So it wouldn't ever need to parse topology. Yeah, but replacing the main chipset will certainly change the PCI topology such that if you're specifying bus=X and addr=X and then also using -M pc, unless you're parsing the default topology to come up with the addressing, it will break in the future. That's why I think something simpler like a linear index that QEMU maps to a static location in the topology is probably the best future proof interface. Regards, Anthony Liguori Regards, Daniel
Re: [Qemu-devel] Missing roms/seabios/Makefile and roms/vgabios/Makefile
On Wed, Jan 19, 2011 at 12:44 PM, Mateusz Loskot wrote: > Hi, > > Running ./configure (under MinGW/MSYS) and symlink gives up trying to > create links to non-existing Makefiles in > roms/seabios/Makefile > roms/vgabios/Makefile Those directiories are actually git submodules, when you run 'git submodule update', they get populated. Maybe configure should check if the directories are OK and disables building ROMs if not.
[Qemu-devel] libcacard as a git submodule
Any objections? Alon
[Qemu-devel] usb redirection status report
Hi All, As most of you know I'm working on usb redirection (making client usb devices accessible in guests over the network). I thought it would be a good idea to write a short status report. So fat the following has been done: * written and posted a network protocol for usb redir. over the network, and send this to the list. * a 2nd revison is ready incorporating all comments from the mailinglist discussion. I'll post this to the list soon. * looked at using some pre-existing marshalling / demarshalling solution, specifically looked at google's protocol buffers. Not an option as this uses c++. There is a third party C version of protocol buffers, but this cannot deal with streaming input, making it not usable for usb redirection. * Designed an API for a transport independent, marshaller / demarshaller for the protocol. * Implemented a roll my own marshaller / demarshaller for the protocol. * Designed an API for a (transport indepenent) usb-host object/library which can be incorporated into spice-client, or a vnc client, etc. To easily add usb host capabilities to client-applications. * Implemented a skeleton version of the usb-host (still need to implement most usb redir commands). * Wrote a standalone usb-host application using standard tcp/ip[v4|v6] as transport, as proof of concept / for testing purposes: usbredirserver To Do: * Finish usb-host library * Write a test client (usb-guest) for testing * Implement a transport independent qemu usb-host talking the usb redir protocol. * Hook up a monitor command to hookup the qemu usb-redir-host connect to a usbredirserver * Test / debug / test * Integrate with Spice (use a spice channel as transport) * Integrate with vnc? Regards, Hans
[Qemu-devel] Re: libcacard as a git submodule
On 01/19/2011 12:11 PM, Alon Levy wrote: Any objections? My primary concern is that it is easy to make changes to QEMU that impact the interface between QEMU and libcacard so I'd actually prefer it not be a submodule. Regards, Anthony Liguori Alon
[Qemu-devel] Re: MIPS, io-thread, icount and wfi
On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote: > On 2011-01-18 01:19, Edgar E. Iglesias wrote: > > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote: > >> Hi, > >> > >> I'm running an io-thread enabled qemu-system-mipsel with icount. > >> When the guest (linux) goes to sleep through the wait insn (waiting > >> to be woken up by future timer interrupts), the thing deadlocks. > >> > >> IIUC, this is because vm timers are driven by icount, but the CPU is > >> halted so icount makes no progress and time stands still. > >> > >> I've locally disabled vcpu halting when icount is enabled, that > >> works around my problem but of course makes qemu consume 100% host cpu. > >> > >> I don't know why I only see this problem with io-thread builds? > >> Could be related timing and luck. > >> > >> Would be interesting to know if someone has any info on how this was > >> intended to work (if it was)? And if there are ideas for better > >> workarounds or fixes that don't disable vcpu halting entirely. > > > > Hi, > > > > I've found the problem. For some reason io-thread builds use a > > static timeout for wait loops. The entire chunk of code that > > makes sure qemu_icount makes forward progress when the CPU's > > are idle has been ifdef'ed away... > > > > This fixes the problem for me, hopefully without affecting > > io-thread runs without icount. > > > > commit 0f4f3a919952500b487b438c5520f07a1c6be35b > > Author: Edgar E. Iglesias > > Date: Tue Jan 18 01:01:57 2011 +0100 > > > > qemu-timer: Fix timeout calc for io-thread with icount > > > > Make sure we always make forward progress with qemu_icount to > > avoid deadlocks. For io-thread, use the static 1000 timeout > > only if icount is disabled. > > > > Signed-off-by: Edgar E. Iglesias > > > > diff --git a/qemu-timer.c b/qemu-timer.c > > index 95814af..db1ec49 100644 > > --- a/qemu-timer.c > > +++ b/qemu-timer.c > > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void) > > } > > } > > > > -#ifndef CONFIG_IOTHREAD > > static int64_t qemu_icount_delta(void) > > { > > if (!use_icount) { > > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void) > > return cpu_get_icount() - cpu_get_clock(); > > } > > } > > -#endif > > > > /* enable cpu_get_ticks() */ > > void cpu_enable_ticks(void) > > @@ -1077,9 +1075,17 @@ void quit_timers(void) > > > > int qemu_calculate_timeout(void) > > { > > -#ifndef CONFIG_IOTHREAD > > int timeout; > > > > +#ifdef CONFIG_IOTHREAD > > +/* When using icount, making forward progress with qemu_icount when the > > + guest CPU is idle is critical. We only use the static io-thread > > timeout > > + for non icount runs. */ > > +if (!use_icount) { > > +return 1000; > > +} > > +#endif > > + > > if (!vm_running) > > timeout = 5000; > > else { > > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void) > > } > > > > return timeout; > > -#else /* CONFIG_IOTHREAD */ > > -return 1000; > > -#endif > > } > > > > > > > > This logic and timeout values were imported on iothread merge. And I bet > at least the timeout value of 1s (vs. 5s) can still be found in > qemu-kvm. Maybe someone over there can remember the rationales behind > choosing this value. > > Jan This timeout is for the main select() call. So there is not a lot of reasoning, how long to wait when there's no activity on the file descriptors.
Re: [Qemu-devel] [Bug 704904] [NEW] No rule to make target ../libhw32/virtio.o
On Wed, Jan 19, 2011 at 1:57 PM, Mateusz Łoskot wrote: > Public bug reported: > > Building qemu from current git using 32-bit MinGW (installer from > 2010-11-07) on Windows Vista (64-bit) fails with the following error: > > make[1]: *** No rule to make target `../libhw32/virtio.o', needed by > `qemu.exe'. Stop. > make: *** [subdir-i386-softmmu] Error 2 This should have been fixed by 0601740a5db12ea7ae0f2f7826f0cfb05854500a. I don't see a line like: GEN config-all-devices.mak Please delete config-all-devices.mak and try again.
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote: > On 01/19/2011 11:01 AM, Daniel P. Berrange wrote: > > > >The reason we specify 'bus' is that we wanted to be flexible wrt > >upgrades of libvirt, without needing restarts of QEMU instances > >it manages. That way we can introduce new functionality into > >libvirt that relies on it having previously set 'bus' on all > >active QEMUs. > > > >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to > >be adding the extra bridges. I'd expect that QEMU provided just > >the first bridge and then libvirt would specify how many more > >bridges to create at boot or hotplug them later. So it wouldn't > >ever need to parse topology. > > Yeah, but replacing the main chipset will certainly change the PCI > topology such that if you're specifying bus=X and addr=X and then > also using -M pc, unless you're parsing the default topology to come > up with the addressing, it will break in the future. We never use a bare '-M pc' though, we always canonicalize to one of the versioned forms. So if we run '-M pc-0.12', then neither the main PCI chipset nor topology would have changed in newer QEMU. Of course if we deployed a new VM with '-M pc-0.20' that might have new PCI chipset, so bus=pci.0 might have different meaning that it did when used with '-M pc-0.12', but I don't think that's an immediate problem Regards, Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote: > On 01/19/2011 11:35 AM, Daniel P. Berrange wrote: > >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote: > >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote: > >>>On 01/18/11 18:09, Anthony Liguori wrote: > On 01/18/2011 10:56 AM, Jan Kiszka wrote: > >>The device model topology is 100% a hidden architectural detail. > >This is true for the sysbus, it is obviously not the case for PCI and > >similarly discoverable buses. There we have a guest-explorable topology > >that is currently equivalent to the the qdev layout. > But we also don't do PCI passthrough so we really haven't even explored > how that maps in qdev. I don't know if qemu-kvm has attempted to > qdev-ify it. > >>>It is qdev-ified. It is a normal pci device from qdev's point of view. > >>> > >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't > >>>work with tcg? > >>> > The -device interface is a stable interface. Right now, you don't > specify any type of identifier of the pci bus when you create a PCI > device. It's implied in the interface. > >>>Wrong. You can specify the bus you want attach the device to via > >>>bus=. This is true for *every* device, including all pci > >>>devices. If unspecified qdev uses the first bus it finds. > >>> > >>>As long as there is a single pci bus only there is simply no need > >>>to specify it, thats why nobody does that today. > >>Right. In terms of specifying bus=, what are we promising re: > >>compatibility? Will there always be a pci.0? If we add some > >>PCI-to-PCI bridges in order to support more devices, is libvirt > >>support to parse the hierarchy and figure out which bus to put the > >>device on? > >The answer to your questions probably differ depending on > >whether '-nodefconfig' and '-nodefaults' are set on the > >command line. If they are set, then I'd expect to only > >ever see one PCI bus with name pci.0 forever more, unless > >i explicitly ask for more. If they are not set, then you > >might expect to see multiple PCI buses by appear by magic > > Yeah, we can't promise that. If you use -M pc, you aren't > guaranteed a stable PCI bus topology even with > -nodefconfig/-nodefaults. That's why we never use '-M pc' when actually invoking QEMU. If the user specifies 'pc' in the XML, we canonicalize that to the versioned alternative like 'pc-0.12' before invoking QEMU. We also expose the list of versioned machines to apps so they can do canonicalization themselves if desired. Regards, Daniel
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On 01/19/2011 12:52 PM, Daniel P. Berrange wrote: On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote: On 01/19/2011 11:01 AM, Daniel P. Berrange wrote: The reason we specify 'bus' is that we wanted to be flexible wrt upgrades of libvirt, without needing restarts of QEMU instances it manages. That way we can introduce new functionality into libvirt that relies on it having previously set 'bus' on all active QEMUs. If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to be adding the extra bridges. I'd expect that QEMU provided just the first bridge and then libvirt would specify how many more bridges to create at boot or hotplug them later. So it wouldn't ever need to parse topology. Yeah, but replacing the main chipset will certainly change the PCI topology such that if you're specifying bus=X and addr=X and then also using -M pc, unless you're parsing the default topology to come up with the addressing, it will break in the future. We never use a bare '-M pc' though, we always canonicalize to one of the versioned forms. So if we run '-M pc-0.12', then neither the main PCI chipset nor topology would have changed in newer QEMU. Of course if we deployed a new VM with '-M pc-0.20' that might have new PCI chipset, so bus=pci.0 might have different meaning that it did when used with '-M pc-0.12', but I don't think that's an immediate problem Right, but you expose bus addressing via the XML, no? That means that if a user specifies something like '1.0', and you translate that to bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used for the integrated 3D VGA graphics adapter, the guest creation will fail. If you expose topological configuration to the user, the guest will not continue working down the road unless you come up with a scheme where you map addresses to a different address range for newer pcs. Regards, Anthony Liguori Regards, Daniel
[Qemu-devel] IDE development
Hello, For a project we're slowly making our own operating system. My task is the file system. We plan on using the FatFs file system. My question is how to link FatFs to the IDE Hard Drive QEMU emulates? I cannot find any documentation on how to access the disk, API, calls, ect. Any search typically returns posts to folks trying to mound loopback devices or additions to QEMU. I get a lot of hits for the addition of SATA and so on. I just need to bind read and write to qemu's hard drive interface. Any pointers in the right direction would be appreciated. Thanks, Steve
[Qemu-devel] Re: MIPS, io-thread, icount and wfi
On Wed, Jan 19, 2011 at 03:02:26PM -0200, Marcelo Tosatti wrote: > On Tue, Jan 18, 2011 at 11:00:57AM +0100, Jan Kiszka wrote: > > On 2011-01-18 01:19, Edgar E. Iglesias wrote: > > > On Mon, Jan 17, 2011 at 11:03:08AM +0100, Edgar E. Iglesias wrote: > > >> Hi, > > >> > > >> I'm running an io-thread enabled qemu-system-mipsel with icount. > > >> When the guest (linux) goes to sleep through the wait insn (waiting > > >> to be woken up by future timer interrupts), the thing deadlocks. > > >> > > >> IIUC, this is because vm timers are driven by icount, but the CPU is > > >> halted so icount makes no progress and time stands still. > > >> > > >> I've locally disabled vcpu halting when icount is enabled, that > > >> works around my problem but of course makes qemu consume 100% host cpu. > > >> > > >> I don't know why I only see this problem with io-thread builds? > > >> Could be related timing and luck. > > >> > > >> Would be interesting to know if someone has any info on how this was > > >> intended to work (if it was)? And if there are ideas for better > > >> workarounds or fixes that don't disable vcpu halting entirely. > > > > > > Hi, > > > > > > I've found the problem. For some reason io-thread builds use a > > > static timeout for wait loops. The entire chunk of code that > > > makes sure qemu_icount makes forward progress when the CPU's > > > are idle has been ifdef'ed away... > > > > > > This fixes the problem for me, hopefully without affecting > > > io-thread runs without icount. > > > > > > commit 0f4f3a919952500b487b438c5520f07a1c6be35b > > > Author: Edgar E. Iglesias > > > Date: Tue Jan 18 01:01:57 2011 +0100 > > > > > > qemu-timer: Fix timeout calc for io-thread with icount > > > > > > Make sure we always make forward progress with qemu_icount to > > > avoid deadlocks. For io-thread, use the static 1000 timeout > > > only if icount is disabled. > > > > > > Signed-off-by: Edgar E. Iglesias > > > > > > diff --git a/qemu-timer.c b/qemu-timer.c > > > index 95814af..db1ec49 100644 > > > --- a/qemu-timer.c > > > +++ b/qemu-timer.c > > > @@ -110,7 +110,6 @@ static int64_t cpu_get_clock(void) > > > } > > > } > > > > > > -#ifndef CONFIG_IOTHREAD > > > static int64_t qemu_icount_delta(void) > > > { > > > if (!use_icount) { > > > @@ -124,7 +123,6 @@ static int64_t qemu_icount_delta(void) > > > return cpu_get_icount() - cpu_get_clock(); > > > } > > > } > > > -#endif > > > > > > /* enable cpu_get_ticks() */ > > > void cpu_enable_ticks(void) > > > @@ -1077,9 +1075,17 @@ void quit_timers(void) > > > > > > int qemu_calculate_timeout(void) > > > { > > > -#ifndef CONFIG_IOTHREAD > > > int timeout; > > > > > > +#ifdef CONFIG_IOTHREAD > > > +/* When using icount, making forward progress with qemu_icount when > > > the > > > + guest CPU is idle is critical. We only use the static io-thread > > > timeout > > > + for non icount runs. */ > > > +if (!use_icount) { > > > +return 1000; > > > +} > > > +#endif > > > + > > > if (!vm_running) > > > timeout = 5000; > > > else { > > > @@ -1110,8 +1116,5 @@ int qemu_calculate_timeout(void) > > > } > > > > > > return timeout; > > > -#else /* CONFIG_IOTHREAD */ > > > -return 1000; > > > -#endif > > > } > > > > > > > > > > > > > This logic and timeout values were imported on iothread merge. And I bet > > at least the timeout value of 1s (vs. 5s) can still be found in > > qemu-kvm. Maybe someone over there can remember the rationales behind > > choosing this value. > > > > Jan > > This timeout is for the main select() call. So there is not a lot > of reasoning, how long to wait when there's no activity on the file > descriptors. OK, I suspected something like that. Thanks both of you for the info. I'll give people a couple of days to complain at the patch, if noone does I'll apply it. Cheers
Re: [Qemu-devel] [PATCH] target-arm: Fix garbage collection of temporaries in Neon emulation.
On 19 January 2011 14:37, Christophe Lyon wrote: > Here is an updated patch which will hopefully not be mangled by my mailer. > > Fix garbage collection of temporaries in Neon emulation. I've tested this patch and it does indeed fix the problems with VMULL and friends (I was seeing assertions/hangs). I've tested with random instruction sequence generation and with this patch the non-scalar forms of VMLAL, VMLSL, VQDMLAL, VQDMLSL, VMULL, VQDMULL now all pass. The scalar forms now pass random-sequence testing with the addition of a patch from the qemu-meego tree. Since I have effectively just tested that meego patch I'll post it to the list in a moment. Reviewed-by: Peter Maydell I would personally prefer slightly less terse commit messages (for instance it might be nice to list the affected instructions in this case). The convention is also to preface the summary line with the file or directory affected, ie "target-arm: Fix garbage collection of temporaries in Neon emulation". -- PMM
Re: [Qemu-devel] [PULL] piix, pci, qdev
On Wed, Jan 19, 2011 at 04:14:48PM +0100, Christoph Hellwig wrote: > > > pci: fix migration path for devices behind bridges > > This patch breaks starting qemu for me on 32-bit x86 with the following > assert: Yes, sorry about that. I sent a fix, could you check out it please? > qemu-system-x86_64: savevm.c:1129: register_savevm_live: Assertion > `!se->compat || se->instance_id == 0' failed. > > my qemu command line is: > > /opt/qemu/bin/qemu-system-x86_64 \ > -m 1500 \ > -enable-kvm \ > -drive if=none,file=/dev/vg01/qemu-root,cache=none,aio=threads,id=root \ > -drive if=none,file=/dev/vg01/qemu-data,cache=none,aio=threads,id=test \ > -drive if=none,file=/home/test1.img,cache=none,id=scratch \ > -device virtio-blk-pci,drive=root \ > -device virtio-blk-pci,drive=test \ > -device virtio-blk-pci,drive=scratch \ > -append "root=/dev/vda console=tty0 console=ttyS0,38400n8" \ > -monitor tcp:127.0.0.1:31337,server,nowait \ > -kernel arch/x86/boot/bzImage \ > -nographic > > btw, I can't find a patch for that commit anywhere in the qemu-devel > archives. Did it get reviewed anywhere at all?
[Qemu-devel] [PATCH] target-arm: Fix loading of scalar value for Neon multiply-by-scalar
Fix the register and part of register we get the scalar from in the various "multiply vector by scalar" ops (VMUL by scalar and friends). Signed-off-by: Peter Maydell --- target-arm/translate.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index c60cd18..0c2856a 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -3608,14 +3608,14 @@ static inline TCGv neon_get_scalar(int size, int reg) { TCGv tmp; if (size == 1) { -tmp = neon_load_reg(reg >> 1, reg & 1); -} else { -tmp = neon_load_reg(reg >> 2, (reg >> 1) & 1); -if (reg & 1) { -gen_neon_dup_low16(tmp); -} else { +tmp = neon_load_reg(reg & 7, reg >> 4); +if (reg & 8) { gen_neon_dup_high16(tmp); +} else { +gen_neon_dup_low16(tmp); } +} else { +tmp = neon_load_reg(reg & 15, reg >> 4); } return tmp; } -- 1.6.3.3
[Qemu-devel] [PATCH] pci: fix device paths
Patch a6a7005d14b3c32d4864a718fb1cb19c789f58a5 generated broken device paths. We snprintf with a length shorter than the output, so the last character is discarded and replaced by the null byte. Fix it up by snprintf to a buffer which is larger by 1 byte and then memcpy the data (without the null byte) to where we need it. Reported-by: Christoph Hellwig Signed-off-by: Michael S. Tsirkin --- This fixes the issue for me. Could you ack pls? hw/pci.c | 16 1 files changed, 12 insertions(+), 4 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 8d0e3df..c77f6e9 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -2032,10 +2032,13 @@ static char *pcibus_get_dev_path(DeviceState *dev) * domain:Bus:Slot.Func for systems without nested PCI bridges. * Slot.Function list specifies the slot and function numbers for all * devices on the path from root to the specific device. */ -int domain_len = strlen(":00"); -int slot_len = strlen(":SS.F"); +char domain[] = ":00"; +char slot[] = ":SS.F"; +int domain_len = sizeof domain - 1 /* For '\0' */; +int slot_len = sizeof slot - 1 /* For '\0' */; int path_len; char *path, *p; +int s; /* Calculate # of slots on path between device and root. */; slot_depth = 0; @@ -2050,14 +2053,19 @@ static char *pcibus_get_dev_path(DeviceState *dev) path[path_len] = '\0'; /* First field is the domain. */ -snprintf(path, domain_len, "%04x:00", pci_find_domain(d->bus)); +s = snprintf(domain, sizeof domain, "%04x:00", pci_find_domain(d->bus)); +assert(s == domain_len); +memcpy(path, domain, domain_len); /* Fill in slot numbers. We walk up from device to root, so need to print * them in the reverse order, last to first. */ p = path + path_len; for (t = d; t; t = t->bus->parent_dev) { p -= slot_len; -snprintf(p, slot_len, ":%02x.%x", PCI_SLOT(t->devfn), PCI_FUNC(d->devfn)); +s = snprintf(slot, sizeof slot, ":%02x.%x", + PCI_SLOT(t->devfn), PCI_FUNC(d->devfn)); +assert(s == slot_len); +memcpy(p, slot, slot_len); } return path; -- 1.7.3.2.91.g446ac
Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori wrote: > On 01/19/2011 07:15 AM, Markus Armbruster wrote: >> >> So they interact with KVM (need kvm_state), and they interact with the >> emulated PCI bus. Could you elaborate on the fundamental difference >> between the two interactions that makes you choose the (hypothetical) >> KVM bus over the PCI bus as device parent? >> > > It's almost arbitrary, but I would say it's the direction that I/Os flow. > > But if the underlying observation is that the device tree is not really a > tree, you're 100% correct. This is part of why a factory interface that > just takes a parent bus is too simplistic. > > I think we ought to introduce a -pci-device option that is specifically for > creating PCI devices that doesn't require a parent bus argument but provides > a way to specify stable addressing (for instancing, using a linear index). I think kvm_state should not be a property of any device or bus. It should be split to more logical pieces. Some parts of it could remain in CPUState, because they are associated with a VCPU. Also, for example irqfd could be considered to be similar object to char or block devices provided by QEMU to devices. Would it make sense to introduce new host types for passing parts of kvm_state to devices? I'd also make coalesced MMIO stuff part of memory object. We are not passing any state references when using cpu_physical_memory_rw(), but that could be changed.
[Qemu-devel] Re: [PATCH 1/2] prep: Remove bogus BIOS size check
Am 18.01.2011 um 22:43 schrieb Hervé Poussineau: From: Andreas Färber r3480 added this check to account for the entry vector 0xfff00100 to be available for CPUs that need it. Today however, the NIP is not yet initialized at this point (zero), so the check always triggers. Moreover, BIOS size check is already done previously, so this part can be removed too. Cc: Alexander Graf Signed-off-by: Andreas Färber Signed-off-by: Hervé Poussineau --- hw/ppc_prep.c |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/hw/ppc_prep.c b/hw/ppc_prep.c index 1492266..6b22122 100644 --- a/hw/ppc_prep.c +++ b/hw/ppc_prep.c @@ -600,9 +600,6 @@ static void ppc_prep_init (ram_addr_t ram_size, if (filename) { qemu_free(filename); } -if (env->nip < 0xFFF8 && bios_size < 0x0010) { -hw_error("PowerPC 601 / 620 / 970 need a 1MB BIOS\n"); -} I've been thinking if we could replace this with a check for env- >excep_prefix + env->hreset_vector or similar but haven't found the time to try it out yet. Andreas if (linux_boot) { kernel_base = KERNEL_LOAD_ADDR; -- 1.7.2.3
[Qemu-devel] [PATCH 1/5] Fast Virtual Disk (FVD) Proposal Part 1
Part 1 of the block device driver for the proposed FVD image format. Multiple patches are used in order to manage the size of each patch. This patch includes existing files that are modified by FVD. See the related discussions at http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg00426.html . Signed-off-by: Chunqiang Tang --- Makefile | 10 +--- Makefile.objs|1 + block.c | 12 +- block_int.h |5 ++- configure|2 +- qemu-img-cmds.hx |6 + qemu-img.c | 62 - qemu-io.c|3 ++ qemu-option.c|4 +++ qemu-tool.c | 36 --- 10 files changed, 81 insertions(+), 60 deletions(-) diff --git a/Makefile b/Makefile index 6d601ee..da4d777 100644 --- a/Makefile +++ b/Makefile @@ -151,13 +151,15 @@ version-obj-$(CONFIG_WIN32) += version.o ## qemu-img.o: qemu-img-cmds.h -qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o cmd.o: $(GENERATED_HEADERS) +qemu-img.o qemu-tool.o qemu-nbd.o qemu-io.o cmd.o qemu-test.o: $(GENERATED_HEADERS) -qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-img$(EXESUF): qemu-img.o qemu-tool.o qemu-tool-time.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o -qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-nbd$(EXESUF): qemu-nbd.o qemu-tool.o qemu-tool-time.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o -qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o +qemu-io$(EXESUF): qemu-io.o cmd.o qemu-tool.o qemu-tool-time.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o + +qemu-test$(EXESUF): qemu-test.o qemu-tool.o qemu-error.o $(oslib-obj-y) $(trace-obj-y) $(block-obj-y) $(qobject-obj-y) $(version-obj-y) qemu-timer-common.o qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx $(call quiet-command,sh $(SRC_PATH)/hxtool -h < $< > $@," GEN $@") diff --git a/Makefile.objs b/Makefile.objs index c3e52c5..c0c1155 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -23,6 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o +block-nested-y += blksim.o fvd.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block.c b/block.c index ff2795b..856bb1a 100644 --- a/block.c +++ b/block.c @@ -58,7 +58,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors); -static QTAILQ_HEAD(, BlockDriverState) bdrv_states = +QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); static QLIST_HEAD(, BlockDriver) bdrv_drivers = @@ -768,7 +768,7 @@ int bdrv_commit(BlockDriverState *bs) if (!drv) return -ENOMEDIUM; - + if (!bs->backing_hd) { return -ENOTSUP; } @@ -1538,7 +1538,7 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors) * 'nb_sectors' is the max value 'pnum' should be set to. */ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, - int *pnum) +int *pnum) { int64_t n; if (!bs->drv->bdrv_is_allocated) { @@ -2050,9 +2050,9 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, cb, opaque); if (ret) { - /* Update stats even though technically transfer has not happened. */ - bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE; - bs->rd_ops ++; +/* Update stats even though technically transfer has not happened. */ +bs->rd_bytes += (unsigned) nb_sectors * BDRV_SECTOR_SIZE; +bs->rd_ops ++; } return ret; diff --git a/block_int.h b/block_int.h index 12663e8..2343d07 100644 --- a/block_int.h +++ b/block_int.h @@ -28,8 +28,8 @@ #include "qemu-option.h" #include "qemu-queue.h" -#define BLOCK_FLAG_ENCRYPT 1 -#define BLOCK_FLAG_COMPAT6 4 +#define BLOCK_FLAG_ENCRYPT1 +#define BLOCK_FLAG_COMPAT64 #define BLOCK_OPT_SIZE "size" #define BLOCK_OPT_ENCRYPT "encryption" @@ -98,6 +98,7 @@ struc
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
Chunqiang Tang wrote: > > >> Moreover, using a host file system not only adds overhead, but > > >> also introduces data integrity issues. Specifically, if I/Os uses > O_DSYNC, > > >> it may be too slow. If I/Os use O_DIRECT, it cannot guarantee data > > >> integrity in the event of a host crash. See > > >> http://lwn.net/Articles/348739/ . > > > > > > You have the same issue with O_DIRECT when using a raw disk device > > > too. That is, O_DIRECT on a raw device does not guarantee integrity > > > in the event of a host crash either, for mostly the same reasons. > > > > QEMU has semantics that use O_DIRECT safely; there is no issue here. > > When a drive is added with cache=none, QEMU not only uses O_DIRECT but > > also advertises an enabled write cache to the guest. > > > > The guest *must* flush the cache when it wants to ensure data is > > stable. In the event of a host crash, all, some, or none of the I/O > > since the last flush may have made it to disk. Each of these > > possibilities is fair game since the guest may only depend on writes > > being on disk if they completed and a successful flush was issued > > afterwards. > > Thank both of you for the explanation, which is very helpful to me. With > FVD's capability of eliminating the host file system and storing the image > on a logical volume, then perhaps we can always use O_DSYNC, because there > is little (or no?) LVM metadata that needs a flush on every write and > hence O_DSYNC does not add overhead? I am not certain on this, and need > help for confirmation. If this is true, the guest does not need to flush > the cache. I think O_DSYNC does not work as you might expect on raw disk devices and logical volumes. That doesn't mean you don't need something for crash durability! Instead, you need to issue the disk cache flushes in whatever way works. It actually has a very *high* overhead. The overhead isn't from metadata - it is from needing to flush the disk cache after every write, which prevents the disk from reordering writes. If you don't issue the flushes, and the physical device has a volatile write cache, then you cannot guarantee integrity in the event of a host crash. This can make a filesystem faster than a raw disk or logical volume in some configurations, if the filesystem journals data writes to limit the seeking needed to commit durably. -- Jamie
Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB
2011/1/19 Pierre Riteau : > b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return > value of bdrv_write and aborts migration when it fails. However, if the > size of the block device to migrate is not a multiple of BLOCK_SIZE > (currently 1 MB), the last bdrv_write will fail with -EIO. > > Fixed by calling bdrv_write with the correct size of the last block. > --- > block-migration.c | 16 +++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/block-migration.c b/block-migration.c > index 1475325..eeb9c62 100644 > --- a/block-migration.c > +++ b/block-migration.c > @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int > version_id) > int64_t addr; > BlockDriverState *bs; > uint8_t *buf; > + int64_t total_sectors; > + int nr_sectors; > > do { > addr = qemu_get_be64(f); > @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int > version_id) > return -EINVAL; > } > > + total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS; > + if (total_sectors <= 0) { > + fprintf(stderr, "Error getting length of block device %s\n", > device_name); > + return -EINVAL; > + } > + > + if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) { > + nr_sectors = total_sectors - addr; > + } else { > + nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; > + } > + > buf = qemu_malloc(BLOCK_SIZE); > > qemu_get_buffer(f, buf, BLOCK_SIZE); > - ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK); > + ret = bdrv_write(bs, addr, buf, nr_sectors); > > qemu_free(buf); > if (ret < 0) { > -- > 1.7.3.5 > > > Hi Pierre, I don't think the fix above is correct. If you have a file which isn't aliened with BLOCK_SIZE, you won't get an error with the patch. However, the receiver doesn't know how much sectors which the sender wants to be written, so the guest may fail after migration because some data may not be written. IIUC, although changing bytestream should be prevented as much as possible, we should save/load total_sectors to check appropriate file is allocated on the receiver side. BTW, you should use error_report instead of fprintf(stderr, ...). Thanks, Yoshi
Re: [Qemu-devel] [PATCH] savevm: fix corruption in vmstate_subsection_load().
2010/12/14 Yoshiaki Tamura : > Although it's rare to happen in live migration, when the head of a > byte stream contains 0x05 which is the marker of subsection, the > loader gets corrupted because vmstate_subsection_load() continues even > the device doesn't require it. This patch adds a checker whether > subsection is needed, and skips following routines if not needed. > > Signed-off-by: Yoshiaki Tamura > --- > savevm.c | 8 +++- > 1 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/savevm.c b/savevm.c > index d38f79e..72f6249 100644 > --- a/savevm.c > +++ b/savevm.c > @@ -1633,6 +1633,12 @@ static const VMStateDescription > *vmstate_get_subsection(const VMStateSubsection > static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription > *vmsd, > void *opaque) > { > + const VMStateSubsection *sub = vmsd->subsections; > + > + if (!sub || !sub->needed) { > + return 0; > + } > + > while (qemu_peek_byte(f) == QEMU_VM_SUBSECTION) { > char idstr[256]; > int ret; > @@ -1645,7 +1651,7 @@ static int vmstate_subsection_load(QEMUFile *f, const > VMStateDescription *vmsd, > idstr[len] = 0; > version_id = qemu_get_be32(f); > > - sub_vmsd = vmstate_get_subsection(vmsd->subsections, idstr); > + sub_vmsd = vmstate_get_subsection(sub, idstr); > if (sub_vmsd == NULL) { > return -ENOENT; > } > -- > 1.7.1.2 > > > Hi Juan, This is an error that always happen with Kemari. Could you tell me if I'm fixing incorrectly? Thanks, Yoshi
Re: [Qemu-devel] [RFC] Propose the Fast Virtual Disk (FVD) image format that outperforms QCOW2 by 249%
> when I tried to use your patch, I found several problems: > > * The patch does apply cleanly to latest QEMU. >This is caused by recent changes in QEMU git master. > > * The new code uses tabs instead of spaces (QEMU coding rules). > > * Some lines of the new code end with blank characters. > > * The patch adds empty lines at the end of some files. > > The last two points are reported by newer versions of git > (which refuse to take such patches with the default setting). > > Could you please update your patch to fix those topics? > I'd like to apply it to my QEMU code and try the new FVD. Thank you for the detailed instructions. I fixed all the issues above and posted the latest patches to the mailing list (see below). Please use this latest version, which also includes some other fixes. BTW, I found out that my previous patches were rejected by the mailing list because a single patch for FVD was too big. http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01948.html http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01947.html http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01950.html http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01949.html http://lists.gnu.org/archive/html/qemu-devel/2011-01/msg01951.html Regards, ChunQiang (CQ) Tang, Ph.D. Homepage: http://www.research.ibm.com/people/c/ctang
[Qemu-devel] [PATCH] linux-user: Add support for -version option
Add support to the linux-user qemu for the -version command line option, bringing it into line with the system emulation qemu. Signed-off-by: Peter Maydell --- linux-user/main.c | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 0d627d6..e651bfd 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2624,14 +2624,21 @@ void cpu_loop (CPUState *env) } #endif /* TARGET_ALPHA */ +static void version(void) +{ +printf("qemu-" TARGET_ARCH " version " QEMU_VERSION QEMU_PKGVERSION + ", Copyright (c) 2003-2008 Fabrice Bellard\n"); +} + static void usage(void) { -printf("qemu-" TARGET_ARCH " version " QEMU_VERSION QEMU_PKGVERSION ", Copyright (c) 2003-2008 Fabrice Bellard\n" - "usage: qemu-" TARGET_ARCH " [options] program [arguments...]\n" +version(); +printf("usage: qemu-" TARGET_ARCH " [options] program [arguments...]\n" "Linux CPU emulator (compiled for %s emulation)\n" "\n" "Standard options:\n" "-hprint this help\n" + "-version display version information and exit\n" "-g port wait gdb connection to port\n" "-L path set the elf interpreter prefix (default=%s)\n" "-s size set the stack size in bytes (default=%ld)\n" @@ -2886,8 +2893,10 @@ int main(int argc, char **argv, char **envp) singlestep = 1; } else if (!strcmp(r, "strace")) { do_strace = 1; -} else -{ +} else if (!strcmp(r, "version")) { +version(); +exit(0); +} else { usage(); } } -- 1.6.3.3
Re: [Qemu-devel] [PATCH 14/19] block: insert event-tap to bdrv_aio_writev() and bdrv_aio_flush().
2011/1/19 Kevin Wolf : > Am 19.01.2011 14:16, schrieb Yoshiaki Tamura: >> 2011/1/19 Kevin Wolf : >>> Am 19.01.2011 06:44, schrieb Yoshiaki Tamura: event-tap function is called only when it is on, and requests sent from device emulators. Signed-off-by: Yoshiaki Tamura --- block.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index ff2795b..85bd8b8 100644 --- a/block.c +++ b/block.c @@ -28,6 +28,7 @@ #include "block_int.h" #include "module.h" #include "qemu-objects.h" +#include "event-tap.h" #ifdef CONFIG_BSD #include @@ -2111,6 +2112,11 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, if (bdrv_check_request(bs, sector_num, nb_sectors)) return NULL; + if (bs->device_name && event_tap_is_on()) { + return event_tap_bdrv_aio_writev(bs, sector_num, qiov, nb_sectors, + cb, opaque); + } + if (bs->dirty_bitmap) { blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb, opaque); >>> >>> Just noticed the context here... Does this patch break block migration >>> when event-tap is on? >> >> I don't think so. event-tap will call bdrv_aio_writev() upon >> flushing requests and it shouldn't affect block-migration. The >> block written after the synchronization should be marked as dirty >> and should be sent in the next round. Am I missing the point? > > No, that makes sense. I don't have a complete understanding of the whole > series yet, so there may be well more misunderstandings on my side. It's OK. I'm glad that you're reviewing. >>> Another question that came to my mind is if we really hook everything we >>> need. I think we'll need to have a hook in bdrv_flush as well. I don't >>> know if you do hook qemu_aio_flush and friends - does a call cause >>> event-tap to flush its queue? If not, a call to qemu_aio_flush might >>> hang qemu because it's waiting for requests to complete which are >>> actually stuck in the event-tap queue. >> >> No it doesn't queue at event-tap. Marcelo pointed that we should >> hook bdrv_aio_flush to avoid requests inversion, that made sense >> to me. Do we need to hook bdrv_flush for that same reason? If > > bdrv_flush() is the synchronous version of bdrv_aio_flush(), so in > general it seems likely that we need to do the same. Hmm. Because it's synchronous, we need to start synchronization right away, and once done, flush requests queued in event-tap then return. >> we hook bdrv_flush and qemu_aio_flush, we're going loop forever >> because the synchronization code is calling vm_stop that call >> bdrv_flush_all and qemu_aio_flush. > > qemu_aio_flush doesn't invoke any bdrv_* functions, so I don't see why > we would loop forever. It just waits for AIO requests to complete. > > I just looked up the code and I think the situation is a bit different > than I thought originally: qemu_aio_flush waits only for completion of > requests which belong to a driver that has registered using > qemu_aio_set_fd_handler. So this means that AIO requests queued in > event-tap are not considered in-flight requests and we won't get stuck > in a loop. Maybe we have no problem in fact. :-) I had the same thoughts. We don't have to hook qemu_aio_flush. > On the other hand, e.g. migration relies on the fact that after a > qemu_aio_flush, all AIO requests that the device model has submitted are > completed. I think event-tap must take care that the requests which are > queued are not forgotten to be migrated. (Maybe the code already > considers this, I'm just writing down what comes to my mind...) That's where event-tap is calling qemu_aio_flush. It should be almost same as for live migration. Requests queued in event-tap are replayed on the secondary side, that is the core design of Kemari. Thanks, Yoshi > > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [Qemu-devel] [PATCH 09/19] Introduce event-tap.
2011/1/19 Kevin Wolf : > Am 19.01.2011 14:04, schrieb Yoshiaki Tamura: +static void event_tap_blk_flush(EventTapBlkReq *blk_req) +{ + BlockDriverState *bs; + + bs = bdrv_find(blk_req->device_name); >>> >>> Please store the BlockDriverState in blk_req. This code loops over all >>> block devices and does a string comparison - and that for each request. >>> You can also save the qemu_strdup() when creating the request. >>> >>> In the few places where you really need the device name (might be the >>> case for load/save, I'm not sure), you can still get it from the >>> BlockDriverState. >> >> I would do so for the primary side. Although we haven't >> implemented yet, we want to replay block requests from block >> layer on the secondary side, and need device name to restore >> BlockDriverState. > > Hm, I see. I'm not happy about it, but I don't have a suggestion right > away how to avoid it. > >>> + + if (blk_req->is_flush) { + bdrv_aio_flush(bs, blk_req->reqs[0].cb, blk_req->reqs[0].opaque); >>> >>> You need to handle errors. If bdrv_aio_flush returns NULL, call the >>> callback with -EIO. >> >> I'll do so. >> >>> + return; + } + + bdrv_aio_writev(bs, blk_req->reqs[0].sector, blk_req->reqs[0].qiov, + blk_req->reqs[0].nb_sectors, blk_req->reqs[0].cb, + blk_req->reqs[0].opaque); >>> >>> Same here. >>> + bdrv_flush(bs); >>> >>> This looks really strange. What is this supposed to do? >>> >>> One point is that you write it immediately after bdrv_aio_write, so you >>> get an fsync for which you don't know if it includes the current write >>> request or if it doesn't. Which data do you want to get flushed to the disk? >> >> I was expecting to flush the aio request that was just initiated. >> Am I misunderstanding the function? > > Seems so. The function names don't use really clear terminology either, > so you're not the first one to fall in this trap. Basically we have: > > * qemu_aio_flush() waits for all AIO requests to complete. I think you > wanted to have exactly this, but only for a single block device. Such a > function doesn't exist yet. > > * bdrv_flush() makes sure that all successfully completed requests are > written to disk (by calling fsync) > > * bdrv_aio_flush() is the asynchronous version of bdrv_flush, i.e. run > the fsync in the thread pool Then what I wanted to do is, call qemu_aio_flush first, then bdrv_flush. It should be like live migration. > >>> The other thing is that you introduce a bdrv_flush for each request, >>> basically forcing everyone to something very similar to writethrough >>> mode. I'm sure this will have a big impact on performance. >> >> The reason is to avoid inversion of queued requests. Although >> processing one-by-one is heavy, wouldn't having requests flushed >> to disk out of order break the disk image? > > No, that's fine. If a guest issues two requests at the same time, they > may complete in any order. You just need to make sure that you don't > call the completion callback before the request really has completed. We need to flush requests, meaning aio and fsync, before sending the final state of the guests, to make sure we can switch to the secondary safely. > I'm just starting to wonder if the guest won't timeout the requests if > they are queued for too long. Even more, with IDE, it can only handle > one request at a time, so not completing requests doesn't sound like a > good idea at all. In what intervals is the event-tap queue flushed? The requests are flushed once each transaction completes. So it's not with specific intervals. > On the other hand, if you complete before actually writing out, you > don't get timeouts, but you signal success to the guest when the request > could still fail. What would you do in this case? With a writeback cache > mode we're fine, we can just fail the next flush (until then nothing is > guaranteed to be on disk and order doesn't matter either), but with > cache=writethrough we're in serious trouble. > > Have you thought about this problem? Maybe we end up having to flush the > event-tap queue for each single write in writethrough mode. Yes, and that's what I'm trying to do at this point. I know that performance matters a lot, but sacrificing reliability over performance now isn't a good idea. I first want to lay the ground, and then focus on optimization. Note that without dirty bitmap optimization, Kemari suffers a lot in sending rams. Anthony and I discussed to take this approach at KVM Forum. >>> Additionally, error handling is missing. >> >> I looked at the codes using bdrv_flush and realized some of them >> doesn't handle errors, but scsi-disk.c does. Should everyone >> handle errors or depends on the usage? > > I added the return code only recently, it was a void function > previously. Probably some error handling should be added to all of them. Ah:) Glad to hear
Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB
On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: > 2011/1/19 Pierre Riteau : >> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return >> value of bdrv_write and aborts migration when it fails. However, if the >> size of the block device to migrate is not a multiple of BLOCK_SIZE >> (currently 1 MB), the last bdrv_write will fail with -EIO. >> >> Fixed by calling bdrv_write with the correct size of the last block. >> --- >> block-migration.c | 16 +++- >> 1 files changed, 15 insertions(+), 1 deletions(-) >> >> diff --git a/block-migration.c b/block-migration.c >> index 1475325..eeb9c62 100644 >> --- a/block-migration.c >> +++ b/block-migration.c >> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int >> version_id) >> int64_t addr; >> BlockDriverState *bs; >> uint8_t *buf; >> +int64_t total_sectors; >> +int nr_sectors; >> >> do { >> addr = qemu_get_be64(f); >> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int >> version_id) >> return -EINVAL; >> } >> >> +total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS; >> +if (total_sectors <= 0) { >> +fprintf(stderr, "Error getting length of block device >> %s\n", device_name); >> +return -EINVAL; >> +} >> + >> +if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) { >> +nr_sectors = total_sectors - addr; >> +} else { >> +nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; >> +} >> + >> buf = qemu_malloc(BLOCK_SIZE); >> >> qemu_get_buffer(f, buf, BLOCK_SIZE); >> -ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK); >> +ret = bdrv_write(bs, addr, buf, nr_sectors); >> >> qemu_free(buf); >> if (ret < 0) { >> -- >> 1.7.3.5 >> >> >> > > Hi Pierre, > > I don't think the fix above is correct. If you have a file which > isn't aliened with BLOCK_SIZE, you won't get an error with the > patch. However, the receiver doesn't know how much sectors which > the sender wants to be written, so the guest may fail after > migration because some data may not be written. IIUC, although > changing bytestream should be prevented as much as possible, we > should save/load total_sectors to check appropriate file is > allocated on the receiver side. Isn't the guest supposed to be started using a file with the correct size? But I guess changing the protocol would be best as it would avoid headaches to people who mistakenly created a file that is too small. > BTW, you should use error_report instead of fprintf(stderr, ...). I didn't know that, I followed what was used in this file. Thank you. -- Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France http://perso.univ-rennes1.fr/pierre.riteau/
[Qemu-devel] [PATCH] pci/pcie: make pci_find_device() ARI aware.
make pci_find_device() ARI aware. Signed-off-by: Isaku Yamahata --- hw/pci.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 8d0e3df..851f350 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1596,11 +1596,18 @@ PCIBus *pci_find_bus(PCIBus *bus, int bus_num) PCIDevice *pci_find_device(PCIBus *bus, int bus_num, int slot, int function) { +PCIDevice *d; bus = pci_find_bus(bus, bus_num); if (!bus) return NULL; +d = bus->parent_dev; +if (d && pci_is_express(d) && +pcie_cap_get_type(d) == PCI_EXP_TYPE_DOWNSTREAM && +!pcie_cap_is_ari_enabled(d) && slot > 0) { +return NULL; +} return bus->devices[PCI_DEVFN(slot, function)]; } -- 1.7.1.1
[Qemu-devel] [PATCH] pci: use qemu_malloc() in pcibus_get_dev_path()
use qemu_malloc() instead of direct use of malloc(). Signed-off-by: Isaku Yamahata --- hw/pci.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 851f350..86af0ee 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -2053,7 +2053,7 @@ static char *pcibus_get_dev_path(DeviceState *dev) path_len = domain_len + slot_len * slot_depth; /* Allocate memory, fill in the terminating null byte. */ -path = malloc(path_len + 1 /* For '\0' */); +path = qemu_malloc(path_len + 1 /* For '\0' */); path[path_len] = '\0'; /* First field is the domain. */ -- 1.7.1.1
[Qemu-devel] [PATCH] pci: make pci_bus_new() aware of pci domain
This patch makes pci bus creation aware of pci domain. Signed-off-by: Isaku Yamahata --- hw/pci.c | 19 ++- hw/pci.h |7 --- hw/piix_pci.c |2 +- 3 files changed, 19 insertions(+), 9 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 86af0ee..e1e7b25 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -246,8 +246,11 @@ int pci_find_domain(const PCIBus *bus) return -1; } +/* create root pci bus. + * If secondary pci bus is wanted, use pci_bridge_initfn() + */ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, - const char *name, int devfn_min) + const char *name, int domain, int devfn_min) { qbus_create_inplace(&bus->qbus, &pci_bus_info, parent, name); assert(PCI_FUNC(devfn_min) == 0); @@ -255,18 +258,19 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, /* host bridge */ QLIST_INIT(&bus->child); -pci_host_bus_register(0, bus); /* for now only pci domain 0 is supported */ +pci_host_bus_register(domain, bus); vmstate_register(NULL, -1, &vmstate_pcibus, bus); } -PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min) +PCIBus *pci_bus_new(DeviceState *parent, const char *name, +int domain, int devfn_min) { PCIBus *bus; bus = qemu_mallocz(sizeof(*bus)); bus->qbus.qdev_allocated = 1; -pci_bus_new_inplace(bus, parent, name, devfn_min); +pci_bus_new_inplace(bus, parent, name, domain, devfn_min); return bus; } @@ -292,13 +296,18 @@ void pci_bus_set_mem_base(PCIBus *bus, target_phys_addr_t base) bus->mem_base = base; } +/* deprecated: kept for compatility of existing codes. + * pci_bus_new() and pci_bus_irqs() should be used for root pci bus + * like i440fx_init(). + * pci_bridge_initfn() should be used for secondary pci bus + */ PCIBus *pci_register_bus(DeviceState *parent, const char *name, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int devfn_min, int nirq) { PCIBus *bus; -bus = pci_bus_new(parent, name, devfn_min); +bus = pci_bus_new(parent, name, 0 /* domain = 0 for compat */, devfn_min); pci_bus_irqs(bus, set_irq, map_irq, irq_opaque, nirq); return bus; } diff --git a/hw/pci.h b/hw/pci.h index bc8d5bb..a0fd953 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -229,14 +229,15 @@ typedef enum { typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, PCIHotplugState state); void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, - const char *name, int devfn_min); -PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min); + const char *name, int domain, int devfn_min); +PCIBus *pci_bus_new(DeviceState *parent, const char *name, +int domain, int devfn_min); void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int nirq); void pci_bus_hotplug(PCIBus *bus, pci_hotplug_fn hotplug, DeviceState *dev); PCIBus *pci_register_bus(DeviceState *parent, const char *name, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, - void *irq_opaque, int devfn_min, int nirq); + void *irq_opaque, int devfn_min, int nirq); /* deprecated */ void pci_device_reset(PCIDevice *dev); void pci_bus_reset(PCIBus *bus); diff --git a/hw/piix_pci.c b/hw/piix_pci.c index 358da58..718983d 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -226,7 +226,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn, qemu_irq * dev = qdev_create(NULL, "i440FX-pcihost"); s = FROM_SYSBUS(I440FXState, sysbus_from_qdev(dev)); -b = pci_bus_new(&s->busdev.qdev, NULL, 0); +b = pci_bus_new(&s->busdev.qdev, NULL, 0, 0); s->bus = b; qdev_init_nofail(dev); -- 1.7.1.1
[Qemu-devel] [PATCH 1/3] pci: deassert intx on reset.
deassert intx on device reset. So far pci_device_reset() is used for system reset. In that case, interrupt controller is reset at the same time so that all irq is are deasserted. But now pci bus reset/flr is supported, and in that case irq needs to be disabled explicitly. Signed-off-by: Isaku Yamahata --- hw/pci.c |9 + hw/pci.h |2 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index e1e7b25..de6370f 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -137,6 +137,14 @@ static void pci_update_irq_status(PCIDevice *dev) } } +void pci_device_deassert_intx(PCIDevice *dev) +{ +int i; +for (i = 0; i < PCI_NUM_PINS; ++i) { +qemu_set_irq(dev->irq[i], 0); +} +} + /* * This function is called on #RST and FLR. * FLR if PCI_EXP_DEVCTL_BCR_FLR is set @@ -152,6 +160,7 @@ void pci_device_reset(PCIDevice *dev) dev->irq_state = 0; pci_update_irq_status(dev); +pci_device_deassert_intx(dev); /* Clear all writeable bits */ pci_word_test_and_clear_mask(dev->config + PCI_COMMAND, pci_get_word(dev->wmask + PCI_COMMAND) | diff --git a/hw/pci.h b/hw/pci.h index a0fd953..01c3285 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -265,6 +265,8 @@ void do_pci_info_print(Monitor *mon, const QObject *data); void do_pci_info(Monitor *mon, QObject **ret_data); void pci_bridge_update_mappings(PCIBus *b); +void pci_device_deassert_intx(PCIDevice *dev); + static inline void pci_set_byte(uint8_t *config, uint8_t val) { -- 1.7.1.1
[Qemu-devel] [PATCH 2/3] msi: simply write config a bit.
use pci_device_deassert_intx(). Signed-off-by: Isaku Yamahata --- hw/msi.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/hw/msi.c b/hw/msi.c index f03f519..3dc3a24 100644 --- a/hw/msi.c +++ b/hw/msi.c @@ -255,7 +255,6 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) uint8_t log_max_vecs; unsigned int vector; uint32_t pending; -int i; if (!ranges_overlap(addr, len, dev->msi_cap, msi_cap_sizeof(flags))) { return; @@ -296,9 +295,7 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) * from using its INTx# pin (if implemented) to request * service (MSI, MSI-X, and INTx# are mutually exclusive). */ -for (i = 0; i < PCI_NUM_PINS; ++i) { -qemu_set_irq(dev->irq[i], 0); -} +pci_device_deassert_intx(dev); /* * nr_vectors might be set bigger than capable. So clamp it. -- 1.7.1.1
[Qemu-devel] [PATCH 0/3] pci: disable intx on flr/bus reset
So far pci_device_reset() is used for system reset. In that case, interrupt controller is also reset so that all irq is are deasserted. But now pci bus reset/flr is supported, and in that case irq needs to be disabled explicitly. Isaku Yamahata (3): pci: deassert intx on reset. msi: simply write config a bit. msix: simply write config hw/msi.c |5 + hw/msix.c |5 + hw/pci.c |9 + hw/pci.h |2 ++ 4 files changed, 13 insertions(+), 8 deletions(-)