[Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property
After commit 338d3f, we support 'status' property for filter object. The segfault can be triggered by starting qemu with 'status=off' property for filter, when the s->incoming_queue is NULL, we reference it directly in qemu_net_queue_flush(). Let's check the value of 's->incoming_queue' before calling qemu_net_queue_flush(). Signed-off-by: zhanghailiang --- net/filter-buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/filter-buffer.c b/net/filter-buffer.c index cc6bd94..79e2ce3 100644 --- a/net/filter-buffer.c +++ b/net/filter-buffer.c @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf) { FilterBufferState *s = FILTER_BUFFER(nf); -if (!qemu_net_queue_flush(s->incoming_queue)) { +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) { /* Unable to empty the queue, purge remaining packets */ qemu_net_queue_purge(s->incoming_queue, nf->netdev); } -- 1.8.3.1
Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property
On 04/01/2016 03:08 PM, zhanghailiang wrote: > After commit 338d3f, we support 'status' property for filter object. > The segfault can be triggered by starting qemu with 'status=off' property > for filter, when the s->incoming_queue is NULL, we reference it directly > in qemu_net_queue_flush(). > > Let's check the value of 's->incoming_queue' before calling > qemu_net_queue_flush(). > > Signed-off-by: zhanghailiang > --- > net/filter-buffer.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/filter-buffer.c b/net/filter-buffer.c > index cc6bd94..79e2ce3 100644 > --- a/net/filter-buffer.c > +++ b/net/filter-buffer.c > @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf) > { > FilterBufferState *s = FILTER_BUFFER(nf); > > -if (!qemu_net_queue_flush(s->incoming_queue)) { > +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) { > /* Unable to empty the queue, purge remaining packets */ > qemu_net_queue_purge(s->incoming_queue, nf->netdev); > } We'd better handle this at generic layer and don't let a specific net filter need to worry about this. Looks like the issue is we may trigger status_changed() too early (even before the the filter was initialized). How about not call status_changed() if the initialization is not done?
Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.5.1 Stable released
Am 30.03.2016 um 02:11 schrieb Michael Roth: > Hi everyone, > > I am pleased to announce that the QEMU v2.5.1 stable release is now > available at: > > http://wiki.qemu.org/download/qemu-2.5.1.tar.bz2 > > v2.5.1 is now tagged in the official qemu.git repository, > and the stable-2.5 branch has been updated accordingly: > > http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-2.5 > > In addition to the normal array of general bug fixes, this release > includes security fixes/hardening for USB, vmxnet3/e1000/ne2000 NICs, > NIC checksumming, and management consoles via HMP. Users of earlier > releases should upgrade accordingly. > > Note: For -M pseries-2.3 PPC guests, migration is now restored between > QEMU 2.3.x and 2.5.1, but migration between 2.5.1 and any versions other > than 2.3.x now require the -machine enforce-config-section=on option. > > Thank you to everyone involved! > > CHANGELOG: > > a58047f: Update version for 2.5.1 release (Michael Roth) > 5f409b1: hyperv: cpu hotplug fix with HyperV enabled (Denis V. Lunev) > 078de11: vmdk: Fix converting to streamOptimized (Fam Zheng) > acea76c: vmdk: Create streamOptimized as version 3 (Fam Zheng) > 80b6e57: usb: check USB configuration descriptor object (Prasad J Pandit) > 9bddb45: usb: check RNDIS message length (Prasad J Pandit) > e3a2cdf: usb: check RNDIS buffer offsets & length (Prasad J Pandit) > 4dcd2f1: usb: check page select value while processing iTD (Prasad J Pandit) > 38e0921: net: ne2000: fix bounds check in ioport operations (Prasad J Pandit) > d0ee85b: net: check packet payload length (Prasad J Pandit) > 4f046a6: ide: ahci: reset ncq object to unused on error (Prasad J Pandit) > b47809c: i386: avoid null pointer dereference (P J P) > 24fe899: hmp: fix sendkey out of bounds write (CVE-2015-8619) (Wolfgang > Bumiller) > aaf4fb6: ahci: Do not unmap NULL addresses (John Snow) > a2ae168: migration: allow machine to enforce configuration section migration > (Greg Kurz) > bad094d: vl.c: Fix regression in machine error message (Marcel Apfelbaum) > 4b0b1ec: quorum: Fix crash in quorum_aio_cb() (Alberto Garcia) > cab1cc7: target-arm: Make reserved ranges in ID_AA64* spaces RAZ, not UNDEF > (Peter Maydell) > 9ae0217: vhost-user: don't merge regions with different fds (Michael S. > Tsirkin) > 3092979: fw_cfg: unbreak migration compatibility for 2.4 and earlier machines > (Laszlo Ersek) > c5c9841: hw/virtio: group virtio flags into an enum (Marcel Apfelbaum) > 6b62303: hw/virtio: fix double use of a virtio flag (Marcel Apfelbaum) > c06f342: spapr: skip configuration section during migration of older machines > (Greg Kurz) > cb873ea: e1000: eliminate infinite loops on out-of-bounds transfer start > (Laszlo Ersek) > 4853a5a: block: qemu-iotests - add test for snapshot, commit, snapshot bug > (Jeff Cody) > a375e0b: block: set device_list.tqe_prev to NULL on BDS removal (Jeff Cody) > a38a283: qmp: Fix reference-counting of qnull on empty output visit (Eric > Blake) > 225d50f: cpus: use broadcast on qemu_pause_cond (Dr. David Alan Gilbert) > 020282d: fw_cfg: avoid calculating invalid current entry pointer (Gabriel L. > Somlo) > 091af18: s390x/css: fix control flags during csch (Halil Pasic) > d983923: s390x/ioinst: set type and len for SEI response (Pierre Morel) > 643c8d8: block/raw-posix: avoid bogus fixup for cylinders on DASD disks > (Christian Borntraeger) > 3ede27d: ehci: update irq on reset (Gerd Hoffmann) > 9849b19: net: set endianness on all backend devices (Laurent Vivier) > fe90bdc: net: ne2000: check ring buffer control registers (Prasad J Pandit) > aaa5271: net/filter: fix nf->netdev_id leak (Li Zhijian) > abda95c: net/dump: fix nfds->filename leak (Li Zhijian) > 6a49a71: blockdev: Fix 'change' for slot devices (Max Reitz) > e1a8a09: block: Add blk_dev_has_tray() (Max Reitz) > 7a2c1c8: net: rocker: fix an incorrect array bounds check (Prasad J Pandit) > 702a8d1: ivshmem: remove redundant assignment, fix crash with msi=off > (Marc-André Lureau) > 3e96d5d: ivshmem: no need for opaque argument (Marc-André Lureau) > 16a2875: scsi: initialise info object with appropriate size (P J P) > 4588b0d: virtio-9p: use accessor to get thread_pool (Greg Kurz) > ff083d3: xenfb: avoid reading twice the same fields from the shared page > (Stefano Stabellini) > 4d59e78: xen/blkif: Avoid double access to src->nr_segments (Stefano > Stabellini) > 52a7b27: configure: Fix shell syntax to placate OpenBSD's pdksh (Peter > Maydell) > d4aed70: target-ppc: kvm: fix floating point registers sync on little-endian > hosts (Greg Kurz) > 42ae4a3: net: vmxnet3: avoid memory leakage in activate_device (P J P) > 0d33580: ehci: make idt processing more robust (Gerd Hoffmann) > > Unfortunately, this release lacks the following patch: target-i386: do not read/write MSR_TSC_AUX from KVM if CPUID bit is not set without it any vServer with a Westmere or older vCPU will freeze with 100% CPU on vmload / migration. Peter
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On 31/03/2016 22:17, Alex Bligh wrote: >> > In qemu, read+FUA just triggers blk_co_flush() prior to reading; but >> > that's the same function it calls for write+FUA. > That's harmless, but unnecessary in the sense that current documented > behaviour doesn't require it. Perhaps it should? > > I suppose TRIM etc. should support FUA too? TRIM is an advisory operation, so it doesn't make sense to force access to the medium. The closest you could get would be to add FUA to WRITE_ZEROES. But since WRITE_ZEROES is not a particularly common operation there isn't much to gain compared to FLUSHing after the write has completed; in fact SCSI doesn't have a FUA bit on its WRITE SAME command. Paolo
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On 31/03/2016 22:34, Eric Blake wrote: > give me possibly stale data because I accessed > the underlying storage rather than paying attention to in-flight writes > that would change what I read". Overlapping I/O is always unspecified, so you should expect stale data if a read overlaps an in-flight write. Not only there's no guarantee of atomicity; the disk is a "safe register" rather than a "regular register" (http://stackoverflow.com/questions/8871633/whats-the-difference-between-safe-regular-and-atomic-registers) The way QEMU does "flush first" for FUA on reads provides the guarantee that "no one else can read something older than what I read", but that assumes that no one else is doing an overlapping write. Paolo
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Thu, Mar 31, 2016 at 06:44:32PM -0400, Kevin O'Connor wrote: > On Thu, Mar 31, 2016 at 11:17:30PM +0100, Richard W.M. Jones wrote: > > I'd dearly love to get rid of the sgabios option ROM. It looks like > > SeaBIOS nearly supports a full serial console now? > > Last I checked, one could disable the option rom by adding "-device > VGA,romfile=" to the qemu command line. Sure, I can easily remove sgabios. The problem is I still want my serial console to show BIOS messages! Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Re: [Qemu-devel] [PATCH] slirp: Fix migration from older versions of QEMU to the current one
On 01.04.2016 00:22, Samuel Thibault wrote: > Samuel Thibault, on Fri 01 Apr 2016 00:00:43 +0200, wrote: >> Just realizing... We'd need to add AF_INET6 cases here too, to be able >> to save/restore a VM using ipv6 connections. > > That seems quite involved however, maybe we should postpone that to > later? I think it's ok to do it later. The code there is only invoked when the user specified the "guestfwd=..." option, so that's likely rather a rare case, I think, which could be postponed to QEMU 2.7. Thomas
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Thu, Mar 31, 2016 at 06:44:32PM -0400, Kevin O'Connor wrote: > I think you'll find that if you compile out some features from > SeaBIOS, it will be of a similar speed to that "minimal BIOS". Try > this: > > cd /path/to/seabios/ > echo -e > 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0' > > .config > make olddefconfig > make > > What time do you get with the above stripped down seabios (the > generated bios is in out/bios.bin)? It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs during virtio-scsi sensing. See attached. Any idea what I need to enable? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top ./run test-tool/libguestfs-test-tool *IMPORTANT NOTICE * * When reporting bugs, include the COMPLETE, UNEDITED * output below in your bug report. * LIBGUESTFS_PATH=/home/rjones/d/libguestfs/appliance LIBGUESTFS_CACHEDIR=/home/rjones/d/libguestfs/tmp LD_LIBRARY_PATH=/home/rjones/d/libguestfs/ruby/ext/guestfs:/home/rjones/d/libguestfs/src/.libs:/home/rjones/d/libguestfs/java/.libs:/home/rjones/d/libguestfs/gobject/.libs LIBGUESTFS_TMPDIR=/home/rjones/d/libguestfs/tmp LIBGUESTFS_HV=/home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64 LIBGUESTFS_BACKEND=direct PATH=/home/rjones/d/libguestfs/v2v:/home/rjones/d/libguestfs/tools:/home/rjones/d/libguestfs/test-tool:/home/rjones/d/libguestfs/sysprep:/home/rjones/d/libguestfs/sparsify:/home/rjones/d/libguestfs/resize:/home/rjones/d/libguestfs/rescue:/home/rjones/d/libguestfs/p2v:/home/rjones/d/libguestfs/make-fs:/home/rjones/d/libguestfs/inspector:/home/rjones/d/libguestfs/get-kernel:/home/rjones/d/libguestfs/fuse:/home/rjones/d/libguestfs/format:/home/rjones/d/libguestfs/fish:/home/rjones/d/libguestfs/erlang:/home/rjones/d/libguestfs/edit:/home/rjones/d/libguestfs/diff:/home/rjones/d/libguestfs/dib:/home/rjones/d/libguestfs/df:/home/rjones/d/libguestfs/customize:/home/rjones/d/libguestfs/cat:/home/rjones/d/libguestfs/builder:/home/rjones/d/libguestfs/align:/usr/lib64/ccache:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/rjones/.local/bin:/home/rjones/bin XDG_RUNTIME_DIR=/run/user/1000 SELinux: Enforcing guestfs_get_append: (null) guestfs_get_autosync: 1 guestfs_get_backend: direct guestfs_get_backend_settings: [] guestfs_get_cachedir: /home/rjones/d/libguestfs/tmp guestfs_get_direct: 0 guestfs_get_hv: /home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64 guestfs_get_memsize: 500 guestfs_get_network: 0 guestfs_get_path: /home/rjones/d/libguestfs/appliance guestfs_get_pgroup: 0 guestfs_get_program: libguestfs-test-tool guestfs_get_recovery_proc: 1 guestfs_get_selinux: 0 guestfs_get_smp: 1 guestfs_get_sockdir: /run/user/1000 guestfs_get_tmpdir: /home/rjones/d/libguestfs/tmp guestfs_get_trace: 0 guestfs_get_verbose: 1 host_cpu: x86_64 Launching appliance, timeout set to 600 seconds. libguestfs: launch: program=libguestfs-test-tool libguestfs: launch: version=1.33.16 libguestfs: launch: backend registered: unix libguestfs: launch: backend registered: uml libguestfs: launch: backend registered: libvirt libguestfs: launch: backend registered: direct libguestfs: launch: backend=direct libguestfs: launch: tmpdir=/home/rjones/d/libguestfs/tmp/libguestfsT8B1IH libguestfs: launch: umask=0002 libguestfs: launch: euid=1000 libguestfs: begin building supermin appliance libguestfs: run supermin libguestfs: command: run: /usr/bin/supermin libguestfs: command: run: \ --build libguestfs: command: run: \ --verbose libguestfs: command: run: \ --if-newer libguestfs: command: run: \ --lock /home/rjones/d/libguestfs/tmp/.guestfs-1000/lock libguestfs: command: run: \ --copy-kernel libguestfs: command: run: \ -f ext2 libguestfs: command: run: \ --host-cpu x86_64 libguestfs: command: run: \ /home/rjones/d/libguestfs/appliance/supermin.d libguestfs: command: run: \ -o /home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d supermin: version: 5.1.15 supermin: rpm: detected RPM version 4.13 supermin: package handler: fedora/rpm supermin: acquiring lock on /home/rjones/d/libguestfs/tmp/.guestfs-1000/lock supermin: if-newer: output does not need rebuilding libguestfs: finished building supermin appliance libguestfs: begin testing qemu features libguestfs: command: run: /home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64 libguestfs: command: run: \ -display none libguestfs: command: run: \ -help libguestfs: qemu version 2.5 libguestfs: command: run: /home
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 09:55, Richard W.M. Jones wrote: >>> > > I'd dearly love to get rid of the sgabios option ROM. It looks like >>> > > SeaBIOS nearly supports a full serial console now? >> > >> > Last I checked, one could disable the option rom by adding "-device >> > VGA,romfile=" to the qemu command line. > Sure, I can easily remove sgabios. The problem is I still want my > serial console to show BIOS messages! What do you need it for? Paolo
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 10:02, Richard W.M. Jones wrote: >> > echo -e >> > 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0' >> > > .config >> > make olddefconfig >> > make >> > >> > What time do you get with the above stripped down seabios (the >> > generated bios is in out/bios.bin)? > It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs > during virtio-scsi sensing. See attached. Any idea what I need to > enable? At least ACPI (I would also add mptable and SMBIOS). Paolo
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 10:11:46AM +0200, Paolo Bonzini wrote: > > > On 01/04/2016 10:02, Richard W.M. Jones wrote: > >> > echo -e > >> > 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0' > >> > > .config > >> > make olddefconfig > >> > make > >> > > >> > What time do you get with the above stripped down seabios (the > >> > generated bios is in out/bios.bin)? > > It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs > > during virtio-scsi sensing. See attached. Any idea what I need to > > enable? > > At least ACPI (I would also add mptable and SMBIOS). Found it: only CONFIG_MPTABLE=y was necessary. It boots with: # CONFIG_PIRTABLE is not set CONFIG_MPTABLE=y # CONFIG_SMBIOS is not set # CONFIG_ACPI is not set Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
[Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!
Microsoft loves Linux, and Red Hat loves .NET. Since we can put whatever we want in the Hyper-V vendor signature, let's show some love too! Cc: Andreas Färber Cc: Alex Williamson Cc: Denis V. Lunev Cc: Eduardo Habkost Cc: Roman Kagan Cc: KY Srinivasan Signed-off-by: Paolo Bonzini --- target-i386/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 799fdfa..1968f04 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs) c = &cpuid_data.entries[cpuid_i++]; c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS; if (!cpu->hyperv_vendor_id) { -memcpy(signature, "Microsoft Hv", 12); +memcpy(signature, "KVM<3HyperV!", 12); } else { size_t len = strlen(cpu->hyperv_vendor_id); -- 2.5.5
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
Minimal SeaBIOS (+ CONFIG_MPTABLE): http://oirase.annexia.org/tmp/min-seabios.txt Stock SeaBIOS from qemu: http://oirase.annexia.org/tmp/stock-seabios.txt Both files best viewed with `less -r'. It does appear to considerably reduce SeaBIOS time. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 10:14, Richard W.M. Jones wrote: >> > At least ACPI (I would also add mptable and SMBIOS). > Found it: only CONFIG_MPTABLE=y was necessary. It boots with: > > # CONFIG_PIRTABLE is not set > CONFIG_MPTABLE=y > # CONFIG_SMBIOS is not set > # CONFIG_ACPI is not set If you add all three it should not give any slowdown and will provide full hardware features to the kernel. qboot does ACPI and PCI bus assignment (it doesn't do SMBIOS because I got bored debugging it. :)) Paolo
[Qemu-devel] [PATCH] ui/virtio-gpu: add and use qemu_create_displaysurface_pixman
Add a the new qemu_create_displaysurface_pixman function, to create a DisplaySurface backed by an existing pixman image. In that case there is no need to create a new pixman image pointing to the same backing storage. We can just use the existing image directly. This does not only simplify things a bit, but most importantly it gets the reference counting right, so the backing storage for the pixman image wouldn't be released underneath us. Use new function in virtio-gpu, where using it actually fixes use-after-free crashes. Cc: qemu-sta...@nongnu.org Signed-off-by: Gerd Hoffmann --- hw/display/virtio-gpu.c | 5 + include/ui/console.h| 1 + trace-events| 1 + ui/console.c| 11 +++ 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c index ddf3bfb..095d7ff 100644 --- a/hw/display/virtio-gpu.c +++ b/hw/display/virtio-gpu.c @@ -572,10 +572,7 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g, scanout->width != ss.r.width || scanout->height != ss.r.height) { /* realloc the surface ptr */ -scanout->ds = qemu_create_displaysurface_from -(ss.r.width, ss.r.height, format, - pixman_image_get_stride(res->image), - (uint8_t *)pixman_image_get_data(res->image) + offset); +scanout->ds = qemu_create_displaysurface_pixman(res->image); if (!scanout->ds) { cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC; return; diff --git a/include/ui/console.h b/include/ui/console.h index eb9419d..d5a88d9 100644 --- a/include/ui/console.h +++ b/include/ui/console.h @@ -234,6 +234,7 @@ DisplayState *init_displaystate(void); DisplaySurface *qemu_create_displaysurface_from(int width, int height, pixman_format_code_t format, int linesize, uint8_t *data); +DisplaySurface *qemu_create_displaysurface_pixman(pixman_image_t *image); DisplaySurface *qemu_create_displaysurface_guestmem(int width, int height, pixman_format_code_t format, int linesize, diff --git a/trace-events b/trace-events index 996a77f..8350743 100644 --- a/trace-events +++ b/trace-events @@ -1161,6 +1161,7 @@ console_select(int nr) "%d" console_refresh(int interval) "interval %d ms" displaysurface_create(void *display_surface, int w, int h) "surface=%p, %dx%d" displaysurface_create_from(void *display_surface, int w, int h, uint32_t format) "surface=%p, %dx%d, format 0x%x" +displaysurface_create_pixman(void *display_surface) "surface=%p" displaysurface_free(void *display_surface) "surface=%p" displaychangelistener_register(void *dcl, const char *name) "%p [ %s ]" displaychangelistener_unregister(void *dcl, const char *name) "%p [ %s ]" diff --git a/ui/console.c b/ui/console.c index 1fd4ea4..bf38579 100644 --- a/ui/console.c +++ b/ui/console.c @@ -1292,6 +1292,17 @@ DisplaySurface *qemu_create_displaysurface_from(int width, int height, return surface; } +DisplaySurface *qemu_create_displaysurface_pixman(pixman_image_t *image) +{ +DisplaySurface *surface = g_new0(DisplaySurface, 1); + +trace_displaysurface_create_pixman(surface); +surface->format = pixman_image_get_format(image); +surface->image = pixman_image_ref(image); + +return surface; +} + static void qemu_unmap_displaysurface_guestmem(pixman_image_t *image, void *unused) { -- 1.8.3.1
Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property
On 2016/4/1 15:39, Jason Wang wrote: On 04/01/2016 03:08 PM, zhanghailiang wrote: After commit 338d3f, we support 'status' property for filter object. The segfault can be triggered by starting qemu with 'status=off' property for filter, when the s->incoming_queue is NULL, we reference it directly in qemu_net_queue_flush(). Let's check the value of 's->incoming_queue' before calling qemu_net_queue_flush(). Signed-off-by: zhanghailiang --- net/filter-buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/filter-buffer.c b/net/filter-buffer.c index cc6bd94..79e2ce3 100644 --- a/net/filter-buffer.c +++ b/net/filter-buffer.c @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf) { FilterBufferState *s = FILTER_BUFFER(nf); -if (!qemu_net_queue_flush(s->incoming_queue)) { +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) { /* Unable to empty the queue, purge remaining packets */ qemu_net_queue_purge(s->incoming_queue, nf->netdev); } We'd better handle this at generic layer and don't let a specific net filter need to worry about this. Looks like the issue is we may trigger status_changed() too early (even before the the filter was initialized). Yes ~ How about not call status_changed() if the initialization is not done? But seems that it is difficult to confirm if the filter is initialized or not ... .
[Qemu-devel] [Bug 1006655] Re: Can't convert to vmdk with the streamOptimized subformat
I have tried, many times, to use qemu-img to create StreamOptimized for VMWare ESXi 6.0 OVAs. It does NOT work. After days of research, I replaced qemu-img, by vboxmanage, then, it worked! Now, I'm using something this: -- vboxmanage convertfromraw packer/output-vm.raw packer/output-vm-disk1.vmdk --format vmdk --variant Stream -- I also had to edit the headers, with dd: --- dd if=output-vm-disk1.vmdk of=output-vm-disk1.descriptor bs=1 skip=512 count=1024 sed -i -e 's/ide/lsilogic/g' output-vm-disk1.descriptor dd conv=notrunc,nocreat if=output-vm-disk1.descriptor of=output-disk1.vmdk bs=1 seek=512 count=1024 --- This is the only way to build StreamOptimized VMDKs, for inclusion inside of OVA, that works on VMWare 6.0. The qemu-img can not be used. Here is my full solution to this problem (if the same problem): https://github.com/tmartinx/svauto/blob/dev/image-factory.sh#L510 I'm just not entirely sure if we're talking about the same problem here... Cheers! Thiago -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1006655 Title: Can't convert to vmdk with the streamOptimized subformat Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu-kvm package in Ubuntu: Invalid Bug description: Hi all, I'm trying to convert a qcow2 image file to the vmdk (on Ubuntu Server 12.04): # qemu-img convert -f qcow2 -O vmdk -o Stream spv-2912.qcow2 spv-2912-3.vmdk -o subformat=streamOptimized VMDK: can't write to allocated cluster for streamOptimized qemu-img: error while writing sector 65: Input/output error Same result with any input format. Others subformats work but the StreamOptimized is by far the most important one. The only workaround I found is to migrate to raw and then to use VBoxManage (VirtualBox). To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1006655/+subscriptions
[Qemu-devel] [Bug 1006655] Re: Can't convert to vmdk with the streamOptimized subformat
It looks like my problem is different, since I'm not seeing this: qemu-img: error while writing sector 65: Input/output error But maybe, it is a light in the end of the tunnel... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1006655 Title: Can't convert to vmdk with the streamOptimized subformat Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu-kvm package in Ubuntu: Invalid Bug description: Hi all, I'm trying to convert a qcow2 image file to the vmdk (on Ubuntu Server 12.04): # qemu-img convert -f qcow2 -O vmdk -o Stream spv-2912.qcow2 spv-2912-3.vmdk -o subformat=streamOptimized VMDK: can't write to allocated cluster for streamOptimized qemu-img: error while writing sector 65: Input/output error Same result with any input format. Others subformats work but the StreamOptimized is by far the most important one. The only workaround I found is to migrate to raw and then to use VBoxManage (VirtualBox). To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1006655/+subscriptions
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On Thu, Mar 31, 2016 at 02:34:24PM -0600, Eric Blake wrote: > On 03/31/2016 02:17 PM, Alex Bligh wrote: > >> even if we aren't quite sure > >> what to document of those flags. And that means qemu is correct, and > >> the NBD protocol has a bug. Since you contributed the FUA flag, is that > >> something you can try to improve? > > > > Yeah. My mess so I should clean it up. I think FUA should be valid > > on essentially everything. > > > > I think I might wait until structured replies is in though! > > It's also tricky because we just barely documented that servers SHOULD > reject invalid flags with EINVAL; and that clients MUST NOT send FUA on > commands where it is not documented; I don't know if we have an adequate > discovery system in place to learn _which_ commands support FUA, That's what I'm mostly worried about. Yes, we have FUA, and yes, some clients may send it on commands that aren't WRITE, but it is not very well defined what happens then: - Currently-released versions of nbd-server will accept the flag on non-WRITE commands, but ignore it. This is generally not desirable, I would say. - The change that I made a few days ago means future versions will *not* accept it. I feel this is better than accepting-and-ignoring, but not ideal either in case a client sends out a request that includes FUA. We can perhaps ignore the former of the above as "buggy behaviour" and not negotiate anything for a server that does the "proper" thing of accepting-and-acting-on for non-WRITE FUA, but then it is probably a good idea to document in the spec that "there are older, buggy, servers which accept but ignore FUA on anything but WRITE", at least for a few years until such older servers aren't available anymore. Alternatively, we could add yet another flag bit to signal "FUA valid everywhere". Not sure if that is worth it. [...] -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA
On Thu, Mar 31, 2016 at 07:15:32PM +0100, Alex Bligh wrote: > NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but > 1<<16 in nbd.h. It is not used anywhere within the code. Yes it is: wouter@gangtai:~/code/c/nbd$ grep -rl CMD_FLAG_FUA * doc/proto.md make-integrityhuge.c nbd.h nbd-server.c nbd-trdump.c tests/run/nbd-tester-client.c wouter@gangtai:~/code/c/nbd$ I don't mind bringing the code in sync with what the documentation says, but it should not change behaviour ;-) -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH
On Fri, Apr 01, 2016 at 12:03:19AM +0100, Alex Bligh wrote: > Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically > the latter may be set on any command, and its semantics on commands other > than NBD_CMD_WRITE need explaining. Further, explain how these relate to > reordering of commands. > > Signed-off-by: Alex Bligh > --- > doc/proto.md | 52 ++-- > 1 file changed, 42 insertions(+), 10 deletions(-) > > diff --git a/doc/proto.md b/doc/proto.md > index c1e05c5..bc4483d 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -197,6 +197,37 @@ handle as was sent by the client in the corresponding > request. In > this way, the client can correlate which request is receiving a > response. > > + Ordering of messages and writes > + > +The server MAY process commands out of order, and MAY reply out of > +order, save that: > + > +* All write commands (that includes both `NBD_CMD_WRITE` and > + `NBD_CMD_TRIM`) that the server completes (i.e. replies to) > + prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile > + storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure > + that all write command received prior to processing the `NBD_CMD_FLUSH` > + (whether they are replied to or not) are written to non-volatile > + storage prior to processing an `NBD_CMD_FLUSH`; note this is a > + stronger condition than the previous 'MUST' condition. This This seems to make little sense. Are you saying that suddenly now sending a reply for FLUSH with outstanding writes is wrong? If not, the above should be clarified. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [PATCH] Docs: proto.md: Clarify NUL in export name
Thanks, applied. On Fri, Apr 01, 2016 at 12:16:48AM +0100, Alex Bligh wrote: > Clarify that > > * The name is not NUL terminated (not just that the length > 'does not include NUL termination' which might be taken to mean > there is NUL termination but the length doesn't include it. > > * The name cannot itself include embedded NUL characters (despite > it not being NUL terminated). > > Signed-off-by: Alex Bligh > --- > doc/proto.md | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/doc/proto.md b/doc/proto.md > index c1e05c5..7729051 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -292,7 +292,8 @@ of the newstyle negotiation. > haggling, and proceed to the transmission phase. Data: name of the > export, free-form UTF-8 text (subject to limitations by server > implementation). The length of the name is determined from the > -option header, and does NOT include a NUL terminator. If the > +option header. The name is not NUL terminated, and may not > +contain embedded NUL characters. If the > chosen export does not exist or requirements for the chosen export > are not met (e.g., the client did not negotiate TLS for an export > where the server requires it), the server should close the > -- > 1.9.1 > > > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 > ___ > Nbd-general mailing list > nbd-gene...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nbd-general > -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension
Hi, Thanks, applied. On Thu, Mar 31, 2016 at 04:02:05PM +0300, Denis V. Lunev wrote: > From: Pavel Borzenkov > > There exist some cases when a client knows that the data it is going to > write is all zeroes. Such cases include mirroring or backing up a device > implemented by a sparse file. > > With current NBD command set, the client has to issue NBD_CMD_WRITE > command with zeroed payload and transfer these zero bytes through the > wire. The server has to write the data onto disk, effectively denying > the sparseness. > > To remedy this, the patch adds WRITE_ZEROES extension with one new > NBD_CMD_WRITE_ZEROES command. > > Signed-off-by: Pavel Borzenkov > Signed-off-by: Denis V. Lunev > CC: Wouter Verhelst > CC: Paolo Bonzini > CC: Kevin Wolf > CC: Stefan Hajnoczi > CC: Wouter Verhelst > CC: Alex Bligh > CC: Eric Blake > --- > v2: > - rebased on master > - explicitly state that the client must not set NBD_CMD_WRITE_ZEROES if > support for it wasn't negotiated with the server; > - add new command flag's description in format suitable for moving to > "Command flags" section. > > > doc/proto.md | 64 > > 1 file changed, 60 insertions(+), 4 deletions(-) > > diff --git a/doc/proto.md b/doc/proto.md > index c1e05c5..a574563 100644 > --- a/doc/proto.md > +++ b/doc/proto.md > @@ -261,6 +261,8 @@ immediately after the handshake flags field in oldstyle > negotiation: >schedule I/O accesses as for a rotational medium > - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports >`NBD_CMD_TRIM` commands > +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server > + supports `NBD_CMD_WRITE_ZEROES` commands > > Clients SHOULD ignore unknown flags. > > @@ -444,10 +446,13 @@ affects a particular command. Clients MUST NOT set a > command flag bit > that is not documented for the particular command; and whether a flag is > valid may depend on negotiation during the handshake phase. > > -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`. SHOULD be > - set to 1 if the client requires "Force Unit Access" mode of > - operation. MUST NOT be set unless transmission flags included > - `NBD_FLAG_SEND_FUA`. > +- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and > + `NBD_CMD_WRITE_ZEROES` commands. SHOULD be set to 1 if the client requires > + "Force Unit Access" mode of operation. MUST NOT be set unless transmission > + flags included `NBD_FLAG_SEND_FUA`. > + > +- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES` > + extension; see below. > > Request types > > @@ -523,6 +528,10 @@ The following request types exist: > A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM` > was set in the transmission flags field. > > +* `NBD_CMD_WRITE_ZEROES` (6) > + > +Defined by the experimental `WRITE_ZEROES` extension; see below. > + > * Other requests > > Some third-party implementations may require additional protocol > @@ -654,6 +663,53 @@ option reply type. >message if they do not also send it as a reply to the >`NBD_OPT_SELECT` message. > > +### `WRITE_ZEROES` extension > + > +There exist some cases when a client knows that the data it is going to write > +is all zeroes. Such cases include mirroring or backing up a device > implemented > +by a sparse file. With current NBD command set, the client has to issue > +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes > +through the wire. The server has to write the data onto disk, effectively > +losing the sparseness. > + > +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds > +one new command and one new command flag. > + > +* `NBD_CMD_WRITE_ZEROES` (6) > + > +A write request with no payload. Length and offset define the location > +and amount of data to be zeroed. > + > +The server MUST zero out the data on disk, and then send the reply > +message. The server MAY send the reply message before the data has > +reached permanent storage. > + > +A client MUST NOT send a write zeroes request unless > +`NBD_FLAG_SEND_WRITE_ZEROES` was set in the transmission flags field. > + > +If the `NBD_FLAG_SEND_FUA` flag was set in the transmission flags field, > +the client MAY set the flag `NBD_CMD_FLAG_FUA` in the command flags > field. > +If this flag was set, the server MUST NOT send the reply until it has > +ensured that the newly-zeroed data has reached permanent storage. > + > +If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command > +flags field, the server MAY use trimming to zero out the area, but it > +MUST ensure that the data reads back as zero. > + > +If an error occurs, the server SHOULD set the appropriate error code > +in the error field. The server MAY then close the connection. > + > +The server SHOULD r
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote: > On 01/04/2016 10:14, Richard W.M. Jones wrote: > > Found it: only CONFIG_MPTABLE=y was necessary. It boots with: > > > > # CONFIG_PIRTABLE is not set > > CONFIG_MPTABLE=y > > # CONFIG_SMBIOS is not set > > # CONFIG_ACPI is not set > > If you add all three it should not give any slowdown and will provide > full hardware features to the kernel. qboot does ACPI and PCI bus > assignment (it doesn't do SMBIOS because I got bored debugging it. :)) Enabling all 4 adds about 2ms. However the overhead of SeaBIOS is still down from 68ms to 18ms (4.0% of total boot time down to 1.1%) so it's still a big gain. I wonder how we can make use of this in qemu and downstream distros? Can we have a bios-min.bin which is used with -kernel boots? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 10:03:10AM +0200, Paolo Bonzini wrote: > > > On 01/04/2016 09:55, Richard W.M. Jones wrote: > >>> > > I'd dearly love to get rid of the sgabios option ROM. It looks like > >>> > > SeaBIOS nearly supports a full serial console now? > >> > > >> > Last I checked, one could disable the option rom by adding "-device > >> > VGA,romfile=" to the qemu command line. > > Sure, I can easily remove sgabios. The problem is I still want my > > serial console to show BIOS messages! > > What do you need it for? It's so we can handle error reports. When someone reports that libguestfs "hangs", it's sometimes useful to know if the BIOS was entered or not, since it points the finger at either qemu, BIOS or kernel. (Remember we have to be able to run on a whole variety of Linux distros which often have old/broken/random qemu/bios/kernel). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 10:44, Richard W.M. Jones wrote: > On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote: >> On 01/04/2016 10:14, Richard W.M. Jones wrote: >>> Found it: only CONFIG_MPTABLE=y was necessary. It boots with: >>> >>> # CONFIG_PIRTABLE is not set >>> CONFIG_MPTABLE=y >>> # CONFIG_SMBIOS is not set >>> # CONFIG_ACPI is not set >> >> If you add all three it should not give any slowdown and will provide >> full hardware features to the kernel. qboot does ACPI and PCI bus >> assignment (it doesn't do SMBIOS because I got bored debugging it. :)) > > Enabling all 4 adds about 2ms. > > However the overhead of SeaBIOS is still down from 68ms to 18ms > (4.0% of total boot time down to 1.1%) so it's still a big gain. > > I wonder how we can make use of this in qemu and downstream distros? > Can we have a bios-min.bin which is used with -kernel boots? That's an interesting idea. We can look at it for 2.7. Paolo
Re: [Qemu-devel] [PATCH] Whitelist sysinfo call
On Mon, Mar 21, 2016 at 08=17=45AM -0400, Miroslav Rezanina wrote: > > > - 元のメッセージ - > > 差出人: "Eduardo Otubo" > > 宛先: mreza...@redhat.com > > Cc: qemu-devel@nongnu.org, arm...@redhat.com > > 送信済み: 2016年3月11日, 金曜日 午前 9:51:50 > > 件名: Re: [Qemu-devel] [PATCH] Whitelist sysinfo call > > > > On Mon, Mar 07, 2016 at 10=34=46AM +0100, mreza...@redhat.com wrote: > > > From: Miroslav Rezanina > > > > > > Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call > > > so qemu using rbd image hang after start when run in sandbox mode. > > > > > > To allow using rbd images in sandbox mode we have to whitelist it. > > > > > > Signed-off-by: Miroslav Rezanina > > > --- > > > qemu-seccomp.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/qemu-seccomp.c b/qemu-seccomp.c > > > index 2866e3c..e29fca1 100644 > > > --- a/qemu-seccomp.c > > > +++ b/qemu-seccomp.c > > > @@ -250,6 +250,7 @@ static const struct QemuSeccompSyscall > > > seccomp_whitelist[] = { > > > #ifdef HAVE_CACHEFLUSH > > > { SCMP_SYS(cacheflush), 240 }, > > > #endif > > > +{ SCMP_SYS(sysinfo), 240 }, > > > > Are you sure you want to add this syscall to the bottom of the list? Did > > you estimate the frequency it is called by running strace? > > > > Thanks for the patch. > > > Hi, > > Yes, it wasn't used before nss update and now is used only for rbd based > images > where it is called just few times upon start so drawback should be minimal. > With > this we do not change cost of other calls. > > Thanks for review and question, > Mirek Ok. So, ACK on this patch. I'll roll out a pull request by the end of the day. Thanks for the contribution. -- Eduardo Otubo ProfitBricks GmbH signature.asc Description: Digital signature
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
2016-04-01 11:47 GMT+03:00 Paolo Bonzini : > That's an interesting idea. We can look at it for 2.7. That's fine =) I'm waiting too. -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 10:47, Richard W.M. Jones wrote: > It's so we can handle error reports. When someone reports that > libguestfs "hangs", it's sometimes useful to know if the BIOS was > entered or not, since it points the finger at either qemu, BIOS or > kernel. (Remember we have to be able to run on a whole variety of > Linux distros which often have old/broken/random qemu/bios/kernel). Sure, but that can be an extra option if sgabios adds overhead. Anyhow, you probably want "-vga none" too. Paolo
Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!
On 04/01/2016 11:18 AM, Paolo Bonzini wrote: Microsoft loves Linux, and Red Hat loves .NET. Since we can put whatever we want in the Hyper-V vendor signature, let's show some love too! Cc: Andreas Färber Cc: Alex Williamson Cc: Denis V. Lunev Cc: Eduardo Habkost Cc: Roman Kagan Cc: KY Srinivasan Signed-off-by: Paolo Bonzini --- target-i386/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 799fdfa..1968f04 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs) c = &cpuid_data.entries[cpuid_i++]; c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS; if (!cpu->hyperv_vendor_id) { -memcpy(signature, "Microsoft Hv", 12); +memcpy(signature, "KVM<3HyperV!", 12); } else { size_t len = strlen(cpu->hyperv_vendor_id); potentially this could be dangerous. This could break hypervisor detection code. I'll check this today and reply. But you have made my day. Thank you for April 1.
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 10:51:31AM +0200, Paolo Bonzini wrote: > > > On 01/04/2016 10:47, Richard W.M. Jones wrote: > > It's so we can handle error reports. When someone reports that > > libguestfs "hangs", it's sometimes useful to know if the BIOS was > > entered or not, since it points the finger at either qemu, BIOS or > > kernel. (Remember we have to be able to run on a whole variety of > > Linux distros which often have old/broken/random qemu/bios/kernel). > > Sure, but that can be an extra option if sgabios adds overhead. > > Anyhow, you probably want "-vga none" too. That's the qemu option? Currently we use -nodefconfig -nodefaults -display none which I assumed would do the same thing (ie. not add any default devices). Does -vga none do more than this? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html
Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!
On Fri, Apr 01, 2016 at 10:18:02AM +0200, Paolo Bonzini wrote: > Microsoft loves Linux, and Red Hat loves .NET. Since we can put whatever > we want in the Hyper-V vendor signature, let's show some love too! > > Cc: Andreas Färber > Cc: Alex Williamson > Cc: Denis V. Lunev > Cc: Eduardo Habkost > Cc: Roman Kagan > Cc: KY Srinivasan > Signed-off-by: Paolo Bonzini > --- > target-i386/kvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c > index 799fdfa..1968f04 100644 > --- a/target-i386/kvm.c > +++ b/target-i386/kvm.c > @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs) > c = &cpuid_data.entries[cpuid_i++]; > c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS; > if (!cpu->hyperv_vendor_id) { > -memcpy(signature, "Microsoft Hv", 12); > +memcpy(signature, "KVM<3HyperV!", 12); I think it'd be nicer to use the unicode heart symbol ♡ encoded as utf-8 eg memcpy(signature, "KVM\xe2\x99\xa1HyperV", 12); Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 10:57, Richard W.M. Jones wrote: > That's the qemu option? Currently we use > -nodefconfig -nodefaults -display none > which I assumed would do the same thing (ie. not add any default > devices). Does -vga none do more than this? No, I missed the -nodefaults. Paolo
Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!
On 01/04/2016 10:58, Daniel P. Berrange wrote: > On Fri, Apr 01, 2016 at 10:18:02AM +0200, Paolo Bonzini wrote: >> Microsoft loves Linux, and Red Hat loves .NET. Since we can put whatever >> we want in the Hyper-V vendor signature, let's show some love too! >> >> Cc: Andreas Färber >> Cc: Alex Williamson >> Cc: Denis V. Lunev >> Cc: Eduardo Habkost >> Cc: Roman Kagan >> Cc: KY Srinivasan >> Signed-off-by: Paolo Bonzini >> --- >> target-i386/kvm.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/target-i386/kvm.c b/target-i386/kvm.c >> index 799fdfa..1968f04 100644 >> --- a/target-i386/kvm.c >> +++ b/target-i386/kvm.c >> @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs) >> c = &cpuid_data.entries[cpuid_i++]; >> c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS; >> if (!cpu->hyperv_vendor_id) { >> -memcpy(signature, "Microsoft Hv", 12); >> +memcpy(signature, "KVM<3HyperV!", 12); > > I think it'd be nicer to use the unicode heart symbol ♡ encoded as utf-8 > eg > > memcpy(signature, "KVM\xe2\x99\xa1HyperV", 12); I'll try to get v2 ready in the next 13 hours and 53 minutes. Paolo
Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property
On 04/01/2016 04:24 PM, Hailiang Zhang wrote: > On 2016/4/1 15:39, Jason Wang wrote: >> >> >> On 04/01/2016 03:08 PM, zhanghailiang wrote: >>> After commit 338d3f, we support 'status' property for filter object. >>> The segfault can be triggered by starting qemu with 'status=off' property >>> for filter, when the s->incoming_queue is NULL, we reference it directly >>> in qemu_net_queue_flush(). >>> >>> Let's check the value of 's->incoming_queue' before calling >>> qemu_net_queue_flush(). >>> >>> Signed-off-by: zhanghailiang >>> --- >>> net/filter-buffer.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/net/filter-buffer.c b/net/filter-buffer.c >>> index cc6bd94..79e2ce3 100644 >>> --- a/net/filter-buffer.c >>> +++ b/net/filter-buffer.c >>> @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf) >>> { >>> FilterBufferState *s = FILTER_BUFFER(nf); >>> >>> -if (!qemu_net_queue_flush(s->incoming_queue)) { >>> +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) { >>> /* Unable to empty the queue, purge remaining packets */ >>> qemu_net_queue_purge(s->incoming_queue, nf->netdev); >>> } >> >> We'd better handle this at generic layer and don't let a specific net >> filter need to worry about this. >> >> Looks like the issue is we may trigger status_changed() too early (even >> before the the filter was initialized). >> > > Yes ~ > >> How about not call status_changed() if the initialization is not done? >> > > But seems that it is difficult to confirm if the filter is initialized > or not ... If nfc->setup() is not called, nf->netdev is NULL. Thanks Wen Congyang > >> . >> > > > > >
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
* Richard W.M. Jones (rjo...@redhat.com) wrote: > On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote: > > On 01/04/2016 10:14, Richard W.M. Jones wrote: > > > Found it: only CONFIG_MPTABLE=y was necessary. It boots with: > > > > > > # CONFIG_PIRTABLE is not set > > > CONFIG_MPTABLE=y > > > # CONFIG_SMBIOS is not set > > > # CONFIG_ACPI is not set > > > > If you add all three it should not give any slowdown and will provide > > full hardware features to the kernel. qboot does ACPI and PCI bus > > assignment (it doesn't do SMBIOS because I got bored debugging it. :)) > > Enabling all 4 adds about 2ms. > > However the overhead of SeaBIOS is still down from 68ms to 18ms > (4.0% of total boot time down to 1.1%) so it's still a big gain. > > I wonder how we can make use of this in qemu and downstream distros? > Can we have a bios-min.bin which is used with -kernel boots? Could we pass a flag at runtime that gets the standard image to skip a lot of the stuff you don't want? Or is there anything else qemu could provide (e.g. a fast way to get to PCI config space?) Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
Hi, > I wonder how we can make use of this in qemu and downstream distros? > Can we have a bios-min.bin which is used with -kernel boots? We already build two seabios roms: one full featued and one slightly stripped down to keep it below 128k, for backward compatibility with old machine types. Adding a third config for -kernel boot should be easy. For that use case we can probably also turn on seabios logging to the serial console and drop sgabios. We don't need input (no boot menu) and we also don't need to hook into int10 (no grub/ipxe using that for output). cheers, Gerd
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On 1 Apr 2016, at 08:43, Paolo Bonzini wrote: > > On 31/03/2016 22:17, Alex Bligh wrote: In qemu, read+FUA just triggers blk_co_flush() prior to reading; but that's the same function it calls for write+FUA. >> That's harmless, but unnecessary in the sense that current documented >> behaviour doesn't require it. Perhaps it should? >> >> I suppose TRIM etc. should support FUA too? > > TRIM is an advisory operation, so it doesn't make sense to force access > to the medium. Reflecting, that's correct. FUA can't mean anything sensible for TRIM. -- Alex Bligh
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On 1 Apr 2016, at 08:49, Paolo Bonzini wrote: > > On 31/03/2016 22:34, Eric Blake wrote: >> give me possibly stale data because I accessed >> the underlying storage rather than paying attention to in-flight writes >> that would change what I read". > > Overlapping I/O is always unspecified, so you should expect stale data > if a read overlaps an in-flight write. Sure, that's a consideration for overlapping I/O in general rather than FUA. You need to wait until a write operation has completed to be able to guarantee that a read operation will read back the written data (whether or not it has been persisted to disk rather than e.g. sitting in a cache somewhere). A read following a FUA write that has not completed could be processed prior to the write. Similarly a read following a non-FUA write that has completed must read the written data even if it's not persisted. I'm interested with either Qemu or the kernel have a sense of FUA on reads. Do you have a view on that? This (which was the inspiration for nbd FUA and now appears slightly clearer than I remember at the time): https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt says > The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the > filesystem and will make sure that I/O completion for this request is only > signaled after the data has been committed to non-volatile storage. and > For devices that also support the FUA bit the block layer needs > to be told to pass through the REQ_FUA bit using: > > blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA); > > and the driver must handle write requests that have the REQ_FUA bit set > in prep_fn/request_fn. If the FUA bit is not natively supported the block > layer turns it into an empty REQ_FLUSH request after the actual write. That's pretty ambiguous as to whether you need to handle it for reads as well. If it does read+FUA means that the read needs an fdatasync() type operation to happen. -- Alex Bligh
Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH
On 1 Apr 2016, at 09:35, Wouter Verhelst wrote: >> +* All write commands (that includes both `NBD_CMD_WRITE` and >> + `NBD_CMD_TRIM`) that the server completes (i.e. replies to) >> + prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile >> + storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD >> ensure >> + that all write command received prior to processing the `NBD_CMD_FLUSH` >> + (whether they are replied to or not) are written to non-volatile >> + storage prior to processing an `NBD_CMD_FLUSH`; note this is a >> + stronger condition than the previous 'MUST' condition. This > > This seems to make little sense. Are you saying that suddenly now > sending a reply for FLUSH with outstanding writes is wrong? If not, the > above should be clarified. The MUST sentence does not cover that situation as it only refers to completed writes. The SHOULD sentence says that's a 'SHOULD NOT' situation in respect of writes that have PROCESSED (i.e actioned) whether or not they have been replied to. Of course the client has no way of knowing whether they have been PROCESSED without a reply. Personally I think the SHOULD clause is pretty pointless and is unnecessary, but that's where the conversation got to n years ago I believe. Happy to delete the last sentence if that's wrong. -- Alex Bligh
[Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes
The write path for pass-through devices, commonly used for controlling keyboard LEDs via EV_LED, was not implemented. This commit adds the necessary plumbing to connect the status virtio queue to the host evdev file descriptor. Signed-off-by: Ladi Prosek --- Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't believe that this will be happening in practise for local evdev devices. Of course, all bets are off if fd is an arbitrary file, but we would be running into issues anyway (right now, if virtio_input_host_event doesn't read all of sizeof(struct virtio_input_event) bytes, we get out of sync with the event stream). So part of me would just do one best-effort write and ignore all errors. I'm curious what you think. hw/input/virtio-input-host.c | 85 include/hw/virtio/virtio-input.h | 8 2 files changed, 93 insertions(+) diff --git a/hw/input/virtio-input-host.c b/hw/input/virtio-input-host.c index 36856d6..bc02d3b 100644 --- a/hw/input/virtio-input-host.c +++ b/hw/input/virtio-input-host.c @@ -42,6 +42,55 @@ static void virtio_input_host_event(void *opaque) } } +static int virtio_input_host_write_event(VirtIOInputHost *vih, + struct timeval *time, + struct virtio_input_event *virtio) +{ +struct input_event evdev; +int rc; + +evdev.time = *time; +evdev.type = le16_to_cpu(virtio->type); +evdev.code = le16_to_cpu(virtio->code); +evdev.value = le32_to_cpu(virtio->value); + +rc = write(vih->fd, &evdev, sizeof(evdev)); +if (rc == -1) { +rc = errno; +if (rc == EAGAIN || rc == EWOULDBLOCK) { +return EAGAIN; +} +perror("virtio_input_host_write_event: write"); +return rc; +} +if (rc != sizeof(evdev)) { +return EINVAL; +} +return 0; +} + +static void virtio_input_writable(void *opaque) +{ +VirtIOInputHost *vih = opaque; +VirtIOInputEvent *vie; +int rc; + +while (!QSIMPLEQ_EMPTY(&vih->pending_writes)) { +vie = QSIMPLEQ_FIRST(&vih->pending_writes); + +rc = virtio_input_host_write_event(vih, &vie->time, &vie->virtio); +if (rc == EAGAIN) { +break; +} +QSIMPLEQ_REMOVE_HEAD(&vih->pending_writes, next); +g_free(vie); +} + +if (QSIMPLEQ_EMPTY(&vih->pending_writes)) { +qemu_set_fd_handler(vih->fd, virtio_input_host_event, NULL, vih); +} +} + static void virtio_input_bits_config(VirtIOInputHost *vih, int type, int count) { @@ -127,6 +176,8 @@ static void virtio_input_host_realize(DeviceState *dev, Error **errp) virtio_input_bits_config(vih, EV_LED, LED_CNT); qemu_set_fd_handler(vih->fd, virtio_input_host_event, NULL, vih); + +QSIMPLEQ_INIT(&vih->pending_writes); return; err_close: @@ -138,11 +189,44 @@ err_close: static void virtio_input_host_unrealize(DeviceState *dev, Error **errp) { VirtIOInputHost *vih = VIRTIO_INPUT_HOST(dev); +VirtIOInputEvent *vie; if (vih->fd > 0) { qemu_set_fd_handler(vih->fd, NULL, NULL, NULL); close(vih->fd); } + +while (!QSIMPLEQ_EMPTY(&vih->pending_writes)) { +vie = QSIMPLEQ_FIRST(&vih->pending_writes); +QSIMPLEQ_REMOVE_HEAD(&vih->pending_writes, next); +g_free(vie); +} +} + +static void virtio_input_host_handle_status(VirtIOInput *vinput, +virtio_input_event *event) +{ +VirtIOInputHost *vih = VIRTIO_INPUT_HOST(vinput); +struct timeval time; +int rc; + +if (gettimeofday(&time, NULL)) { +perror("virtio_input_host_handle_status: gettimeofday"); +return; +} + +rc = virtio_input_host_write_event(vih, &time, event); +if (rc == EAGAIN) { +VirtIOInputEvent *vie = g_malloc(sizeof(*vie)); +vie->time = time; +vie->virtio = *event; + +if (QSIMPLEQ_EMPTY(&vih->pending_writes)) { +qemu_set_fd_handler(vih->fd, virtio_input_host_event, +virtio_input_writable, vih); +} +QSIMPLEQ_INSERT_TAIL(&vih->pending_writes, vie, next); +} } static const VMStateDescription vmstate_virtio_input_host = { @@ -164,6 +248,7 @@ static void virtio_input_host_class_init(ObjectClass *klass, void *data) dc->props = virtio_input_host_properties; vic->realize = virtio_input_host_realize; vic->unrealize = virtio_input_host_unrealize; +vic->handle_status = virtio_input_host_handle_status; } static void virtio_input_host_init(Object *obj) diff --git a/include/hw/virtio/virtio-input.h b/include/hw/virtio/virtio-input.h index af1c207..d41d650 100644 --- a/include/hw/virtio/virtio-input.h +++ b/include/hw/virtio/virtio-input.h @@ -61,6 +61,7 @@ typedef struct VirtIOInputClass VirtIOInputClass; typedef struct VirtIOI
Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA
On 1 Apr 2016, at 08:59, Wouter Verhelst wrote: > On Thu, Mar 31, 2016 at 07:15:32PM +0100, Alex Bligh wrote: >> NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but >> 1<<16 in nbd.h. It is not used anywhere within the code. > > Yes it is: > > wouter@gangtai:~/code/c/nbd$ grep -rl CMD_FLAG_FUA * > doc/proto.md > make-integrityhuge.c > nbd.h > nbd-server.c > nbd-trdump.c > tests/run/nbd-tester-client.c > wouter@gangtai:~/code/c/nbd$ > > I don't mind bringing the code in sync with what the documentation says, > but it should not change behaviour ;-) Sorry - I think I must have done 'git grep' in the wrong directory or something. Ignore this patch and I'll redo it. I think probably the easiest fix is (beware, manually generated untested patch) as follows, which at least shows the reason for the discrepancy. diff --git a/nbd.h b/nbd.h index f2a32dd..53b6ca1 100644 --- a/nbd.h +++ b/nbd.h @@ -38,7 +38,9 @@ enum { }; #define NBD_CMD_MASK_COMMAND 0x -#define NBD_CMD_FLAG_FUA (1<<16) +#define NBD_CMD_SHIFT (16) +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT) /* values for flags field */ #define NBD_FLAG_HAS_FLAGS (1 << 0)/* Flags are there */ -- Alex Bligh
Re: [Qemu-devel] Virtio-9p
On Fri, 1 Apr 2016 09:09:32 +0200 Pradeep Kiruvale wrote: > Hi Greg, > Hi Pradeep, I'm Cc'ing qemu-devel like in your previous posts, so QEMU experts may jump in. > What I understand from the requirement for our project is if we use > virtio-blk it caches the pages in the guest. We would like to avoid that AFAIK this is true when you pass cache=none to -drive on the QEMU command line. But there are other options such as writeback or writethrough, which rely on the host page cache. > and also we want to share those pages across multiple guests and when they > need some data they should get it from the host instead of using the cached > data at each guest. > So you want all the guests to attach to the same block device or backing file in the host, correct ? AFAIK we cannot do that with virtio-blk indeed... and virtio-9p is only about sharing files, not block devices. Maybe you could share a big file between the host and all guests with 9p, and each guest can use the loop device to access the file as a block device... but even then, you'd have to deal with concurrent accesses... > Basically we are trying to cut down the memory foot print of the guests. > If you're using KVM and your guests run the same distro or application, you may try to use KSM (Kernel Same-page Merging) in the host. > Regards, > Pradeep > > On 31 March 2016 at 18:12, Greg Kurz wrote: > > > On Wed, 30 Mar 2016 16:27:48 +0200 > > Pradeep Kiruvale wrote: > > > > > Hi Greg, > > > > > > > Hi Pradeep, > > > > > Thanks for the reply. > > > > > > Let me put it this way, virtio-blk-pci is used for block IO on the > > devices > > > shared between the guest and the host. > > > > I don't really understand the "devices shared between the guest and the > > host" wording... virtio-blk-pci exposes a virtio-blk device through PCI > > to the guest. The virtio-blk device can be backed by a file or a block > > device from the host. > > > > > Here I want to share the file and have QoS between the guests. So I am > > > using the Virtio-9p-pci. > > > > > > > What file ? > > > > > Basically I want to have QoS for virtio-9p-pci. > > > > > > > Can you provide a more detailed scenario on the result you want to reach ? > > > > > Regards, > > > Pradeep > > > > > > > Cheers. > > > > -- > > Greg > > > > > On 30 March 2016 at 16:13, Greg Kurz wrote: > > > > > > > On Wed, 30 Mar 2016 14:10:38 +0200 > > > > Pradeep Kiruvale wrote: > > > > > > > > > Hi All, > > > > > > > > > > Is virtio-9p-pci device only supports the fsdev deices? I am trying > > to > > > > use > > > > > -drive option for applying QoS for block device using Virtio-9p-pci > > > > device, > > > > > but failing to create/add a device other than fsdev. Can you please > > help > > > > me > > > > > on this? > > > > > > > > > > Regards, > > > > > Pradeep > > > > > > > > Hi Pradeep, > > > > > > > > Not sure to catch what you want to do but I confirm that virti-9p-pci > > only > > > > supports > > > > fsdev... if you want a block device, why don't you use virtio-blk-pci ? > > > > > > > > Cheers. > > > > > > > > -- > > > > Greg > > > > > > > > > > > >
Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
On 1 Apr 2016, at 09:27, Wouter Verhelst wrote: >> It's also tricky because we just barely documented that servers SHOULD >> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on >> commands where it is not documented; I don't know if we have an adequate >> discovery system in place to learn _which_ commands support FUA, > > That's what I'm mostly worried about. Yes, we have FUA, and yes, some > clients may send it on commands that aren't WRITE, but it is not very > well defined what happens then: Actually the text says: > Clients MUST NOT set a command flag bit that is not documented for the > particular command; and whether a flag is valid may depend on negotiation > during the handshake phase. So this gives us an easy out. We document NBD_CMD_FLAG_FUA for every command, but say it is meaningless for commands outside some subset (see below), and therefore 'SHOULD NOT' be used by the client. For commands outside that subset, the server SHOULD ignore it. That means that we don't need to change the above text, as it is documented. However, I'm not clear what the subset of commands NBD_CMD_FLAG_FUA should work on is. I originally thought it was anything that writes. Paolo has pointed out it's pointless for TRIM given it's advisory. It makes sense for anything that writes holes. I am not clear what it should do for reads - the original semantic was meant to be the kernel bio layer semantic, and (as per message to Paolo) the kernel documentation is less than helpful in whether it can be set on a read bio. -- Alex Bligh
[Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist
In some occasions, a patch [1] can start with a hunk containing a simple type cast. At the time annotate_values() is run, the type is unknown and the cast type is misinterpreted as a identifier, resulting in an error if it is followed with a negative value: ERROR: spaces required around that '-' (ctx:WxV) It seems complex to catch all possible types in a cast expression. So, as a fallback solution, let's add some common qemu types to the typeList array. [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06741.html Signed-off-by: Cédric Le Goater --- scripts/checkpatch.pl |1 + 1 file changed, 1 insertion(+) Index: qemu-dgibson-for-2.6.git/scripts/checkpatch.pl === --- qemu-dgibson-for-2.6.git.orig/scripts/checkpatch.pl +++ qemu-dgibson-for-2.6.git/scripts/checkpatch.pl @@ -212,6 +212,7 @@ our @typeList = ( qr{${Ident}_t}, qr{${Ident}_handler}, qr{${Ident}_handler_fn}, + qr{target_(?:u)?long}, ); # This can be modified by sub possible. Since it can be empty, be careful
[Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine
Using the nested aio_poll() in coroutine is a bad idea. This patch replaces the aio_poll loop in bdrv_drain with a BH, if called in coroutine. For example, the bdrv_drain() in mirror.c can hang when a guest issued request is pending on it in qemu_co_mutex_lock(). Mirror coroutine in this case has just finished a request, and the block job is about to complete. It calls bdrv_drain() which waits for the other coroutine to complete. The other coroutine is a scsi-disk request. The deadlock happens when the latter is in turn pending on the former to yield/terminate, qemu_co_mutex_lock(). The state flow is as below (assuming a qcow2 image): mirror coroutine scsi-disk coroutine - do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain while (has tracked request) aio_poll() In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return because the mirror coroutine is blocked in the aio_poll(blocking=true). Reported-by: Laurent Vivier Suggested-by: Paolo Bonzini Signed-off-by: Fam Zheng --- block/io.c | 40 1 file changed, 40 insertions(+) diff --git a/block/io.c b/block/io.c index c4869b9..d0a4551 100644 --- a/block/io.c +++ b/block/io.c @@ -253,6 +253,42 @@ static void bdrv_drain_recurse(BlockDriverState *bs) } } +typedef struct { +Coroutine *co; +BlockDriverState *bs; +bool done; +} BdrvCoDrainData; + +static void bdrv_co_drain_bh_cb(void *opaque) +{ +BdrvCoDrainData *data = opaque; +Coroutine *co = data->co; + +bdrv_drain(data->bs); +data->done = true; +qemu_coroutine_enter(co, NULL); +} + +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs) +{ +QEMUBH *bh; +BdrvCoDrainData data; + +assert(qemu_in_coroutine()); +data = (BdrvCoDrainData) { +.co = qemu_coroutine_self(), +.bs = bs, +.done = false, +}; +bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data), +qemu_bh_schedule(bh); + +do { +qemu_coroutine_yield(); +} while (!data.done); +qemu_bh_delete(bh); +} + /* * Wait for pending requests to complete on a single BlockDriverState subtree, * and suspend block driver's internal I/O until next request arrives. @@ -269,6 +305,10 @@ void bdrv_drain(BlockDriverState *bs) bool busy = true; bdrv_drain_recurse(bs); +if (qemu_in_coroutine()) { +bdrv_co_drain(bs); +return; +} while (busy) { /* Keep iterating */ bdrv_flush_io_queue(bs); -- 2.7.4
Re: [Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine
On 01/04/2016 11:46, Fam Zheng wrote: > + > +static void bdrv_co_drain_bh_cb(void *opaque) > +{ > +BdrvCoDrainData *data = opaque; > +Coroutine *co = data->co; > + > +bdrv_drain(data->bs); > +data->done = true; > +qemu_coroutine_enter(co, NULL); > +} > + > +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs) > +{ > +QEMUBH *bh; > +BdrvCoDrainData data; > + > +assert(qemu_in_coroutine()); > +data = (BdrvCoDrainData) { > +.co = qemu_coroutine_self(), > +.bs = bs, > +.done = false, > +}; > +bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data), > +qemu_bh_schedule(bh); > + > +do { > +qemu_coroutine_yield(); > +} while (!data.done); The loop and "done" is not necessary. Also, > +qemu_bh_delete(bh); this can be moved to bdrv_co_drain_bh_cb before bdrv_drain, so that the bottom half doesn't slow down the event loop until bdrv_drain completes. Paolo > +}
Re: [Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist
On Fri, 1 Apr 2016 11:40:06 +0200 Cédric Le Goater wrote: > In some occasions, a patch [1] can start with a hunk containing a > simple type cast. At the time annotate_values() is run, the type is > unknown and the cast type is misinterpreted as a identifier, resulting > in an error if it is followed with a negative value: > > ERROR: spaces required around that '-' (ctx:WxV) > > It seems complex to catch all possible types in a cast expression. So, > as a fallback solution, let's add some common qemu types to the > typeList array. > > [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06741.html > > Signed-off-by: Cédric Le Goater > --- Indeed, with this patch, the checkpatch script stops complaining when it sees: +if (prefix == (target_ulong) -1ULL) { And I tend to agree with Cedric... unless Paolo has a bright idea of course. :) Acked-by: Greg Kurz > scripts/checkpatch.pl |1 + > 1 file changed, 1 insertion(+) > > Index: qemu-dgibson-for-2.6.git/scripts/checkpatch.pl > === > --- qemu-dgibson-for-2.6.git.orig/scripts/checkpatch.pl > +++ qemu-dgibson-for-2.6.git/scripts/checkpatch.pl > @@ -212,6 +212,7 @@ our @typeList = ( > qr{${Ident}_t}, > qr{${Ident}_handler}, > qr{${Ident}_handler_fn}, > + qr{target_(?:u)?long}, > ); > > # This can be modified by sub possible. Since it can be empty, be careful
[Qemu-devel] [PATCHv2] Improve documentation of FUA and FLUSH
Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically the latter may be set on any command, and its semantics on commands other than NBD_CMD_WRITE need explaining. Further, explain how these relate to reordering of commands. Signed-off-by: Alex Bligh --- doc/proto.md | 83 +--- 1 file changed, 68 insertions(+), 15 deletions(-) Changes since v1: * Rebased on master * Eric's modifications * Included 2 options as to how we might handle FUA on non-write commands. Let's establish whether either the kernel or Qemu have meaningful uses for read+FUA. diff --git a/doc/proto.md b/doc/proto.md index 8c83382..389b186 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -213,6 +213,47 @@ handle as was sent by the client in the corresponding request. In this way, the client can correlate which request is receiving a response. + Ordering of messages and writes + +The server MAY process commands out of order, and MAY reply out of +order, save that: + +* All write commands (that includes both `NBD_CMD_WRITE` and + `NBD_CMD_TRIM`) that the server completes (i.e. replies to) + prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile + storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure + that all write command received prior to processing the `NBD_CMD_FLUSH` + (whether they are replied to or not) are written to non-volatile + storage prior to processing an `NBD_CMD_FLUSH`; note this is a + stronger condition than the previous 'MUST' condition. This + paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within + the transmission flags, as otherwise `NBD_CMD_FLUSH` will never + be sent by the client to the server. + +* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set + +[Option #C1] + + in its command flags until the data area referred to by that command + +[Option #C1] + + in its command flags until the data (if any) written by that command + +[All options] + + is persisted to non-volatile storage. This only applies if + `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise + `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server + by the client. + +`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with +`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux +kernel bio with `REQ_FUA` set. In case of ambiguity in this +specification, the +[kernel documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt) +may be useful. + Request message The request message, sent by the client, looks as follows: @@ -480,10 +521,33 @@ affects a particular command. Clients MUST NOT set a command flag bit that is not documented for the particular command; and whether a flag is valid may depend on negotiation during the handshake phase. -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and - `NBD_CMD_WRITE_ZEROES` commands. SHOULD be set to 1 if the client requires - "Force Unit Access" mode of operation. MUST NOT be set unless transmission - flags included `NBD_FLAG_SEND_FUA`. +- bit 0, `NBD_CMD_FLAG_FUA`. This flag is valid for all commands. This bit + MUST be set to 0 unless the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") + was set in the transmission flags field. If the `NBD_FLAG_SEND_FUA` + is set in the transmission flags field, the client MAY set + `NBD_CMD_FLAG_FUA` in any request. If this bit is set, the server + +[Option #C1] + + MUST NOT send a reply until it has ensured that any data referred to + by this request (i.e. written data on a `NBD_CMD_WRITE` or + `NBD_CMD_WRITE_ZEROES`, read data on an `NBD_CMD_READ`) has reached + permanent storage. There will be certain commands + (e.g. `NBD_CMD_DISC`) for which this flag will thus not alter behaviour + (as the command does not refer to any data), in which case the server + MUST ignore this bit. + +[Option #C2] + + MUST NOT send a reply until it has ensured that the data (if any) written + by this request (e.g. by `NBD_CMD_WRITE` or `NBD_CMD_WRITE_ZEROES`) has + reached permanent storage. There will be certain commands (e.g. `NBD_CMD_READ` + and `NBD_CMD_DISC`) for which this flag will thus not alter behaviour + (as the command does not write any data), in which case the server + MUST ignore this bit. + +[All options] + - bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES` extension; see below. - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY` @@ -532,12 +596,6 @@ The following request types exist: message. The server MAY send the reply message before the data has reached permanent storage. -If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the -transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in -the command flags field. If this flag was set, the server MUST NOT send -the re
[Qemu-devel] [PATCH] qemu-iotests: 149: Use "/usr/bin/env python"
Do the same as other scripts, to pick the correct interpreter between python2 and python3 from the environment. Signed-off-by: Fam Zheng --- tests/qemu-iotests/149 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149 index bb5811d..52e23d2 100755 --- a/tests/qemu-iotests/149 +++ b/tests/qemu-iotests/149 @@ -1,4 +1,4 @@ -#!/usr/bin/python +#!/usr/bin/env python # # Copyright (C) 2016 Red Hat, Inc. # -- 2.7.4
[Qemu-devel] [PATCH] doc/proto.md: Clients MUST not set unknown client flags
Clients MUST NOT set unknown client flags. Currently this is permitted (but 'SHOULD NOT' be done), with the result that the server MUST drop the connection if it happens. This in effect gives the client an inappropriate way to close the connection. Signed-off-by: Alex Bligh --- doc/proto.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/proto.md b/doc/proto.md index 8c83382..6a7ebda 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -308,7 +308,7 @@ receiving the handshake flags from the server. set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124 bytes of zeroes at the end of the negotiation. -Clients SHOULD NOT set any other flags; the server MUST drop the +Clients MUST NOT set any other flags; the server MUST drop the connection if the client sets an unknown flag, or a flag that does not match something advertised by the server. -- 1.9.1
Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA
On Fri, Apr 01, 2016 at 10:32:50AM +0100, Alex Bligh wrote: > #define NBD_CMD_MASK_COMMAND 0x > -#define NBD_CMD_FLAG_FUA (1<<16) > +#define NBD_CMD_SHIFT (16) > +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT) That works too, I suppose. However, like I said, I need to clean this up anyway. I thought your "will you take a patch" was going to do that, but if not, I'll get around to doing it myself at some point... -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine
On Fri, 04/01 11:49, Paolo Bonzini wrote: > > > On 01/04/2016 11:46, Fam Zheng wrote: > > + > > +static void bdrv_co_drain_bh_cb(void *opaque) > > +{ > > +BdrvCoDrainData *data = opaque; > > +Coroutine *co = data->co; > > + > > +bdrv_drain(data->bs); > > +data->done = true; > > +qemu_coroutine_enter(co, NULL); > > +} > > + > > +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs) > > +{ > > +QEMUBH *bh; > > +BdrvCoDrainData data; > > + > > +assert(qemu_in_coroutine()); > > +data = (BdrvCoDrainData) { > > +.co = qemu_coroutine_self(), > > +.bs = bs, > > +.done = false, > > +}; > > +bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data), > > +qemu_bh_schedule(bh); > > + > > +do { > > +qemu_coroutine_yield(); > > +} while (!data.done); > > The loop and "done" is not necessary. Also, I was trying to protect against the bugs similar to the one fixed in e424aff5f30, but you're right we can make this an assertion and the loop is not needed. If a calling coroutine is resumed unexpectedly, we will catch it and fix it. > > > +qemu_bh_delete(bh); > > this can be moved to bdrv_co_drain_bh_cb before bdrv_drain, so that the > bottom half doesn't slow down the event loop until bdrv_drain completes. Good point! Will fix. Thanks, Fam
Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH
On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote: > > On 1 Apr 2016, at 09:35, Wouter Verhelst wrote: > > >> +* All write commands (that includes both `NBD_CMD_WRITE` and > >> + `NBD_CMD_TRIM`) that the server completes (i.e. replies to) > >> + prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile > >> + storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD > >> ensure > >> + that all write command received prior to processing the `NBD_CMD_FLUSH` > >> + (whether they are replied to or not) are written to non-volatile > >> + storage prior to processing an `NBD_CMD_FLUSH`; note this is a > >> + stronger condition than the previous 'MUST' condition. This > > > > This seems to make little sense. Are you saying that suddenly now > > sending a reply for FLUSH with outstanding writes is wrong? If not, the > > above should be clarified. > > The MUST sentence does not cover that situation as it only refers > to completed writes. > > The SHOULD sentence says that's a 'SHOULD NOT' situation in respect > of writes that have PROCESSED (i.e actioned) whether or not they > have been replied to. Of course the client has no way of knowing > whether they have been PROCESSED without a reply. > > Personally I think the SHOULD clause is pretty pointless and is > unnecessary, but that's where the conversation got to n years > ago I believe. I'm still not sure what it's supposed to mean, though. Clearly, you should at the very least reword it, if not... > Happy to delete the last sentence if that's wrong. ... remove it instead. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA
On 1 Apr 2016, at 10:43, Wouter Verhelst wrote: > On Fri, Apr 01, 2016 at 10:32:50AM +0100, Alex Bligh wrote: >> #define NBD_CMD_MASK_COMMAND 0x >> -#define NBD_CMD_FLAG_FUA (1<<16) >> +#define NBD_CMD_SHIFT (16) >> +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT) > > That works too, I suppose. > > However, like I said, I need to clean this up anyway. I thought your > "will you take a patch" was going to do that, but if not, I'll get > around to doing it myself at some point... How about you take that one for now and one of us will get do doing it properly in due course. The proper solution is obviously to use two 16 bit fields. -- Alex Bligh
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote: > Hi, > > > I wonder how we can make use of this in qemu and downstream distros? > > Can we have a bios-min.bin which is used with -kernel boots? > > We already build two seabios roms: one full featued and one slightly > stripped down to keep it below 128k, for backward compatibility with old > machine types. > > Adding a third config for -kernel boot should be easy. For that use > case we can probably also turn on seabios logging to the serial console > and drop sgabios. We don't need input (no boot menu) and we also don't > need to hook into int10 (no grub/ipxe using that for output). SeaBIOS logging is slow (or more likely, serial output is slow). And sgabios is useful for debugging. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH
On 1 Apr 2016, at 11:10, Wouter Verhelst wrote: > On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote: >> >> On 1 Apr 2016, at 09:35, Wouter Verhelst wrote: >> +* All write commands (that includes both `NBD_CMD_WRITE` and + `NBD_CMD_TRIM`) that the server completes (i.e. replies to) + prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile + storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure + that all write command received prior to processing the `NBD_CMD_FLUSH` + (whether they are replied to or not) are written to non-volatile + storage prior to processing an `NBD_CMD_FLUSH`; note this is a + stronger condition than the previous 'MUST' condition. This >>> >>> This seems to make little sense. Are you saying that suddenly now >>> sending a reply for FLUSH with outstanding writes is wrong? If not, the >>> above should be clarified. >> >> The MUST sentence does not cover that situation as it only refers >> to completed writes. >> >> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect >> of writes that have PROCESSED (i.e actioned) whether or not they >> have been replied to. Of course the client has no way of knowing >> whether they have been PROCESSED without a reply. >> >> Personally I think the SHOULD clause is pretty pointless and is >> unnecessary, but that's where the conversation got to n years >> ago I believe. > > I'm still not sure what it's supposed to mean, though. Clearly, you > should at the very least reword it, if not... > >> Happy to delete the last sentence if that's wrong. > > ... remove it instead. If I can't even explain it, it doesn't bode well! I think there are three types of writes that are relevant at the point of replying to a FLUSH: Type A: writes that are truly 'in flight', i.e. have been sent, but have not been 'processed' (i.e. write(2) has not been called). Type B: writes that are have been processed (i.e. write(2) has not been called) but the reply has not yet been sent to the client - either the reply hasn't been made yet or (more likely) it's sitting in some queue. Type C: writes that have been processed and a reply sent. The first sentence (the 'MUST') refers to Type C writes and says these MUST be persisted to permanent storage prior to replying to the FLUSH. Type A writes may be in a buffer server side or even client side, so no one expects anything to happen with respect to those. The 'SHOULD' bit concerns Type B writes. It's saying that if you've actually done the write call (but not yet replied) you SHOULD persist these to disk prior to replying to the FLUSH. This type of situation doesn't (last time I looked) happen in nbd-server.c because it does not process requests in parallel. However it may happen elsewhere. The question is what to do with them. I'm happy dropping anything about these writes and only dealing with Type C writes, but the 'SHOULD' is where I thought we got to before (I think I sent the quote from the mailing list). -- Alex Bligh
Re: [Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes
On Fr, 2016-04-01 at 11:31 +0200, Ladi Prosek wrote: > The write path for pass-through devices, commonly used for controlling > keyboard LEDs via EV_LED, was not implemented. This commit adds the > necessary plumbing to connect the status virtio queue to the host evdev > file descriptor. > > Signed-off-by: Ladi Prosek > --- > Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't > believe that this will be happening in practise for local evdev devices. Agree. > Of course, all bets are off if fd is an arbitrary file, It can't be, it would fail the evdev ioctls. > with the event stream). So part of me would just do one best-effort > write and ignore all errors. I'm curious what you think. Just do best-effort sounds reasonable to me. The code to handle EGAIN will just bitrot b/c it will never be used (and I guess it is untested too, or did you manage to provoke EGAIN somehow?) cheers, Gerd
Re: [Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes
On Fri, Apr 1, 2016 at 12:20 PM, Gerd Hoffmann wrote: > On Fr, 2016-04-01 at 11:31 +0200, Ladi Prosek wrote: >> The write path for pass-through devices, commonly used for controlling >> keyboard LEDs via EV_LED, was not implemented. This commit adds the >> necessary plumbing to connect the status virtio queue to the host evdev >> file descriptor. >> >> Signed-off-by: Ladi Prosek >> --- >> Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't >> believe that this will be happening in practise for local evdev devices. > > Agree. > >> Of course, all bets are off if fd is an arbitrary file, > > It can't be, it would fail the evdev ioctls. Ah! >> with the event stream). So part of me would just do one best-effort >> write and ignore all errors. I'm curious what you think. > > Just do best-effort sounds reasonable to me. Will do, thanks. > The code to handle EGAIN will just bitrot b/c it will never be used (and > I guess it is untested too, or did you manage to provoke EGAIN somehow?) if (rand() & 1) { return EAGAIN; } > cheers, > Gerd >
Re: [Qemu-devel] [PATCH] target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()
On 01/04/2016 05:52, David Gibson wrote: > This seems like the right minimal fix in the qemu-2.6 timeframe to fix > the actual bug. However, longer term it seems like the correct thing > to do might be to set kvm_vcpu_dirty early in the reset path. Thoughts? Isn't it done already? vl.c does: pause_all_vcpus(); cpu_synchronize_all_states(); qemu_system_reset(VMRESET_REPORT); resume_all_vcpus(); Thanks, Paolo > diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c > index 72c4ab5..caf41ce 100644 > --- a/target-ppc/mmu-hash64.c > +++ b/target-ppc/mmu-hash64.c > @@ -283,8 +283,6 @@ void ppc_hash64_set_external_hpt(PowerPCCPU *cpu, void > *hpt, int shift, > CPUPPCState *env = &cpu->env; > Error *local_err = NULL; > > -cpu_synchronize_state(CPU(cpu)); > - > if (hpt) { > env->external_htab = hpt; > } else {
Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property
On 2016/4/1 17:10, Wen Congyang wrote: On 04/01/2016 04:24 PM, Hailiang Zhang wrote: On 2016/4/1 15:39, Jason Wang wrote: On 04/01/2016 03:08 PM, zhanghailiang wrote: After commit 338d3f, we support 'status' property for filter object. The segfault can be triggered by starting qemu with 'status=off' property for filter, when the s->incoming_queue is NULL, we reference it directly in qemu_net_queue_flush(). Let's check the value of 's->incoming_queue' before calling qemu_net_queue_flush(). Signed-off-by: zhanghailiang --- net/filter-buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/filter-buffer.c b/net/filter-buffer.c index cc6bd94..79e2ce3 100644 --- a/net/filter-buffer.c +++ b/net/filter-buffer.c @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf) { FilterBufferState *s = FILTER_BUFFER(nf); -if (!qemu_net_queue_flush(s->incoming_queue)) { +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) { /* Unable to empty the queue, purge remaining packets */ qemu_net_queue_purge(s->incoming_queue, nf->netdev); } We'd better handle this at generic layer and don't let a specific net filter need to worry about this. Looks like the issue is we may trigger status_changed() too early (even before the the filter was initialized). Yes ~ How about not call status_changed() if the initialization is not done? But seems that it is difficult to confirm if the filter is initialized or not ... If nfc->setup() is not called, nf->netdev is NULL. Yes, you right, Jason, what's opinion ? Thanks, hailiang Thanks Wen Congyang . .
Re: [Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist
On 01/04/2016 11:50, Greg Kurz wrote: > > And I tend to agree with Cedric... unless Paolo has a bright idea of course. > :) > > Acked-by: Greg Kurz Yes, the patch is correct. Paolo
[Qemu-devel] [PATCH] doc/proto.md: restore formatting
Restore formatting and correct name of 'length' Signed-off-by: Alex Bligh --- doc/proto.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 8c83382..03dfe2b 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -501,8 +501,8 @@ The following request types exist: extension was negotiated. If structured replies were not negotiated, the server MUST reply -with a simple reply header, followed immediately by len bytes of -data, read from offset bytes into the file, unless an error +with a simple reply header, followed immediately by *length* bytes +of data, read from *offset* bytes into the file, unless an error condition has occurred. If an error occurs, the server SHOULD set the appropriate error -- 1.9.1
Re: [Qemu-devel] [RFC PATCH v2.1 05/12] qdev: hotplug: Introduce HotplugHandler.pre_plug() callback
On 01/04/2016 05:30, David Gibson wrote: > On Thu, Mar 31, 2016 at 02:09:14PM +0530, Bharata B Rao wrote: >> From: Igor Mammedov >> >> pre_plug callback is to be called before device.realize() is executed. >> This would allow to check/set device's properties from HotplugHandler. >> >> Signed-off-by: Igor Mammedov >> Signed-off-by: Bharata B Rao > > Reviewed-by: David Gibson > > It would be really nice to get some opinion on this from Andreas or > Paolo. Certainly okay for me, Igor did all of the HotplugHandler design and work. Paolo >> --- >> hw/core/hotplug.c| 11 +++ >> hw/core/qdev.c | 9 - >> include/hw/hotplug.h | 14 +- >> 3 files changed, 32 insertions(+), 2 deletions(-) >> >> diff --git a/hw/core/hotplug.c b/hw/core/hotplug.c >> index 645cfca..17ac986 100644 >> --- a/hw/core/hotplug.c >> +++ b/hw/core/hotplug.c >> @@ -13,6 +13,17 @@ >> #include "hw/hotplug.h" >> #include "qemu/module.h" >> >> +void hotplug_handler_pre_plug(HotplugHandler *plug_handler, >> + DeviceState *plugged_dev, >> + Error **errp) >> +{ >> +HotplugHandlerClass *hdc = HOTPLUG_HANDLER_GET_CLASS(plug_handler); >> + >> +if (hdc->pre_plug) { >> +hdc->pre_plug(plug_handler, plugged_dev, errp); >> +} >> +} >> + >> void hotplug_handler_plug(HotplugHandler *plug_handler, >>DeviceState *plugged_dev, >>Error **errp) >> diff --git a/hw/core/qdev.c b/hw/core/qdev.c >> index db41aa1..a0b3aad 100644 >> --- a/hw/core/qdev.c >> +++ b/hw/core/qdev.c >> @@ -1062,6 +1062,14 @@ static void device_set_realized(Object *obj, bool >> value, Error **errp) >> g_free(name); >> } >> >> +hotplug_ctrl = qdev_get_hotplug_handler(dev); >> +if (hotplug_ctrl) { >> +hotplug_handler_pre_plug(hotplug_ctrl, dev, &local_err); >> +if (local_err != NULL) { >> +goto fail; >> +} >> +} >> + >> if (dc->realize) { >> dc->realize(dev, &local_err); >> } >> @@ -1072,7 +1080,6 @@ static void device_set_realized(Object *obj, bool >> value, Error **errp) >> >> DEVICE_LISTENER_CALL(realize, Forward, dev); >> >> -hotplug_ctrl = qdev_get_hotplug_handler(dev); >> if (hotplug_ctrl) { >> hotplug_handler_plug(hotplug_ctrl, dev, &local_err); >> } >> diff --git a/include/hw/hotplug.h b/include/hw/hotplug.h >> index 2db025d..50d84e9 100644 >> --- a/include/hw/hotplug.h >> +++ b/include/hw/hotplug.h >> @@ -46,7 +46,8 @@ typedef void (*hotplug_fn)(HotplugHandler *plug_handler, >> * hardware (un)plug functions. >> * >> * @parent: Opaque parent interface. >> - * @plug: plug callback. >> + * @pre_plug: pre plug callback called at start of device.realize(true) >> + * @plug: plug callback called at end of device.realize(true). >> * @unplug_request: unplug request callback. >> * Used as a means to initiate device unplug for devices >> that >> * require asynchronous unplug handling. >> @@ -59,6 +60,7 @@ typedef struct HotplugHandlerClass { >> InterfaceClass parent; >> >> /* */ >> +hotplug_fn pre_plug; >> hotplug_fn plug; >> hotplug_fn unplug_request; >> hotplug_fn unplug; >> @@ -74,6 +76,16 @@ void hotplug_handler_plug(HotplugHandler *plug_handler, >>Error **errp); >> >> /** >> + * hotplug_handler_pre_plug: >> + * >> + * Call #HotplugHandlerClass.pre_plug callback of @plug_handler. >> + */ >> +void hotplug_handler_pre_plug(HotplugHandler *plug_handler, >> + DeviceState *plugged_dev, >> + Error **errp); >> + >> + >> +/** >> * hotplug_handler_unplug_request: >> * >> * Calls #HotplugHandlerClass.unplug_request callback of @plug_handler. > signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH] doc/memory: update MMIO section
There is no memory_region_io(). And remove a stray '-'. Signed-off-by: Cao jin --- docs/memory.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/memory.txt b/docs/memory.txt index 97134e1..fddc0d9 100644 --- a/docs/memory.txt +++ b/docs/memory.txt @@ -37,7 +37,7 @@ MemoryRegion): - MMIO: a range of guest memory that is implemented by host callbacks; each read or write causes a callback to be called on the host. - You initialize these with memory_region_io(), passing it a MemoryRegionOps + You initialize these with memory_region_init_io(), passing it a MemoryRegionOps structure describing the callbacks. - ROM: a ROM memory region works like RAM for reads (directly accessing @@ -186,7 +186,7 @@ of its own subregions: D of size 0x1000 at offset 0 and E of size 0x1000 at offset 0x2000. As a diagram: 0 1000 2000 3000 4000 5000 6000 70008000 -|--|--|--|--|--|--|--|---| +|--|--|--|--|--|--|--|--| A:[ ] C:[] B: [ ] -- 2.1.0
[Qemu-devel] [PATCHv2 2/2] Correct definition of NBD_CMD_FLAG_FUA
NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but 1<<16 in nbd.h. The code currently treats the command as a 32 bit quantity and masks this off. This is confusing. Until such time as the code is fixed up, make it obvious this isn't really bit 16. Signed-off-by: Alex Bligh --- nbd.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/nbd.h b/nbd.h index f2a32dd..732c605 100644 --- a/nbd.h +++ b/nbd.h @@ -38,7 +38,8 @@ enum { }; #define NBD_CMD_MASK_COMMAND 0x -#define NBD_CMD_FLAG_FUA (1<<16) +#define NBD_CMD_SHIFT (16) +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT) /* values for flags field */ #define NBD_FLAG_HAS_FLAGS (1 << 0)/* Flags are there */ -- 1.9.1
[Qemu-devel] [PATCHv2 1/2] proto.md: Clearly set out NBDMAGIC is the actual value
Clearly set out NBDMAGIC, not the name of a constant equal to some value. Set out the value in hex as well. Document the newstyle magic number is "IHAVEOPT". Signed-off-by: Alex Bligh --- doc/proto.md | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index 03dfe2b..8376021 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -67,7 +67,8 @@ newstyle negotiation. Oldstyle negotiation -S: 64 bits, `NBDMAGIC` (also known as the `INIT_PASSWD`) +S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (also known as + the `INIT_PASSWD`) S: 64 bits, `0x00420281861253` (`cliserv_magic`, a magic number) S: 64 bits, size of the export in bytes (unsigned) S: 32 bits, flags @@ -96,8 +97,10 @@ production purposes. The initial few exchanges in newstyle negotiation look as follows: -S: 64 bits, `NBDMAGIC` (as in the old style handshake) -S: 64 bits, `0x49484156454F5054` (note different magic number) +S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (as in the old + style handshake) +S: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note different + magic number) S: 16 bits, handshake flags C: 32 bits, flags @@ -113,7 +116,8 @@ At this point, we move on to option haggling, during which point the client can send one or (in fixed newstyle) more options to the server. The generic format of setting an option is as follows: -C: 64 bits, `0x49484156454F5054` (note same newstyle handshake's magic number) +C: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note same + newstyle handshake's magic number) C: 32 bits, option C: 32 bits, length of option data (unsigned) C: any data needed for the chosen option, of length as specified above. -- 1.9.1
Re: [Qemu-devel] [PATCH 3/3] nbd: Reject unknown request flags
On 31/03/2016 23:20, Eric Blake wrote: > The NBD protocol says that clients should not send a command flag > that has not been negotiated (whether by the client requesting an > option during a handshake, or because we advertise support for the > flag in response to NBD_OPT_EXPORT_NAME), and that servers should > reject invalid flags with EINVAL. We were silently ignoring the > flags instead. The client can't rely on our behavior, since it is > their fault for passing the bad flag in the first place, but it's > better to be robust up front than to possibly behave differently > than the client was expecting with the attempted flag. > > Signed-off-by: Eric Blake > --- > nbd/server.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index a590773..31bd9c5 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -974,6 +974,10 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, > struct nbd_request *reque > goto out; > } > > +if (request->flags & ~NBD_CMD_FLAG_FUA) { > +LOG("unsupported flags (got 0x%x)", request->flags); > +return -EINVAL; > +} > if ((request->from + request->len) < request->from) { > LOG("integer overflow detected! " > "you're probably being attacked"); > Queued for 2.6. Paolo
Re: [Qemu-devel] [PATCH v2 0/2] Fixing non-blocking operation of chardevs
On 31/03/2016 17:29, Daniel P. Berrange wrote: > This fixes socket chardevs to always be in non-blocking > mode, as they were before the QIOChannel conversion. The > second patch was already posted before, but dropped when > Peter discovered a problem on OS-X causing ahci-test to > hang: > > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg05807.html > > I traced this down to broken EAGAIN handling affecting > OS-X, hence the first patch in this series. > > Changed in v2: > > - Also fix qemu_chr_fe_write_log method > > Daniel P. Berrange (2): > char: fix broken EAGAIN retry on OS-X due to errno clobbering > char: ensure all clients are in non-blocking mode > > qemu-char.c | 39 --- > 1 file changed, 20 insertions(+), 19 deletions(-) > Thanks, queued for the next pull request (next week). Paolo
Re: [Qemu-devel] [PATCH 2/3] nbd: Fix poor debug message
On 31/03/2016 23:20, Eric Blake wrote: > The client sends messages to the server, not itself. > > Signed-off-by: Eric Blake > --- > nbd/client.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/nbd/client.c b/nbd/client.c > index 5af9f26..d95ad7a 100644 > --- a/nbd/client.c > +++ b/nbd/client.c > @@ -635,7 +635,7 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct > nbd_request *request) > cpu_to_be64w((uint64_t *)(buf + 16), request->from); > cpu_to_be32w((uint32_t *)(buf + 24), request->len); > > -TRACE("Sending request to client: " > +TRACE("Sending request to server: " >"{ .from = %" PRIu64", .len = %u, .handle = %" PRIu64 >", .flags=%x, .type=%i}", >request->from, request->len, request->handle, request->flags, > (Rebased and) queued for 2.6. Paolo
Re: [Qemu-devel] [PATCH] doc/memory: update MMIO section
On 01/04/2016 12:47, Cao jin wrote: > There is no memory_region_io(). And remove a stray '-'. > > Signed-off-by: Cao jin > --- > docs/memory.txt | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/docs/memory.txt b/docs/memory.txt > index 97134e1..fddc0d9 100644 > --- a/docs/memory.txt > +++ b/docs/memory.txt > @@ -37,7 +37,7 @@ MemoryRegion): > > - MMIO: a range of guest memory that is implemented by host callbacks; >each read or write causes a callback to be called on the host. > - You initialize these with memory_region_io(), passing it a MemoryRegionOps > + You initialize these with memory_region_init_io(), passing it a > MemoryRegionOps >structure describing the callbacks. > > - ROM: a ROM memory region works like RAM for reads (directly accessing > @@ -186,7 +186,7 @@ of its own subregions: D of size 0x1000 at offset 0 and E > of size 0x1000 at > offset 0x2000. As a diagram: > > 0 1000 2000 3000 4000 5000 6000 70008000 > -|--|--|--|--|--|--|--|---| > +|--|--|--|--|--|--|--|--| >A:[ ] >C:[] >B: [ ] > Queued for 2.6, thanks. Paolo
Re: [Qemu-devel] Ballooning on TPS!=HPS hosts
CC'ing virtualization list. On (Thu) 31 Mar 2016 [19:00:24], Dr. David Alan Gilbert wrote: > Hi, > I was reading the balloon code and am confused as to how/if ballooning > works on hosts where the host page size is larger than the > target page size. > > static void balloon_page(void *addr, int deflate) > { > #if defined(__linux__) > if (!qemu_balloon_is_inhibited() && (!kvm_enabled() || > kvm_has_sync_mmu())) { > qemu_madvise(addr, TARGET_PAGE_SIZE, > deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED); > } > #endif > } > > The virtio-balloon code only does stuff through ballon_page, > and an madvise DONTNEED should fail if you try and do it on > a size smaller than the host page size. So does ballooning work on > Power/ARM? > > Am I misunderstanding this? I think you're right. Guess no one's tested this in such scenarios yet. > Of course looking at the above we won't actually generate an error since > we don't check the return of qemu_madvise. ... at least we can deflate the balloon in case the madvise fails, so the guest can use the pages it's given us. > We have three sizes: > a) host page size > b) target page size > c) VIRTIO_BALLOON_PFN_SHIFT > > c == 12 (4k) for everyone > > > 1) I think the virtio-balloon code needs to coallesce adjecent requests > and call balloon_page on whole chunks at once passing a length. > 2) why does balloon_page use TARGET_PAGE_SIZE, ignoring anything else >shouldn't it be 1 << VIRTIO_BALLOON_PFN_SHIFT ? > 3) I'm guessing the guest kernel doesn't know the host page size, so >how can it know what size chunks of balloon to work in? Thanks, Amit
Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by skipping free pages
On (Tue) 22 Mar 2016 [19:05:31], Dr. David Alan Gilbert wrote: > * Liang Li (liang.z...@intel.com) wrote: > > b. Implement a new virtio device > > Implementing a brand new virtio device to exchange information > > between host and guest is another choice. It requires modifying the > > virtio spec too. > > If the right solution is to change the spec then we should do it; > we shouldn't use a technically worse solution just to avoid the spec > change; although we have to be even more careful to get the right > solution if we want to change the spec. Yes, absolutely. I didn't mean to suggest virtio-serial because it'll help us save with modifying specs. I suggested it because for me, the most important point was that if a guest is sensitive to latencies or spikes, the balloon driver is definitely going to be not installed by the guest admin. (I'm still not caught up on this thread, but wanted to clarify this right away.) Amit
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fr, 2016-04-01 at 11:17 +0100, Richard W.M. Jones wrote: > On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote: > > Hi, > > > > > I wonder how we can make use of this in qemu and downstream distros? > > > Can we have a bios-min.bin which is used with -kernel boots? > > > > We already build two seabios roms: one full featued and one slightly > > stripped down to keep it below 128k, for backward compatibility with old > > machine types. > > > > Adding a third config for -kernel boot should be easy. For that use > > case we can probably also turn on seabios logging to the serial console > > and drop sgabios. We don't need input (no boot menu) and we also don't > > need to hook into int10 (no grub/ipxe using that for output). > > SeaBIOS logging is slow (or more likely, serial output is slow). > And sgabios is useful for debugging. Ah, I see. debuglevel=1 prints too much and debuglevel=0 has no logging at all. We'd need seabios log at least the version banner to serial even with debuglevel=0 to replace sgabios. https://www.kraxel.org/cgit/qemu/log/?h=work/stripped-seabios Not hooked into -kernel yet, so you have to use "-bios bios-kboot.bin". Comments? cheers, Gerd
Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.
On 3/29/2016 5:58 PM, Michael S. Tsirkin wrote: > On Mon, Mar 28, 2016 at 09:46:05AM +0530, Jitendra Kolhe wrote: >> While measuring live migration performance for qemu/kvm guest, it >> was observed that the qemu doesn’t maintain any intelligence for the >> guest ram pages which are released by the guest balloon driver and >> treat such pages as any other normal guest ram pages. This has direct >> impact on overall migration time for the guest which has released >> (ballooned out) memory to the host. >> >> In case of large systems, where we can configure large guests with 1TB >> and with considerable amount of memory release by balloon driver to the, >> host the migration time gets worse. >> >> The solution proposed below is local only to qemu (and does not require >> any modification to Linux kernel or any guest driver). We have verified >> the fix for large guests =1TB on HPE Superdome X (which can support up >> to 240 cores and 12TB of memory) and in case where 90% of memory is >> released by balloon driver the migration time for an idle guests reduces >> to ~600 sec's from ~1200 sec’s. >> >> Detail: During live migration, as part of 1st iteration in ram_save_iterate() >> -> ram_find_and_save_block () will try to migrate ram pages which are >> released by vitrio-balloon driver as part of dynamic memory delete. >> Even though the pages which are returned to the host by virtio-balloon >> driver are zero pages, the migration algorithm will still end up >> scanning the entire page ram_find_and_save_block() -> ram_save_page/ >> ram_save_compressed_page -> save_zero_page() -> is_zero_range(). We >> also end-up sending some control information over network for these >> page during migration. This adds to total migration time. >> >> The proposed fix, uses the existing bitmap infrastructure to create >> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram >> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents >> entire guest ram memory till max configured memory. Guest ram pages >> claimed by the virtio-balloon driver will be represented by 1 in the >> bitmap. During live migration, each guest ram page (host VA offset) >> is checked against the virtio-balloon bitmap, if the bit is set the >> corresponding ram page will be excluded from scanning and sending >> control information during migration. The bitmap is also migrated to >> the target as part of every ram_save_iterate loop and after the >> guest is stopped remaining balloon bitmap is migrated as part of >> balloon driver save / load interface. > > Migrating the bitmap might help a chained migration case > but will slow down the more typical case a bit. > Make it optional? > > Disabling migration of balloon bitmap would disable the optimization permanently for all other subsequent migrations. The current implementation sends offsets and lengths of 1s in the bitmap. In cases where we see highly fragmented bitmap which may add to the bitmap migration time, but still it would be less compared to total migration time. Here are some values from my test setup for an idle guest with 16GB memory ballooned down to 4GB memory (with no ballooning operation in progress during migration). - Total migration time without proposed patch set - ~7600ms - Total migration time with proposed patch set - ~5700ms . Adds overhead of testing against bitmap - ~320ms . Adds overhead of sending the bitmap - ~1.6ms - Saves zero page scan + protocol time - ~2348ms We may see some higher pct for time taken to migrate bitmap in case ballooning is in progress during migration. The patch does introduces an option to enable/disable testing ram_addr against balloon bitmap during migration (since that’s the major overhead), but we still end-up migrating the bitmap by default. >> >> With the proposed fix, the average migration time for an idle guest >> with 1TB maximum memory and 64vCpus >> - reduces from ~1200 secs to ~600 sec, with guest memory ballooned >>down to 128GB (~10% of 1TB). >> - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned >>down to 896GB (~90% of 1TB), >> - with no ballooning configured, we don’t expect to see any impact >>on total migration time. >> >> The optimization gets temporarily disabled, if the balloon operation is >> in progress. Since the optimization skips scanning and migrating control >> information for ballooned out pages, we might skip guest ram pages in >> cases where the guest balloon driver has freed the ram page to the guest >> but not yet informed the host/qemu about the ram page >> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we >> might skip migrating ram pages which the guest is using. Since this >> problem is specific to balloon leak, we can restrict balloon operation in >> progress check to only balloon leak operation in progress check. >> >> The optimization also get permanently disabled (for all subsequent >> migrations) in case any of th
Re: [Qemu-devel] tcg: reworking tb_invalidated_flag
Sergey Fedorov writes: > On 31/03/16 16:37, Alex Bennée wrote: >> Sergey Fedorov writes: >>> Looks like no matter which approach we use, it's ultimately necessary to >>> ensure all CPUs have exited from translated code before the translation >>> buffer may be safely flushed. >> One approach would be to have multiple translation contexts with their >> own buffers and then you can safely flush TBs if no vCPUs are currently >> executing in those regions. But I suspect that is a much more complex >> future optimisation. > > Yes, this is much more complex and its performance impact should be > investigated. > >> Having said that is it safe to flush TBs from a given page if we know >> no vCPUs are currently executing in that page? As the execution loop has >> to exit the chained TBs as we cross page boundaries we could just keep >> account of which vCPUs are currently in which page. > > It should be safe to invalidate a TB while some other CPU is executing > its translated code. But it should be guaranteed that no CPU execute any > old TB after tb_flush() since we're going to start reusing those TBs. > > I see how TB cannot be patched if it spans two pages, is there any on > when TCG goto_tb can be generated? Do you mean tcg_gen_goto_tb? AFAIUI all blocks end with goto_tb post-ambles but they should only directly jump to another TB if they are in the same page. -- Alex Bennée
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On Fri, Apr 01, 2016 at 01:07:55PM +0200, Gerd Hoffmann wrote: > On Fr, 2016-04-01 at 11:17 +0100, Richard W.M. Jones wrote: > > On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote: > > > Hi, > > > > > > > I wonder how we can make use of this in qemu and downstream distros? > > > > Can we have a bios-min.bin which is used with -kernel boots? > > > > > > We already build two seabios roms: one full featued and one slightly > > > stripped down to keep it below 128k, for backward compatibility with old > > > machine types. > > > > > > Adding a third config for -kernel boot should be easy. For that use > > > case we can probably also turn on seabios logging to the serial console > > > and drop sgabios. We don't need input (no boot menu) and we also don't > > > need to hook into int10 (no grub/ipxe using that for output). > > > > SeaBIOS logging is slow (or more likely, serial output is slow). > > And sgabios is useful for debugging. > > Ah, I see. debuglevel=1 prints too much and debuglevel=0 has no logging > at all. We'd need seabios log at least the version banner to serial > even with debuglevel=0 to replace sgabios. > > https://www.kraxel.org/cgit/qemu/log/?h=work/stripped-seabios > > Not hooked into -kernel yet, so you have to use "-bios bios-kboot.bin". > > Comments? I think we were working on the same thing ... Attached is my version. Note that you must enable at least CONFIG_MPTABLE else virtio-scsi does not work in the guest. I also enabled ACPI & SMBIOS & PIRTABLE. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org >From 2c9b45d9d03c05573e1dbfc35e6750b127896be6 Mon Sep 17 00:00:00 2001 From: "Richard W.M. Jones" Date: Fri, 1 Apr 2016 11:35:09 +0100 Subject: [PATCH] bios: Add fast variant of SeaBIOS for use with -kernel on x86. This commit adds a fast variant of SeaBIOS called 'bios-fast.bin'. It's designed to be the fastest (also the smallest, but that's not the main aim) SeaBIOS that is just enough to boot a Linux kernel using the -kernel option on i686 and x86_64. This commit does not modify the -kernel option to use this. You have to specify it by doing something like this: -kernel vmlinuz -bios bios-fast.bin Signed-off-by: Richard W.M. Jones --- Makefile | 3 ++- roms/Makefile| 4 +++- roms/config.seabios-fast | 12 3 files changed, 17 insertions(+), 2 deletions(-) create mode 100644 roms/config.seabios-fast diff --git a/Makefile b/Makefile index 1d076a9..c4e939d 100644 --- a/Makefile +++ b/Makefile @@ -389,7 +389,8 @@ common de-ch es fo fr-ca hu ja mk nl-be pt sl tr \ bepocz ifdef INSTALL_BLOBS -BLOBS=bios.bin bios-256k.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \ +BLOBS=bios.bin bios-256k.bin bios-fast.bin \ +sgabios.bin vgabios.bin vgabios-cirrus.bin \ vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin vgabios-virtio.bin \ acpi-dsdt.aml \ ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin QEMU,cgthree.bin \ diff --git a/roms/Makefile b/roms/Makefile index 7bd1252..26b0586 100644 --- a/roms/Makefile +++ b/roms/Makefile @@ -61,9 +61,11 @@ default: @echo " slof -- update slof.bin" @echo " u-boot.e500-- update u-boot.e500" -bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k +bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k \ + build-seabios-config-seabios-fast cp seabios/builds/seabios-128k/bios.bin ../pc-bios/bios.bin cp seabios/builds/seabios-256k/bios.bin ../pc-bios/bios-256k.bin + cp seabios/builds/seabios-fast/bios.bin ../pc-bios/bios-fast.bin seavgabios: $(patsubst %,seavgabios-%,$(vgabios_variants)) diff --git a/roms/config.seabios-fast b/roms/config.seabios-fast new file mode 100644 index 000..98a4c27 --- /dev/null +++ b/roms/config.seabios-fast @@ -0,0 +1,12 @@ +# The most fastest SeaBIOS that can boot Linux using -kernel. +CONFIG_USB=n +CONFIG_DRIVES=n +CONFIG_KEYBOARD=n +CONFIG_MOUSE=n +CONFIG_WRITABLE_UPPERMEMORY=y +CONFIG_TCGBIOS=n +CONFIG_PIRTABLE=y +CONFIG_MPTABLE=y +CONFIG_SMBIOS=y +CONFIG_ACPI=y +CONFIG_DEBUG_LEVEL=0 -- 2.7.4
[Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait
From: Marc-André Lureau If the client socket has the 'reconnect' option, make sure the 'wait' option is also used. That way, an initial connection will be ensured before the VM start and the virtio device is configured. Signed-off-by: Marc-André Lureau --- net/vhost-user.c | 12 1 file changed, 12 insertions(+) diff --git a/net/vhost-user.c b/net/vhost-user.c index 1b9e73a..9007d0b 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -27,6 +27,8 @@ typedef struct VhostUserState { typedef struct VhostUserChardevProps { bool is_socket; bool is_unix; +bool is_reconnect; +bool is_wait; } VhostUserChardevProps; VHostNetState *vhost_user_get_vhost_net(NetClientState *nc) @@ -239,6 +241,10 @@ static int net_vhost_chardev_opts(void *opaque, } else if (strcmp(name, "path") == 0) { props->is_unix = true; } else if (strcmp(name, "server") == 0) { +} else if (strcmp(name, "reconnect") == 0) { +props->is_reconnect = true; +} else if (strcmp(name, "wait") == 0) { +props->is_wait = true; } else { error_setg(errp, "vhost-user does not support a chardev with option %s=%s", @@ -271,6 +277,12 @@ static CharDriverState *net_vhost_parse_chardev( return NULL; } +if (props.is_reconnect && !props.is_wait) { +error_setg(errp, "chardev \"%s\" must also 'wait' with 'reconnect'", + opts->chardev); +return NULL; +} + qemu_chr_fe_claim_no_fail(chr); return chr; -- 2.5.5
[Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event
From: Marc-André Lureau This message is an information rather than an error. Use a trace event instead. This allows using reconnect in tests without extra error messages. Signed-off-by: Marc-André Lureau --- qemu-char.c | 4 ++-- trace-events | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 270819a..a944ee0 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -40,6 +40,7 @@ #include "io/channel-file.h" #include "io/channel-tls.h" #include "sysemu/replay.h" +#include "trace.h" #include @@ -2639,8 +2640,7 @@ static void check_report_connect_error(CharDriverState *chr, TCPCharDriver *s = chr->opaque; if (!s->connect_err_reported) { -error_report("Unable to connect character device %s: %s", - chr->label, error_get_pretty(err)); +trace_char_socket_reconnect_error(chr->label, error_get_pretty(err)); s->connect_err_reported = true; } qemu_chr_socket_restart_timer(chr); diff --git a/trace-events b/trace-events index 996a77f..19e0a05 100644 --- a/trace-events +++ b/trace-events @@ -689,6 +689,9 @@ grlib_apbuart_readl_unknown(uint64_t addr) "addr 0x%"PRIx64 leon3_set_irq(int intno) "Set CPU IRQ %d" leon3_reset_irq(int intno) "Reset CPU IRQ %d" +# qemu-char.c +char_socket_reconnect_error(const char *label, const char *err) "Unable to connect character device %s: %s" + # spice-qemu-char.c spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d" spice_vmc_read(int bytes, int len) "spice read %d of requested %d" -- 2.5.5
[Qemu-devel] [PATCH 01/18] tests: append i386 tests
From: Marc-André Lureau Do not overwrite x86-64 tests, re-enable vhost-user-test. Signed-off-by: Marc-André Lureau --- tests/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/Makefile b/tests/Makefile index 45b9048..9293d49 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -222,7 +222,7 @@ endif check-qtest-i386-y += tests/test-netfilter$(EXESUF) check-qtest-i386-y += tests/test-filter-mirror$(EXESUF) check-qtest-i386-y += tests/test-filter-redirector$(EXESUF) -check-qtest-x86_64-y = $(check-qtest-i386-y) +check-qtest-x86_64-y += $(check-qtest-i386-y) gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c gcov-files-x86_64-y = $(subst i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y)) check-qtest-mips-y = tests/endianness-test$(EXESUF) -- 2.5.5
[Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection
From: Marc-André Lureau Hi This is a follow-up on previous RFC allowing the slave to request a "managed" shutdown and reconnect later. A new optional communication socket is added for the slave to make request to the master (since vhost-user protocol isn't bidirectional) The initial connection must be made before the guest is started for the feature negotiation to complete. In "server" mode, qemu waits before starting the VM. However, in "client" mode with "reconnect", you have to specify "wait". This will wait for the initial connection before starting the VM (in contrast with the "nowait"+backend features proposed by Tetsuya [1]). In order to do a clean shutdown, the slave should flush all pending buffers so that after VHOST_SET_VRING_BASE, it is enough to resume. The guest is made aware of virtio-net disconnection thanks to VIRTIO_NET_S_LINK_UP status, which is reflected by a link-down on the nic. RFCv2: - rebased, added a few preliminary patches - fix a few mistakes in shutdown message recv/send - enforce "wait" when using "reconnect" - save & restore features & check backend compatibility - save & restore the ring state - add shutdown support to vhost-user-bridge Testing: - the vhost-user-test has a simple reconnect test. - vhost-user-bridge can be used to run test interactively: 1) Run a slirp/vlan in a background process: $ qemu -net none -net socket,vlan=0,udp=localhost:,localaddr=localhost: -net user,vlan=0 2) Start vubr (it'll use the slirp/vlan process above by default): $ tests/vhost-user-bridge 3) Start qemu with vhost-user / virtio-net: $ qemu ... -chardev socket,id=char0,path=/tmp/vubr.sock,reconnect=1,wait -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,netdev=mynet1 4) Play in the guest, interrupt (ctlr-c) vubr, check nic link status, restart vubr, etc.. [1] in a previous series "Add feature to start QEMU without vhost-user backend", Tetsuya Mukawa proposed to allow the vhost-user backend to disconnect and reconnect. However, Michael Tsirkin pointed out that you can't do that without extra care, because the guest and hypervisor don't know the slave ring manipulation state, there might be pending replies for example that could be lost, and suggested to reset the guest queues, but this requires kernel changes, and it may have to clear the ring and lose queued packets. He also introduced a new option to specify backend features on qemu command line to be able to boot the VM without waiting for the backend. I consider this a seperate enhancement. Marc-André Lureau (16): tests: append i386 tests char: lower reconnect error to trace event char: use a trace for when the char is waiting char: add wait support for reconnect vhost-user: check reconnect comes with wait vhost: add vhost_dev stop callback vhost-user: add vhost_user to hold the chr vhost-user: add slave-fd support vhost-user: add shutdown support vhost-user: disconnect on start failure vhost-net: do not crash if backend is not present vhost-net: save & restore vhost-user acked features vhost-net: save & restore vring enable state test: vubr check vring enable state test: start vhost-user reconnect test test: add shutdown support vubr test Tetsuya Mukawa (2): vhost-user: add ability to know vhost-user backend disconnection qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd docs/specs/vhost-user.txt | 38 ++ hw/net/vhost_net.c| 53 - hw/virtio/vhost-user.c| 104 +- include/hw/virtio/vhost.h | 4 + include/net/net.h | 1 + include/net/vhost-user.h | 1 + include/net/vhost_net.h | 3 + include/sysemu/char.h | 7 ++ net/vhost-user.c | 44 ++- qemu-char.c | 46 +--- tests/Makefile| 4 +- tests/vhost-user-bridge.c | 112 ++-- tests/vhost-user-test.c | 184 ++ trace-events | 4 + 14 files changed, 564 insertions(+), 41 deletions(-) -- 2.5.5
[Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting
From: Marc-André Lureau Similarly to the reconnect error, this is an information message rather than an error. Use a trace event instead. Signed-off-by: Marc-André Lureau --- qemu-char.c | 3 +-- trace-events | 1 + 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index a944ee0..8702931 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -4434,8 +4434,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, } s->listen_ioc = sioc; if (is_waitconnect) { -fprintf(stderr, "QEMU waiting for connection on: %s\n", -chr->filename); +trace_char_socket_waiting(chr->filename); tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr); } qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL); diff --git a/trace-events b/trace-events index 19e0a05..e59c0f5 100644 --- a/trace-events +++ b/trace-events @@ -691,6 +691,7 @@ leon3_reset_irq(int intno) "Reset CPU IRQ %d" # qemu-char.c char_socket_reconnect_error(const char *label, const char *err) "Unable to connect character device %s: %s" +char_socket_waiting(const char *filename) "QEMU waiting for connection on: %s" # spice-qemu-char.c spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d" -- 2.5.5
[Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback
From: Marc-André Lureau vhost backend may want to stop the device, for example if it wants to restart itself (translates to a link down for vhost-net). Signed-off-by: Marc-André Lureau --- hw/net/vhost_net.c| 14 ++ include/hw/virtio/vhost.h | 4 2 files changed, 18 insertions(+) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 6e1032f..1e4710d 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -131,6 +131,18 @@ static int vhost_net_get_fd(NetClientState *backend) } } +static void vhost_net_backend_stop(struct vhost_dev *dev) +{ +struct vhost_net *net = container_of(dev, struct vhost_net, dev); +NetClientState *nc = net->nc; +NetClientState *peer = nc->peer; + +peer->link_down = 1; +if (peer->info->link_status_changed) { +peer->info->link_status_changed(peer); +} +} + struct vhost_net *vhost_net_init(VhostNetOptions *options) { int r; @@ -165,6 +177,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) net->dev.vq_index = net->nc->queue_index * net->dev.nvqs; } +net->dev.stop = vhost_net_backend_stop; + r = vhost_dev_init(&net->dev, options->opaque, options->backend_type); if (r < 0) { diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index b60d758..859be64 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -35,6 +35,8 @@ struct vhost_log { vhost_log_chunk_t *log; }; +typedef void (*vhost_stop)(struct vhost_dev *dev); + struct vhost_memory; struct vhost_dev { MemoryListener memory_listener; @@ -61,6 +63,8 @@ struct vhost_dev { void *opaque; struct vhost_log *log; QLIST_ENTRY(vhost_dev) entry; +/* backend request to stop */ +vhost_stop stop; }; int vhost_dev_init(struct vhost_dev *hdev, void *opaque, -- 2.5.5
[Qemu-devel] [PATCH 04/18] char: add wait support for reconnect
From: Marc-André Lureau Until now, 'wait' was solely used for listening sockets. However, it can also be useful for 'reconnect' socket kind, where the first open must succeed before continuing. This allows for instance (with other support patches) having vhost-user wait for an initial connection to setup the vhost-net, before eventually accepting new connections. Signed-off-by: Marc-André Lureau --- qemu-char.c | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/qemu-char.c b/qemu-char.c index 8702931..3e25c08 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -3659,7 +3659,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend, Error **errp) { bool is_listen = qemu_opt_get_bool(opts, "server", false); -bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true); +bool is_wait= qemu_opt_get_bool(opts, "wait", is_listen); bool is_telnet = qemu_opt_get_bool(opts, "telnet", false); bool do_nodelay = !qemu_opt_get_bool(opts, "delay", true); int64_t reconnect = qemu_opt_get_number(opts, "reconnect", 0); @@ -3696,7 +3696,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend, sock->has_telnet = true; sock->telnet = is_telnet; sock->has_wait = true; -sock->wait = is_waitconnect; +sock->wait = is_wait; sock->has_reconnect = true; sock->reconnect = reconnect; sock->tls_creds = g_strdup(tls_creds); @@ -4350,7 +4350,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, bool do_nodelay = sock->has_nodelay ? sock->nodelay : false; bool is_listen = sock->has_server ? sock->server : true; bool is_telnet = sock->has_telnet ? sock->telnet : false; -bool is_waitconnect = sock->has_wait? sock->wait: false; +bool is_wait= sock->has_wait? sock->wait: false; int64_t reconnect = sock->has_reconnect ? sock->reconnect : 0; ChardevCommon *common = qapi_ChardevSocket_base(sock); QIOChannelSocket *sioc = NULL; @@ -4424,7 +4424,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, } sioc = qio_channel_socket_new(); -if (s->reconnect_time) { +if (s->reconnect_time && !is_wait) { qio_channel_socket_connect_async(sioc, s->addr, qemu_chr_socket_connected, chr, NULL); @@ -4433,7 +4433,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, goto error; } s->listen_ioc = sioc; -if (is_waitconnect) { +if (is_wait) { trace_char_socket_waiting(chr->filename); tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr); } @@ -4443,9 +4443,24 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, NULL); } } else { -if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) { -goto error; -} +do { +Error *err = NULL; + +if (qio_channel_socket_connect_sync(sioc, s->addr, &err) < 0) { +if (reconnect) { +trace_char_socket_reconnect_error(chr->label, + error_get_pretty(err)); +error_free(err); +continue; +} else { +error_propagate(errp, err); +goto error; +} +} else { +break; +} +} while (true); + tcp_chr_new_client(chr, sioc); object_unref(OBJECT(sioc)); } -- 2.5.5
[Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test
From: Marc-André Lureau Check that SLAVE_FD & SHUTDOWN message work and that the master asked for the ring read status. Signed-off-by: Marc-André Lureau --- tests/Makefile | 2 +- tests/vhost-user-test.c | 184 2 files changed, 170 insertions(+), 16 deletions(-) diff --git a/tests/Makefile b/tests/Makefile index 9293d49..8e4807e 100644 --- a/tests/Makefile +++ b/tests/Makefile @@ -577,7 +577,7 @@ tests/usb-hcd-uhci-test$(EXESUF): tests/usb-hcd-uhci-test.o $(libqos-usb-obj-y) tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y) tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y) tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o -tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) $(test-io-obj-y) +tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o qemu-timer.o $(qtest-obj-y) $(test-io-obj-y) $(libqos-pc-obj-y) $(libqos-virtio-obj-y) tests/qemu-iotests/socket_scm_helper$(EXESUF): tests/qemu-iotests/socket_scm_helper.o tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y) tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o $(test-block-obj-y) diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c index 6961596..17ebfd1 100644 --- a/tests/vhost-user-test.c +++ b/tests/vhost-user-test.c @@ -21,6 +21,12 @@ #include #include #include +#include "libqos/pci-pc.h" +#include "libqos/virtio.h" +#include "libqos/virtio-pci.h" +#include "libqos/malloc.h" +#include "libqos/malloc-pc.h" +#include "libqos/malloc-generic.h" /* GLIB version compatibility flags */ #if !GLIB_CHECK_VERSION(2, 26, 0) @@ -34,7 +40,7 @@ #define QEMU_CMD_ACCEL " -machine accel=tcg" #define QEMU_CMD_MEM" -m %d -object memory-backend-file,id=mem,size=%dM,"\ "mem-path=%s,share=on -numa node,memdev=mem" -#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s" +#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s" #define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce" #define QEMU_CMD_NET" -device virtio-net-pci,netdev=net0,romfile=./pc-bios/pxe-virtio.rom" @@ -49,6 +55,7 @@ #define VHOST_USER_F_PROTOCOL_FEATURES 30 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 +#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL 3 #define VHOST_LOG_PAGE 0x1000 @@ -71,9 +78,16 @@ typedef enum VhostUserRequest { VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, VHOST_USER_SET_VRING_ENABLE = 18, +VHOST_USER_SET_SLAVE_FD = 20, VHOST_USER_MAX } VhostUserRequest; +typedef enum VhostUserSlaveRequest { +VHOST_USER_SLAVE_NONE = 0, +VHOST_USER_SLAVE_SHUTDOWN = 1, +VHOST_USER_SLAVE_MAX +} VhostUserSlaveRequest; + typedef struct VhostUserMemoryRegion { uint64_t guest_phys_addr; uint64_t memory_size; @@ -119,6 +133,8 @@ static VhostUserMsg m __attribute__ ((unused)); /* The version of the protocol we support */ #define VHOST_USER_VERSION(0x1) + +#define VIRTIO_QUEUE_MAX 1024 /*/ typedef struct TestServer { @@ -133,6 +149,8 @@ typedef struct TestServer { GCond data_cond; int log_fd; uint64_t rings; +int slave_fd; +uint16_t set_base[VIRTIO_QUEUE_MAX]; } TestServer; #if !GLIB_CHECK_VERSION(2, 32, 0) @@ -269,7 +287,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) /* send back features to qemu */ msg.flags |= VHOST_USER_REPLY_MASK; msg.size = sizeof(m.payload.u64); -msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD; +msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD | +1 << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL; p = (uint8_t *) &msg; qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size); break; @@ -324,6 +343,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int size) case VHOST_USER_SET_VRING_BASE: assert(msg.payload.state.index < 2); s->rings |= 0x1ULL << msg.payload.state.index; +s->set_base[msg.payload.state.index] = msg.payload.state.num; +break; + +case VHOST_USER_SET_SLAVE_FD: +qemu_chr_fe_get_msgfds(chr, &s->slave_fd, 1); +g_cond_signal(&s->data_cond); break; default: @@ -360,7 +385,7 @@ static const char *init_hugepagefs(const char *path) return path; } -static TestServer *test_server_new(const gchar *name) +static TestServer *test_server_new(const gchar *name, const gchar *chr_opt) { TestServer *server = g_new0(TestServer, 1); gchar *chr_path; @@ -368,7 +393,8 @@ static TestServer *test_server_new(const gchar *name) server->socket_path = g_strdup_printf("%s/%s.sock", tmpfs, name); server->mig_path = g_strdup_printf("%s/%s.mig", tmpfs, name); -chr_p
[Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
From: Tetsuya Mukawa The patch introduces qemu_chr_disconnect(). The function is used for closing a fd accepted by listen fd. Though we already have qemu_chr_delete(), but it closes not only accepted fd but also listen fd. This new function is used when we still want to keep listen fd. Signed-off-by: Tetsuya Mukawa Reviewed-by: Marc-André Lureau --- include/sysemu/char.h | 7 +++ qemu-char.c | 8 2 files changed, 15 insertions(+) diff --git a/include/sysemu/char.h b/include/sysemu/char.h index 307fd8f..2c39987 100644 --- a/include/sysemu/char.h +++ b/include/sysemu/char.h @@ -75,6 +75,7 @@ struct CharDriverState { IOReadHandler *chr_read; void *handler_opaque; void (*chr_close)(struct CharDriverState *chr); +void (*chr_disconnect)(struct CharDriverState *chr); void (*chr_accept_input)(struct CharDriverState *chr); void (*chr_set_echo)(struct CharDriverState *chr, bool echo); void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open); @@ -143,6 +144,12 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon *backend); */ CharDriverState *qemu_chr_new(const char *label, const char *filename, void (*init)(struct CharDriverState *s)); +/** + * @qemu_chr_disconnect: + * + * Close a fd accpeted by character backend. + */ +void qemu_chr_disconnect(CharDriverState *chr); /** * @qemu_chr_new_noreplay: diff --git a/qemu-char.c b/qemu-char.c index 3e25c08..e249b0a 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -4011,6 +4011,13 @@ void qemu_chr_fe_release(CharDriverState *s) s->avail_connections++; } +void qemu_chr_disconnect(CharDriverState *chr) +{ +if (chr->chr_disconnect) { +chr->chr_disconnect(chr); +} +} + static void qemu_chr_free_common(CharDriverState *chr) { g_free(chr->filename); @@ -4404,6 +4411,7 @@ static CharDriverState *qmp_chardev_open_socket(const char *id, chr->chr_write = tcp_chr_write; chr->chr_sync_read = tcp_chr_sync_read; chr->chr_close = tcp_chr_close; +chr->chr_disconnect = tcp_chr_disconnect; chr->get_msgfds = tcp_get_msgfds; chr->set_msgfds = tcp_set_msgfds; chr->chr_add_client = tcp_chr_add_client; -- 2.5.5
[Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection
From: Tetsuya Mukawa Current QEMU cannot detect vhost-user backend disconnection. The patch adds ability to know it. To know disconnection, add watcher to detect G_IO_HUP event. When G_IO_HUP event is detected, the disconnected socket will be read to cause a CHR_EVENT_CLOSED. Signed-off-by: Tetsuya Mukawa Reviewed-by: Marc-André Lureau --- net/vhost-user.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/net/vhost-user.c b/net/vhost-user.c index 9007d0b..3dae53c 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -22,6 +22,7 @@ typedef struct VhostUserState { NetClientState nc; CharDriverState *chr; VHostNetState *vhost_net; +int watch; } VhostUserState; typedef struct VhostUserChardevProps { @@ -169,6 +170,20 @@ static NetClientInfo net_vhost_user_info = { .has_ufo = vhost_user_has_ufo, }; +static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond, + void *opaque) +{ +VhostUserState *s = opaque; +uint8_t buf[1]; + +/* We don't actually want to read anything, but CHR_EVENT_CLOSED will be + * raised as a side-effect of the read. + */ +qemu_chr_fe_read_all(s->chr, buf, sizeof(buf)); + +return FALSE; +} + static void net_vhost_user_event(void *opaque, int event) { const char *name = opaque; @@ -186,6 +201,8 @@ static void net_vhost_user_event(void *opaque, int event) trace_vhost_user_event(s->chr->label, event); switch (event) { case CHR_EVENT_OPENED: +s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP, + net_vhost_user_watch, s); if (vhost_user_start(queues, ncs) < 0) { exit(1); } @@ -194,6 +211,8 @@ static void net_vhost_user_event(void *opaque, int event) case CHR_EVENT_CLOSED: qmp_set_link(name, false, &err); vhost_user_stop(queues, ncs); +g_source_remove(s->watch); +s->watch = 0; break; } -- 2.5.5
[Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr
From: Marc-André Lureau Next patches will add more fields to the structure Signed-off-by: Marc-André Lureau --- hw/virtio/vhost-user.c | 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 5914e85..02ac1fc 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -108,6 +108,10 @@ static VhostUserMsg m __attribute__ ((unused)); /* The version of the protocol we support */ #define VHOST_USER_VERSION(0x1) +struct vhost_user { +CharDriverState *chr; +}; + static bool ioeventfd_enabled(void) { return kvm_enabled() && kvm_eventfds_enabled(); @@ -115,7 +119,8 @@ static bool ioeventfd_enabled(void) static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg) { -CharDriverState *chr = dev->opaque; +struct vhost_user *u = dev->opaque; +CharDriverState *chr = u->chr; uint8_t *p = (uint8_t *) msg; int r, size = VHOST_USER_HDR_SIZE; @@ -176,7 +181,8 @@ static bool vhost_user_one_time_request(VhostUserRequest request) static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg, int *fds, int fd_num) { -CharDriverState *chr = dev->opaque; +struct vhost_user *u = dev->opaque; +CharDriverState *chr = u->chr; int size = VHOST_USER_HDR_SIZE + msg->size; /* @@ -510,11 +516,14 @@ static int vhost_user_reset_device(struct vhost_dev *dev) static int vhost_user_init(struct vhost_dev *dev, void *opaque) { uint64_t features; +struct vhost_user *u; int err; assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); -dev->opaque = opaque; +u = g_new0(struct vhost_user, 1); +u->chr = opaque; +dev->opaque = u; err = vhost_user_get_features(dev, &features); if (err < 0) { @@ -559,8 +568,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) static int vhost_user_cleanup(struct vhost_dev *dev) { +struct vhost_user *u; + assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER); +u = dev->opaque; +g_free(u); dev->opaque = 0; return 0; -- 2.5.5
[Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present
From: Marc-André Lureau Do not crash when backend is not present while enabling the ring. A following patch will save the enabled state so it can be restored once the backend is started. Signed-off-by: Marc-André Lureau --- hw/net/vhost_net.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 1e4710d..6c40c4e 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -415,8 +415,13 @@ VHostNetState *get_vhost_net(NetClientState *nc) int vhost_set_vring_enable(NetClientState *nc, int enable) { VHostNetState *net = get_vhost_net(nc); -const VhostOps *vhost_ops = net->dev.vhost_ops; +const VhostOps *vhost_ops; + +if (!net) { +return 0; +} +vhost_ops = net->dev.vhost_ops; if (vhost_ops->vhost_set_vring_enable) { return vhost_ops->vhost_set_vring_enable(&net->dev, enable); } -- 2.5.5
[Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support
From: Marc-André Lureau Learn to give a socket to the slave to let him make requests to the master. Signed-off-by: Marc-André Lureau --- docs/specs/vhost-user.txt | 23 hw/virtio/vhost-user.c| 69 +++ 2 files changed, 92 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 0312d40..8a635fa 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -248,12 +248,25 @@ Once the source has finished migration, rings will be stopped by the source. No further update must be done before rings are restarted. +Slave communication +--- + +An optional communication channel is provided if the slave +declares VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature, to allow +the slave to make requests to the master. + +The fd is provided via VHOST_USER_SET_SLAVE_FD ancillary data. + +A slave may then send VHOST_USER_SLAVE_* messages to the master +using this fd communication channel. + Protocol features - #define VHOST_USER_PROTOCOL_F_MQ 0 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1 #define VHOST_USER_PROTOCOL_F_RARP 2 +#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL 3 Message types - @@ -464,3 +477,13 @@ Message types is present in VHOST_USER_GET_PROTOCOL_FEATURES. The first 6 bytes of the payload contain the mac address of the guest to allow the vhost user backend to construct and broadcast the fake RARP. + + * VHOST_USER_SET_SLAVE_FD + Id: 20 + Equivalent ioctl: N/A + Master payload: N/A + + Set the file descriptor for the salve to make VHOST_USER_SLAVE_* + request to the master. It is passed in the ancillary data. + This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL + feature is available. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 02ac1fc..890c1bd 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -32,6 +32,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -59,9 +60,16 @@ typedef enum VhostUserRequest { VHOST_USER_GET_QUEUE_NUM = 17, VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_SEND_RARP = 19, +VHOST_USER_SET_SLAVE_FD = 20, + VHOST_USER_MAX } VhostUserRequest; +typedef enum VhostUserSlaveRequest { +VHOST_USER_SLAVE_NONE = 0, +VHOST_USER_SLAVE_MAX +} VhostUserSlaveRequest; + typedef struct VhostUserMemoryRegion { uint64_t guest_phys_addr; uint64_t memory_size; @@ -110,6 +118,7 @@ static VhostUserMsg m __attribute__ ((unused)); struct vhost_user { CharDriverState *chr; +int slave_fd; }; static bool ioeventfd_enabled(void) @@ -513,6 +522,55 @@ static int vhost_user_reset_device(struct vhost_dev *dev) return 0; } +static void slave_read(void *opaque) +{ +struct vhost_dev *dev = opaque; +struct vhost_user *u = dev->opaque; +VhostUserMsg msg = { 0, }; +int size; + +size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE); +if (size != VHOST_USER_HDR_SIZE) { +error_report("Failed to read from slave."); +qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL); +close(u->slave_fd); +u->slave_fd = -1; +return; +} + +switch (msg.request) { +default: +error_report("Received unexpected msg type."); +} +} + +static int vhost_setup_slave_channel(struct vhost_dev *dev) +{ +VhostUserMsg msg = { +.request = VHOST_USER_SET_SLAVE_FD, +.flags = VHOST_USER_VERSION, +}; +struct vhost_user *u = dev->opaque; +int sv[2]; + +if (!virtio_has_feature(dev->protocol_features, +VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL)) { +return 0; +} + +if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) { +error_report("socketpair() failed"); +return -1; +} + +u->slave_fd = sv[0]; +qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev); + +vhost_user_write(dev, &msg, &sv[1], 1); + +return 0; +} + static int vhost_user_init(struct vhost_dev *dev, void *opaque) { uint64_t features; @@ -523,6 +581,7 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) u = g_new0(struct vhost_user, 1); u->chr = opaque; +u->slave_fd = -1; dev->opaque = u; err = vhost_user_get_features(dev, &features); @@ -563,6 +622,12 @@ static int vhost_user_init(struct vhost_dev *dev, void *opaque) "VHOST_USER_PROTOCOL_F_LOG_SHMFD feature."); } + +err = vhost_setup_slave_channel(dev); +if (err < 0) { +return err; +} + return 0; } @@ -573,6 +638,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev) assert(dev->vhost_ops->backend_type == VHOST_BACKEN
[Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support
From: Marc-André Lureau Signed-off-by: Marc-André Lureau --- docs/specs/vhost-user.txt | 15 +++ hw/virtio/vhost-user.c| 16 2 files changed, 31 insertions(+) diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 8a635fa..60d6d13 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -487,3 +487,18 @@ Message types request to the master. It is passed in the ancillary data. This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature is available. + +Slave message types +--- + + * VHOST_USER_SLAVE_SHUTDOWN: + Id: 1 + Master payload: N/A + Slave payload: u64 + + Request the master to shutdown the slave. A 0 reply is for + success, in which case the slave may close all connections + immediately and quit. A non-zero reply cancels the request. + + Before a reply comes, the master may make other requests in + order to flush or sync state. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 890c1bd..f91baee 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -67,6 +67,8 @@ typedef enum VhostUserRequest { typedef enum VhostUserSlaveRequest { VHOST_USER_SLAVE_NONE = 0, +VHOST_USER_SLAVE_SHUTDOWN = 1, + VHOST_USER_SLAVE_MAX } VhostUserSlaveRequest; @@ -539,6 +541,20 @@ static void slave_read(void *opaque) } switch (msg.request) { +case VHOST_USER_SLAVE_SHUTDOWN: { +uint64_t success = 1; /* 0 is for success */ +if (dev->stop) { +dev->stop(dev); +success = 0; +} +msg.payload.u64 = success; +msg.size = sizeof(msg.payload.u64); +size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); +if (size != VHOST_USER_HDR_SIZE + msg.size) { +error_report("Failed to write reply."); +} +break; +} default: error_report("Received unexpected msg type."); } -- 2.5.5
[Qemu-devel] [PATCH 16/18] test: vubr check vring enable state
From: Marc-André Lureau The rings shouldn't be processed unless previously enabled. Signed-off-by: Marc-André Lureau --- tests/vhost-user-bridge.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c index 0779ba2..42450a6 100644 --- a/tests/vhost-user-bridge.c +++ b/tests/vhost-user-bridge.c @@ -710,7 +710,8 @@ vubr_backend_recv_cb(int sock, void *ctx) int buflen = sizeof(buf); int len; -if (!dev->ready) { +if (!dev->ready || !rx_vq->enable) { +DPRINT("\n NOT READY: dev: %d, rx: %d\n", dev->ready, rx_vq->enable); return; } @@ -747,8 +748,11 @@ vubr_kick_cb(int sock, void *ctx) if (rc == -1) { vubr_die("eventfd_read()"); } else { -DPRINT("Got kick_data: %016"PRIx64"\n", kick_data); -vubr_process_avail(dev, &dev->vq[1]); +DPRINT("Got kick_data: %016"PRIx64", vq enabled: %d\n", + kick_data, dev->vq[1].enable); +if (dev->vq[1].enable) { +vubr_process_avail(dev, &dev->vq[1]); +} } } -- 2.5.5
[Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state
From: Marc-André Lureau A driver may change the vring enable state at run time but vhost-user backend may not be present (a contrived example is when the backend is disconnected and the device is reconfigured after driver rebinding) Restore the vring state when the vhost-user backend is started, so it can process the ring. Signed-off-by: Marc-André Lureau --- hw/net/vhost_net.c | 11 +++ include/net/net.h | 1 + 2 files changed, 12 insertions(+) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index ee39e9c..a56a794 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -343,6 +343,15 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs, if (r < 0) { goto err_start; } + +if (ncs[i].peer->vring_enable) { +/* restore vring enable state */ +r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable); + +if (r < 0) { +goto err_start; +} +} } return 0; @@ -436,6 +445,8 @@ int vhost_set_vring_enable(NetClientState *nc, int enable) VHostNetState *net = get_vhost_net(nc); const VhostOps *vhost_ops; +nc->vring_enable = enable; + if (!net) { return 0; } diff --git a/include/net/net.h b/include/net/net.h index 73e4c46..2c4b23a 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -92,6 +92,7 @@ struct NetClientState { NetClientDestructor *destructor; unsigned int queue_index; unsigned rxfilter_notify_enabled:1; +int vring_enable; QTAILQ_HEAD(NetFilterHead, NetFilterState) filters; }; -- 2.5.5
[Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure
From: Marc-André Lureau If the backend failed to start (for example feature negociation failed), do not exit, but disconnect the char device instead. Signed-off-by: Marc-André Lureau --- net/vhost-user.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/vhost-user.c b/net/vhost-user.c index 3dae53c..54849e9 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -204,7 +204,8 @@ static void net_vhost_user_event(void *opaque, int event) s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP, net_vhost_user_watch, s); if (vhost_user_start(queues, ncs) < 0) { -exit(1); +qemu_chr_disconnect(s->chr); +return; } qmp_set_link(name, true, &err); break; -- 2.5.5
[Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features
From: Marc-André Lureau The initial vhost-user connection sets the features to be negotiated with the driver. Renegotiation isn't possible without device reset. To handle reconnection of vhost-user backend, ensure the same set of features are provided, and reuse already acked features. Signed-off-by: Marc-André Lureau --- hw/net/vhost_net.c | 21 - include/net/vhost-user.h | 1 + include/net/vhost_net.h | 3 +++ net/vhost-user.c | 10 ++ 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index 6c40c4e..ee39e9c 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -120,6 +120,11 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net) return net->dev.max_queues; } +uint64_t vhost_net_get_acked_features(VHostNetState *net) +{ +return net->dev.acked_features; +} + static int vhost_net_get_fd(NetClientState *backend) { switch (backend->info->type) { @@ -148,6 +153,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) int r; bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL; struct vhost_net *net = g_malloc(sizeof *net); +uint64_t features = 0; if (!options->net_backend) { fprintf(stderr, "vhost-net requires net backend to be setup\n"); @@ -197,8 +203,21 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) goto fail; } } + /* Set sane init value. Override when guest acks. */ -vhost_net_ack_features(net, 0); +if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) { +features = vhost_user_get_acked_features(net->nc); +if (~net->dev.features & features) { +fprintf(stderr, "vhost lacks feature mask %" PRIu64 +" for backend\n", +(uint64_t)(~net->dev.features & features)); +vhost_dev_cleanup(&net->dev); +goto fail; +} +} + +vhost_net_ack_features(net, features); + return net; fail: g_free(net); diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h index 85109f6..efae35d 100644 --- a/include/net/vhost-user.h +++ b/include/net/vhost-user.h @@ -13,5 +13,6 @@ struct vhost_net; struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc); +uint64_t vhost_user_get_acked_features(NetClientState *nc); #endif /* VHOST_USER_H_ */ diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h index 3389b41..0bd4877 100644 --- a/include/net/vhost_net.h +++ b/include/net/vhost_net.h @@ -31,4 +31,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* mac_addr); VHostNetState *get_vhost_net(NetClientState *nc); int vhost_set_vring_enable(NetClientState * nc, int enable); + +uint64_t vhost_net_get_acked_features(VHostNetState *net); + #endif diff --git a/net/vhost-user.c b/net/vhost-user.c index 54849e9..ae63e20 100644 --- a/net/vhost-user.c +++ b/net/vhost-user.c @@ -23,6 +23,7 @@ typedef struct VhostUserState { CharDriverState *chr; VHostNetState *vhost_net; int watch; +uint64_t acked_features; } VhostUserState; typedef struct VhostUserChardevProps { @@ -39,6 +40,13 @@ VHostNetState *vhost_user_get_vhost_net(NetClientState *nc) return s->vhost_net; } +uint64_t vhost_user_get_acked_features(NetClientState *nc) +{ +VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc); +assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER); +return s->acked_features; +} + static int vhost_user_running(VhostUserState *s) { return (s->vhost_net) ? 1 : 0; @@ -58,6 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[]) } if (s->vhost_net) { +/* save acked features */ +s->acked_features = vhost_net_get_acked_features(s->vhost_net); vhost_net_cleanup(s->vhost_net); s->vhost_net = NULL; } -- 2.5.5
[Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test
From: Marc-André Lureau The bridge can now be interrupted with ctrl-c. Once the slave channel is up, it will request a shutdown, and wait for success reply to exit. Signed-off-by: Marc-André Lureau --- tests/vhost-user-bridge.c | 102 -- 1 file changed, 99 insertions(+), 3 deletions(-) diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c index 42450a6..ea123be 100644 --- a/tests/vhost-user-bridge.c +++ b/tests/vhost-user-bridge.c @@ -135,6 +135,9 @@ dispatcher_wait(Dispatcher *dispr, uint32_t timeout) int rc = select(dispr->max_sock + 1, &fdset, 0, 0, &tv); if (rc == -1) { +if (errno == EINTR) { +return 0; +} vubr_die("select"); } @@ -186,6 +189,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_MQ = 0, VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1, VHOST_USER_PROTOCOL_F_RARP = 2, +VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3, VHOST_USER_PROTOCOL_F_MAX }; @@ -213,9 +217,16 @@ typedef enum VhostUserRequest { VHOST_USER_GET_QUEUE_NUM = 17, VHOST_USER_SET_VRING_ENABLE = 18, VHOST_USER_SEND_RARP = 19, +VHOST_USER_SET_SLAVE_FD = 20, VHOST_USER_MAX } VhostUserRequest; +typedef enum VhostUserSlaveRequest { +VHOST_USER_SLAVE_NONE = 0, +VHOST_USER_SLAVE_SHUTDOWN = 1, +VHOST_USER_SLAVE_MAX +} VhostUserSlaveRequest; + typedef struct VhostUserMemoryRegion { uint64_t guest_phys_addr; uint64_t memory_size; @@ -288,6 +299,8 @@ typedef struct VubrDev { int ready; uint64_t features; int hdrlen; +int slave_fd; +bool shutdown_requested; } VubrDev; static const char *vubr_request_str[] = { @@ -311,7 +324,14 @@ static const char *vubr_request_str[] = { [VHOST_USER_GET_QUEUE_NUM] = "VHOST_USER_GET_QUEUE_NUM", [VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE", [VHOST_USER_SEND_RARP] = "VHOST_USER_SEND_RARP", -[VHOST_USER_MAX]= "VHOST_USER_MAX", +[VHOST_USER_SET_SLAVE_FD] = "VHOST_USER_SET_SLAVE_FD", +[VHOST_USER_MAX]= "VHOST_USER_MAX" +}; + +static const char *vubr_slave_request_str[] = { +[VHOST_USER_SLAVE_NONE] = "VHOST_USER_SLAVE_NONE", +[VHOST_USER_SLAVE_SHUTDOWN] = "VHOST_USER_SLAVE_SHUTDOWN", +[VHOST_USER_SLAVE_MAX] = "VHOST_USER_SLAVE_MAX" }; static void @@ -638,7 +658,7 @@ vubr_process_desc(VubrDev *dev, VubrVirtq *vq) size_t buf_size = 4096; uint8_t buf[4096]; -DPRINT("Chunks: "); +DPRINT("Chunks: aidx:%d ", a_index); i = d_index; do { void *chunk_start = (void *)(uintptr_t)gpa_to_va(dev, desc[i].addr); @@ -1063,7 +1083,9 @@ vubr_set_vring_err_exec(VubrDev *dev, VhostUserMsg *vmsg) static int vubr_get_protocol_features_exec(VubrDev *dev, VhostUserMsg *vmsg) { -vmsg->payload.u64 = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD; +vmsg->payload.u64 = +1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD | +1ULL << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL; DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64); vmsg->size = sizeof(vmsg->payload.u64); @@ -1105,6 +1127,46 @@ vubr_send_rarp_exec(VubrDev *dev, VhostUserMsg *vmsg) return 0; } +static void +vubr_handle_slave_reply(VhostUserMsg *vmsg) +{ +DPRINT( +"== Vhost slave reply from QEMU ==\n"); +DPRINT("Request: %s (%d)\n", vubr_slave_request_str[vmsg->request], + vmsg->request); +DPRINT("Flags: 0x%x\n", vmsg->flags); +DPRINT("Size:%d\n", vmsg->size); + +switch (vmsg->request) { +case VHOST_USER_SLAVE_SHUTDOWN: +DPRINT("Shutdown success: 0x%016"PRIx64"\n", vmsg->payload.u64); +if (vmsg->payload.u64 == 0) { +exit(0); +} +default: +DPRINT("Invalid slave reply"); +}; +} + +static void +slave_receive_cb(int sock, void *ctx) +{ +VhostUserMsg vmsg; + +vubr_message_read(sock, &vmsg); +vubr_handle_slave_reply(&vmsg); +} + +static int +vubr_set_slave_fd_exec(VubrDev *dev, VhostUserMsg *vmsg) +{ +assert(vmsg->fd_num == 1); +dev->slave_fd = vmsg->fds[0]; +DPRINT("Got slave_fd: %d\n", vmsg->fds[0]); +dispatcher_add(&dev->dispatcher, dev->slave_fd, dev, slave_receive_cb); +return 0; +} + static int vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg) { @@ -1166,6 +1228,8 @@ vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg) return vubr_set_vring_enable_exec(dev, vmsg); case VHOST_USER_SEND_RARP: return vubr_send_rarp_exec(dev, vmsg); +case VHOST_USER_SET_SLAVE_FD: +return vubr_set_slave_fd_exec(dev, vmsg); case VHOST_USER_MAX: assert(vmsg->request != VHOST_USER_MAX); @@ -1226,6 +1290,7 @@ vubr_new(const char *path) }; } +dev->slave_fd = -1; /* Init log */ dev->log_call_fd = -1; dev->
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
On 01/04/2016 13:20, Richard W.M. Jones wrote: > +# MPTABLE is required by Linux kernel, the others add only a > +# couple of milliseconds so we might as well have them > +CONFIG_PIRTABLE=y > +CONFIG_MPTABLE=y > +CONFIG_SMBIOS=y > +CONFIG_ACPI=y > + MPTABLE is only required is you don't have ACPI. Paolo
Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.
On 3/29/2016 4:18 PM, Paolo Bonzini wrote: > > > On 29/03/2016 12:47, Jitendra Kolhe wrote: >>> Indeed. It is correct for the main system RAM, but hot-plugged RAM >>> would also have a zero-based section.offset_within_region. You need to >>> add memory_region_get_ram_addr(section.mr), just like the call to >>> balloon_page adds memory_region_get_ram_ptr(section.mr). >>> >>> Paolo >> >> I am only interested in the offset from memory region base. >> Would below guest PA to host offset work, as we do in >> address_space_translate_internal()? >> (Guest pa - section.offset_within_address_space + >> section.offset_within_region) > > Yes, that would work. But I'm not sure why you're not interested in the > ram_addr_t. > You are right, I was wrongly calculating offsets for hot-plugged RAMblocks. I will have to use qemu_ram_block_from_host() to get ram_addr_t to get correct offsets in the bitmap. > Paolo >
Re: [Qemu-devel] Why is SeaBIOS used with -kernel?
My patch, plus the configuration and comments from your patch, combined. Plus I tested it with libguestfs boot-analysis and it works and is still fast. Integrating this so it happens automatically when the user adds -kernel on x86 seems quite complicated. The only way I could do it was by adding #ifdef defined(__x86_64__) etc to vl.c, which doesn't seem very nice. The problem is the machine type code doesn't know that you're using -kernel. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html >From 28a442fdf8036bde075c75088f8dae8d8568243a Mon Sep 17 00:00:00 2001 From: "Richard W.M. Jones" Date: Fri, 1 Apr 2016 11:35:09 +0100 Subject: [PATCH] bios: Add fast variant of SeaBIOS for use with -kernel on x86. This commit adds a fast variant of SeaBIOS called 'bios-fast.bin'. It's designed to be the fastest (also the smallest, but that's not the main aim) SeaBIOS that is just enough to boot a Linux kernel using the -kernel option on i686 and x86_64. This commit does not modify the -kernel option to use this. You have to specify it by doing something like this: -kernel vmlinuz -bios bios-fast.bin Signed-off-by: Richard W.M. Jones --- Makefile | 3 ++- roms/Makefile| 4 +++- roms/config.seabios-fast | 32 3 files changed, 37 insertions(+), 2 deletions(-) create mode 100644 roms/config.seabios-fast diff --git a/Makefile b/Makefile index 1d076a9..c4e939d 100644 --- a/Makefile +++ b/Makefile @@ -389,7 +389,8 @@ common de-ch es fo fr-ca hu ja mk nl-be pt sl tr \ bepocz ifdef INSTALL_BLOBS -BLOBS=bios.bin bios-256k.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \ +BLOBS=bios.bin bios-256k.bin bios-fast.bin \ +sgabios.bin vgabios.bin vgabios-cirrus.bin \ vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin vgabios-virtio.bin \ acpi-dsdt.aml \ ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin QEMU,cgthree.bin \ diff --git a/roms/Makefile b/roms/Makefile index 7bd1252..26b0586 100644 --- a/roms/Makefile +++ b/roms/Makefile @@ -61,9 +61,11 @@ default: @echo " slof -- update slof.bin" @echo " u-boot.e500-- update u-boot.e500" -bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k +bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k \ + build-seabios-config-seabios-fast cp seabios/builds/seabios-128k/bios.bin ../pc-bios/bios.bin cp seabios/builds/seabios-256k/bios.bin ../pc-bios/bios-256k.bin + cp seabios/builds/seabios-fast/bios.bin ../pc-bios/bios-fast.bin seavgabios: $(patsubst %,seavgabios-%,$(vgabios_variants)) diff --git a/roms/config.seabios-fast b/roms/config.seabios-fast new file mode 100644 index 000..fbb1ef8 --- /dev/null +++ b/roms/config.seabios-fast @@ -0,0 +1,32 @@ +# The fastest SeaBIOS that can boot Linux using -kernel. + +# general stuff +CONFIG_QEMU=y +CONFIG_ROM_SIZE=128 +CONFIG_XEN=n +CONFIG_THREADS=n +CONFIG_WRITABLE_UPPERMEMORY=y + +# no input, no boot menu +CONFIG_MOUSE=n +CONFIG_KEYBOARD=n +CONFIG_BOOTMENU=n +CONFIG_BOOTSPLASH=n + +# hardware support we don't need +CONFIG_LPT=n +CONFIG_SERIAL=n +CONFIG_USB=n +CONFIG_DRIVES=n +CONFIG_TCGBIOS=n +CONFIG_VGAHOOKS=n + +# MPTABLE is required by Linux kernel, the others add only a +# couple of milliseconds so we might as well have them +CONFIG_PIRTABLE=y +CONFIG_MPTABLE=y +CONFIG_SMBIOS=y +CONFIG_ACPI=y + +# no logging +CONFIG_DEBUG_LEVEL=0 -- 2.7.4
Re: [Qemu-devel] tcg: reworking tb_invalidated_flag
On 01/04/16 14:11, Alex Bennée wrote: > Sergey Fedorov writes: > >> On 31/03/16 16:37, Alex Bennée wrote: >>> Sergey Fedorov writes: Looks like no matter which approach we use, it's ultimately necessary to ensure all CPUs have exited from translated code before the translation buffer may be safely flushed. >>> One approach would be to have multiple translation contexts with their >>> own buffers and then you can safely flush TBs if no vCPUs are currently >>> executing in those regions. But I suspect that is a much more complex >>> future optimisation. >> Yes, this is much more complex and its performance impact should be >> investigated. >> >>> Having said that is it safe to flush TBs from a given page if we know >>> no vCPUs are currently executing in that page? As the execution loop has >>> to exit the chained TBs as we cross page boundaries we could just keep >>> account of which vCPUs are currently in which page. >> It should be safe to invalidate a TB while some other CPU is executing >> its translated code. But it should be guaranteed that no CPU execute any >> old TB after tb_flush() since we're going to start reusing those TBs. >> >> I see how TB cannot be patched if it spans two pages, is there any on >> when TCG goto_tb can be generated? > Do you mean tcg_gen_goto_tb? > > AFAIUI all blocks end with goto_tb post-ambles but they should only > directly jump to another TB if they are in the same page. Thanks, I see the checks. Regards, Sergey