[Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property

2016-04-01 Thread zhanghailiang
After commit 338d3f, we support 'status' property for filter object.
The segfault can be triggered by starting qemu with 'status=off' property
for filter, when the s->incoming_queue is NULL, we reference it directly
in qemu_net_queue_flush().

Let's check the value of 's->incoming_queue' before calling
qemu_net_queue_flush().

Signed-off-by: zhanghailiang 
---
 net/filter-buffer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index cc6bd94..79e2ce3 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf)
 {
 FilterBufferState *s = FILTER_BUFFER(nf);
 
-if (!qemu_net_queue_flush(s->incoming_queue)) {
+if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) {
 /* Unable to empty the queue, purge remaining packets */
 qemu_net_queue_purge(s->incoming_queue, nf->netdev);
 }
-- 
1.8.3.1





Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property

2016-04-01 Thread Jason Wang


On 04/01/2016 03:08 PM, zhanghailiang wrote:
> After commit 338d3f, we support 'status' property for filter object.
> The segfault can be triggered by starting qemu with 'status=off' property
> for filter, when the s->incoming_queue is NULL, we reference it directly
> in qemu_net_queue_flush().
>
> Let's check the value of 's->incoming_queue' before calling
> qemu_net_queue_flush().
>
> Signed-off-by: zhanghailiang 
> ---
>  net/filter-buffer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/filter-buffer.c b/net/filter-buffer.c
> index cc6bd94..79e2ce3 100644
> --- a/net/filter-buffer.c
> +++ b/net/filter-buffer.c
> @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf)
>  {
>  FilterBufferState *s = FILTER_BUFFER(nf);
>  
> -if (!qemu_net_queue_flush(s->incoming_queue)) {
> +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) {
>  /* Unable to empty the queue, purge remaining packets */
>  qemu_net_queue_purge(s->incoming_queue, nf->netdev);
>  }

We'd better handle this at generic layer and don't let a specific net
filter need to worry about this.

Looks like the issue is we may trigger status_changed() too early (even
before the the filter was initialized).

How about not call status_changed() if the initialization is not done?



Re: [Qemu-devel] [Qemu-stable] [ANNOUNCE] QEMU 2.5.1 Stable released

2016-04-01 Thread Peter Lieven
Am 30.03.2016 um 02:11 schrieb Michael Roth:
> Hi everyone,
>
> I am pleased to announce that the QEMU v2.5.1 stable release is now
> available at:
>
>   http://wiki.qemu.org/download/qemu-2.5.1.tar.bz2
>
> v2.5.1 is now tagged in the official qemu.git repository,
> and the stable-2.5 branch has been updated accordingly:
>
>   http://git.qemu.org/?p=qemu.git;a=shortlog;h=refs/heads/stable-2.5
>
> In addition to the normal array of general bug fixes, this release
> includes security fixes/hardening for USB, vmxnet3/e1000/ne2000 NICs,
> NIC checksumming, and management consoles via HMP. Users of earlier
> releases should upgrade accordingly.
>
> Note: For -M pseries-2.3 PPC guests, migration is now restored between
> QEMU 2.3.x and 2.5.1, but migration between 2.5.1 and any versions other
> than 2.3.x now require the -machine enforce-config-section=on option.
>
> Thank you to everyone involved!
>
> CHANGELOG:
>
> a58047f: Update version for 2.5.1 release (Michael Roth)
> 5f409b1: hyperv: cpu hotplug fix with HyperV enabled (Denis V. Lunev)
> 078de11: vmdk: Fix converting to streamOptimized (Fam Zheng)
> acea76c: vmdk: Create streamOptimized as version 3 (Fam Zheng)
> 80b6e57: usb: check USB configuration descriptor object (Prasad J Pandit)
> 9bddb45: usb: check RNDIS message length (Prasad J Pandit)
> e3a2cdf: usb: check RNDIS buffer offsets & length (Prasad J Pandit)
> 4dcd2f1: usb: check page select value while processing iTD (Prasad J Pandit)
> 38e0921: net: ne2000: fix bounds check in ioport operations (Prasad J Pandit)
> d0ee85b: net: check packet payload length (Prasad J Pandit)
> 4f046a6: ide: ahci: reset ncq object to unused on error (Prasad J Pandit)
> b47809c: i386: avoid null pointer dereference (P J P)
> 24fe899: hmp: fix sendkey out of bounds write (CVE-2015-8619) (Wolfgang 
> Bumiller)
> aaf4fb6: ahci: Do not unmap NULL addresses (John Snow)
> a2ae168: migration: allow machine to enforce configuration section migration 
> (Greg Kurz)
> bad094d: vl.c: Fix regression in machine error message (Marcel Apfelbaum)
> 4b0b1ec: quorum: Fix crash in quorum_aio_cb() (Alberto Garcia)
> cab1cc7: target-arm: Make reserved ranges in ID_AA64* spaces RAZ, not UNDEF 
> (Peter Maydell)
> 9ae0217: vhost-user: don't merge regions with different fds (Michael S. 
> Tsirkin)
> 3092979: fw_cfg: unbreak migration compatibility for 2.4 and earlier machines 
> (Laszlo Ersek)
> c5c9841: hw/virtio: group virtio flags into an enum (Marcel Apfelbaum)
> 6b62303: hw/virtio: fix double use of a virtio flag (Marcel Apfelbaum)
> c06f342: spapr: skip configuration section during migration of older machines 
> (Greg Kurz)
> cb873ea: e1000: eliminate infinite loops on out-of-bounds transfer start 
> (Laszlo Ersek)
> 4853a5a: block: qemu-iotests - add test for snapshot, commit, snapshot bug 
> (Jeff Cody)
> a375e0b: block: set device_list.tqe_prev to NULL on BDS removal (Jeff Cody)
> a38a283: qmp: Fix reference-counting of qnull on empty output visit (Eric 
> Blake)
> 225d50f: cpus: use broadcast on qemu_pause_cond (Dr. David Alan Gilbert)
> 020282d: fw_cfg: avoid calculating invalid current entry pointer (Gabriel L. 
> Somlo)
> 091af18: s390x/css: fix control flags during csch (Halil Pasic)
> d983923: s390x/ioinst: set type and len for SEI response (Pierre Morel)
> 643c8d8: block/raw-posix: avoid bogus fixup for cylinders on DASD disks 
> (Christian Borntraeger)
> 3ede27d: ehci: update irq on reset (Gerd Hoffmann)
> 9849b19: net: set endianness on all backend devices (Laurent Vivier)
> fe90bdc: net: ne2000: check ring buffer control registers (Prasad J Pandit)
> aaa5271: net/filter: fix nf->netdev_id leak (Li Zhijian)
> abda95c: net/dump: fix nfds->filename leak (Li Zhijian)
> 6a49a71: blockdev: Fix 'change' for slot devices (Max Reitz)
> e1a8a09: block: Add blk_dev_has_tray() (Max Reitz)
> 7a2c1c8: net: rocker: fix an incorrect array bounds check (Prasad J Pandit)
> 702a8d1: ivshmem: remove redundant assignment, fix crash with msi=off 
> (Marc-André Lureau)
> 3e96d5d: ivshmem: no need for opaque argument (Marc-André Lureau)
> 16a2875: scsi: initialise info object with appropriate size (P J P)
> 4588b0d: virtio-9p: use accessor to get thread_pool (Greg Kurz)
> ff083d3: xenfb: avoid reading twice the same fields from the shared page 
> (Stefano Stabellini)
> 4d59e78: xen/blkif: Avoid double access to src->nr_segments (Stefano 
> Stabellini)
> 52a7b27: configure: Fix shell syntax to placate OpenBSD's pdksh (Peter 
> Maydell)
> d4aed70: target-ppc: kvm: fix floating point registers sync on little-endian 
> hosts (Greg Kurz)
> 42ae4a3: net: vmxnet3: avoid memory leakage in activate_device (P J P)
> 0d33580: ehci: make idt processing more robust (Gerd Hoffmann)
>
>

Unfortunately, this release lacks the following patch:

target-i386: do not read/write MSR_TSC_AUX from KVM if CPUID bit is not set

without it any vServer with a Westmere or older vCPU will freeze with 100% CPU
on vmload / migration.

Peter




Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Paolo Bonzini


On 31/03/2016 22:17, Alex Bligh wrote:
>> > In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
>> > that's the same function it calls for write+FUA.
> That's harmless, but unnecessary in the sense that current documented
> behaviour doesn't require it. Perhaps it should?
> 
> I suppose TRIM etc. should support FUA too?

TRIM is an advisory operation, so it doesn't make sense to force access
to the medium.  The closest you could get would be to add FUA to
WRITE_ZEROES.  But since WRITE_ZEROES is not a particularly common
operation there isn't much to gain compared to FLUSHing after the write
has completed; in fact SCSI doesn't have a FUA bit on its WRITE SAME
command.

Paolo



Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Paolo Bonzini


On 31/03/2016 22:34, Eric Blake wrote:
> give me possibly stale data because I accessed
> the underlying storage rather than paying attention to in-flight writes
> that would change what I read".

Overlapping I/O is always unspecified, so you should expect stale data
if a read overlaps an in-flight write.  Not only there's no guarantee of
atomicity; the disk is a "safe register" rather than a "regular
register"
(http://stackoverflow.com/questions/8871633/whats-the-difference-between-safe-regular-and-atomic-registers)

The way QEMU does "flush first" for FUA on reads provides the guarantee
that "no one else can read something older than what I read", but that
assumes that no one else is doing an overlapping write.

Paolo



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Thu, Mar 31, 2016 at 06:44:32PM -0400, Kevin O'Connor wrote:
> On Thu, Mar 31, 2016 at 11:17:30PM +0100, Richard W.M. Jones wrote:
> > I'd dearly love to get rid of the sgabios option ROM.  It looks like
> > SeaBIOS nearly supports a full serial console now?
> 
> Last I checked, one could disable the option rom by adding "-device
> VGA,romfile=" to the qemu command line.

Sure, I can easily remove sgabios.  The problem is I still want my
serial console to show BIOS messages!

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



Re: [Qemu-devel] [PATCH] slirp: Fix migration from older versions of QEMU to the current one

2016-04-01 Thread Thomas Huth
On 01.04.2016 00:22, Samuel Thibault wrote:
> Samuel Thibault, on Fri 01 Apr 2016 00:00:43 +0200, wrote:
>> Just realizing... We'd need to add AF_INET6 cases here too, to be able
>> to save/restore a VM using ipv6 connections.
> 
> That seems quite involved however, maybe we should postpone that to
> later?

I think it's ok to do it later. The code there is only invoked when the
user specified the "guestfwd=..." option, so that's likely rather a rare
case, I think, which could be postponed to QEMU 2.7.

 Thomas




Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Thu, Mar 31, 2016 at 06:44:32PM -0400, Kevin O'Connor wrote:
> I think you'll find that if you compile out some features from
> SeaBIOS, it will be of a similar speed to that "minimal BIOS".  Try
> this:
> 
> cd /path/to/seabios/
> echo -e 
> 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0'
>  > .config
> make olddefconfig
> make
> 
> What time do you get with the above stripped down seabios (the
> generated bios is in out/bios.bin)?

It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs
during virtio-scsi sensing.  See attached.  Any idea what I need to
enable?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
./run test-tool/libguestfs-test-tool 
 
 *IMPORTANT NOTICE
 *
 * When reporting bugs, include the COMPLETE, UNEDITED
 * output below in your bug report.
 *
 
LIBGUESTFS_PATH=/home/rjones/d/libguestfs/appliance
LIBGUESTFS_CACHEDIR=/home/rjones/d/libguestfs/tmp
LD_LIBRARY_PATH=/home/rjones/d/libguestfs/ruby/ext/guestfs:/home/rjones/d/libguestfs/src/.libs:/home/rjones/d/libguestfs/java/.libs:/home/rjones/d/libguestfs/gobject/.libs
LIBGUESTFS_TMPDIR=/home/rjones/d/libguestfs/tmp
LIBGUESTFS_HV=/home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64
LIBGUESTFS_BACKEND=direct
PATH=/home/rjones/d/libguestfs/v2v:/home/rjones/d/libguestfs/tools:/home/rjones/d/libguestfs/test-tool:/home/rjones/d/libguestfs/sysprep:/home/rjones/d/libguestfs/sparsify:/home/rjones/d/libguestfs/resize:/home/rjones/d/libguestfs/rescue:/home/rjones/d/libguestfs/p2v:/home/rjones/d/libguestfs/make-fs:/home/rjones/d/libguestfs/inspector:/home/rjones/d/libguestfs/get-kernel:/home/rjones/d/libguestfs/fuse:/home/rjones/d/libguestfs/format:/home/rjones/d/libguestfs/fish:/home/rjones/d/libguestfs/erlang:/home/rjones/d/libguestfs/edit:/home/rjones/d/libguestfs/diff:/home/rjones/d/libguestfs/dib:/home/rjones/d/libguestfs/df:/home/rjones/d/libguestfs/customize:/home/rjones/d/libguestfs/cat:/home/rjones/d/libguestfs/builder:/home/rjones/d/libguestfs/align:/usr/lib64/ccache:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/home/rjones/.local/bin:/home/rjones/bin
XDG_RUNTIME_DIR=/run/user/1000
SELinux: Enforcing
guestfs_get_append: (null)
guestfs_get_autosync: 1
guestfs_get_backend: direct
guestfs_get_backend_settings: []
guestfs_get_cachedir: /home/rjones/d/libguestfs/tmp
guestfs_get_direct: 0
guestfs_get_hv: /home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64
guestfs_get_memsize: 500
guestfs_get_network: 0
guestfs_get_path: /home/rjones/d/libguestfs/appliance
guestfs_get_pgroup: 0
guestfs_get_program: libguestfs-test-tool
guestfs_get_recovery_proc: 1
guestfs_get_selinux: 0
guestfs_get_smp: 1
guestfs_get_sockdir: /run/user/1000
guestfs_get_tmpdir: /home/rjones/d/libguestfs/tmp
guestfs_get_trace: 0
guestfs_get_verbose: 1
host_cpu: x86_64
Launching appliance, timeout set to 600 seconds.
libguestfs: launch: program=libguestfs-test-tool
libguestfs: launch: version=1.33.16
libguestfs: launch: backend registered: unix
libguestfs: launch: backend registered: uml
libguestfs: launch: backend registered: libvirt
libguestfs: launch: backend registered: direct
libguestfs: launch: backend=direct
libguestfs: launch: tmpdir=/home/rjones/d/libguestfs/tmp/libguestfsT8B1IH
libguestfs: launch: umask=0002
libguestfs: launch: euid=1000
libguestfs: begin building supermin appliance
libguestfs: run supermin
libguestfs: command: run: /usr/bin/supermin
libguestfs: command: run: \ --build
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ --if-newer
libguestfs: command: run: \ --lock 
/home/rjones/d/libguestfs/tmp/.guestfs-1000/lock
libguestfs: command: run: \ --copy-kernel
libguestfs: command: run: \ -f ext2
libguestfs: command: run: \ --host-cpu x86_64
libguestfs: command: run: \ /home/rjones/d/libguestfs/appliance/supermin.d
libguestfs: command: run: \ -o 
/home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d
supermin: version: 5.1.15
supermin: rpm: detected RPM version 4.13
supermin: package handler: fedora/rpm
supermin: acquiring lock on /home/rjones/d/libguestfs/tmp/.guestfs-1000/lock
supermin: if-newer: output does not need rebuilding
libguestfs: finished building supermin appliance
libguestfs: begin testing qemu features
libguestfs: command: run: /home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64
libguestfs: command: run: \ -display none
libguestfs: command: run: \ -help
libguestfs: qemu version 2.5
libguestfs: command: run: /home

Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 09:55, Richard W.M. Jones wrote:
>>> > > I'd dearly love to get rid of the sgabios option ROM.  It looks like
>>> > > SeaBIOS nearly supports a full serial console now?
>> > 
>> > Last I checked, one could disable the option rom by adding "-device
>> > VGA,romfile=" to the qemu command line.
> Sure, I can easily remove sgabios.  The problem is I still want my
> serial console to show BIOS messages!

What do you need it for?

Paolo



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:02, Richard W.M. Jones wrote:
>> > echo -e 
>> > 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0'
>> >  > .config
>> > make olddefconfig
>> > make
>> > 
>> > What time do you get with the above stripped down seabios (the
>> > generated bios is in out/bios.bin)?
> It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs
> during virtio-scsi sensing.  See attached.  Any idea what I need to
> enable?

At least ACPI (I would also add mptable and SMBIOS).

Paolo



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 10:11:46AM +0200, Paolo Bonzini wrote:
> 
> 
> On 01/04/2016 10:02, Richard W.M. Jones wrote:
> >> > echo -e 
> >> > 'CONFIG_USB=n\nCONFIG_DRIVES=n\nCONFIG_KEYBOARD=n\nCONFIG_MOUSE=n\nCONFIG_WRITABLE_UPPERMEMORY=y\nCONFIG_TCGBIOS=n\nCONFIG_PIRTABLE=n\nCONFIG_MPTABLE=n\nCONFIG_SMBIOS=n\nCONFIG_ACPI=n\nCONFIG_DEBUG_LEVEL=0'
> >> >  > .config
> >> > make olddefconfig
> >> > make
> >> > 
> >> > What time do you get with the above stripped down seabios (the
> >> > generated bios is in out/bios.bin)?
> > It's a bit unexpected: The kernel (not SeaBIOS) crashes or hangs
> > during virtio-scsi sensing.  See attached.  Any idea what I need to
> > enable?
> 
> At least ACPI (I would also add mptable and SMBIOS).

Found it: only CONFIG_MPTABLE=y was necessary.  It boots with:

# CONFIG_PIRTABLE is not set
CONFIG_MPTABLE=y
# CONFIG_SMBIOS is not set
# CONFIG_ACPI is not set

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



[Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!

2016-04-01 Thread Paolo Bonzini
Microsoft loves Linux, and Red Hat loves .NET.  Since we can put whatever
we want in the Hyper-V vendor signature, let's show some love too!

Cc: Andreas Färber 
Cc: Alex Williamson 
Cc: Denis V. Lunev 
Cc: Eduardo Habkost 
Cc: Roman Kagan 
Cc: KY Srinivasan 
Signed-off-by: Paolo Bonzini 
---
 target-i386/kvm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 799fdfa..1968f04 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 c = &cpuid_data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
 if (!cpu->hyperv_vendor_id) {
-memcpy(signature, "Microsoft Hv", 12);
+memcpy(signature, "KVM<3HyperV!", 12);
 } else {
 size_t len = strlen(cpu->hyperv_vendor_id);
 
-- 
2.5.5




Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones

Minimal SeaBIOS (+ CONFIG_MPTABLE):
http://oirase.annexia.org/tmp/min-seabios.txt

Stock SeaBIOS from qemu:
http://oirase.annexia.org/tmp/stock-seabios.txt

Both files best viewed with `less -r'.

It does appear to considerably reduce SeaBIOS time.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:14, Richard W.M. Jones wrote:
>> > At least ACPI (I would also add mptable and SMBIOS).
> Found it: only CONFIG_MPTABLE=y was necessary.  It boots with:
> 
> # CONFIG_PIRTABLE is not set
> CONFIG_MPTABLE=y
> # CONFIG_SMBIOS is not set
> # CONFIG_ACPI is not set

If you add all three it should not give any slowdown and will provide
full hardware features to the kernel.  qboot does ACPI and PCI bus
assignment (it doesn't do SMBIOS because I got bored debugging it. :))

Paolo



[Qemu-devel] [PATCH] ui/virtio-gpu: add and use qemu_create_displaysurface_pixman

2016-04-01 Thread Gerd Hoffmann
Add a the new qemu_create_displaysurface_pixman function, to create
a DisplaySurface backed by an existing pixman image.  In that case
there is no need to create a new pixman image pointing to the same
backing storage.  We can just use the existing image directly.

This does not only simplify things a bit, but most importantly it
gets the reference counting right, so the backing storage for the
pixman image wouldn't be released underneath us.

Use new function in virtio-gpu, where using it actually fixes
use-after-free crashes.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Gerd Hoffmann 
---
 hw/display/virtio-gpu.c |  5 +
 include/ui/console.h|  1 +
 trace-events|  1 +
 ui/console.c| 11 +++
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index ddf3bfb..095d7ff 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -572,10 +572,7 @@ static void virtio_gpu_set_scanout(VirtIOGPU *g,
 scanout->width != ss.r.width ||
 scanout->height != ss.r.height) {
 /* realloc the surface ptr */
-scanout->ds = qemu_create_displaysurface_from
-(ss.r.width, ss.r.height, format,
- pixman_image_get_stride(res->image),
- (uint8_t *)pixman_image_get_data(res->image) + offset);
+scanout->ds = qemu_create_displaysurface_pixman(res->image);
 if (!scanout->ds) {
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 return;
diff --git a/include/ui/console.h b/include/ui/console.h
index eb9419d..d5a88d9 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -234,6 +234,7 @@ DisplayState *init_displaystate(void);
 DisplaySurface *qemu_create_displaysurface_from(int width, int height,
 pixman_format_code_t format,
 int linesize, uint8_t *data);
+DisplaySurface *qemu_create_displaysurface_pixman(pixman_image_t *image);
 DisplaySurface *qemu_create_displaysurface_guestmem(int width, int height,
 pixman_format_code_t 
format,
 int linesize,
diff --git a/trace-events b/trace-events
index 996a77f..8350743 100644
--- a/trace-events
+++ b/trace-events
@@ -1161,6 +1161,7 @@ console_select(int nr) "%d"
 console_refresh(int interval) "interval %d ms"
 displaysurface_create(void *display_surface, int w, int h) "surface=%p, %dx%d"
 displaysurface_create_from(void *display_surface, int w, int h, uint32_t 
format) "surface=%p, %dx%d, format 0x%x"
+displaysurface_create_pixman(void *display_surface) "surface=%p"
 displaysurface_free(void *display_surface) "surface=%p"
 displaychangelistener_register(void *dcl, const char *name) "%p [ %s ]"
 displaychangelistener_unregister(void *dcl, const char *name) "%p [ %s ]"
diff --git a/ui/console.c b/ui/console.c
index 1fd4ea4..bf38579 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1292,6 +1292,17 @@ DisplaySurface *qemu_create_displaysurface_from(int 
width, int height,
 return surface;
 }
 
+DisplaySurface *qemu_create_displaysurface_pixman(pixman_image_t *image)
+{
+DisplaySurface *surface = g_new0(DisplaySurface, 1);
+
+trace_displaysurface_create_pixman(surface);
+surface->format = pixman_image_get_format(image);
+surface->image = pixman_image_ref(image);
+
+return surface;
+}
+
 static void qemu_unmap_displaysurface_guestmem(pixman_image_t *image,
void *unused)
 {
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property

2016-04-01 Thread Hailiang Zhang

On 2016/4/1 15:39, Jason Wang wrote:



On 04/01/2016 03:08 PM, zhanghailiang wrote:

After commit 338d3f, we support 'status' property for filter object.
The segfault can be triggered by starting qemu with 'status=off' property
for filter, when the s->incoming_queue is NULL, we reference it directly
in qemu_net_queue_flush().

Let's check the value of 's->incoming_queue' before calling
qemu_net_queue_flush().

Signed-off-by: zhanghailiang 
---
  net/filter-buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index cc6bd94..79e2ce3 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf)
  {
  FilterBufferState *s = FILTER_BUFFER(nf);

-if (!qemu_net_queue_flush(s->incoming_queue)) {
+if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) {
  /* Unable to empty the queue, purge remaining packets */
  qemu_net_queue_purge(s->incoming_queue, nf->netdev);
  }


We'd better handle this at generic layer and don't let a specific net
filter need to worry about this.

Looks like the issue is we may trigger status_changed() too early (even
before the the filter was initialized).



Yes ~


How about not call status_changed() if the initialization is not done?



But seems that it is difficult to confirm if the filter is initialized
or not ...


.






[Qemu-devel] [Bug 1006655] Re: Can't convert to vmdk with the streamOptimized subformat

2016-04-01 Thread Thiago Martins
I have tried, many times, to use qemu-img to create StreamOptimized for
VMWare ESXi 6.0 OVAs.

It does NOT work.

After days of research, I replaced qemu-img, by vboxmanage, then, it
worked!

Now, I'm using something this:

--
vboxmanage convertfromraw packer/output-vm.raw packer/output-vm-disk1.vmdk 
--format vmdk --variant Stream
--

I also had to edit the headers, with dd:

---
dd if=output-vm-disk1.vmdk of=output-vm-disk1.descriptor bs=1 skip=512 
count=1024
sed -i -e 's/ide/lsilogic/g' output-vm-disk1.descriptor
dd conv=notrunc,nocreat if=output-vm-disk1.descriptor of=output-disk1.vmdk bs=1 
seek=512 count=1024
---

This is the only way to build StreamOptimized VMDKs, for inclusion
inside of OVA, that works on VMWare 6.0.

The qemu-img can not be used.

Here is my full solution to this problem (if the same problem):

https://github.com/tmartinx/svauto/blob/dev/image-factory.sh#L510

I'm just not entirely sure if we're talking about the same problem
here...

Cheers!
Thiago

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1006655

Title:
  Can't convert to vmdk with the streamOptimized subformat

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu-kvm package in Ubuntu:
  Invalid

Bug description:
  Hi all,
  I'm trying to convert a qcow2 image file to the vmdk (on Ubuntu Server 12.04):

  # qemu-img convert -f qcow2 -O vmdk -o Stream spv-2912.qcow2 spv-2912-3.vmdk 
-o subformat=streamOptimized
  VMDK: can't write to allocated cluster for streamOptimized
  qemu-img: error while writing sector 65: Input/output error

  Same result with any input format. Others subformats work but the
  StreamOptimized is by far the most important one. The only workaround
  I found is to migrate to raw and then to use VBoxManage (VirtualBox).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1006655/+subscriptions



[Qemu-devel] [Bug 1006655] Re: Can't convert to vmdk with the streamOptimized subformat

2016-04-01 Thread Thiago Martins
It looks like my problem is different, since I'm not seeing this:

qemu-img: error while writing sector 65: Input/output error

But maybe, it is a light in the end of the tunnel...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1006655

Title:
  Can't convert to vmdk with the streamOptimized subformat

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Fix Released
Status in qemu-kvm package in Ubuntu:
  Invalid

Bug description:
  Hi all,
  I'm trying to convert a qcow2 image file to the vmdk (on Ubuntu Server 12.04):

  # qemu-img convert -f qcow2 -O vmdk -o Stream spv-2912.qcow2 spv-2912-3.vmdk 
-o subformat=streamOptimized
  VMDK: can't write to allocated cluster for streamOptimized
  qemu-img: error while writing sector 65: Input/output error

  Same result with any input format. Others subformats work but the
  StreamOptimized is by far the most important one. The only workaround
  I found is to migrate to raw and then to use VBoxManage (VirtualBox).

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1006655/+subscriptions



Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Wouter Verhelst
On Thu, Mar 31, 2016 at 02:34:24PM -0600, Eric Blake wrote:
> On 03/31/2016 02:17 PM, Alex Bligh wrote:
> >> even if we aren't quite sure
> >> what to document of those flags.  And that means qemu is correct, and
> >> the NBD protocol has a bug.  Since you contributed the FUA flag, is that
> >> something you can try to improve?
> > 
> > Yeah. My mess so I should clean it up. I think FUA should be valid
> > on essentially everything.
> > 
> > I think I might wait until structured replies is in though!
> 
> It's also tricky because we just barely documented that servers SHOULD
> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
> commands where it is not documented; I don't know if we have an adequate
> discovery system in place to learn _which_ commands support FUA,

That's what I'm mostly worried about. Yes, we have FUA, and yes, some
clients may send it on commands that aren't WRITE, but it is not very
well defined what happens then:

- Currently-released versions of nbd-server will accept the flag on
  non-WRITE commands, but ignore it. This is generally not desirable, I
  would say.
- The change that I made a few days ago means future versions will *not*
  accept it. I feel this is better than accepting-and-ignoring, but not
  ideal either in case a client sends out a request that includes FUA.

We can perhaps ignore the former of the above as "buggy behaviour" and
not negotiate anything for a server that does the "proper" thing of
accepting-and-acting-on for non-WRITE FUA, but then it is probably a
good idea to document in the spec that "there are older, buggy, servers
which accept but ignore FUA on anything but WRITE", at least for a few
years until such older servers aren't available anymore.

Alternatively, we could add yet another flag bit to signal "FUA valid
everywhere". Not sure if that is worth it.

[...]

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA

2016-04-01 Thread Wouter Verhelst
On Thu, Mar 31, 2016 at 07:15:32PM +0100, Alex Bligh wrote:
> NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but
> 1<<16 in nbd.h. It is not used anywhere within the code.

Yes it is:

wouter@gangtai:~/code/c/nbd$ grep -rl CMD_FLAG_FUA *
doc/proto.md
make-integrityhuge.c
nbd.h
nbd-server.c
nbd-trdump.c
tests/run/nbd-tester-client.c
wouter@gangtai:~/code/c/nbd$ 

I don't mind bringing the code in sync with what the documentation says,
but it should not change behaviour ;-)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH

2016-04-01 Thread Wouter Verhelst
On Fri, Apr 01, 2016 at 12:03:19AM +0100, Alex Bligh wrote:
> Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
> the latter may be set on any command, and its semantics on commands other
> than NBD_CMD_WRITE need explaining. Further, explain how these relate to
> reordering of commands.
> 
> Signed-off-by: Alex Bligh 
> ---
>  doc/proto.md | 52 ++--
>  1 file changed, 42 insertions(+), 10 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index c1e05c5..bc4483d 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -197,6 +197,37 @@ handle as was sent by the client in the corresponding 
> request.  In
>  this way, the client can correlate which request is receiving a
>  response.
>  
> + Ordering of messages and writes
> +
> +The server MAY process commands out of order, and MAY reply out of
> +order, save that:
> +
> +* All write commands (that includes both `NBD_CMD_WRITE` and
> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> +  (whether they are replied to or not) are written to non-volatile
> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> +  stronger condition than the previous 'MUST' condition. This

This seems to make little sense. Are you saying that suddenly now
sending a reply for FLUSH with outstanding writes is wrong? If not, the
above should be clarified.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [Nbd] [PATCH] Docs: proto.md: Clarify NUL in export name

2016-04-01 Thread Wouter Verhelst
Thanks, applied.

On Fri, Apr 01, 2016 at 12:16:48AM +0100, Alex Bligh wrote:
> Clarify that
> 
> * The name is not NUL terminated (not just that the length
>   'does not include NUL termination' which might be taken to mean
>   there is NUL termination but the length doesn't include it.
> 
> * The name cannot itself include embedded NUL characters (despite
>   it not being NUL terminated).
> 
> Signed-off-by: Alex Bligh 
> ---
>  doc/proto.md | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index c1e05c5..7729051 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -292,7 +292,8 @@ of the newstyle negotiation.
>  haggling, and proceed to the transmission phase. Data: name of the
>  export, free-form UTF-8 text (subject to limitations by server
>  implementation). The length of the name is determined from the
> -option header, and does NOT include a NUL terminator.  If the
> +option header. The name is not NUL terminated, and may not
> +contain embedded NUL characters. If the
>  chosen export does not exist or requirements for the chosen export
>  are not met (e.g., the client did not negotiate TLS for an export
>  where the server requires it), the server should close the
> -- 
> 1.9.1
> 
> 
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
> ___
> Nbd-general mailing list
> nbd-gene...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nbd-general
> 

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [Nbd] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension

2016-04-01 Thread Wouter Verhelst
Hi,

Thanks, applied.

On Thu, Mar 31, 2016 at 04:02:05PM +0300, Denis V. Lunev wrote:
> From: Pavel Borzenkov 
> 
> There exist some cases when a client knows that the data it is going to
> write is all zeroes. Such cases include mirroring or backing up a device
> implemented by a sparse file.
> 
> With current NBD command set, the client has to issue NBD_CMD_WRITE
> command with zeroed payload and transfer these zero bytes through the
> wire. The server has to write the data onto disk, effectively denying
> the sparseness.
> 
> To remedy this, the patch adds WRITE_ZEROES extension with one new
> NBD_CMD_WRITE_ZEROES command.
> 
> Signed-off-by: Pavel Borzenkov 
> Signed-off-by: Denis V. Lunev 
> CC: Wouter Verhelst 
> CC: Paolo Bonzini 
> CC: Kevin Wolf 
> CC: Stefan Hajnoczi 
> CC: Wouter Verhelst 
> CC: Alex Bligh 
> CC: Eric Blake 
> ---
> v2:
>   - rebased on master
>   - explicitly state that the client must not set NBD_CMD_WRITE_ZEROES if
> support for it wasn't negotiated with the server;
>   - add new command flag's description in format suitable for moving to
> "Command flags" section.
> 
> 
>  doc/proto.md | 64 
> 
>  1 file changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/proto.md b/doc/proto.md
> index c1e05c5..a574563 100644
> --- a/doc/proto.md
> +++ b/doc/proto.md
> @@ -261,6 +261,8 @@ immediately after the handshake flags field in oldstyle 
> negotiation:
>schedule I/O accesses as for a rotational medium
>  - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
>`NBD_CMD_TRIM` commands
> +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
> +  supports `NBD_CMD_WRITE_ZEROES` commands
>  
>  Clients SHOULD ignore unknown flags.
>  
> @@ -444,10 +446,13 @@ affects a particular command.  Clients MUST NOT set a 
> command flag bit
>  that is not documented for the particular command; and whether a flag is
>  valid may depend on negotiation during the handshake phase.
>  
> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`.  SHOULD be
> -  set to 1 if the client requires "Force Unit Access" mode of
> -  operation.  MUST NOT be set unless transmission flags included
> -  `NBD_FLAG_SEND_FUA`.
> +- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
> +  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
> +  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
> +  flags included `NBD_FLAG_SEND_FUA`.
> +
> +- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
> +  extension; see below.
>  
>   Request types
>  
> @@ -523,6 +528,10 @@ The following request types exist:
>  A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM`
>  was set in the transmission flags field.
>  
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +Defined by the experimental `WRITE_ZEROES` extension; see below.
> +
>  * Other requests
>  
>  Some third-party implementations may require additional protocol
> @@ -654,6 +663,53 @@ option reply type.
>message if they do not also send it as a reply to the
>`NBD_OPT_SELECT` message.
>  
> +### `WRITE_ZEROES` extension
> +
> +There exist some cases when a client knows that the data it is going to write
> +is all zeroes. Such cases include mirroring or backing up a device 
> implemented
> +by a sparse file. With current NBD command set, the client has to issue
> +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes
> +through the wire. The server has to write the data onto disk, effectively
> +losing the sparseness.
> +
> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds
> +one new command and one new command flag.
> +
> +* `NBD_CMD_WRITE_ZEROES` (6)
> +
> +A write request with no payload. Length and offset define the location
> +and amount of data to be zeroed.
> +
> +The server MUST zero out the data on disk, and then send the reply
> +message. The server MAY send the reply message before the data has
> +reached permanent storage.
> +
> +A client MUST NOT send a write zeroes request unless
> +`NBD_FLAG_SEND_WRITE_ZEROES` was set in the transmission flags field.
> +
> +If the `NBD_FLAG_SEND_FUA` flag was set in the transmission flags field,
> +the client MAY set the flag `NBD_CMD_FLAG_FUA` in the command flags 
> field.
> +If this flag was set, the server MUST NOT send the reply until it has
> +ensured that the newly-zeroed data has reached permanent storage.
> +
> +If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
> +flags field, the server MAY use trimming to zero out the area, but it
> +MUST ensure that the data reads back as zero.
> +
> +If an error occurs, the server SHOULD set the appropriate error code
> +in the error field. The server MAY then close the connection.
> +
> +The server SHOULD r

Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote:
> On 01/04/2016 10:14, Richard W.M. Jones wrote:
> > Found it: only CONFIG_MPTABLE=y was necessary.  It boots with:
> > 
> > # CONFIG_PIRTABLE is not set
> > CONFIG_MPTABLE=y
> > # CONFIG_SMBIOS is not set
> > # CONFIG_ACPI is not set
> 
> If you add all three it should not give any slowdown and will provide
> full hardware features to the kernel.  qboot does ACPI and PCI bus
> assignment (it doesn't do SMBIOS because I got bored debugging it. :))

Enabling all 4 adds about 2ms.

However the overhead of SeaBIOS is still down from 68ms to 18ms
(4.0% of total boot time down to 1.1%) so it's still a big gain.

I wonder how we can make use of this in qemu and downstream distros?
Can we have a bios-min.bin which is used with -kernel boots?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 10:03:10AM +0200, Paolo Bonzini wrote:
> 
> 
> On 01/04/2016 09:55, Richard W.M. Jones wrote:
> >>> > > I'd dearly love to get rid of the sgabios option ROM.  It looks like
> >>> > > SeaBIOS nearly supports a full serial console now?
> >> > 
> >> > Last I checked, one could disable the option rom by adding "-device
> >> > VGA,romfile=" to the qemu command line.
> > Sure, I can easily remove sgabios.  The problem is I still want my
> > serial console to show BIOS messages!
> 
> What do you need it for?

It's so we can handle error reports.  When someone reports that
libguestfs "hangs", it's sometimes useful to know if the BIOS was
entered or not, since it points the finger at either qemu, BIOS or
kernel.  (Remember we have to be able to run on a whole variety of
Linux distros which often have old/broken/random qemu/bios/kernel).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:44, Richard W.M. Jones wrote:
> On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote:
>> On 01/04/2016 10:14, Richard W.M. Jones wrote:
>>> Found it: only CONFIG_MPTABLE=y was necessary.  It boots with:
>>>
>>> # CONFIG_PIRTABLE is not set
>>> CONFIG_MPTABLE=y
>>> # CONFIG_SMBIOS is not set
>>> # CONFIG_ACPI is not set
>>
>> If you add all three it should not give any slowdown and will provide
>> full hardware features to the kernel.  qboot does ACPI and PCI bus
>> assignment (it doesn't do SMBIOS because I got bored debugging it. :))
> 
> Enabling all 4 adds about 2ms.
> 
> However the overhead of SeaBIOS is still down from 68ms to 18ms
> (4.0% of total boot time down to 1.1%) so it's still a big gain.
> 
> I wonder how we can make use of this in qemu and downstream distros?
> Can we have a bios-min.bin which is used with -kernel boots?

That's an interesting idea.  We can look at it for 2.7.

Paolo



Re: [Qemu-devel] [PATCH] Whitelist sysinfo call

2016-04-01 Thread Eduardo Otubo
On Mon, Mar 21, 2016 at 08=17=45AM -0400, Miroslav Rezanina wrote:
> 
> 
> - 元のメッセージ -
> > 差出人: "Eduardo Otubo" 
> > 宛先: mreza...@redhat.com
> > Cc: qemu-devel@nongnu.org, arm...@redhat.com
> > 送信済み: 2016年3月11日, 金曜日 午前 9:51:50
> > 件名: Re: [Qemu-devel] [PATCH] Whitelist sysinfo call
> > 
> > On Mon, Mar 07, 2016 at 10=34=46AM +0100, mreza...@redhat.com wrote:
> > > From: Miroslav Rezanina 
> > > 
> > > Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call
> > > so qemu using rbd image hang after start when run in sandbox mode.
> > > 
> > > To allow using rbd images in sandbox mode we have to whitelist it.
> > > 
> > > Signed-off-by: Miroslav Rezanina 
> > > ---
> > >  qemu-seccomp.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/qemu-seccomp.c b/qemu-seccomp.c
> > > index 2866e3c..e29fca1 100644
> > > --- a/qemu-seccomp.c
> > > +++ b/qemu-seccomp.c
> > > @@ -250,6 +250,7 @@ static const struct QemuSeccompSyscall
> > > seccomp_whitelist[] = {
> > >  #ifdef HAVE_CACHEFLUSH
> > >  { SCMP_SYS(cacheflush), 240 },
> > >  #endif
> > > +{ SCMP_SYS(sysinfo), 240 },
> > 
> > Are you sure you want to add this syscall to the bottom of the list? Did
> > you estimate the frequency it is called by running strace?
> > 
> > Thanks for the patch.
> > 
> Hi,
> 
> Yes, it wasn't used before nss update and now is used only for rbd based 
> images
> where it is called just few times upon start so drawback should be minimal. 
> With
> this we do not change cost of other calls.
> 
> Thanks for review and question,
> Mirek

Ok. So, ACK on this patch. I'll roll out a pull request by the end of
the day.
Thanks for the contribution.


-- 
Eduardo Otubo
ProfitBricks GmbH


signature.asc
Description: Digital signature


Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Vasiliy Tolstov
2016-04-01 11:47 GMT+03:00 Paolo Bonzini :
> That's an interesting idea.  We can look at it for 2.7.


That's fine =) I'm waiting too.

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:47, Richard W.M. Jones wrote:
> It's so we can handle error reports.  When someone reports that
> libguestfs "hangs", it's sometimes useful to know if the BIOS was
> entered or not, since it points the finger at either qemu, BIOS or
> kernel.  (Remember we have to be able to run on a whole variety of
> Linux distros which often have old/broken/random qemu/bios/kernel).

Sure, but that can be an extra option if sgabios adds overhead.

Anyhow, you probably want "-vga none" too.

Paolo



Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!

2016-04-01 Thread Denis V. Lunev

On 04/01/2016 11:18 AM, Paolo Bonzini wrote:

Microsoft loves Linux, and Red Hat loves .NET.  Since we can put whatever
we want in the Hyper-V vendor signature, let's show some love too!

Cc: Andreas Färber 
Cc: Alex Williamson 
Cc: Denis V. Lunev 
Cc: Eduardo Habkost 
Cc: Roman Kagan 
Cc: KY Srinivasan 
Signed-off-by: Paolo Bonzini 
---
  target-i386/kvm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 799fdfa..1968f04 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
  c = &cpuid_data.entries[cpuid_i++];
  c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
  if (!cpu->hyperv_vendor_id) {
-memcpy(signature, "Microsoft Hv", 12);
+memcpy(signature, "KVM<3HyperV!", 12);
  } else {
  size_t len = strlen(cpu->hyperv_vendor_id);
  

potentially this could be dangerous. This could break
hypervisor detection code. I'll check this today and
reply.

But you have made my day. Thank you for April 1.



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 10:51:31AM +0200, Paolo Bonzini wrote:
> 
> 
> On 01/04/2016 10:47, Richard W.M. Jones wrote:
> > It's so we can handle error reports.  When someone reports that
> > libguestfs "hangs", it's sometimes useful to know if the BIOS was
> > entered or not, since it points the finger at either qemu, BIOS or
> > kernel.  (Remember we have to be able to run on a whole variety of
> > Linux distros which often have old/broken/random qemu/bios/kernel).
> 
> Sure, but that can be an extra option if sgabios adds overhead.
> 
> Anyhow, you probably want "-vga none" too.

That's the qemu option?  Currently we use
  -nodefconfig -nodefaults -display none
which I assumed would do the same thing (ie. not add any default
devices).  Does -vga none do more than this?

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html



Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!

2016-04-01 Thread Daniel P. Berrange
On Fri, Apr 01, 2016 at 10:18:02AM +0200, Paolo Bonzini wrote:
> Microsoft loves Linux, and Red Hat loves .NET.  Since we can put whatever
> we want in the Hyper-V vendor signature, let's show some love too!
> 
> Cc: Andreas Färber 
> Cc: Alex Williamson 
> Cc: Denis V. Lunev 
> Cc: Eduardo Habkost 
> Cc: Roman Kagan 
> Cc: KY Srinivasan 
> Signed-off-by: Paolo Bonzini 
> ---
>  target-i386/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 799fdfa..1968f04 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  c = &cpuid_data.entries[cpuid_i++];
>  c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
>  if (!cpu->hyperv_vendor_id) {
> -memcpy(signature, "Microsoft Hv", 12);
> +memcpy(signature, "KVM<3HyperV!", 12);

I think it'd be nicer to use the unicode heart symbol ♡ encoded as utf-8
eg

  memcpy(signature, "KVM\xe2\x99\xa1HyperV", 12);


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:57, Richard W.M. Jones wrote:
> That's the qemu option?  Currently we use
>   -nodefconfig -nodefaults -display none
> which I assumed would do the same thing (ie. not add any default
> devices).  Does -vga none do more than this?

No, I missed the -nodefaults.

Paolo



Re: [Qemu-devel] [PATCH for-2.6] target-i386: KVM loves Hyper-V!

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 10:58, Daniel P. Berrange wrote:
> On Fri, Apr 01, 2016 at 10:18:02AM +0200, Paolo Bonzini wrote:
>> Microsoft loves Linux, and Red Hat loves .NET.  Since we can put whatever
>> we want in the Hyper-V vendor signature, let's show some love too!
>>
>> Cc: Andreas Färber 
>> Cc: Alex Williamson 
>> Cc: Denis V. Lunev 
>> Cc: Eduardo Habkost 
>> Cc: Roman Kagan 
>> Cc: KY Srinivasan 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  target-i386/kvm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 799fdfa..1968f04 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -592,7 +592,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>  c = &cpuid_data.entries[cpuid_i++];
>>  c->function = HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS;
>>  if (!cpu->hyperv_vendor_id) {
>> -memcpy(signature, "Microsoft Hv", 12);
>> +memcpy(signature, "KVM<3HyperV!", 12);
> 
> I think it'd be nicer to use the unicode heart symbol ♡ encoded as utf-8
> eg
> 
>   memcpy(signature, "KVM\xe2\x99\xa1HyperV", 12);

I'll try to get v2 ready in the next 13 hours and 53 minutes.

Paolo



Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property

2016-04-01 Thread Wen Congyang
On 04/01/2016 04:24 PM, Hailiang Zhang wrote:
> On 2016/4/1 15:39, Jason Wang wrote:
>>
>>
>> On 04/01/2016 03:08 PM, zhanghailiang wrote:
>>> After commit 338d3f, we support 'status' property for filter object.
>>> The segfault can be triggered by starting qemu with 'status=off' property
>>> for filter, when the s->incoming_queue is NULL, we reference it directly
>>> in qemu_net_queue_flush().
>>>
>>> Let's check the value of 's->incoming_queue' before calling
>>> qemu_net_queue_flush().
>>>
>>> Signed-off-by: zhanghailiang 
>>> ---
>>>   net/filter-buffer.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/filter-buffer.c b/net/filter-buffer.c
>>> index cc6bd94..79e2ce3 100644
>>> --- a/net/filter-buffer.c
>>> +++ b/net/filter-buffer.c
>>> @@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf)
>>>   {
>>>   FilterBufferState *s = FILTER_BUFFER(nf);
>>>
>>> -if (!qemu_net_queue_flush(s->incoming_queue)) {
>>> +if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) {
>>>   /* Unable to empty the queue, purge remaining packets */
>>>   qemu_net_queue_purge(s->incoming_queue, nf->netdev);
>>>   }
>>
>> We'd better handle this at generic layer and don't let a specific net
>> filter need to worry about this.
>>
>> Looks like the issue is we may trigger status_changed() too early (even
>> before the the filter was initialized).
>>
> 
> Yes ~
> 
>> How about not call status_changed() if the initialization is not done?
>>
> 
> But seems that it is difficult to confirm if the filter is initialized
> or not ...

If nfc->setup() is not called, nf->netdev is NULL.

Thanks
Wen Congyang

> 
>> .
>>
> 
> 
> 
> 
> 






Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Dr. David Alan Gilbert
* Richard W.M. Jones (rjo...@redhat.com) wrote:
> On Fri, Apr 01, 2016 at 10:24:37AM +0200, Paolo Bonzini wrote:
> > On 01/04/2016 10:14, Richard W.M. Jones wrote:
> > > Found it: only CONFIG_MPTABLE=y was necessary.  It boots with:
> > > 
> > > # CONFIG_PIRTABLE is not set
> > > CONFIG_MPTABLE=y
> > > # CONFIG_SMBIOS is not set
> > > # CONFIG_ACPI is not set
> > 
> > If you add all three it should not give any slowdown and will provide
> > full hardware features to the kernel.  qboot does ACPI and PCI bus
> > assignment (it doesn't do SMBIOS because I got bored debugging it. :))
> 
> Enabling all 4 adds about 2ms.
> 
> However the overhead of SeaBIOS is still down from 68ms to 18ms
> (4.0% of total boot time down to 1.1%) so it's still a big gain.
> 
> I wonder how we can make use of this in qemu and downstream distros?
> Can we have a bios-min.bin which is used with -kernel boots?

Could we pass a flag at runtime that gets the standard image to skip a lot
of the stuff you don't want?

Or is there anything else qemu could provide (e.g. a fast way to get
to PCI config space?)

Dave

--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Gerd Hoffmann
  Hi,

> I wonder how we can make use of this in qemu and downstream distros?
> Can we have a bios-min.bin which is used with -kernel boots?

We already build two seabios roms: one full featued and one slightly
stripped down to keep it below 128k, for backward compatibility with old
machine types.

Adding a third config for -kernel boot should be easy.  For that use
case we can probably also turn on seabios logging to the serial console
and drop sgabios.  We don't need input (no boot menu) and we also don't
need to hook into int10 (no grub/ipxe using that for output).

cheers,
  Gerd




Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 08:43, Paolo Bonzini  wrote:

> 
> On 31/03/2016 22:17, Alex Bligh wrote:
 In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
 that's the same function it calls for write+FUA.
>> That's harmless, but unnecessary in the sense that current documented
>> behaviour doesn't require it. Perhaps it should?
>> 
>> I suppose TRIM etc. should support FUA too?
> 
> TRIM is an advisory operation, so it doesn't make sense to force access
> to the medium.

Reflecting, that's correct. FUA can't mean anything sensible for
TRIM.

-- 
Alex Bligh







Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 08:49, Paolo Bonzini  wrote:
> 
> On 31/03/2016 22:34, Eric Blake wrote:
>> give me possibly stale data because I accessed
>> the underlying storage rather than paying attention to in-flight writes
>> that would change what I read".
> 
> Overlapping I/O is always unspecified, so you should expect stale data
> if a read overlaps an in-flight write.

Sure, that's a consideration for overlapping I/O in general rather
than FUA. You need to wait until a write operation has completed to be
able to guarantee that a read operation will read back the written
data (whether or not it has been persisted to disk rather than
e.g. sitting in a cache somewhere). A read following a FUA write
that has not completed could be processed prior to the write. Similarly
a read following a non-FUA write that has completed must read the written
data even if it's not persisted.

I'm interested with either Qemu or the kernel have a sense of FUA
on reads. Do you have a view on that?

This (which was the inspiration for nbd FUA and now appears slightly
clearer than I remember at the time):
https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt

says

> The REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the
> filesystem and will make sure that I/O completion for this request is only
> signaled after the data has been committed to non-volatile storage.

and

> For devices that also support the FUA bit the block layer needs
> to be told to pass through the REQ_FUA bit using:
> 
>   blk_queue_flush(sdkp->disk->queue, REQ_FLUSH | REQ_FUA);
> 
> and the driver must handle write requests that have the REQ_FUA bit set
> in prep_fn/request_fn.  If the FUA bit is not natively supported the block
> layer turns it into an empty REQ_FLUSH request after the actual write.

That's pretty ambiguous as to whether you need to handle it for reads
as well. If it does read+FUA means that the read needs an fdatasync()
type operation to happen.

-- 
Alex Bligh







Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 09:35, Wouter Verhelst  wrote:

>> +* All write commands (that includes both `NBD_CMD_WRITE` and
>> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
>> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
>> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD 
>> ensure
>> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
>> +  (whether they are replied to or not) are written to non-volatile
>> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
>> +  stronger condition than the previous 'MUST' condition. This
> 
> This seems to make little sense. Are you saying that suddenly now
> sending a reply for FLUSH with outstanding writes is wrong? If not, the
> above should be clarified.

The MUST sentence does not cover that situation as it only refers
to completed writes.

The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
of writes that have PROCESSED (i.e actioned) whether or not they
have been replied to. Of course the client has no way of knowing
whether they have been PROCESSED without a reply.

Personally I think the SHOULD clause is pretty pointless and is
unnecessary, but that's where the conversation got to n years
ago I believe.

Happy to delete the last sentence if that's wrong.

-- 
Alex Bligh







[Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes

2016-04-01 Thread Ladi Prosek
The write path for pass-through devices, commonly used for controlling
keyboard LEDs via EV_LED, was not implemented. This commit adds the
necessary plumbing to connect the status virtio queue to the host evdev
file descriptor.

Signed-off-by: Ladi Prosek 
---
Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't
believe that this will be happening in practise for local evdev devices.
Of course, all bets are off if fd is an arbitrary file, but we would be
running into issues anyway (right now, if virtio_input_host_event doesn't
read all of sizeof(struct virtio_input_event) bytes, we get out of sync
with the event stream). So part of me would just do one best-effort
write and ignore all errors. I'm curious what you think. 

 hw/input/virtio-input-host.c | 85 
 include/hw/virtio/virtio-input.h |  8 
 2 files changed, 93 insertions(+)

diff --git a/hw/input/virtio-input-host.c b/hw/input/virtio-input-host.c
index 36856d6..bc02d3b 100644
--- a/hw/input/virtio-input-host.c
+++ b/hw/input/virtio-input-host.c
@@ -42,6 +42,55 @@ static void virtio_input_host_event(void *opaque)
 }
 }
 
+static int virtio_input_host_write_event(VirtIOInputHost *vih,
+ struct timeval *time,
+ struct virtio_input_event *virtio)
+{
+struct input_event evdev;
+int rc;
+
+evdev.time = *time;
+evdev.type = le16_to_cpu(virtio->type);
+evdev.code = le16_to_cpu(virtio->code);
+evdev.value = le32_to_cpu(virtio->value);
+
+rc = write(vih->fd, &evdev, sizeof(evdev));
+if (rc == -1) {
+rc = errno;
+if (rc == EAGAIN || rc == EWOULDBLOCK) {
+return EAGAIN;
+}
+perror("virtio_input_host_write_event: write");
+return rc;
+}
+if (rc != sizeof(evdev)) {
+return EINVAL;
+}
+return 0;
+}
+
+static void virtio_input_writable(void *opaque)
+{
+VirtIOInputHost *vih = opaque;
+VirtIOInputEvent *vie;
+int rc;
+
+while (!QSIMPLEQ_EMPTY(&vih->pending_writes)) {
+vie = QSIMPLEQ_FIRST(&vih->pending_writes);
+
+rc = virtio_input_host_write_event(vih, &vie->time, &vie->virtio);
+if (rc == EAGAIN) {
+break;
+}
+QSIMPLEQ_REMOVE_HEAD(&vih->pending_writes, next);
+g_free(vie);
+}
+
+if (QSIMPLEQ_EMPTY(&vih->pending_writes)) {
+qemu_set_fd_handler(vih->fd, virtio_input_host_event, NULL, vih);
+}
+}
+
 static void virtio_input_bits_config(VirtIOInputHost *vih,
  int type, int count)
 {
@@ -127,6 +176,8 @@ static void virtio_input_host_realize(DeviceState *dev, 
Error **errp)
 virtio_input_bits_config(vih, EV_LED, LED_CNT);
 
 qemu_set_fd_handler(vih->fd, virtio_input_host_event, NULL, vih);
+
+QSIMPLEQ_INIT(&vih->pending_writes);
 return;
 
 err_close:
@@ -138,11 +189,44 @@ err_close:
 static void virtio_input_host_unrealize(DeviceState *dev, Error **errp)
 {
 VirtIOInputHost *vih = VIRTIO_INPUT_HOST(dev);
+VirtIOInputEvent *vie;
 
 if (vih->fd > 0) {
 qemu_set_fd_handler(vih->fd, NULL, NULL, NULL);
 close(vih->fd);
 }
+
+while (!QSIMPLEQ_EMPTY(&vih->pending_writes)) {
+vie = QSIMPLEQ_FIRST(&vih->pending_writes);
+QSIMPLEQ_REMOVE_HEAD(&vih->pending_writes, next);
+g_free(vie);
+}
+}
+
+static void virtio_input_host_handle_status(VirtIOInput *vinput,
+virtio_input_event *event)
+{
+VirtIOInputHost *vih = VIRTIO_INPUT_HOST(vinput);
+struct timeval time;
+int rc;
+
+if (gettimeofday(&time, NULL)) {
+perror("virtio_input_host_handle_status: gettimeofday");
+return;
+}
+
+rc = virtio_input_host_write_event(vih, &time, event);
+if (rc == EAGAIN) {
+VirtIOInputEvent *vie = g_malloc(sizeof(*vie));
+vie->time = time;
+vie->virtio = *event;
+
+if (QSIMPLEQ_EMPTY(&vih->pending_writes)) {
+qemu_set_fd_handler(vih->fd, virtio_input_host_event,
+virtio_input_writable, vih);
+}
+QSIMPLEQ_INSERT_TAIL(&vih->pending_writes, vie, next);
+}
 }
 
 static const VMStateDescription vmstate_virtio_input_host = {
@@ -164,6 +248,7 @@ static void virtio_input_host_class_init(ObjectClass 
*klass, void *data)
 dc->props  = virtio_input_host_properties;
 vic->realize   = virtio_input_host_realize;
 vic->unrealize = virtio_input_host_unrealize;
+vic->handle_status = virtio_input_host_handle_status;
 }
 
 static void virtio_input_host_init(Object *obj)
diff --git a/include/hw/virtio/virtio-input.h b/include/hw/virtio/virtio-input.h
index af1c207..d41d650 100644
--- a/include/hw/virtio/virtio-input.h
+++ b/include/hw/virtio/virtio-input.h
@@ -61,6 +61,7 @@ typedef struct VirtIOInputClass VirtIOInputClass;
 typedef struct VirtIOI

Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 08:59, Wouter Verhelst  wrote:

> On Thu, Mar 31, 2016 at 07:15:32PM +0100, Alex Bligh wrote:
>> NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but
>> 1<<16 in nbd.h. It is not used anywhere within the code.
> 
> Yes it is:
> 
> wouter@gangtai:~/code/c/nbd$ grep -rl CMD_FLAG_FUA *
> doc/proto.md
> make-integrityhuge.c
> nbd.h
> nbd-server.c
> nbd-trdump.c
> tests/run/nbd-tester-client.c
> wouter@gangtai:~/code/c/nbd$ 
> 
> I don't mind bringing the code in sync with what the documentation says,
> but it should not change behaviour ;-)

Sorry - I think I must have done 'git grep' in the wrong directory
or something. Ignore this patch and I'll redo it.

I think probably the easiest fix is (beware, manually generated
untested patch) as follows, which at least shows the reason for the
discrepancy.

diff --git a/nbd.h b/nbd.h
index f2a32dd..53b6ca1 100644
--- a/nbd.h
+++ b/nbd.h
@@ -38,7 +38,9 @@ enum {
};

#define NBD_CMD_MASK_COMMAND 0x
-#define NBD_CMD_FLAG_FUA (1<<16)
+#define NBD_CMD_SHIFT (16)
+#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT)

/* values for flags field */
#define NBD_FLAG_HAS_FLAGS  (1 << 0)/* Flags are there */

-- 
Alex Bligh







Re: [Qemu-devel] Virtio-9p

2016-04-01 Thread Greg Kurz
On Fri, 1 Apr 2016 09:09:32 +0200
Pradeep Kiruvale  wrote:

> Hi Greg,
> 

Hi Pradeep,

I'm Cc'ing qemu-devel like in your previous posts, so QEMU experts may jump in.

> What I understand from the requirement for our project is if we use
> virtio-blk it caches the pages in the guest. We would like to avoid that

AFAIK this is true when you pass cache=none to -drive on the QEMU command
line. But there are other options such as writeback or writethrough, which
rely on the host page cache.

> and also we want to share those pages across multiple guests and when they
> need some data they should get it from the host instead of using the cached
> data at each guest.
> 

So you want all the guests to attach to the same block device or backing file in
the host, correct ? AFAIK we cannot do that with virtio-blk indeed... and 
virtio-9p
is only about sharing files, not block devices.

Maybe you could share a big file between the host and all guests with 9p, and
each guest can use the loop device to access the file as a block device... but
even then, you'd have to deal with concurrent accesses...

> Basically we are trying to cut down the memory foot print of the guests.
> 

If you're using KVM and your guests run the same distro or application,
you may try to use KSM (Kernel Same-page Merging) in the host.

> Regards,
> Pradeep
> 
> On 31 March 2016 at 18:12, Greg Kurz  wrote:
> 
> > On Wed, 30 Mar 2016 16:27:48 +0200
> > Pradeep Kiruvale  wrote:
> >
> > > Hi Greg,
> > >
> >
> > Hi Pradeep,
> >
> > > Thanks for the reply.
> > >
> > > Let me put it this way, virtio-blk-pci is used for block IO on the
> > devices
> > > shared between the guest and the host.
> >
> > I don't really understand the "devices shared between the guest and the
> > host" wording... virtio-blk-pci exposes a virtio-blk device through PCI
> > to the guest. The virtio-blk device can be backed by a file or a block
> > device from the host.
> >
> > > Here I want to share the file and have QoS between the guests. So I am
> > > using the Virtio-9p-pci.
> > >
> >
> > What file ?
> >
> > > Basically I want to have QoS for virtio-9p-pci.
> > >
> >
> > Can you provide a more detailed scenario on the result you want to reach ?
> >
> > > Regards,
> > > Pradeep
> > >
> >
> > Cheers.
> >
> > --
> > Greg
> >
> > > On 30 March 2016 at 16:13, Greg Kurz  wrote:
> > >
> > > > On Wed, 30 Mar 2016 14:10:38 +0200
> > > > Pradeep Kiruvale  wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > Is virtio-9p-pci device only supports the fsdev deices? I am trying
> > to
> > > > use
> > > > > -drive option for applying QoS for block device using Virtio-9p-pci
> > > > device,
> > > > > but failing to create/add a device other than fsdev. Can you please
> > help
> > > > me
> > > > > on this?
> > > > >
> > > > > Regards,
> > > > > Pradeep
> > > >
> > > > Hi Pradeep,
> > > >
> > > > Not sure to catch what you want to do but I confirm that virti-9p-pci
> > only
> > > > supports
> > > > fsdev... if you want a block device, why don't you use virtio-blk-pci ?
> > > >
> > > > Cheers.
> > > >
> > > > --
> > > > Greg
> > > >
> > > >
> >
> >




Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 09:27, Wouter Verhelst  wrote:

>> It's also tricky because we just barely documented that servers SHOULD
>> reject invalid flags with EINVAL; and that clients MUST NOT send FUA on
>> commands where it is not documented; I don't know if we have an adequate
>> discovery system in place to learn _which_ commands support FUA,
> 
> That's what I'm mostly worried about. Yes, we have FUA, and yes, some
> clients may send it on commands that aren't WRITE, but it is not very
> well defined what happens then:

Actually the text says:

> Clients MUST NOT set a command flag bit that is not documented for the 
> particular command; and whether a flag is valid may depend on negotiation 
> during the handshake phase.


So this gives us an easy out. We document NBD_CMD_FLAG_FUA for every command, 
but say it is meaningless for commands outside some subset (see below), and 
therefore 'SHOULD NOT' be used by the client. For commands outside that subset, 
the server SHOULD ignore it. That means that we don't need to change the above 
text, as it is documented.

However, I'm not clear what the subset of commands NBD_CMD_FLAG_FUA should work 
on is. I originally thought it was anything that writes. Paolo has pointed out 
it's pointless for TRIM given it's advisory. It makes sense for anything that 
writes holes. I am not clear what it should do for reads - the original 
semantic was meant to be the kernel bio layer semantic, and (as per message to 
Paolo) the kernel documentation is less than helpful in whether it can be set 
on a read bio.

-- 
Alex Bligh







[Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist

2016-04-01 Thread Cédric Le Goater
In some occasions, a patch [1] can start with a hunk containing a
simple type cast. At the time annotate_values() is run, the type is
unknown and the cast type is misinterpreted as a identifier, resulting
in an error if it is followed with a negative value:

ERROR: spaces required around that '-' (ctx:WxV)

It seems complex to catch all possible types in a cast expression. So,
as a fallback solution, let's add some common qemu types to the
typeList array.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06741.html

Signed-off-by: Cédric Le Goater 
---
 scripts/checkpatch.pl |1 +
 1 file changed, 1 insertion(+)

Index: qemu-dgibson-for-2.6.git/scripts/checkpatch.pl
===
--- qemu-dgibson-for-2.6.git.orig/scripts/checkpatch.pl
+++ qemu-dgibson-for-2.6.git/scripts/checkpatch.pl
@@ -212,6 +212,7 @@ our @typeList = (
qr{${Ident}_t},
qr{${Ident}_handler},
qr{${Ident}_handler_fn},
+   qr{target_(?:u)?long},
 );
 
 # This can be modified by sub possible.  Since it can be empty, be careful




[Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine

2016-04-01 Thread Fam Zheng
Using the nested aio_poll() in coroutine is a bad idea. This patch
replaces the aio_poll loop in bdrv_drain with a BH, if called in
coroutine.

For example, the bdrv_drain() in mirror.c can hang when a guest issued
request is pending on it in qemu_co_mutex_lock().

Mirror coroutine in this case has just finished a request, and the block
job is about to complete. It calls bdrv_drain() which waits for the
other coroutine to complete. The other coroutine is a scsi-disk request.
The deadlock happens when the latter is in turn pending on the former to
yield/terminate, qemu_co_mutex_lock(). The state flow is as below
(assuming a qcow2 image):

  mirror coroutine   scsi-disk coroutine
  -
  do last write

qcow2:qemu_co_mutex_lock()
...
 scsi disk read

   tracked request begin

   qcow2:qemu_co_mutex_lock.enter

qcow2:qemu_co_mutex_unlock()

  bdrv_drain
while (has tracked request)
  aio_poll()

In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return
because the mirror coroutine is blocked in the aio_poll(blocking=true).

Reported-by: Laurent Vivier 
Suggested-by: Paolo Bonzini 
Signed-off-by: Fam Zheng 
---
 block/io.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/block/io.c b/block/io.c
index c4869b9..d0a4551 100644
--- a/block/io.c
+++ b/block/io.c
@@ -253,6 +253,42 @@ static void bdrv_drain_recurse(BlockDriverState *bs)
 }
 }
 
+typedef struct {
+Coroutine *co;
+BlockDriverState *bs;
+bool done;
+} BdrvCoDrainData;
+
+static void bdrv_co_drain_bh_cb(void *opaque)
+{
+BdrvCoDrainData *data = opaque;
+Coroutine *co = data->co;
+
+bdrv_drain(data->bs);
+data->done = true;
+qemu_coroutine_enter(co, NULL);
+}
+
+static void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
+{
+QEMUBH *bh;
+BdrvCoDrainData data;
+
+assert(qemu_in_coroutine());
+data = (BdrvCoDrainData) {
+.co = qemu_coroutine_self(),
+.bs = bs,
+.done = false,
+};
+bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data),
+qemu_bh_schedule(bh);
+
+do {
+qemu_coroutine_yield();
+} while (!data.done);
+qemu_bh_delete(bh);
+}
+
 /*
  * Wait for pending requests to complete on a single BlockDriverState subtree,
  * and suspend block driver's internal I/O until next request arrives.
@@ -269,6 +305,10 @@ void bdrv_drain(BlockDriverState *bs)
 bool busy = true;
 
 bdrv_drain_recurse(bs);
+if (qemu_in_coroutine()) {
+bdrv_co_drain(bs);
+return;
+}
 while (busy) {
 /* Keep iterating */
  bdrv_flush_io_queue(bs);
-- 
2.7.4




Re: [Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 11:46, Fam Zheng wrote:
> +
> +static void bdrv_co_drain_bh_cb(void *opaque)
> +{
> +BdrvCoDrainData *data = opaque;
> +Coroutine *co = data->co;
> +
> +bdrv_drain(data->bs);
> +data->done = true;
> +qemu_coroutine_enter(co, NULL);
> +}
> +
> +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
> +{
> +QEMUBH *bh;
> +BdrvCoDrainData data;
> +
> +assert(qemu_in_coroutine());
> +data = (BdrvCoDrainData) {
> +.co = qemu_coroutine_self(),
> +.bs = bs,
> +.done = false,
> +};
> +bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data),
> +qemu_bh_schedule(bh);
> +
> +do {
> +qemu_coroutine_yield();
> +} while (!data.done);

The loop and "done" is not necessary.  Also,

> +qemu_bh_delete(bh);

this can be moved to bdrv_co_drain_bh_cb before bdrv_drain, so that the
bottom half doesn't slow down the event loop until bdrv_drain completes.

Paolo

> +}



Re: [Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist

2016-04-01 Thread Greg Kurz
On Fri,  1 Apr 2016 11:40:06 +0200
Cédric Le Goater  wrote:

> In some occasions, a patch [1] can start with a hunk containing a
> simple type cast. At the time annotate_values() is run, the type is
> unknown and the cast type is misinterpreted as a identifier, resulting
> in an error if it is followed with a negative value:
> 
>   ERROR: spaces required around that '-' (ctx:WxV)
> 
> It seems complex to catch all possible types in a cast expression. So,
> as a fallback solution, let's add some common qemu types to the
> typeList array.
> 
> [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg06741.html
> 
> Signed-off-by: Cédric Le Goater 
> ---

Indeed, with this patch, the checkpatch script stops complaining when it
sees:

+if (prefix == (target_ulong) -1ULL) {

And I tend to agree with Cedric... unless Paolo has a bright idea of course. :)

Acked-by: Greg Kurz 

>  scripts/checkpatch.pl |1 +
>  1 file changed, 1 insertion(+)
> 
> Index: qemu-dgibson-for-2.6.git/scripts/checkpatch.pl
> ===
> --- qemu-dgibson-for-2.6.git.orig/scripts/checkpatch.pl
> +++ qemu-dgibson-for-2.6.git/scripts/checkpatch.pl
> @@ -212,6 +212,7 @@ our @typeList = (
>   qr{${Ident}_t},
>   qr{${Ident}_handler},
>   qr{${Ident}_handler_fn},
> + qr{target_(?:u)?long},
>  );
> 
>  # This can be modified by sub possible.  Since it can be empty, be careful




[Qemu-devel] [PATCHv2] Improve documentation of FUA and FLUSH

2016-04-01 Thread Alex Bligh
Improve the documentation of NBD_CMD_FLUSH and NBD_CMD_FLAG_FUA. Specifically
the latter may be set on any command, and its semantics on commands other
than NBD_CMD_WRITE need explaining. Further, explain how these relate to
reordering of commands.

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 83 +---
 1 file changed, 68 insertions(+), 15 deletions(-)

Changes since v1:

* Rebased on master

* Eric's modifications

* Included 2 options as to how we might handle FUA on non-write commands.
  Let's establish whether either the kernel or Qemu have meaningful uses for
  read+FUA.

diff --git a/doc/proto.md b/doc/proto.md
index 8c83382..389b186 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -213,6 +213,47 @@ handle as was sent by the client in the corresponding 
request.  In
 this way, the client can correlate which request is receiving a
 response.
 
+ Ordering of messages and writes
+
+The server MAY process commands out of order, and MAY reply out of
+order, save that:
+
+* All write commands (that includes both `NBD_CMD_WRITE` and
+  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
+  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
+  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD ensure
+  that all write command received prior to processing the `NBD_CMD_FLUSH`
+  (whether they are replied to or not) are written to non-volatile
+  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
+  stronger condition than the previous 'MUST' condition. This
+  paragraph only applies if `NBD_FLAG_SEND_FLUSH` is set within
+  the transmission flags, as otherwise `NBD_CMD_FLUSH` will never
+  be sent by the client to the server.
+
+* A server MUST NOT reply to a command that has `NBD_CMD_FLAG_FUA` set
+
+[Option #C1]
+
+  in its command flags until the data area referred to by that command
+
+[Option #C1]
+
+  in its command flags until the data (if any) written by that command
+
+[All options]
+
+  is persisted to non-volatile storage. This only applies if
+  `NBD_FLAG_SEND_FLUSH` is set within the transmission flags, as otherwise
+  `NBD_CMD_FLAG_FUA` will not be set on any commands sent to the server
+  by the client.
+
+`NBD_CMD_FLUSH` is modelled on the Linux kernel empty bio with
+`REQ_FLUSH` set. `NBD_CMD_FLAG_FUA` is modelled on the Linux
+kernel bio with `REQ_FUA` set. In case of ambiguity in this
+specification, the
+[kernel 
documentation](https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt)
+may be useful.
+
  Request message
 
 The request message, sent by the client, looks as follows:
@@ -480,10 +521,33 @@ affects a particular command.  Clients MUST NOT set a 
command flag bit
 that is not documented for the particular command; and whether a flag is
 valid may depend on negotiation during the handshake phase.
 
-- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
-  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client requires
-  "Force Unit Access" mode of operation.  MUST NOT be set unless transmission
-  flags included `NBD_FLAG_SEND_FUA`.
+- bit 0, `NBD_CMD_FLAG_FUA`. This flag is valid for all commands. This bit
+  MUST be set to 0 unless the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access")
+  was set in the transmission flags field. If the `NBD_FLAG_SEND_FUA`
+  is set in the transmission flags field, the client MAY set
+  `NBD_CMD_FLAG_FUA` in any request. If this bit is set, the server
+
+[Option #C1]
+
+  MUST NOT send a reply until it has ensured that any data referred to
+  by this request (i.e. written data on a `NBD_CMD_WRITE` or
+  `NBD_CMD_WRITE_ZEROES`, read data on an `NBD_CMD_READ`) has reached
+  permanent storage. There will be certain commands
+  (e.g. `NBD_CMD_DISC`) for which this flag will thus not alter behaviour
+  (as the command does not refer to any data), in which case the server
+  MUST ignore this bit.
+
+[Option #C2]
+
+  MUST NOT send a reply until it has ensured that the data (if any) written
+  by this request (e.g. by `NBD_CMD_WRITE` or `NBD_CMD_WRITE_ZEROES`) has
+  reached permanent storage. There will be certain commands (e.g. 
`NBD_CMD_READ`
+  and `NBD_CMD_DISC`) for which this flag will thus not alter behaviour
+  (as the command does not write any data), in which case the server
+  MUST ignore this bit.
+
+[All options]
+
 - bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
   extension; see below.
 - bit 2, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
@@ -532,12 +596,6 @@ The following request types exist:
 message. The server MAY send the reply message before the data has
 reached permanent storage.
 
-If the `NBD_FLAG_SEND_FUA` flag ("Force Unit Access") was set in the
-transmission flags field, the client MAY set the flag `NBD_CMD_FLAG_FUA` in
-the command flags field. If this flag was set, the server MUST NOT send
-the re

[Qemu-devel] [PATCH] qemu-iotests: 149: Use "/usr/bin/env python"

2016-04-01 Thread Fam Zheng
Do the same as other scripts, to pick the correct interpreter between
python2 and python3 from the environment.

Signed-off-by: Fam Zheng 
---
 tests/qemu-iotests/149 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
index bb5811d..52e23d2 100755
--- a/tests/qemu-iotests/149
+++ b/tests/qemu-iotests/149
@@ -1,4 +1,4 @@
-#!/usr/bin/python
+#!/usr/bin/env python
 #
 # Copyright (C) 2016 Red Hat, Inc.
 #
-- 
2.7.4




[Qemu-devel] [PATCH] doc/proto.md: Clients MUST not set unknown client flags

2016-04-01 Thread Alex Bligh
Clients MUST NOT set unknown client flags. Currently this is
permitted (but 'SHOULD NOT' be done), with the result that the
server MUST drop the connection if it happens. This in effect
gives the client an inappropriate way to close the connection.

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/proto.md b/doc/proto.md
index 8c83382..6a7ebda 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -308,7 +308,7 @@ receiving the handshake flags from the server.
   set `NBD_FLAG_NO_ZEROES`. If set, the server MUST NOT send the 124
   bytes of zeroes at the end of the negotiation.
 
-Clients SHOULD NOT set any other flags; the server MUST drop the
+Clients MUST NOT set any other flags; the server MUST drop the
 connection if the client sets an unknown flag, or a flag that does
 not match something advertised by the server.
 
-- 
1.9.1




Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA

2016-04-01 Thread Wouter Verhelst
On Fri, Apr 01, 2016 at 10:32:50AM +0100, Alex Bligh wrote:
> #define NBD_CMD_MASK_COMMAND 0x
> -#define NBD_CMD_FLAG_FUA (1<<16)
> +#define NBD_CMD_SHIFT (16)
> +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT)

That works too, I suppose.

However, like I said, I need to clean this up anyway. I thought your
"will you take a patch" was going to do that, but if not, I'll get
around to doing it myself at some point...

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [PATCH] block: Fix bdrv_drain in coroutine

2016-04-01 Thread Fam Zheng
On Fri, 04/01 11:49, Paolo Bonzini wrote:
> 
> 
> On 01/04/2016 11:46, Fam Zheng wrote:
> > +
> > +static void bdrv_co_drain_bh_cb(void *opaque)
> > +{
> > +BdrvCoDrainData *data = opaque;
> > +Coroutine *co = data->co;
> > +
> > +bdrv_drain(data->bs);
> > +data->done = true;
> > +qemu_coroutine_enter(co, NULL);
> > +}
> > +
> > +static void coroutine_fn bdrv_co_drain(BlockDriverState *bs)
> > +{
> > +QEMUBH *bh;
> > +BdrvCoDrainData data;
> > +
> > +assert(qemu_in_coroutine());
> > +data = (BdrvCoDrainData) {
> > +.co = qemu_coroutine_self(),
> > +.bs = bs,
> > +.done = false,
> > +};
> > +bh = aio_bh_new(bdrv_get_aio_context(bs), bdrv_co_drain_bh_cb, &data),
> > +qemu_bh_schedule(bh);
> > +
> > +do {
> > +qemu_coroutine_yield();
> > +} while (!data.done);
> 
> The loop and "done" is not necessary.  Also,

I was trying to protect against the bugs similar to the one fixed in
e424aff5f30, but you're right we can make this an assertion and the loop is not
needed. If a calling coroutine is resumed unexpectedly, we will catch it and
fix it.

> 
> > +qemu_bh_delete(bh);
> 
> this can be moved to bdrv_co_drain_bh_cb before bdrv_drain, so that the
> bottom half doesn't slow down the event loop until bdrv_drain completes.

Good point! Will fix.

Thanks,

Fam



Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH

2016-04-01 Thread Wouter Verhelst
On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote:
> 
> On 1 Apr 2016, at 09:35, Wouter Verhelst  wrote:
> 
> >> +* All write commands (that includes both `NBD_CMD_WRITE` and
> >> +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
> >> +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
> >> +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD 
> >> ensure
> >> +  that all write command received prior to processing the `NBD_CMD_FLUSH`
> >> +  (whether they are replied to or not) are written to non-volatile
> >> +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
> >> +  stronger condition than the previous 'MUST' condition. This
> > 
> > This seems to make little sense. Are you saying that suddenly now
> > sending a reply for FLUSH with outstanding writes is wrong? If not, the
> > above should be clarified.
> 
> The MUST sentence does not cover that situation as it only refers
> to completed writes.
> 
> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
> of writes that have PROCESSED (i.e actioned) whether or not they
> have been replied to. Of course the client has no way of knowing
> whether they have been PROCESSED without a reply.
> 
> Personally I think the SHOULD clause is pretty pointless and is
> unnecessary, but that's where the conversation got to n years
> ago I believe.

I'm still not sure what it's supposed to mean, though. Clearly, you
should at the very least reword it, if not...

> Happy to delete the last sentence if that's wrong.

... remove it instead.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12



Re: [Qemu-devel] [Nbd] [PATCH 2/2] Correct definition of NBD_CMD_FLAG_FUA

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 10:43, Wouter Verhelst  wrote:

> On Fri, Apr 01, 2016 at 10:32:50AM +0100, Alex Bligh wrote:
>> #define NBD_CMD_MASK_COMMAND 0x
>> -#define NBD_CMD_FLAG_FUA (1<<16)
>> +#define NBD_CMD_SHIFT (16)
>> +#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT)
> 
> That works too, I suppose.
> 
> However, like I said, I need to clean this up anyway. I thought your
> "will you take a patch" was going to do that, but if not, I'll get
> around to doing it myself at some point...

How about you take that one for now and one of us will get do
doing it properly in due course. The proper solution is obviously
to use two 16 bit fields.

-- 
Alex Bligh







Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > I wonder how we can make use of this in qemu and downstream distros?
> > Can we have a bios-min.bin which is used with -kernel boots?
> 
> We already build two seabios roms: one full featued and one slightly
> stripped down to keep it below 128k, for backward compatibility with old
> machine types.
> 
> Adding a third config for -kernel boot should be easy.  For that use
> case we can probably also turn on seabios logging to the serial console
> and drop sgabios.  We don't need input (no boot menu) and we also don't
> need to hook into int10 (no grub/ipxe using that for output).

SeaBIOS logging is slow (or more likely, serial output is slow).
And sgabios is useful for debugging.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW



Re: [Qemu-devel] [Nbd] [PATCH] Improve documentation of FUA and FLUSH

2016-04-01 Thread Alex Bligh

On 1 Apr 2016, at 11:10, Wouter Verhelst  wrote:

> On Fri, Apr 01, 2016 at 10:28:03AM +0100, Alex Bligh wrote:
>> 
>> On 1 Apr 2016, at 09:35, Wouter Verhelst  wrote:
>> 
 +* All write commands (that includes both `NBD_CMD_WRITE` and
 +  `NBD_CMD_TRIM`) that the server completes (i.e. replies to)
 +  prior to processing to a `NBD_CMD_FLUSH` MUST be written to non-volatile
 +  storage prior to replying to that `NBD_CMD_FLUSH`. The server SHOULD 
 ensure
 +  that all write command received prior to processing the `NBD_CMD_FLUSH`
 +  (whether they are replied to or not) are written to non-volatile
 +  storage prior to processing an `NBD_CMD_FLUSH`; note this is a
 +  stronger condition than the previous 'MUST' condition. This
>>> 
>>> This seems to make little sense. Are you saying that suddenly now
>>> sending a reply for FLUSH with outstanding writes is wrong? If not, the
>>> above should be clarified.
>> 
>> The MUST sentence does not cover that situation as it only refers
>> to completed writes.
>> 
>> The SHOULD sentence says that's a 'SHOULD NOT' situation in respect
>> of writes that have PROCESSED (i.e actioned) whether or not they
>> have been replied to. Of course the client has no way of knowing
>> whether they have been PROCESSED without a reply.
>> 
>> Personally I think the SHOULD clause is pretty pointless and is
>> unnecessary, but that's where the conversation got to n years
>> ago I believe.
> 
> I'm still not sure what it's supposed to mean, though. Clearly, you
> should at the very least reword it, if not...
> 
>> Happy to delete the last sentence if that's wrong.
> 
> ... remove it instead.

If I can't even explain it, it doesn't bode well!

I think there are three types of writes that are relevant at the point
of replying to a FLUSH:

Type A: writes that are truly 'in flight', i.e. have been sent, but have
not been 'processed' (i.e. write(2) has not been called).

Type B: writes that are have been processed (i.e. write(2) has not been
called) but the reply has not yet been sent to the client - either the
reply hasn't been made yet or (more likely) it's sitting in some queue.

Type C: writes that have been processed and a reply sent.


The first sentence (the 'MUST') refers to Type C writes and says these
MUST be persisted to permanent storage prior to replying to the FLUSH.

Type A writes may be in a buffer server side or even client side, so
no one expects anything to happen with respect to those.

The 'SHOULD' bit concerns Type B writes. It's saying that if you've
actually done the write call (but not yet replied) you SHOULD persist
these to disk prior to replying to the FLUSH. This type of situation
doesn't (last time I looked) happen in nbd-server.c because it does
not process requests in parallel. However it may happen elsewhere.
The question is what to do with them.

I'm happy dropping anything about these writes and only dealing with
Type C writes, but the 'SHOULD' is where I thought we got to before
(I think I sent the quote from the mailing list).


-- 
Alex Bligh







Re: [Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes

2016-04-01 Thread Gerd Hoffmann
On Fr, 2016-04-01 at 11:31 +0200, Ladi Prosek wrote:
> The write path for pass-through devices, commonly used for controlling
> keyboard LEDs via EV_LED, was not implemented. This commit adds the
> necessary plumbing to connect the status virtio queue to the host evdev
> file descriptor.
> 
> Signed-off-by: Ladi Prosek 
> ---
> Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't
> believe that this will be happening in practise for local evdev devices.

Agree.

> Of course, all bets are off if fd is an arbitrary file,

It can't be, it would fail the evdev ioctls.

> with the event stream). So part of me would just do one best-effort
> write and ignore all errors. I'm curious what you think. 

Just do best-effort sounds reasonable to me.

The code to handle EGAIN will just bitrot b/c it will never be used (and
I guess it is untested too, or did you manage to provoke EGAIN somehow?)

cheers,
  Gerd




Re: [Qemu-devel] [PATCH] virtio-input: implement pass-through evdev writes

2016-04-01 Thread Ladi Prosek
On Fri, Apr 1, 2016 at 12:20 PM, Gerd Hoffmann  wrote:
> On Fr, 2016-04-01 at 11:31 +0200, Ladi Prosek wrote:
>> The write path for pass-through devices, commonly used for controlling
>> keyboard LEDs via EV_LED, was not implemented. This commit adds the
>> necessary plumbing to connect the status virtio queue to the host evdev
>> file descriptor.
>>
>> Signed-off-by: Ladi Prosek 
>> ---
>> Most of the new code has to do with handling EAGAIN/EWOULDBLOCK. I don't
>> believe that this will be happening in practise for local evdev devices.
>
> Agree.
>
>> Of course, all bets are off if fd is an arbitrary file,
>
> It can't be, it would fail the evdev ioctls.

Ah!

>> with the event stream). So part of me would just do one best-effort
>> write and ignore all errors. I'm curious what you think.
>
> Just do best-effort sounds reasonable to me.

Will do, thanks.

> The code to handle EGAIN will just bitrot b/c it will never be used (and
> I guess it is untested too, or did you manage to provoke EGAIN somehow?)

if (rand() & 1) {
return EAGAIN;
}

> cheers,
>   Gerd
>



Re: [Qemu-devel] [PATCH] target-ppc: Correct KVM synchronization for ppc_hash64_set_external_hpt()

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 05:52, David Gibson wrote:
> This seems like the right minimal fix in the qemu-2.6 timeframe to fix
> the actual bug.  However, longer term it seems like the correct thing
> to do might be to set kvm_vcpu_dirty early in the reset path.  Thoughts?

Isn't it done already? vl.c does:

pause_all_vcpus();
cpu_synchronize_all_states();
qemu_system_reset(VMRESET_REPORT);
resume_all_vcpus();

Thanks,

Paolo

> diff --git a/target-ppc/mmu-hash64.c b/target-ppc/mmu-hash64.c
> index 72c4ab5..caf41ce 100644
> --- a/target-ppc/mmu-hash64.c
> +++ b/target-ppc/mmu-hash64.c
> @@ -283,8 +283,6 @@ void ppc_hash64_set_external_hpt(PowerPCCPU *cpu, void 
> *hpt, int shift,
>  CPUPPCState *env = &cpu->env;
>  Error *local_err = NULL;
>  
> -cpu_synchronize_state(CPU(cpu));
> -
>  if (hpt) {
>  env->external_htab = hpt;
>  } else {



Re: [Qemu-devel] [PATCH] filter-buffer: fix segfault while start qemu with status=off property

2016-04-01 Thread Hailiang Zhang

On 2016/4/1 17:10, Wen Congyang wrote:

On 04/01/2016 04:24 PM, Hailiang Zhang wrote:

On 2016/4/1 15:39, Jason Wang wrote:



On 04/01/2016 03:08 PM, zhanghailiang wrote:

After commit 338d3f, we support 'status' property for filter object.
The segfault can be triggered by starting qemu with 'status=off' property
for filter, when the s->incoming_queue is NULL, we reference it directly
in qemu_net_queue_flush().

Let's check the value of 's->incoming_queue' before calling
qemu_net_queue_flush().

Signed-off-by: zhanghailiang 
---
   net/filter-buffer.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/filter-buffer.c b/net/filter-buffer.c
index cc6bd94..79e2ce3 100644
--- a/net/filter-buffer.c
+++ b/net/filter-buffer.c
@@ -34,7 +34,7 @@ static void filter_buffer_flush(NetFilterState *nf)
   {
   FilterBufferState *s = FILTER_BUFFER(nf);

-if (!qemu_net_queue_flush(s->incoming_queue)) {
+if (s->incoming_queue && !qemu_net_queue_flush(s->incoming_queue)) {
   /* Unable to empty the queue, purge remaining packets */
   qemu_net_queue_purge(s->incoming_queue, nf->netdev);
   }


We'd better handle this at generic layer and don't let a specific net
filter need to worry about this.

Looks like the issue is we may trigger status_changed() too early (even
before the the filter was initialized).



Yes ~


How about not call status_changed() if the initialization is not done?



But seems that it is difficult to confirm if the filter is initialized
or not ...


If nfc->setup() is not called, nf->netdev is NULL.



Yes, you right, Jason, what's opinion ?

Thanks,
hailiang


Thanks
Wen Congyang




.












.






Re: [Qemu-devel] [PATCH] checkpatch: add target_ulong to typelist

2016-04-01 Thread Paolo Bonzini
On 01/04/2016 11:50, Greg Kurz wrote:
> 
> And I tend to agree with Cedric... unless Paolo has a bright idea of course. 
> :)
> 
> Acked-by: Greg Kurz 

Yes, the patch is correct.

Paolo



[Qemu-devel] [PATCH] doc/proto.md: restore formatting

2016-04-01 Thread Alex Bligh
Restore formatting and correct name of 'length'

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 8c83382..03dfe2b 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -501,8 +501,8 @@ The following request types exist:
 extension was negotiated.
 
 If structured replies were not negotiated, the server MUST reply
-with a simple reply header, followed immediately by len bytes of
-data, read from offset bytes into the file, unless an error
+with a simple reply header, followed immediately by *length* bytes
+of data, read from *offset* bytes into the file, unless an error
 condition has occurred.
 
 If an error occurs, the server SHOULD set the appropriate error
-- 
1.9.1




Re: [Qemu-devel] [RFC PATCH v2.1 05/12] qdev: hotplug: Introduce HotplugHandler.pre_plug() callback

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 05:30, David Gibson wrote:
> On Thu, Mar 31, 2016 at 02:09:14PM +0530, Bharata B Rao wrote:
>> From: Igor Mammedov 
>>
>> pre_plug callback is to be called before device.realize() is executed.
>> This would allow to check/set device's properties from HotplugHandler.
>>
>> Signed-off-by: Igor Mammedov 
>> Signed-off-by: Bharata B Rao 
> 
> Reviewed-by: David Gibson 
> 
> It would be really nice to get some opinion on this from Andreas or
> Paolo.

Certainly okay for me, Igor did all of the HotplugHandler design and work.

Paolo

>> ---
>>  hw/core/hotplug.c| 11 +++
>>  hw/core/qdev.c   |  9 -
>>  include/hw/hotplug.h | 14 +-
>>  3 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/core/hotplug.c b/hw/core/hotplug.c
>> index 645cfca..17ac986 100644
>> --- a/hw/core/hotplug.c
>> +++ b/hw/core/hotplug.c
>> @@ -13,6 +13,17 @@
>>  #include "hw/hotplug.h"
>>  #include "qemu/module.h"
>>  
>> +void hotplug_handler_pre_plug(HotplugHandler *plug_handler,
>> +  DeviceState *plugged_dev,
>> +  Error **errp)
>> +{
>> +HotplugHandlerClass *hdc = HOTPLUG_HANDLER_GET_CLASS(plug_handler);
>> +
>> +if (hdc->pre_plug) {
>> +hdc->pre_plug(plug_handler, plugged_dev, errp);
>> +}
>> +}
>> +
>>  void hotplug_handler_plug(HotplugHandler *plug_handler,
>>DeviceState *plugged_dev,
>>Error **errp)
>> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
>> index db41aa1..a0b3aad 100644
>> --- a/hw/core/qdev.c
>> +++ b/hw/core/qdev.c
>> @@ -1062,6 +1062,14 @@ static void device_set_realized(Object *obj, bool 
>> value, Error **errp)
>>  g_free(name);
>>  }
>>  
>> +hotplug_ctrl = qdev_get_hotplug_handler(dev);
>> +if (hotplug_ctrl) {
>> +hotplug_handler_pre_plug(hotplug_ctrl, dev, &local_err);
>> +if (local_err != NULL) {
>> +goto fail;
>> +}
>> +}
>> +
>>  if (dc->realize) {
>>  dc->realize(dev, &local_err);
>>  }
>> @@ -1072,7 +1080,6 @@ static void device_set_realized(Object *obj, bool 
>> value, Error **errp)
>>  
>>  DEVICE_LISTENER_CALL(realize, Forward, dev);
>>  
>> -hotplug_ctrl = qdev_get_hotplug_handler(dev);
>>  if (hotplug_ctrl) {
>>  hotplug_handler_plug(hotplug_ctrl, dev, &local_err);
>>  }
>> diff --git a/include/hw/hotplug.h b/include/hw/hotplug.h
>> index 2db025d..50d84e9 100644
>> --- a/include/hw/hotplug.h
>> +++ b/include/hw/hotplug.h
>> @@ -46,7 +46,8 @@ typedef void (*hotplug_fn)(HotplugHandler *plug_handler,
>>   * hardware (un)plug functions.
>>   *
>>   * @parent: Opaque parent interface.
>> - * @plug: plug callback.
>> + * @pre_plug: pre plug callback called at start of device.realize(true)
>> + * @plug: plug callback called at end of device.realize(true).
>>   * @unplug_request: unplug request callback.
>>   *  Used as a means to initiate device unplug for devices 
>> that
>>   *  require asynchronous unplug handling.
>> @@ -59,6 +60,7 @@ typedef struct HotplugHandlerClass {
>>  InterfaceClass parent;
>>  
>>  /*  */
>> +hotplug_fn pre_plug;
>>  hotplug_fn plug;
>>  hotplug_fn unplug_request;
>>  hotplug_fn unplug;
>> @@ -74,6 +76,16 @@ void hotplug_handler_plug(HotplugHandler *plug_handler,
>>Error **errp);
>>  
>>  /**
>> + * hotplug_handler_pre_plug:
>> + *
>> + * Call #HotplugHandlerClass.pre_plug callback of @plug_handler.
>> + */
>> +void hotplug_handler_pre_plug(HotplugHandler *plug_handler,
>> +  DeviceState *plugged_dev,
>> +  Error **errp);
>> +
>> +
>> +/**
>>   * hotplug_handler_unplug_request:
>>   *
>>   * Calls #HotplugHandlerClass.unplug_request callback of @plug_handler.
> 



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] doc/memory: update MMIO section

2016-04-01 Thread Cao jin
There is no memory_region_io(). And remove a stray '-'.

Signed-off-by: Cao jin 
---
 docs/memory.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 97134e1..fddc0d9 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -37,7 +37,7 @@ MemoryRegion):
 
 - MMIO: a range of guest memory that is implemented by host callbacks;
   each read or write causes a callback to be called on the host.
-  You initialize these with memory_region_io(), passing it a MemoryRegionOps
+  You initialize these with memory_region_init_io(), passing it a 
MemoryRegionOps
   structure describing the callbacks.
 
 - ROM: a ROM memory region works like RAM for reads (directly accessing
@@ -186,7 +186,7 @@ of its own subregions: D of size 0x1000 at offset 0 and E 
of size 0x1000 at
 offset 0x2000. As a diagram:
 
 0  1000   2000   3000   4000   5000   6000   70008000
-|--|--|--|--|--|--|--|---|
+|--|--|--|--|--|--|--|--|
   A:[   ]
   C:[]
   B:  [  ]
-- 
2.1.0






[Qemu-devel] [PATCHv2 2/2] Correct definition of NBD_CMD_FLAG_FUA

2016-04-01 Thread Alex Bligh
NBD_CMD_FLAG_FUA is defined as 1<<0 in the documentation, but
1<<16 in nbd.h.

The code currently treats the command as a 32 bit quantity
and masks this off. This is confusing. Until such time as the
code is fixed up, make it obvious this isn't really bit 16.

Signed-off-by: Alex Bligh 
---
 nbd.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd.h b/nbd.h
index f2a32dd..732c605 100644
--- a/nbd.h
+++ b/nbd.h
@@ -38,7 +38,8 @@ enum {
 };
 
 #define NBD_CMD_MASK_COMMAND 0x
-#define NBD_CMD_FLAG_FUA (1<<16)
+#define NBD_CMD_SHIFT (16)
+#define NBD_CMD_FLAG_FUA ((1 << 0) << NBD_CMD_SHIFT)
 
 /* values for flags field */
 #define NBD_FLAG_HAS_FLAGS (1 << 0)/* Flags are there */
-- 
1.9.1




[Qemu-devel] [PATCHv2 1/2] proto.md: Clearly set out NBDMAGIC is the actual value

2016-04-01 Thread Alex Bligh
Clearly set out NBDMAGIC, not the name of a constant equal to
some value. Set out the value in hex as well.

Document the newstyle magic number is "IHAVEOPT".

Signed-off-by: Alex Bligh 
---
 doc/proto.md | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/doc/proto.md b/doc/proto.md
index 03dfe2b..8376021 100644
--- a/doc/proto.md
+++ b/doc/proto.md
@@ -67,7 +67,8 @@ newstyle negotiation.
 
  Oldstyle negotiation
 
-S: 64 bits, `NBDMAGIC` (also known as the `INIT_PASSWD`)  
+S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (also known as
+   the `INIT_PASSWD`)  
 S: 64 bits, `0x00420281861253` (`cliserv_magic`, a magic number)  
 S: 64 bits, size of the export in bytes (unsigned)  
 S: 32 bits, flags  
@@ -96,8 +97,10 @@ production purposes.
 
 The initial few exchanges in newstyle negotiation look as follows:
 
-S: 64 bits, `NBDMAGIC` (as in the old style handshake)  
-S: 64 bits, `0x49484156454F5054` (note different magic number)  
+S: 64 bits, `0x4e42444d41474943` (ASCII '`NBDMAGIC`') (as in the old
+   style handshake)  
+S: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note different
+   magic number)  
 S: 16 bits, handshake flags  
 C: 32 bits, flags  
 
@@ -113,7 +116,8 @@ At this point, we move on to option haggling, during which 
point the
 client can send one or (in fixed newstyle) more options to the server.
 The generic format of setting an option is as follows:
 
-C: 64 bits, `0x49484156454F5054` (note same newstyle handshake's magic number) 
 
+C: 64 bits, `0x49484156454F5054` (ASCII '`IHAVEOPT`') (note same
+   newstyle handshake's magic number)  
 C: 32 bits, option  
 C: 32 bits, length of option data (unsigned)  
 C: any data needed for the chosen option, of length as specified above.  
-- 
1.9.1




Re: [Qemu-devel] [PATCH 3/3] nbd: Reject unknown request flags

2016-04-01 Thread Paolo Bonzini


On 31/03/2016 23:20, Eric Blake wrote:
> The NBD protocol says that clients should not send a command flag
> that has not been negotiated (whether by the client requesting an
> option during a handshake, or because we advertise support for the
> flag in response to NBD_OPT_EXPORT_NAME), and that servers should
> reject invalid flags with EINVAL.  We were silently ignoring the
> flags instead.  The client can't rely on our behavior, since it is
> their fault for passing the bad flag in the first place, but it's
> better to be robust up front than to possibly behave differently
> than the client was expecting with the attempted flag.
> 
> Signed-off-by: Eric Blake 
> ---
>  nbd/server.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index a590773..31bd9c5 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -974,6 +974,10 @@ static ssize_t nbd_co_receive_request(NBDRequest *req, 
> struct nbd_request *reque
>  goto out;
>  }
> 
> +if (request->flags & ~NBD_CMD_FLAG_FUA) {
> +LOG("unsupported flags (got 0x%x)", request->flags);
> +return -EINVAL;
> +}
>  if ((request->from + request->len) < request->from) {
>  LOG("integer overflow detected! "
>  "you're probably being attacked");
> 

Queued for 2.6.

Paolo



Re: [Qemu-devel] [PATCH v2 0/2] Fixing non-blocking operation of chardevs

2016-04-01 Thread Paolo Bonzini


On 31/03/2016 17:29, Daniel P. Berrange wrote:
> This fixes socket chardevs to always be in non-blocking
> mode, as they were before the QIOChannel conversion. The
> second patch was already posted before, but dropped when
> Peter discovered a problem on OS-X causing ahci-test to
> hang:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg05807.html
> 
> I traced this down to broken EAGAIN handling affecting
> OS-X, hence the first patch in this series.
> 
> Changed in v2:
> 
>  - Also fix qemu_chr_fe_write_log method
> 
> Daniel P. Berrange (2):
>   char: fix broken EAGAIN retry on OS-X due to errno clobbering
>   char: ensure all clients are in non-blocking mode
> 
>  qemu-char.c | 39 ---
>  1 file changed, 20 insertions(+), 19 deletions(-)
> 

Thanks, queued for the next pull request (next week).

Paolo



Re: [Qemu-devel] [PATCH 2/3] nbd: Fix poor debug message

2016-04-01 Thread Paolo Bonzini


On 31/03/2016 23:20, Eric Blake wrote:
> The client sends messages to the server, not itself.
> 
> Signed-off-by: Eric Blake 
> ---
>  nbd/client.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 5af9f26..d95ad7a 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -635,7 +635,7 @@ ssize_t nbd_send_request(QIOChannel *ioc, struct 
> nbd_request *request)
>  cpu_to_be64w((uint64_t *)(buf + 16), request->from);
>  cpu_to_be32w((uint32_t *)(buf + 24), request->len);
> 
> -TRACE("Sending request to client: "
> +TRACE("Sending request to server: "
>"{ .from = %" PRIu64", .len = %u, .handle = %" PRIu64
>", .flags=%x, .type=%i}",
>request->from, request->len, request->handle, request->flags,
> 

(Rebased and) queued for 2.6.

Paolo



Re: [Qemu-devel] [PATCH] doc/memory: update MMIO section

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 12:47, Cao jin wrote:
> There is no memory_region_io(). And remove a stray '-'.
> 
> Signed-off-by: Cao jin 
> ---
>  docs/memory.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/memory.txt b/docs/memory.txt
> index 97134e1..fddc0d9 100644
> --- a/docs/memory.txt
> +++ b/docs/memory.txt
> @@ -37,7 +37,7 @@ MemoryRegion):
>  
>  - MMIO: a range of guest memory that is implemented by host callbacks;
>each read or write causes a callback to be called on the host.
> -  You initialize these with memory_region_io(), passing it a MemoryRegionOps
> +  You initialize these with memory_region_init_io(), passing it a 
> MemoryRegionOps
>structure describing the callbacks.
>  
>  - ROM: a ROM memory region works like RAM for reads (directly accessing
> @@ -186,7 +186,7 @@ of its own subregions: D of size 0x1000 at offset 0 and E 
> of size 0x1000 at
>  offset 0x2000. As a diagram:
>  
>  0  1000   2000   3000   4000   5000   6000   70008000
> -|--|--|--|--|--|--|--|---|
> +|--|--|--|--|--|--|--|--|
>A:[   ]
>C:[]
>B:  [  ]
> 

Queued for 2.6, thanks.

Paolo



Re: [Qemu-devel] Ballooning on TPS!=HPS hosts

2016-04-01 Thread Amit Shah
CC'ing virtualization list.

On (Thu) 31 Mar 2016 [19:00:24], Dr. David Alan Gilbert wrote:
> Hi,
>   I was reading the balloon code and am confused as to how/if ballooning
> works on hosts where the host page size is larger than the
> target page size.
> 
> static void balloon_page(void *addr, int deflate)
> {
> #if defined(__linux__)
> if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
>  kvm_has_sync_mmu())) {
> qemu_madvise(addr, TARGET_PAGE_SIZE,
> deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
> }
> #endif
> }
> 
> The virtio-balloon code only does stuff through ballon_page,
> and an madvise DONTNEED should fail if you try and do it on
> a size smaller than the host page size.  So does ballooning work on
> Power/ARM?
> 
> Am I misunderstanding this?

I think you're right.  Guess no one's tested this in such scenarios
yet.

> Of course looking at the above we won't actually generate an error since
> we don't check the return of qemu_madvise.

... at least we can deflate the balloon in case the madvise fails, so
the guest can use the pages it's given us.

> We have three sizes:
> a) host page size
> b) target page size
> c) VIRTIO_BALLOON_PFN_SHIFT
> 
>  c == 12 (4k) for everyone
>  
> 
> 1) I think the virtio-balloon code needs to coallesce adjecent requests
>   and call balloon_page on whole chunks at once passing a length.
> 2) why does balloon_page use TARGET_PAGE_SIZE, ignoring anything else
>shouldn't it be 1 << VIRTIO_BALLOON_PFN_SHIFT ?
> 3) I'm guessing the guest kernel doesn't know the host page size, so
>how can it know what size chunks of balloon to work in?

Thanks,

Amit



Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by skipping free pages

2016-04-01 Thread Amit Shah
On (Tue) 22 Mar 2016 [19:05:31], Dr. David Alan Gilbert wrote:
> * Liang Li (liang.z...@intel.com) wrote:

> > b. Implement a new virtio device
> > Implementing a brand new virtio device to exchange information
> > between host and guest is another choice. It requires modifying the
> > virtio spec too.
> 
> If the right solution is to change the spec then we should do it;
> we shouldn't use a technically worse solution just to avoid the spec
> change; although we have to be even more careful to get the right
> solution if we want to change the spec.

Yes, absolutely.  I didn't mean to suggest virtio-serial because it'll
help us save with modifying specs.  I suggested it because for me, the
most important point was that if a guest is sensitive to latencies or
spikes, the balloon driver is definitely going to be not installed by
the guest admin.

(I'm still not caught up on this thread, but wanted to clarify this
right away.)

Amit



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Gerd Hoffmann
On Fr, 2016-04-01 at 11:17 +0100, Richard W.M. Jones wrote:
> On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > I wonder how we can make use of this in qemu and downstream distros?
> > > Can we have a bios-min.bin which is used with -kernel boots?
> > 
> > We already build two seabios roms: one full featued and one slightly
> > stripped down to keep it below 128k, for backward compatibility with old
> > machine types.
> > 
> > Adding a third config for -kernel boot should be easy.  For that use
> > case we can probably also turn on seabios logging to the serial console
> > and drop sgabios.  We don't need input (no boot menu) and we also don't
> > need to hook into int10 (no grub/ipxe using that for output).
> 
> SeaBIOS logging is slow (or more likely, serial output is slow).
> And sgabios is useful for debugging.

Ah, I see.  debuglevel=1 prints too much and debuglevel=0 has no logging
at all.  We'd need seabios log at least the version banner to serial
even with debuglevel=0 to replace sgabios.

https://www.kraxel.org/cgit/qemu/log/?h=work/stripped-seabios

Not hooked into -kernel yet, so you have to use "-bios bios-kboot.bin".

Comments?

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-04-01 Thread Jitendra Kolhe
On 3/29/2016 5:58 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 28, 2016 at 09:46:05AM +0530, Jitendra Kolhe wrote:
>> While measuring live migration performance for qemu/kvm guest, it
>> was observed that the qemu doesn’t maintain any intelligence for the
>> guest ram pages which are released by the guest balloon driver and
>> treat such pages as any other normal guest ram pages. This has direct
>> impact on overall migration time for the guest which has released
>> (ballooned out) memory to the host.
>>
>> In case of large systems, where we can configure large guests with 1TB
>> and with considerable amount of memory release by balloon driver to the,
>> host the migration time gets worse.
>>
>> The solution proposed below is local only to qemu (and does not require
>> any modification to Linux kernel or any guest driver). We have verified
>> the fix for large guests =1TB on HPE Superdome X (which can support up
>> to 240 cores and 12TB of memory) and in case where 90% of memory is
>> released by balloon driver the migration time for an idle guests reduces
>> to ~600 sec's from ~1200 sec’s.
>>
>> Detail: During live migration, as part of 1st iteration in ram_save_iterate()
>> -> ram_find_and_save_block () will try to migrate ram pages which are
>> released by vitrio-balloon driver as part of dynamic memory delete.
>> Even though the pages which are returned to the host by virtio-balloon
>> driver are zero pages, the migration algorithm will still end up
>> scanning the entire page ram_find_and_save_block() -> ram_save_page/
>> ram_save_compressed_page -> save_zero_page() -> is_zero_range().  We
>> also end-up sending some control information over network for these
>> page during migration. This adds to total migration time.
>>
>> The proposed fix, uses the existing bitmap infrastructure to create
>> a virtio-balloon bitmap. The bits in the bitmap represent a guest ram
>> page of size 1UL<< VIRTIO_BALLOON_PFN_SHIFT. The bitmap represents
>> entire guest ram memory till max configured memory. Guest ram pages
>> claimed by the virtio-balloon driver will be represented by 1 in the
>> bitmap. During live migration, each guest ram page (host VA offset)
>> is checked against the virtio-balloon bitmap, if the bit is set the
>> corresponding ram page will be excluded from scanning and sending
>> control information during migration. The bitmap is also migrated to
>> the target as part of every ram_save_iterate loop and after the
>> guest is stopped remaining balloon bitmap is migrated as part of
>> balloon driver save / load interface.
> 
> Migrating the bitmap might help a chained migration case
> but will slow down the more typical case a bit.
> Make it optional?
> 
> 

Disabling migration of balloon bitmap would disable the 
optimization permanently for all other subsequent migrations.

The current implementation sends offsets and lengths of 1s 
in the bitmap. In cases where we see highly fragmented 
bitmap which may add to the bitmap migration time, but still 
it would be less compared to total migration time.

Here are some values from my test setup for an idle guest 
with 16GB memory ballooned down to 4GB memory (with no 
ballooning operation in progress during migration).
- Total migration time without proposed patch set - ~7600ms 
- Total migration time with proposed patch set - ~5700ms
  . Adds overhead of testing against bitmap - ~320ms
  . Adds overhead of sending the bitmap - ~1.6ms
- Saves zero page scan + protocol time - ~2348ms  
We may see some higher pct for time taken to migrate bitmap
in case ballooning is in progress during migration. 

The patch does introduces an option to enable/disable 
testing ram_addr against balloon bitmap during migration 
(since that’s the major overhead), but we still end-up 
migrating the bitmap by default. 

>>
>> With the proposed fix, the average migration time for an idle guest
>> with 1TB maximum memory and 64vCpus
>>  - reduces from ~1200 secs to ~600 sec, with guest memory ballooned
>>down to 128GB (~10% of 1TB).
>>  - reduces from ~1300 to ~1200 sec (7%), with guest memory ballooned
>>down to 896GB (~90% of 1TB),
>>  - with no ballooning configured, we don’t expect to see any impact
>>on total migration time.
>>
>> The optimization gets temporarily disabled, if the balloon operation is
>> in progress. Since the optimization skips scanning and migrating control
>> information for ballooned out pages, we might skip guest ram pages in
>> cases where the guest balloon driver has freed the ram page to the guest
>> but not yet informed the host/qemu about the ram page
>> (VIRTIO_BALLOON_F_MUST_TELL_HOST). In such case with optimization, we
>> might skip migrating ram pages which the guest is using. Since this
>> problem is specific to balloon leak, we can restrict balloon operation in
>> progress check to only balloon leak operation in progress check.
>>
>> The optimization also get permanently disabled (for all subsequent
>> migrations) in case any of th

Re: [Qemu-devel] tcg: reworking tb_invalidated_flag

2016-04-01 Thread Alex Bennée

Sergey Fedorov  writes:

> On 31/03/16 16:37, Alex Bennée wrote:
>> Sergey Fedorov  writes:
>>> Looks like no matter which approach we use, it's ultimately necessary to
>>> ensure all CPUs have exited from translated code before the translation
>>> buffer may be safely flushed.
>> One approach would be to have multiple translation contexts with their
>> own buffers and then you can safely flush TBs if no vCPUs are currently
>> executing in those regions. But I suspect that is a much more complex
>> future optimisation.
>
> Yes, this is much more complex and its performance impact should be
> investigated.
>
>> Having said that is it safe to flush TBs from a given page if we know
>> no vCPUs are currently executing in that page? As the execution loop has
>> to exit the chained TBs as we cross page boundaries we could just keep
>> account of which vCPUs are currently in which page.
>
> It should be safe to invalidate a TB while some other CPU is executing
> its translated code. But it should be guaranteed that no CPU execute any
> old TB after tb_flush() since we're going to start reusing those TBs.
>
> I see how TB cannot be patched if it spans two pages, is there any on
> when TCG goto_tb can be generated?

Do you mean tcg_gen_goto_tb?

AFAIUI all blocks end with goto_tb post-ambles but they should only
directly jump to another TB if they are in the same page.

--
Alex Bennée



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones
On Fri, Apr 01, 2016 at 01:07:55PM +0200, Gerd Hoffmann wrote:
> On Fr, 2016-04-01 at 11:17 +0100, Richard W.M. Jones wrote:
> > On Fri, Apr 01, 2016 at 11:18:30AM +0200, Gerd Hoffmann wrote:
> > >   Hi,
> > > 
> > > > I wonder how we can make use of this in qemu and downstream distros?
> > > > Can we have a bios-min.bin which is used with -kernel boots?
> > > 
> > > We already build two seabios roms: one full featued and one slightly
> > > stripped down to keep it below 128k, for backward compatibility with old
> > > machine types.
> > > 
> > > Adding a third config for -kernel boot should be easy.  For that use
> > > case we can probably also turn on seabios logging to the serial console
> > > and drop sgabios.  We don't need input (no boot menu) and we also don't
> > > need to hook into int10 (no grub/ipxe using that for output).
> > 
> > SeaBIOS logging is slow (or more likely, serial output is slow).
> > And sgabios is useful for debugging.
> 
> Ah, I see.  debuglevel=1 prints too much and debuglevel=0 has no logging
> at all.  We'd need seabios log at least the version banner to serial
> even with debuglevel=0 to replace sgabios.
> 
> https://www.kraxel.org/cgit/qemu/log/?h=work/stripped-seabios
> 
> Not hooked into -kernel yet, so you have to use "-bios bios-kboot.bin".
> 
> Comments?

I think we were working on the same thing ...  Attached is my
version.

Note that you must enable at least CONFIG_MPTABLE else virtio-scsi
does not work in the guest.  I also enabled ACPI & SMBIOS & PIRTABLE.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org
>From 2c9b45d9d03c05573e1dbfc35e6750b127896be6 Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" 
Date: Fri, 1 Apr 2016 11:35:09 +0100
Subject: [PATCH] bios: Add fast variant of SeaBIOS for use with -kernel on
 x86.

This commit adds a fast variant of SeaBIOS called 'bios-fast.bin'.

It's designed to be the fastest (also the smallest, but that's not the
main aim) SeaBIOS that is just enough to boot a Linux kernel using the
-kernel option on i686 and x86_64.

This commit does not modify the -kernel option to use this.  You have
to specify it by doing something like this:

  -kernel vmlinuz -bios bios-fast.bin

Signed-off-by: Richard W.M. Jones 
---
 Makefile |  3 ++-
 roms/Makefile|  4 +++-
 roms/config.seabios-fast | 12 
 3 files changed, 17 insertions(+), 2 deletions(-)
 create mode 100644 roms/config.seabios-fast

diff --git a/Makefile b/Makefile
index 1d076a9..c4e939d 100644
--- a/Makefile
+++ b/Makefile
@@ -389,7 +389,8 @@ common  de-ch  es fo  fr-ca  hu ja  mk  nl-be  
pt  sl tr \
 bepocz
 
 ifdef INSTALL_BLOBS
-BLOBS=bios.bin bios-256k.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \
+BLOBS=bios.bin bios-256k.bin bios-fast.bin \
+sgabios.bin vgabios.bin vgabios-cirrus.bin \
 vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin vgabios-virtio.bin \
 acpi-dsdt.aml \
 ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin 
QEMU,cgthree.bin \
diff --git a/roms/Makefile b/roms/Makefile
index 7bd1252..26b0586 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -61,9 +61,11 @@ default:
@echo "  slof   -- update slof.bin"
@echo "  u-boot.e500-- update u-boot.e500"
 
-bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k
+bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k \
+  build-seabios-config-seabios-fast
cp seabios/builds/seabios-128k/bios.bin ../pc-bios/bios.bin
cp seabios/builds/seabios-256k/bios.bin ../pc-bios/bios-256k.bin
+   cp seabios/builds/seabios-fast/bios.bin ../pc-bios/bios-fast.bin
 
 seavgabios: $(patsubst %,seavgabios-%,$(vgabios_variants))
 
diff --git a/roms/config.seabios-fast b/roms/config.seabios-fast
new file mode 100644
index 000..98a4c27
--- /dev/null
+++ b/roms/config.seabios-fast
@@ -0,0 +1,12 @@
+# The most fastest SeaBIOS that can boot Linux using -kernel.
+CONFIG_USB=n
+CONFIG_DRIVES=n
+CONFIG_KEYBOARD=n
+CONFIG_MOUSE=n
+CONFIG_WRITABLE_UPPERMEMORY=y
+CONFIG_TCGBIOS=n
+CONFIG_PIRTABLE=y
+CONFIG_MPTABLE=y
+CONFIG_SMBIOS=y
+CONFIG_ACPI=y
+CONFIG_DEBUG_LEVEL=0
-- 
2.7.4



[Qemu-devel] [PATCH 05/18] vhost-user: check reconnect comes with wait

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

If the client socket has the 'reconnect' option, make sure the 'wait'
option is also used. That way, an initial connection will be ensured
before the VM start and the virtio device is configured.

Signed-off-by: Marc-André Lureau 
---
 net/vhost-user.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 1b9e73a..9007d0b 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -27,6 +27,8 @@ typedef struct VhostUserState {
 typedef struct VhostUserChardevProps {
 bool is_socket;
 bool is_unix;
+bool is_reconnect;
+bool is_wait;
 } VhostUserChardevProps;
 
 VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
@@ -239,6 +241,10 @@ static int net_vhost_chardev_opts(void *opaque,
 } else if (strcmp(name, "path") == 0) {
 props->is_unix = true;
 } else if (strcmp(name, "server") == 0) {
+} else if (strcmp(name, "reconnect") == 0) {
+props->is_reconnect = true;
+} else if (strcmp(name, "wait") == 0) {
+props->is_wait = true;
 } else {
 error_setg(errp,
"vhost-user does not support a chardev with option %s=%s",
@@ -271,6 +277,12 @@ static CharDriverState *net_vhost_parse_chardev(
 return NULL;
 }
 
+if (props.is_reconnect && !props.is_wait) {
+error_setg(errp, "chardev \"%s\" must also 'wait' with 'reconnect'",
+   opts->chardev);
+return NULL;
+}
+
 qemu_chr_fe_claim_no_fail(chr);
 
 return chr;
-- 
2.5.5




[Qemu-devel] [PATCH 02/18] char: lower reconnect error to trace event

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

This message is an information rather than an error. Use a trace event
instead.

This allows using reconnect in tests without extra error messages.

Signed-off-by: Marc-André Lureau 
---
 qemu-char.c  | 4 ++--
 trace-events | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 270819a..a944ee0 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -40,6 +40,7 @@
 #include "io/channel-file.h"
 #include "io/channel-tls.h"
 #include "sysemu/replay.h"
+#include "trace.h"
 
 #include 
 
@@ -2639,8 +2640,7 @@ static void check_report_connect_error(CharDriverState 
*chr,
 TCPCharDriver *s = chr->opaque;
 
 if (!s->connect_err_reported) {
-error_report("Unable to connect character device %s: %s",
- chr->label, error_get_pretty(err));
+trace_char_socket_reconnect_error(chr->label, error_get_pretty(err));
 s->connect_err_reported = true;
 }
 qemu_chr_socket_restart_timer(chr);
diff --git a/trace-events b/trace-events
index 996a77f..19e0a05 100644
--- a/trace-events
+++ b/trace-events
@@ -689,6 +689,9 @@ grlib_apbuart_readl_unknown(uint64_t addr) "addr 0x%"PRIx64
 leon3_set_irq(int intno) "Set CPU IRQ %d"
 leon3_reset_irq(int intno) "Reset CPU IRQ %d"
 
+# qemu-char.c
+char_socket_reconnect_error(const char *label, const char *err) "Unable to 
connect character device %s: %s"
+
 # spice-qemu-char.c
 spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d"
 spice_vmc_read(int bytes, int len) "spice read %d of requested %d"
-- 
2.5.5




[Qemu-devel] [PATCH 01/18] tests: append i386 tests

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Do not overwrite x86-64 tests, re-enable vhost-user-test.

Signed-off-by: Marc-André Lureau 
---
 tests/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/Makefile b/tests/Makefile
index 45b9048..9293d49 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -222,7 +222,7 @@ endif
 check-qtest-i386-y += tests/test-netfilter$(EXESUF)
 check-qtest-i386-y += tests/test-filter-mirror$(EXESUF)
 check-qtest-i386-y += tests/test-filter-redirector$(EXESUF)
-check-qtest-x86_64-y = $(check-qtest-i386-y)
+check-qtest-x86_64-y += $(check-qtest-i386-y)
 gcov-files-i386-y += i386-softmmu/hw/timer/mc146818rtc.c
 gcov-files-x86_64-y = $(subst 
i386-softmmu/,x86_64-softmmu/,$(gcov-files-i386-y))
 check-qtest-mips-y = tests/endianness-test$(EXESUF)
-- 
2.5.5




[Qemu-devel] [PATCH 00/18] RFCv2: vhost-user: shutdown and reconnection

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Hi

This is a follow-up on previous RFC allowing the slave to request a
"managed" shutdown and reconnect later. A new optional communication
socket is added for the slave to make request to the master (since
vhost-user protocol isn't bidirectional)

The initial connection must be made before the guest is started for
the feature negotiation to complete. In "server" mode, qemu waits
before starting the VM. However, in "client" mode with "reconnect",
you have to specify "wait". This will wait for the initial connection
before starting the VM (in contrast with the "nowait"+backend features
proposed by Tetsuya [1]).

In order to do a clean shutdown, the slave should flush all pending
buffers so that after VHOST_SET_VRING_BASE, it is enough to
resume. The guest is made aware of virtio-net disconnection thanks to
VIRTIO_NET_S_LINK_UP status, which is reflected by a link-down on the
nic.

RFCv2:
- rebased, added a few preliminary patches
- fix a few mistakes in shutdown message recv/send
- enforce "wait" when using "reconnect"
- save & restore features & check backend compatibility
- save & restore the ring state
- add shutdown support to vhost-user-bridge

Testing:
- the vhost-user-test has a simple reconnect test.
- vhost-user-bridge can be used to run test interactively:

1) Run a slirp/vlan in a background process:
$ qemu -net none -net socket,vlan=0,udp=localhost:,localaddr=localhost: 
-net user,vlan=0

2) Start vubr (it'll use the slirp/vlan process above by default):
$ tests/vhost-user-bridge

3) Start qemu with vhost-user / virtio-net:
$ qemu ... -chardev socket,id=char0,path=/tmp/vubr.sock,reconnect=1,wait 
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device 
virtio-net-pci,netdev=mynet1

4) Play in the guest, interrupt (ctlr-c) vubr, check nic link status, restart 
vubr, etc..

[1] in a previous series "Add feature to start QEMU without
vhost-user backend", Tetsuya Mukawa proposed to allow the vhost-user
backend to disconnect and reconnect. However, Michael Tsirkin pointed
out that you can't do that without extra care, because the guest and
hypervisor don't know the slave ring manipulation state, there might
be pending replies for example that could be lost, and suggested to
reset the guest queues, but this requires kernel changes, and it may
have to clear the ring and lose queued packets. He also introduced a
new option to specify backend features on qemu command line to be able
to boot the VM without waiting for the backend. I consider this a
seperate enhancement.

Marc-André Lureau (16):
  tests: append i386 tests
  char: lower reconnect error to trace event
  char: use a trace for when the char is waiting
  char: add wait support for reconnect
  vhost-user: check reconnect comes with wait
  vhost: add vhost_dev stop callback
  vhost-user: add vhost_user to hold the chr
  vhost-user: add slave-fd support
  vhost-user: add shutdown support
  vhost-user: disconnect on start failure
  vhost-net: do not crash if backend is not present
  vhost-net: save & restore vhost-user acked features
  vhost-net: save & restore vring enable state
  test: vubr check vring enable state
  test: start vhost-user reconnect test
  test: add shutdown support vubr test

Tetsuya Mukawa (2):
  vhost-user: add ability to know vhost-user backend disconnection
  qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

 docs/specs/vhost-user.txt |  38 ++
 hw/net/vhost_net.c|  53 -
 hw/virtio/vhost-user.c| 104 +-
 include/hw/virtio/vhost.h |   4 +
 include/net/net.h |   1 +
 include/net/vhost-user.h  |   1 +
 include/net/vhost_net.h   |   3 +
 include/sysemu/char.h |   7 ++
 net/vhost-user.c  |  44 ++-
 qemu-char.c   |  46 +---
 tests/Makefile|   4 +-
 tests/vhost-user-bridge.c | 112 ++--
 tests/vhost-user-test.c   | 184 ++
 trace-events  |   4 +
 14 files changed, 564 insertions(+), 41 deletions(-)

-- 
2.5.5




[Qemu-devel] [PATCH 03/18] char: use a trace for when the char is waiting

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Similarly to the reconnect error, this is an information message rather
than an error. Use a trace event instead.

Signed-off-by: Marc-André Lureau 
---
 qemu-char.c  | 3 +--
 trace-events | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index a944ee0..8702931 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4434,8 +4434,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 }
 s->listen_ioc = sioc;
 if (is_waitconnect) {
-fprintf(stderr, "QEMU waiting for connection on: %s\n",
-chr->filename);
+trace_char_socket_waiting(chr->filename);
 tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
 }
 qio_channel_set_blocking(QIO_CHANNEL(s->listen_ioc), false, NULL);
diff --git a/trace-events b/trace-events
index 19e0a05..e59c0f5 100644
--- a/trace-events
+++ b/trace-events
@@ -691,6 +691,7 @@ leon3_reset_irq(int intno) "Reset CPU IRQ %d"
 
 # qemu-char.c
 char_socket_reconnect_error(const char *label, const char *err) "Unable to 
connect character device %s: %s"
+char_socket_waiting(const char *filename) "QEMU waiting for connection on: %s"
 
 # spice-qemu-char.c
 spice_vmc_write(ssize_t out, int len) "spice wrottn %zd of requested %d"
-- 
2.5.5




[Qemu-devel] [PATCH 07/18] vhost: add vhost_dev stop callback

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

vhost backend may want to stop the device, for example if it wants to
restart itself (translates to a link down for vhost-net).

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c| 14 ++
 include/hw/virtio/vhost.h |  4 
 2 files changed, 18 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6e1032f..1e4710d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -131,6 +131,18 @@ static int vhost_net_get_fd(NetClientState *backend)
 }
 }
 
+static void vhost_net_backend_stop(struct vhost_dev *dev)
+{
+struct vhost_net *net = container_of(dev, struct vhost_net, dev);
+NetClientState *nc = net->nc;
+NetClientState *peer = nc->peer;
+
+peer->link_down = 1;
+if (peer->info->link_status_changed) {
+peer->info->link_status_changed(peer);
+}
+}
+
 struct vhost_net *vhost_net_init(VhostNetOptions *options)
 {
 int r;
@@ -165,6 +177,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 net->dev.vq_index = net->nc->queue_index * net->dev.nvqs;
 }
 
+net->dev.stop = vhost_net_backend_stop;
+
 r = vhost_dev_init(&net->dev, options->opaque,
options->backend_type);
 if (r < 0) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b60d758..859be64 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -35,6 +35,8 @@ struct vhost_log {
 vhost_log_chunk_t *log;
 };
 
+typedef void (*vhost_stop)(struct vhost_dev *dev);
+
 struct vhost_memory;
 struct vhost_dev {
 MemoryListener memory_listener;
@@ -61,6 +63,8 @@ struct vhost_dev {
 void *opaque;
 struct vhost_log *log;
 QLIST_ENTRY(vhost_dev) entry;
+/* backend request to stop */
+vhost_stop stop;
 };
 
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
-- 
2.5.5




[Qemu-devel] [PATCH 04/18] char: add wait support for reconnect

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Until now, 'wait' was solely used for listening sockets. However, it can
also be useful for 'reconnect' socket kind, where the first open must
succeed before continuing.

This allows for instance (with other support patches) having vhost-user
wait for an initial connection to setup the vhost-net, before eventually
accepting new connections.

Signed-off-by: Marc-André Lureau 
---
 qemu-char.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 8702931..3e25c08 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3659,7 +3659,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
ChardevBackend *backend,
   Error **errp)
 {
 bool is_listen  = qemu_opt_get_bool(opts, "server", false);
-bool is_waitconnect = is_listen && qemu_opt_get_bool(opts, "wait", true);
+bool is_wait= qemu_opt_get_bool(opts, "wait", is_listen);
 bool is_telnet  = qemu_opt_get_bool(opts, "telnet", false);
 bool do_nodelay = !qemu_opt_get_bool(opts, "delay", true);
 int64_t reconnect   = qemu_opt_get_number(opts, "reconnect", 0);
@@ -3696,7 +3696,7 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
ChardevBackend *backend,
 sock->has_telnet = true;
 sock->telnet = is_telnet;
 sock->has_wait = true;
-sock->wait = is_waitconnect;
+sock->wait = is_wait;
 sock->has_reconnect = true;
 sock->reconnect = reconnect;
 sock->tls_creds = g_strdup(tls_creds);
@@ -4350,7 +4350,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 bool do_nodelay = sock->has_nodelay ? sock->nodelay : false;
 bool is_listen  = sock->has_server  ? sock->server  : true;
 bool is_telnet  = sock->has_telnet  ? sock->telnet  : false;
-bool is_waitconnect = sock->has_wait? sock->wait: false;
+bool is_wait= sock->has_wait? sock->wait: false;
 int64_t reconnect   = sock->has_reconnect ? sock->reconnect : 0;
 ChardevCommon *common = qapi_ChardevSocket_base(sock);
 QIOChannelSocket *sioc = NULL;
@@ -4424,7 +4424,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 }
 
 sioc = qio_channel_socket_new();
-if (s->reconnect_time) {
+if (s->reconnect_time && !is_wait) {
 qio_channel_socket_connect_async(sioc, s->addr,
  qemu_chr_socket_connected,
  chr, NULL);
@@ -4433,7 +4433,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 goto error;
 }
 s->listen_ioc = sioc;
-if (is_waitconnect) {
+if (is_wait) {
 trace_char_socket_waiting(chr->filename);
 tcp_chr_accept(QIO_CHANNEL(s->listen_ioc), G_IO_IN, chr);
 }
@@ -4443,9 +4443,24 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 QIO_CHANNEL(s->listen_ioc), G_IO_IN, tcp_chr_accept, chr, 
NULL);
 }
 } else {
-if (qio_channel_socket_connect_sync(sioc, s->addr, errp) < 0) {
-goto error;
-}
+do {
+Error *err = NULL;
+
+if (qio_channel_socket_connect_sync(sioc, s->addr, &err) < 0) {
+if (reconnect) {
+trace_char_socket_reconnect_error(chr->label,
+  error_get_pretty(err));
+error_free(err);
+continue;
+} else {
+error_propagate(errp, err);
+goto error;
+}
+} else {
+break;
+}
+} while (true);
+
 tcp_chr_new_client(chr, sioc);
 object_unref(OBJECT(sioc));
 }
-- 
2.5.5




[Qemu-devel] [PATCH 17/18] test: start vhost-user reconnect test

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Check that SLAVE_FD & SHUTDOWN message work and that the master
asked for the ring read status.

Signed-off-by: Marc-André Lureau 
---
 tests/Makefile  |   2 +-
 tests/vhost-user-test.c | 184 
 2 files changed, 170 insertions(+), 16 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 9293d49..8e4807e 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -577,7 +577,7 @@ tests/usb-hcd-uhci-test$(EXESUF): tests/usb-hcd-uhci-test.o 
$(libqos-usb-obj-y)
 tests/usb-hcd-ehci-test$(EXESUF): tests/usb-hcd-ehci-test.o $(libqos-usb-obj-y)
 tests/usb-hcd-xhci-test$(EXESUF): tests/usb-hcd-xhci-test.o $(libqos-usb-obj-y)
 tests/pc-cpu-test$(EXESUF): tests/pc-cpu-test.o
-tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o 
qemu-timer.o $(qtest-obj-y) $(test-io-obj-y)
+tests/vhost-user-test$(EXESUF): tests/vhost-user-test.o qemu-char.o 
qemu-timer.o $(qtest-obj-y) $(test-io-obj-y) $(libqos-pc-obj-y) 
$(libqos-virtio-obj-y)
 tests/qemu-iotests/socket_scm_helper$(EXESUF): 
tests/qemu-iotests/socket_scm_helper.o
 tests/test-qemu-opts$(EXESUF): tests/test-qemu-opts.o $(test-util-obj-y)
 tests/test-write-threshold$(EXESUF): tests/test-write-threshold.o 
$(test-block-obj-y)
diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
index 6961596..17ebfd1 100644
--- a/tests/vhost-user-test.c
+++ b/tests/vhost-user-test.c
@@ -21,6 +21,12 @@
 #include 
 #include 
 #include 
+#include "libqos/pci-pc.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
 
 /* GLIB version compatibility flags */
 #if !GLIB_CHECK_VERSION(2, 26, 0)
@@ -34,7 +40,7 @@
 #define QEMU_CMD_ACCEL  " -machine accel=tcg"
 #define QEMU_CMD_MEM" -m %d -object memory-backend-file,id=mem,size=%dM,"\
 "mem-path=%s,share=on -numa node,memdev=mem"
-#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s"
+#define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=net0,chardev=%s,vhostforce"
 #define QEMU_CMD_NET" -device 
virtio-net-pci,netdev=net0,romfile=./pc-bios/pxe-virtio.rom"
 
@@ -49,6 +55,7 @@
 
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD 1
+#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL 3
 
 #define VHOST_LOG_PAGE 0x1000
 
@@ -71,9 +78,16 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_PROTOCOL_FEATURES = 15,
 VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 VHOST_USER_SET_VRING_ENABLE = 18,
+VHOST_USER_SET_SLAVE_FD = 20,
 VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+VHOST_USER_SLAVE_NONE = 0,
+VHOST_USER_SLAVE_SHUTDOWN = 1,
+VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 uint64_t guest_phys_addr;
 uint64_t memory_size;
@@ -119,6 +133,8 @@ static VhostUserMsg m __attribute__ ((unused));
 
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION(0x1)
+
+#define VIRTIO_QUEUE_MAX 1024
 /*/
 
 typedef struct TestServer {
@@ -133,6 +149,8 @@ typedef struct TestServer {
 GCond data_cond;
 int log_fd;
 uint64_t rings;
+int slave_fd;
+uint16_t set_base[VIRTIO_QUEUE_MAX];
 } TestServer;
 
 #if !GLIB_CHECK_VERSION(2, 32, 0)
@@ -269,7 +287,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 /* send back features to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
 msg.size = sizeof(m.payload.u64);
-msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+msg.payload.u64 = 1 << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+1 << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL;
 p = (uint8_t *) &msg;
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
@@ -324,6 +343,12 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 case VHOST_USER_SET_VRING_BASE:
 assert(msg.payload.state.index < 2);
 s->rings |= 0x1ULL << msg.payload.state.index;
+s->set_base[msg.payload.state.index] = msg.payload.state.num;
+break;
+
+case VHOST_USER_SET_SLAVE_FD:
+qemu_chr_fe_get_msgfds(chr, &s->slave_fd, 1);
+g_cond_signal(&s->data_cond);
 break;
 
 default:
@@ -360,7 +385,7 @@ static const char *init_hugepagefs(const char *path)
 return path;
 }
 
-static TestServer *test_server_new(const gchar *name)
+static TestServer *test_server_new(const gchar *name, const gchar *chr_opt)
 {
 TestServer *server = g_new0(TestServer, 1);
 gchar *chr_path;
@@ -368,7 +393,8 @@ static TestServer *test_server_new(const gchar *name)
 server->socket_path = g_strdup_printf("%s/%s.sock", tmpfs, name);
 server->mig_path = g_strdup_printf("%s/%s.mig", tmpfs, name);
 
-chr_p

[Qemu-devel] [PATCH 09/18] qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

2016-04-01 Thread marcandre . lureau
From: Tetsuya Mukawa 

The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa 
Reviewed-by: Marc-André Lureau 
---
 include/sysemu/char.h | 7 +++
 qemu-char.c   | 8 
 2 files changed, 15 insertions(+)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index 307fd8f..2c39987 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -75,6 +75,7 @@ struct CharDriverState {
 IOReadHandler *chr_read;
 void *handler_opaque;
 void (*chr_close)(struct CharDriverState *chr);
+void (*chr_disconnect)(struct CharDriverState *chr);
 void (*chr_accept_input)(struct CharDriverState *chr);
 void (*chr_set_echo)(struct CharDriverState *chr, bool echo);
 void (*chr_set_fe_open)(struct CharDriverState *chr, int fe_open);
@@ -143,6 +144,12 @@ void qemu_chr_parse_common(QemuOpts *opts, ChardevCommon 
*backend);
  */
 CharDriverState *qemu_chr_new(const char *label, const char *filename,
   void (*init)(struct CharDriverState *s));
+/**
+ * @qemu_chr_disconnect:
+ *
+ * Close a fd accpeted by character backend.
+ */
+void qemu_chr_disconnect(CharDriverState *chr);
 
 /**
  * @qemu_chr_new_noreplay:
diff --git a/qemu-char.c b/qemu-char.c
index 3e25c08..e249b0a 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4011,6 +4011,13 @@ void qemu_chr_fe_release(CharDriverState *s)
 s->avail_connections++;
 }
 
+void qemu_chr_disconnect(CharDriverState *chr)
+{
+if (chr->chr_disconnect) {
+chr->chr_disconnect(chr);
+}
+}
+
 static void qemu_chr_free_common(CharDriverState *chr)
 {
 g_free(chr->filename);
@@ -4404,6 +4411,7 @@ static CharDriverState *qmp_chardev_open_socket(const 
char *id,
 chr->chr_write = tcp_chr_write;
 chr->chr_sync_read = tcp_chr_sync_read;
 chr->chr_close = tcp_chr_close;
+chr->chr_disconnect = tcp_chr_disconnect;
 chr->get_msgfds = tcp_get_msgfds;
 chr->set_msgfds = tcp_set_msgfds;
 chr->chr_add_client = tcp_chr_add_client;
-- 
2.5.5




[Qemu-devel] [PATCH 06/18] vhost-user: add ability to know vhost-user backend disconnection

2016-04-01 Thread marcandre . lureau
From: Tetsuya Mukawa 

Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa 
Reviewed-by: Marc-André Lureau 
---
 net/vhost-user.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 9007d0b..3dae53c 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -22,6 +22,7 @@ typedef struct VhostUserState {
 NetClientState nc;
 CharDriverState *chr;
 VHostNetState *vhost_net;
+int watch;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -169,6 +170,20 @@ static NetClientInfo net_vhost_user_info = {
 .has_ufo = vhost_user_has_ufo,
 };
 
+static gboolean net_vhost_user_watch(GIOChannel *chan, GIOCondition cond,
+   void *opaque)
+{
+VhostUserState *s = opaque;
+uint8_t buf[1];
+
+/* We don't actually want to read anything, but CHR_EVENT_CLOSED will be
+ * raised as a side-effect of the read.
+ */
+qemu_chr_fe_read_all(s->chr, buf, sizeof(buf));
+
+return FALSE;
+}
+
 static void net_vhost_user_event(void *opaque, int event)
 {
 const char *name = opaque;
@@ -186,6 +201,8 @@ static void net_vhost_user_event(void *opaque, int event)
 trace_vhost_user_event(s->chr->label, event);
 switch (event) {
 case CHR_EVENT_OPENED:
+s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
+ net_vhost_user_watch, s);
 if (vhost_user_start(queues, ncs) < 0) {
 exit(1);
 }
@@ -194,6 +211,8 @@ static void net_vhost_user_event(void *opaque, int event)
 case CHR_EVENT_CLOSED:
 qmp_set_link(name, false, &err);
 vhost_user_stop(queues, ncs);
+g_source_remove(s->watch);
+s->watch = 0;
 break;
 }
 
-- 
2.5.5




[Qemu-devel] [PATCH 08/18] vhost-user: add vhost_user to hold the chr

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Next patches will add more fields to the structure

Signed-off-by: Marc-André Lureau 
---
 hw/virtio/vhost-user.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 5914e85..02ac1fc 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -108,6 +108,10 @@ static VhostUserMsg m __attribute__ ((unused));
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION(0x1)
 
+struct vhost_user {
+CharDriverState *chr;
+};
+
 static bool ioeventfd_enabled(void)
 {
 return kvm_enabled() && kvm_eventfds_enabled();
@@ -115,7 +119,8 @@ static bool ioeventfd_enabled(void)
 
 static int vhost_user_read(struct vhost_dev *dev, VhostUserMsg *msg)
 {
-CharDriverState *chr = dev->opaque;
+struct vhost_user *u = dev->opaque;
+CharDriverState *chr = u->chr;
 uint8_t *p = (uint8_t *) msg;
 int r, size = VHOST_USER_HDR_SIZE;
 
@@ -176,7 +181,8 @@ static bool vhost_user_one_time_request(VhostUserRequest 
request)
 static int vhost_user_write(struct vhost_dev *dev, VhostUserMsg *msg,
 int *fds, int fd_num)
 {
-CharDriverState *chr = dev->opaque;
+struct vhost_user *u = dev->opaque;
+CharDriverState *chr = u->chr;
 int size = VHOST_USER_HDR_SIZE + msg->size;
 
 /*
@@ -510,11 +516,14 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
 uint64_t features;
+struct vhost_user *u;
 int err;
 
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
-dev->opaque = opaque;
+u = g_new0(struct vhost_user, 1);
+u->chr = opaque;
+dev->opaque = u;
 
 err = vhost_user_get_features(dev, &features);
 if (err < 0) {
@@ -559,8 +568,12 @@ static int vhost_user_init(struct vhost_dev *dev, void 
*opaque)
 
 static int vhost_user_cleanup(struct vhost_dev *dev)
 {
+struct vhost_user *u;
+
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_USER);
 
+u = dev->opaque;
+g_free(u);
 dev->opaque = 0;
 
 return 0;
-- 
2.5.5




[Qemu-devel] [PATCH 13/18] vhost-net: do not crash if backend is not present

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1e4710d..6c40c4e 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -415,8 +415,13 @@ VHostNetState *get_vhost_net(NetClientState *nc)
 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
-const VhostOps *vhost_ops = net->dev.vhost_ops;
+const VhostOps *vhost_ops;
+
+if (!net) {
+return 0;
+}
 
+vhost_ops = net->dev.vhost_ops;
 if (vhost_ops->vhost_set_vring_enable) {
 return vhost_ops->vhost_set_vring_enable(&net->dev, enable);
 }
-- 
2.5.5




[Qemu-devel] [PATCH 10/18] vhost-user: add slave-fd support

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Learn to give a socket to the slave to let him make requests to the
master.

Signed-off-by: Marc-André Lureau 
---
 docs/specs/vhost-user.txt | 23 
 hw/virtio/vhost-user.c| 69 +++
 2 files changed, 92 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 0312d40..8a635fa 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -248,12 +248,25 @@ Once the source has finished migration, rings will be 
stopped by
 the source. No further update must be done before rings are
 restarted.
 
+Slave communication
+---
+
+An optional communication channel is provided if the slave
+declares VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature, to allow
+the slave to make requests to the master.
+
+The fd is provided via VHOST_USER_SET_SLAVE_FD ancillary data.
+
+A slave may then send VHOST_USER_SLAVE_* messages to the master
+using this fd communication channel.
+
 Protocol features
 -
 
 #define VHOST_USER_PROTOCOL_F_MQ 0
 #define VHOST_USER_PROTOCOL_F_LOG_SHMFD  1
 #define VHOST_USER_PROTOCOL_F_RARP   2
+#define VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL  3
 
 Message types
 -
@@ -464,3 +477,13 @@ Message types
   is present in VHOST_USER_GET_PROTOCOL_FEATURES.
   The first 6 bytes of the payload contain the mac address of the guest to
   allow the vhost user backend to construct and broadcast the fake RARP.
+
+ * VHOST_USER_SET_SLAVE_FD
+  Id: 20
+  Equivalent ioctl: N/A
+  Master payload: N/A
+
+  Set the file descriptor for the salve to make VHOST_USER_SLAVE_*
+  request to the master. It is passed in the ancillary data.
+  This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
+  feature is available.
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 02ac1fc..890c1bd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -32,6 +32,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_MQ = 0,
 VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
 VHOST_USER_PROTOCOL_F_RARP = 2,
+VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3,
 
 VHOST_USER_PROTOCOL_F_MAX
 };
@@ -59,9 +60,16 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
 VHOST_USER_SEND_RARP = 19,
+VHOST_USER_SET_SLAVE_FD = 20,
+
 VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+VHOST_USER_SLAVE_NONE = 0,
+VHOST_USER_SLAVE_MAX
+}  VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 uint64_t guest_phys_addr;
 uint64_t memory_size;
@@ -110,6 +118,7 @@ static VhostUserMsg m __attribute__ ((unused));
 
 struct vhost_user {
 CharDriverState *chr;
+int slave_fd;
 };
 
 static bool ioeventfd_enabled(void)
@@ -513,6 +522,55 @@ static int vhost_user_reset_device(struct vhost_dev *dev)
 return 0;
 }
 
+static void slave_read(void *opaque)
+{
+struct vhost_dev *dev = opaque;
+struct vhost_user *u = dev->opaque;
+VhostUserMsg msg = { 0, };
+int size;
+
+size = read(u->slave_fd, &msg, VHOST_USER_HDR_SIZE);
+if (size != VHOST_USER_HDR_SIZE) {
+error_report("Failed to read from slave.");
+qemu_set_fd_handler(u->slave_fd, NULL, NULL, NULL);
+close(u->slave_fd);
+u->slave_fd = -1;
+return;
+}
+
+switch (msg.request) {
+default:
+error_report("Received unexpected msg type.");
+}
+}
+
+static int vhost_setup_slave_channel(struct vhost_dev *dev)
+{
+VhostUserMsg msg = {
+.request = VHOST_USER_SET_SLAVE_FD,
+.flags = VHOST_USER_VERSION,
+};
+struct vhost_user *u = dev->opaque;
+int sv[2];
+
+if (!virtio_has_feature(dev->protocol_features,
+VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL)) {
+return 0;
+}
+
+if (socketpair(PF_UNIX, SOCK_STREAM, 0, sv) == -1) {
+error_report("socketpair() failed");
+return -1;
+}
+
+u->slave_fd = sv[0];
+qemu_set_fd_handler(u->slave_fd, slave_read, NULL, dev);
+
+vhost_user_write(dev, &msg, &sv[1], 1);
+
+return 0;
+}
+
 static int vhost_user_init(struct vhost_dev *dev, void *opaque)
 {
 uint64_t features;
@@ -523,6 +581,7 @@ static int vhost_user_init(struct vhost_dev *dev, void 
*opaque)
 
 u = g_new0(struct vhost_user, 1);
 u->chr = opaque;
+u->slave_fd = -1;
 dev->opaque = u;
 
 err = vhost_user_get_features(dev, &features);
@@ -563,6 +622,12 @@ static int vhost_user_init(struct vhost_dev *dev, void 
*opaque)
"VHOST_USER_PROTOCOL_F_LOG_SHMFD feature.");
 }
 
+
+err = vhost_setup_slave_channel(dev);
+if (err < 0) {
+return err;
+}
+
 return 0;
 }
 
@@ -573,6 +638,10 @@ static int vhost_user_cleanup(struct vhost_dev *dev)
 assert(dev->vhost_ops->backend_type == VHOST_BACKEN

[Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

Signed-off-by: Marc-André Lureau 
---
 docs/specs/vhost-user.txt | 15 +++
 hw/virtio/vhost-user.c| 16 
 2 files changed, 31 insertions(+)

diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
index 8a635fa..60d6d13 100644
--- a/docs/specs/vhost-user.txt
+++ b/docs/specs/vhost-user.txt
@@ -487,3 +487,18 @@ Message types
   request to the master. It is passed in the ancillary data.
   This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL
   feature is available.
+
+Slave message types
+---
+
+ * VHOST_USER_SLAVE_SHUTDOWN:
+  Id: 1
+  Master payload: N/A
+  Slave payload: u64
+
+  Request the master to shutdown the slave. A 0 reply is for
+  success, in which case the slave may close all connections
+  immediately and quit. A non-zero reply cancels the request.
+
+  Before a reply comes, the master may make other requests in
+  order to flush or sync state.
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 890c1bd..f91baee 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -67,6 +67,8 @@ typedef enum VhostUserRequest {
 
 typedef enum VhostUserSlaveRequest {
 VHOST_USER_SLAVE_NONE = 0,
+VHOST_USER_SLAVE_SHUTDOWN = 1,
+
 VHOST_USER_SLAVE_MAX
 }  VhostUserSlaveRequest;
 
@@ -539,6 +541,20 @@ static void slave_read(void *opaque)
 }
 
 switch (msg.request) {
+case VHOST_USER_SLAVE_SHUTDOWN: {
+uint64_t success = 1; /* 0 is for success */
+if (dev->stop) {
+dev->stop(dev);
+success = 0;
+}
+msg.payload.u64 = success;
+msg.size = sizeof(msg.payload.u64);
+size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0);
+if (size != VHOST_USER_HDR_SIZE + msg.size) {
+error_report("Failed to write reply.");
+}
+break;
+}
 default:
 error_report("Received unexpected msg type.");
 }
-- 
2.5.5




[Qemu-devel] [PATCH 16/18] test: vubr check vring enable state

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

The rings shouldn't be processed unless previously enabled.

Signed-off-by: Marc-André Lureau 
---
 tests/vhost-user-bridge.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 0779ba2..42450a6 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -710,7 +710,8 @@ vubr_backend_recv_cb(int sock, void *ctx)
 int buflen = sizeof(buf);
 int len;
 
-if (!dev->ready) {
+if (!dev->ready || !rx_vq->enable) {
+DPRINT("\n NOT READY: dev: %d, rx: %d\n", dev->ready, rx_vq->enable);
 return;
 }
 
@@ -747,8 +748,11 @@ vubr_kick_cb(int sock, void *ctx)
 if (rc == -1) {
 vubr_die("eventfd_read()");
 } else {
-DPRINT("Got kick_data: %016"PRIx64"\n", kick_data);
-vubr_process_avail(dev, &dev->vq[1]);
+DPRINT("Got kick_data: %016"PRIx64", vq enabled: %d\n",
+   kick_data, dev->vq[1].enable);
+if (dev->vq[1].enable) {
+vubr_process_avail(dev, &dev->vq[1]);
+}
 }
 }
 
-- 
2.5.5




[Qemu-devel] [PATCH 15/18] vhost-net: save & restore vring enable state

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c | 11 +++
 include/net/net.h  |  1 +
 2 files changed, 12 insertions(+)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ee39e9c..a56a794 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -343,6 +343,15 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 if (r < 0) {
 goto err_start;
 }
+
+if (ncs[i].peer->vring_enable) {
+/* restore vring enable state */
+r = vhost_set_vring_enable(ncs[i].peer, ncs[i].peer->vring_enable);
+
+if (r < 0) {
+goto err_start;
+}
+}
 }
 
 return 0;
@@ -436,6 +445,8 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops;
 
+nc->vring_enable = enable;
+
 if (!net) {
 return 0;
 }
diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..2c4b23a 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,6 +92,7 @@ struct NetClientState {
 NetClientDestructor *destructor;
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
+int vring_enable;
 QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
-- 
2.5.5




[Qemu-devel] [PATCH 12/18] vhost-user: disconnect on start failure

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

If the backend failed to start (for example feature negociation failed),
do not exit, but disconnect the char device instead.

Signed-off-by: Marc-André Lureau 
---
 net/vhost-user.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 3dae53c..54849e9 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -204,7 +204,8 @@ static void net_vhost_user_event(void *opaque, int event)
 s->watch = qemu_chr_fe_add_watch(s->chr, G_IO_HUP,
  net_vhost_user_watch, s);
 if (vhost_user_start(queues, ncs) < 0) {
-exit(1);
+qemu_chr_disconnect(s->chr);
+return;
 }
 qmp_set_link(name, true, &err);
 break;
-- 
2.5.5




[Qemu-devel] [PATCH 14/18] vhost-net: save & restore vhost-user acked features

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau 
---
 hw/net/vhost_net.c   | 21 -
 include/net/vhost-user.h |  1 +
 include/net/vhost_net.h  |  3 +++
 net/vhost-user.c | 10 ++
 4 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 6c40c4e..ee39e9c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -120,6 +120,11 @@ uint64_t vhost_net_get_max_queues(VHostNetState *net)
 return net->dev.max_queues;
 }
 
+uint64_t vhost_net_get_acked_features(VHostNetState *net)
+{
+return net->dev.acked_features;
+}
+
 static int vhost_net_get_fd(NetClientState *backend)
 {
 switch (backend->info->type) {
@@ -148,6 +153,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 int r;
 bool backend_kernel = options->backend_type == VHOST_BACKEND_TYPE_KERNEL;
 struct vhost_net *net = g_malloc(sizeof *net);
+uint64_t features = 0;
 
 if (!options->net_backend) {
 fprintf(stderr, "vhost-net requires net backend to be setup\n");
@@ -197,8 +203,21 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options)
 goto fail;
 }
 }
+
 /* Set sane init value. Override when guest acks. */
-vhost_net_ack_features(net, 0);
+if (net->nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER) {
+features = vhost_user_get_acked_features(net->nc);
+if (~net->dev.features & features) {
+fprintf(stderr, "vhost lacks feature mask %" PRIu64
+" for backend\n",
+(uint64_t)(~net->dev.features & features));
+vhost_dev_cleanup(&net->dev);
+goto fail;
+}
+}
+
+vhost_net_ack_features(net, features);
+
 return net;
 fail:
 g_free(net);
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 85109f6..efae35d 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -13,5 +13,6 @@
 
 struct vhost_net;
 struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
+uint64_t vhost_user_get_acked_features(NetClientState *nc);
 
 #endif /* VHOST_USER_H_ */
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 3389b41..0bd4877 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -31,4 +31,7 @@ int vhost_net_notify_migration_done(VHostNetState *net, char* 
mac_addr);
 VHostNetState *get_vhost_net(NetClientState *nc);
 
 int vhost_set_vring_enable(NetClientState * nc, int enable);
+
+uint64_t vhost_net_get_acked_features(VHostNetState *net);
+
 #endif
diff --git a/net/vhost-user.c b/net/vhost-user.c
index 54849e9..ae63e20 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -23,6 +23,7 @@ typedef struct VhostUserState {
 CharDriverState *chr;
 VHostNetState *vhost_net;
 int watch;
+uint64_t acked_features;
 } VhostUserState;
 
 typedef struct VhostUserChardevProps {
@@ -39,6 +40,13 @@ VHostNetState *vhost_user_get_vhost_net(NetClientState *nc)
 return s->vhost_net;
 }
 
+uint64_t vhost_user_get_acked_features(NetClientState *nc)
+{
+VhostUserState *s = DO_UPCAST(VhostUserState, nc, nc);
+assert(nc->info->type == NET_CLIENT_OPTIONS_KIND_VHOST_USER);
+return s->acked_features;
+}
+
 static int vhost_user_running(VhostUserState *s)
 {
 return (s->vhost_net) ? 1 : 0;
@@ -58,6 +66,8 @@ static void vhost_user_stop(int queues, NetClientState *ncs[])
 }
 
 if (s->vhost_net) {
+/* save acked features */
+s->acked_features = vhost_net_get_acked_features(s->vhost_net);
 vhost_net_cleanup(s->vhost_net);
 s->vhost_net = NULL;
 }
-- 
2.5.5




[Qemu-devel] [PATCH 18/18] test: add shutdown support vubr test

2016-04-01 Thread marcandre . lureau
From: Marc-André Lureau 

The bridge can now be interrupted with ctrl-c. Once the slave channel is
up, it will request a shutdown, and wait for success reply to exit.

Signed-off-by: Marc-André Lureau 
---
 tests/vhost-user-bridge.c | 102 --
 1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
index 42450a6..ea123be 100644
--- a/tests/vhost-user-bridge.c
+++ b/tests/vhost-user-bridge.c
@@ -135,6 +135,9 @@ dispatcher_wait(Dispatcher *dispr, uint32_t timeout)
 int rc = select(dispr->max_sock + 1, &fdset, 0, 0, &tv);
 
 if (rc == -1) {
+if (errno == EINTR) {
+return 0;
+}
 vubr_die("select");
 }
 
@@ -186,6 +189,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_MQ = 0,
 VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
 VHOST_USER_PROTOCOL_F_RARP = 2,
+VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL = 3,
 
 VHOST_USER_PROTOCOL_F_MAX
 };
@@ -213,9 +217,16 @@ typedef enum VhostUserRequest {
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
 VHOST_USER_SEND_RARP = 19,
+VHOST_USER_SET_SLAVE_FD = 20,
 VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+VHOST_USER_SLAVE_NONE = 0,
+VHOST_USER_SLAVE_SHUTDOWN = 1,
+VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 uint64_t guest_phys_addr;
 uint64_t memory_size;
@@ -288,6 +299,8 @@ typedef struct VubrDev {
 int ready;
 uint64_t features;
 int hdrlen;
+int slave_fd;
+bool shutdown_requested;
 } VubrDev;
 
 static const char *vubr_request_str[] = {
@@ -311,7 +324,14 @@ static const char *vubr_request_str[] = {
 [VHOST_USER_GET_QUEUE_NUM]  =  "VHOST_USER_GET_QUEUE_NUM",
 [VHOST_USER_SET_VRING_ENABLE]   =  "VHOST_USER_SET_VRING_ENABLE",
 [VHOST_USER_SEND_RARP]  =  "VHOST_USER_SEND_RARP",
-[VHOST_USER_MAX]=  "VHOST_USER_MAX",
+[VHOST_USER_SET_SLAVE_FD]   =  "VHOST_USER_SET_SLAVE_FD",
+[VHOST_USER_MAX]=  "VHOST_USER_MAX"
+};
+
+static const char *vubr_slave_request_str[] = {
+[VHOST_USER_SLAVE_NONE] =  "VHOST_USER_SLAVE_NONE",
+[VHOST_USER_SLAVE_SHUTDOWN] =  "VHOST_USER_SLAVE_SHUTDOWN",
+[VHOST_USER_SLAVE_MAX]  =  "VHOST_USER_SLAVE_MAX"
 };
 
 static void
@@ -638,7 +658,7 @@ vubr_process_desc(VubrDev *dev, VubrVirtq *vq)
 size_t buf_size = 4096;
 uint8_t buf[4096];
 
-DPRINT("Chunks: ");
+DPRINT("Chunks: aidx:%d   ", a_index);
 i = d_index;
 do {
 void *chunk_start = (void *)(uintptr_t)gpa_to_va(dev, desc[i].addr);
@@ -1063,7 +1083,9 @@ vubr_set_vring_err_exec(VubrDev *dev, VhostUserMsg *vmsg)
 static int
 vubr_get_protocol_features_exec(VubrDev *dev, VhostUserMsg *vmsg)
 {
-vmsg->payload.u64 = 1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD;
+vmsg->payload.u64 =
+1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD |
+1ULL << VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL;
 DPRINT("u64: 0x%016"PRIx64"\n", vmsg->payload.u64);
 vmsg->size = sizeof(vmsg->payload.u64);
 
@@ -1105,6 +1127,46 @@ vubr_send_rarp_exec(VubrDev *dev, VhostUserMsg *vmsg)
 return 0;
 }
 
+static void
+vubr_handle_slave_reply(VhostUserMsg *vmsg)
+{
+DPRINT(
+"==   Vhost slave reply from QEMU   
==\n");
+DPRINT("Request: %s (%d)\n", vubr_slave_request_str[vmsg->request],
+   vmsg->request);
+DPRINT("Flags:   0x%x\n", vmsg->flags);
+DPRINT("Size:%d\n", vmsg->size);
+
+switch (vmsg->request) {
+case VHOST_USER_SLAVE_SHUTDOWN:
+DPRINT("Shutdown success: 0x%016"PRIx64"\n", vmsg->payload.u64);
+if (vmsg->payload.u64 == 0) {
+exit(0);
+}
+default:
+DPRINT("Invalid slave reply");
+};
+}
+
+static void
+slave_receive_cb(int sock, void *ctx)
+{
+VhostUserMsg vmsg;
+
+vubr_message_read(sock, &vmsg);
+vubr_handle_slave_reply(&vmsg);
+}
+
+static int
+vubr_set_slave_fd_exec(VubrDev *dev, VhostUserMsg *vmsg)
+{
+assert(vmsg->fd_num == 1);
+dev->slave_fd = vmsg->fds[0];
+DPRINT("Got slave_fd: %d\n", vmsg->fds[0]);
+dispatcher_add(&dev->dispatcher, dev->slave_fd, dev, slave_receive_cb);
+return 0;
+}
+
 static int
 vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg)
 {
@@ -1166,6 +1228,8 @@ vubr_execute_request(VubrDev *dev, VhostUserMsg *vmsg)
 return vubr_set_vring_enable_exec(dev, vmsg);
 case VHOST_USER_SEND_RARP:
 return vubr_send_rarp_exec(dev, vmsg);
+case VHOST_USER_SET_SLAVE_FD:
+return vubr_set_slave_fd_exec(dev, vmsg);
 
 case VHOST_USER_MAX:
 assert(vmsg->request != VHOST_USER_MAX);
@@ -1226,6 +1290,7 @@ vubr_new(const char *path)
 };
 }
 
+dev->slave_fd = -1;
 /* Init log */
 dev->log_call_fd = -1;
 dev->

Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Paolo Bonzini


On 01/04/2016 13:20, Richard W.M. Jones wrote:
> +# MPTABLE is required by Linux kernel, the others add only a
> +# couple of milliseconds so we might as well have them
> +CONFIG_PIRTABLE=y
> +CONFIG_MPTABLE=y
> +CONFIG_SMBIOS=y
> +CONFIG_ACPI=y
> +

MPTABLE is only required is you don't have ACPI.

Paolo



Re: [Qemu-devel] [PATCH v2] migration: skip sending ram pages released by virtio-balloon driver.

2016-04-01 Thread Jitendra Kolhe
On 3/29/2016 4:18 PM, Paolo Bonzini wrote:
> 
> 
> On 29/03/2016 12:47, Jitendra Kolhe wrote:
>>> Indeed.  It is correct for the main system RAM, but hot-plugged RAM
>>> would also have a zero-based section.offset_within_region.  You need to
>>> add memory_region_get_ram_addr(section.mr), just like the call to
>>> balloon_page adds memory_region_get_ram_ptr(section.mr).
>>>
>>> Paolo
>>
>> I am only interested in the offset from memory region base. 
>> Would below guest PA to host offset work, as we do in
>> address_space_translate_internal()?
>> (Guest pa - section.offset_within_address_space + 
>> section.offset_within_region)
> 
> Yes, that would work.  But I'm not sure why you're not interested in the
> ram_addr_t.
> 

You are right, I was wrongly calculating offsets for hot-plugged
RAMblocks. I will have to use qemu_ram_block_from_host() to get 
ram_addr_t to get correct offsets in the bitmap.

> Paolo
> 



Re: [Qemu-devel] Why is SeaBIOS used with -kernel?

2016-04-01 Thread Richard W.M. Jones

My patch, plus the configuration and comments from your patch,
combined.  Plus I tested it with libguestfs boot-analysis and it works
and is still fast.

Integrating this so it happens automatically when the user adds
-kernel on x86 seems quite complicated.  The only way I could do it
was by adding #ifdef defined(__x86_64__) etc to vl.c, which doesn't
seem very nice.  The problem is the machine type code doesn't know
that you're using -kernel.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
>From 28a442fdf8036bde075c75088f8dae8d8568243a Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" 
Date: Fri, 1 Apr 2016 11:35:09 +0100
Subject: [PATCH] bios: Add fast variant of SeaBIOS for use with -kernel on
 x86.

This commit adds a fast variant of SeaBIOS called 'bios-fast.bin'.

It's designed to be the fastest (also the smallest, but that's not the
main aim) SeaBIOS that is just enough to boot a Linux kernel using the
-kernel option on i686 and x86_64.

This commit does not modify the -kernel option to use this.  You have
to specify it by doing something like this:

  -kernel vmlinuz -bios bios-fast.bin

Signed-off-by: Richard W.M. Jones 
---
 Makefile |  3 ++-
 roms/Makefile|  4 +++-
 roms/config.seabios-fast | 32 
 3 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 roms/config.seabios-fast

diff --git a/Makefile b/Makefile
index 1d076a9..c4e939d 100644
--- a/Makefile
+++ b/Makefile
@@ -389,7 +389,8 @@ common  de-ch  es fo  fr-ca  hu ja  mk  nl-be  
pt  sl tr \
 bepocz
 
 ifdef INSTALL_BLOBS
-BLOBS=bios.bin bios-256k.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \
+BLOBS=bios.bin bios-256k.bin bios-fast.bin \
+sgabios.bin vgabios.bin vgabios-cirrus.bin \
 vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin vgabios-virtio.bin \
 acpi-dsdt.aml \
 ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin 
QEMU,cgthree.bin \
diff --git a/roms/Makefile b/roms/Makefile
index 7bd1252..26b0586 100644
--- a/roms/Makefile
+++ b/roms/Makefile
@@ -61,9 +61,11 @@ default:
@echo "  slof   -- update slof.bin"
@echo "  u-boot.e500-- update u-boot.e500"
 
-bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k
+bios: build-seabios-config-seabios-128k build-seabios-config-seabios-256k \
+  build-seabios-config-seabios-fast
cp seabios/builds/seabios-128k/bios.bin ../pc-bios/bios.bin
cp seabios/builds/seabios-256k/bios.bin ../pc-bios/bios-256k.bin
+   cp seabios/builds/seabios-fast/bios.bin ../pc-bios/bios-fast.bin
 
 seavgabios: $(patsubst %,seavgabios-%,$(vgabios_variants))
 
diff --git a/roms/config.seabios-fast b/roms/config.seabios-fast
new file mode 100644
index 000..fbb1ef8
--- /dev/null
+++ b/roms/config.seabios-fast
@@ -0,0 +1,32 @@
+# The fastest SeaBIOS that can boot Linux using -kernel.
+
+# general stuff
+CONFIG_QEMU=y
+CONFIG_ROM_SIZE=128
+CONFIG_XEN=n
+CONFIG_THREADS=n
+CONFIG_WRITABLE_UPPERMEMORY=y
+
+# no input, no boot menu
+CONFIG_MOUSE=n
+CONFIG_KEYBOARD=n
+CONFIG_BOOTMENU=n
+CONFIG_BOOTSPLASH=n
+
+# hardware support we don't need
+CONFIG_LPT=n
+CONFIG_SERIAL=n
+CONFIG_USB=n
+CONFIG_DRIVES=n
+CONFIG_TCGBIOS=n
+CONFIG_VGAHOOKS=n
+
+# MPTABLE is required by Linux kernel, the others add only a
+# couple of milliseconds so we might as well have them
+CONFIG_PIRTABLE=y
+CONFIG_MPTABLE=y
+CONFIG_SMBIOS=y
+CONFIG_ACPI=y
+
+# no logging
+CONFIG_DEBUG_LEVEL=0
-- 
2.7.4



Re: [Qemu-devel] tcg: reworking tb_invalidated_flag

2016-04-01 Thread Sergey Fedorov
On 01/04/16 14:11, Alex Bennée wrote:
> Sergey Fedorov  writes:
>
>> On 31/03/16 16:37, Alex Bennée wrote:
>>> Sergey Fedorov  writes:
 Looks like no matter which approach we use, it's ultimately necessary to
 ensure all CPUs have exited from translated code before the translation
 buffer may be safely flushed.
>>> One approach would be to have multiple translation contexts with their
>>> own buffers and then you can safely flush TBs if no vCPUs are currently
>>> executing in those regions. But I suspect that is a much more complex
>>> future optimisation.
>> Yes, this is much more complex and its performance impact should be
>> investigated.
>>
>>> Having said that is it safe to flush TBs from a given page if we know
>>> no vCPUs are currently executing in that page? As the execution loop has
>>> to exit the chained TBs as we cross page boundaries we could just keep
>>> account of which vCPUs are currently in which page.
>> It should be safe to invalidate a TB while some other CPU is executing
>> its translated code. But it should be guaranteed that no CPU execute any
>> old TB after tb_flush() since we're going to start reusing those TBs.
>>
>> I see how TB cannot be patched if it spans two pages, is there any on
>> when TCG goto_tb can be generated?
> Do you mean tcg_gen_goto_tb?
>
> AFAIUI all blocks end with goto_tb post-ambles but they should only
> directly jump to another TB if they are in the same page.

Thanks, I see the checks.

Regards,
Sergey



  1   2   3   >