Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

2016-11-16 Thread Stefan Hajnoczi
On Tue, Nov 15, 2016 at 10:38 PM, ashish mittal  wrote:
> On Wed, Sep 28, 2016 at 2:45 PM, Stefan Hajnoczi  wrote:
>> On Tue, Sep 27, 2016 at 09:09:49PM -0700, Ashish Mittal wrote:
>> 5.
>> I don't see any endianness handling or portable alignment of struct
>> fields in the network protocol code.  Binary network protocols need to
>> take care of these issue for portability.  This means libqnio compiled
>> for different architectures will not work.  Do you plan to support any
>> other architectures besides x86?
>>
>
> No, we support only x86 and do not plan to support any other arch.
> Please let me know if this necessitates any changes to the configure
> script.

I think no change to ./configure is necessary.  The library will only
ship on x86 so other platforms will never attempt to compile the code.

>> 6.
>> The networking code doesn't look robust: kvset uses assert() on input
>> from the network so the other side of the connection could cause SIGABRT
>> (coredump), the client uses the msg pointer as the cookie for the
>> response packet so the server can easily crash the client by sending a
>> bogus cookie value, etc.  Even on the client side these things are
>> troublesome but on a server they are guaranteed security issues.  I
>> didn't look into it deeply.  Please audit the code.
>>
>
> By design, our solution on OpenStack platform uses a closed set of
> nodes communicating on dedicated networks. VxHS servers on all the
> nodes are on a dedicated network. Clients (qemu) connects to these
> only after reading the server IP from the XML (read by libvirt). The
> XML cannot be modified without proper access. Therefore, IMO this
> problem would be  relevant only if someone were to use qnio as a
> generic mode of communication/data transfer, but for our use-case, we
> will not run into this problem. Is this explanation acceptable?

No.  The trust model is that the guest is untrusted and in the worst
case may gain code execution in QEMU due to security bugs.

You are assuming block/vxhs.c and libqnio are trusted but that
assumption violates the trust model.

In other words:
1. Guest exploits a security hole inside QEMU and gains code execution
on the host.
2. Guest uses VxHS client file descriptor on host to send a malicious
packet to VxHS server.
3. VxHS server is compromised by guest.
4. Compromised VxHS server sends malicious packets to all other
connected clients.
5. All clients have been compromised.

This means both the VxHS client and server must be robust.  They have
to validate inputs to avoid buffer overflows, assertion failures,
infinite loops, etc.

Stefan



Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode

2016-11-16 Thread Fam Zheng
On Mon, 11/14 16:29, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 16:26, Stefan Hajnoczi wrote:
> > On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote:
> >> QEMU_AIO_POLL_MAX_NS  IOPs
> >>unset31,383
> >>146,860
> >>246,440
> >>435,246
> >>834,973
> >>   1646,794
> >>   3246,729
> >>   6435,520
> >>  12845,902
> > 
> > The environment variable is in nanoseconds.  The range of values you
> > tried are very small (all <1 usec).  It would be interesting to try
> > larger values in the ballpark of the latencies you have traced.  For
> > example 2000, 4000, 8000, 16000, and 32000 ns.
> > 
> > Very interesting that QEMU_AIO_POLL_MAX_NS=1 performs so well without
> > much CPU overhead.
> 
> That basically means "avoid a syscall if you already know there's
> something to do", so in retrospect it's not that surprising.  Still
> interesting though, and it means that the feature is useful even if you
> don't have CPU to waste.

With the "deleted" bug fixed I did a little more testing to understand this.

Setting QEMU_AIO_POLL_MAX_NS=1 doesn't mean run_poll_handlers() will only loop
for 1 ns - the patch only checks at every 1024 polls. The first poll in a
run_poll_handlers() call can hardly succeed, so we poll at least 1024 times.

According to my test, on average each run_poll_handlers() takes ~12000ns, which
is ~160 iterations of the poll loop, before geting a new event (either from
virtio queue or linux-aio, I don't have the ratio here).

So in the worse case (no new event), 1024 iterations is basically (12000 / 160 *
1024) = 76800 ns!

The above is with iodepth=1 and jobs=1.  With iodepth=32 and jobs=1, or
iodepth=8 and jobs=4, the numbers are ~30th poll with 5600ns.

Fam



Re: [Qemu-devel] [PATCH v2] vhost: Update 'ioeventfd_started' with host notifiers

2016-11-16 Thread Felipe Franciosi

> On 16 Nov 2016, at 04:05, Alexey Kardashevskiy  wrote:
> 
> On 11/11/16 01:45, Christian Borntraeger wrote:
>> On 11/09/2016 01:44 PM, Felipe Franciosi wrote:
>>> Following the recent refactor of virtio notfiers [1], more specifically
>>> the patch that uses virtio_bus_set_host_notifier [2] by default, core
>>> virtio code requires 'ioeventfd_started' to be set to true/false when
>>> the host notifiers are configured. Because not all vhost devices were
>>> update (eg. vhost-scsi) to use the new interface, this value is always
>>> set to false.
>>> 
>>> When booting a guest with a vhost-scsi backend controller, SeaBIOS will
>>> initially configure the device which sets all notifiers. The guest will
>>> continue to boot fine until the kernel virtio-scsi driver reinitialises
>>> the device causing a stop followed by another start. Since
>>> ioeventfd_started was never set to true, the 'stop' operation triggered
>>> by virtio_bus_set_host_notifier() will not result in a call to
>>> virtio_pci_ioeventfd_assign(assign=false). This leaves the memory
>>> regions with stale notifiers and results on the next start triggering
>>> the following assertion:
>>> 
>>>  kvm_mem_ioeventfd_add: error adding ioeventfd: File exists
>>>  Aborted
>>> 
>>> This patch updates ioeventfd_started whenever the notifiers are set or
>>> cleared, fixing this issue.
>>> 
>>> Signed-off-by: Felipe Franciosi 
>> 
>> This also fixes vhost-net after reboot on s390/kvm for me
> 
> 
> It does not fix it (the original breakage from e616c2f "virtio: remove
> ioeventfd_disabled altogether") for me:

Can you try Paolo's latest patches for this issue?
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02834.html

Specifically this:
http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02837.html

If that doesn't work, can you please plug a gdb on your qemu and print a stack 
trace once you hit the assertion?

Thanks,
Felipe


> 
> /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \
> -chardev stdio,id=STDIO0,signal=off,mux=on \
> -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \
> -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \
> -enable-kvm -m 2G \
> -kernel /home/aik/t/vml450le \
> -initrd /home/aik/t/le.cpio \
> -netdev tap,id=TAP0,vhost=on,helper=/home/aik/qemu-bridge-helper \
> -device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:00,netdev=TAP0" \
> -smp 16,threads=8 \
> -trace events=qemu_trace_events \
> -machine pseries \
> -L /home/aik/t/qemu-ppc64-bios/
> QEMU PID = 22145
> QEMU 2.7.50 monitor - type 'help' for more information
> (qemu)
> 
> 
> SLOF **
> QEMU Starting
> Build Date = Nov 14 2016 19:13:53
> FW Version = git-9b8945ecbde65b06
> Press "s" to enter Open Firmware.
> 
> Populating /vdevice methods
> Populating /vdevice/nvram@7100
> Populating /vdevice/vty@71000100
> Populating /pci@8002000
> 00  (D) : 1af4 1000virtio [ net ]
> qemu-system-ppc64: /home/aik/p/qemu/memory.c:1940:
> memory_region_del_eventfd: Assertion `i != mr->ioeventfd_nb' failed.
> QEMU pid = 22145 returned -6
> 
> 
> 
> 
> Without this one, the breakage looked different (error would have happened
> lot later, when in the guest kernel):
> 
> 
> 
> SLOF **
> QEMU Starting
> Build Date = Nov 14 2016 19:13:53
> FW Version = git-9b8945ecbde65b06
> Press "s" to enter Open Firmware.
> 
> Populating /vdevice methods
> Populating /vdevice/nvram@7100
> Populating /vdevice/vty@71000100
> Populating /pci@8002000
> 00  (D) : 1af4 1000virtio [ net ]
> No NVRAM common partition, re-initializing...
> Scanning USB
> Using default console: /vdevice/vty@71000100
> ted RAM kernel at 40 (16ef23c bytes) C08FF
>  Welcome to Open Firmware
> 
>  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
>  This program and the accompanying materials are made available
>  under the terms of the BSD License available at
>  http://www.opensource.org/licenses/bsd-license.php
> 
> Booting from memory...
> OF stdout device is: /vdevice/vty@71000100
> Preparing to boot Linux version 4.5.0-le_v4.5_aik@vpl2-kernel
> (a...@vpl2.ozlabs.ibm.com) (gcc version 5.4.1 20160623 (GCC) ) #59 SMP
> 
> [skipping bunch of boring stuff]
> 
> virtio-pci :00:00.0: enabling device (0100 -> 0103)
> HVCS: Driver registered.
> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> brd: module loaded
> loop: module loaded
> Uniform Multi-Platform E-IDE driver
> ide-gd driver 1.18
> ide-cd driver 5.00
> Loading iSCSI transport class v2.0-870.
> Emulex LightPulse Fibre Channel SCSI driver 11.0.0.10.
> Copyright(c) 2004-2015 Emulex.  All rights reserved.
> ipr: IBM Power RAID SCSI Device Driver version: 2.6.3 (October 17, 2015)
> ibmvfc: IBM Virtual Fibre Channel Driver version: 1.0.11 (April 12, 2013)
> rtas_msi: calc quota for :00:0

[Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Thomas Huth
The ppc64 postcopy test does not work with KVM-PR, and it is also
causing annoying warning messages when run on a x86 host. So let's
use KVM here only if we know that we're running with KVM-HV (which
automatically also means that we're running on a ppc64 host), and
fall back to TCG otherwise.

Signed-off-by: Thomas Huth 
---
 tests/postcopy-test.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
index d6613c5..dafe8be 100644
--- a/tests/postcopy-test.c
+++ b/tests/postcopy-test.c
@@ -380,17 +380,21 @@ static void test_migrate(void)
   " -incoming %s",
   tmpfs, bootpath, uri);
 } else if (strcmp(arch, "ppc64") == 0) {
+const char *accel;
+
+/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
+accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
 init_bootfile_ppc(bootpath);
-cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
+cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
   " -name pcsource,debug-threads=on"
   " -serial file:%s/src_serial"
   " -drive file=%s,if=pflash,format=raw",
-  tmpfs, bootpath);
-cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
+  accel, tmpfs, bootpath);
+cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
   " -name pcdest,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -incoming %s",
-  tmpfs, uri);
+  accel, tmpfs, uri);
 } else {
 g_assert_not_reached();
 }
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2016-11-16 Thread Markus Armbruster
ashish mittal  writes:

> Thanks for concluding on this.
>
> I will rearrange the qnio_api.h header accordingly as follows:
>
> +#include "qemu/osdep.h"

Headers should not include osdep.h.

> +#include<=== after osdep.h
> +#include "block/block_int.h"

Including block_int.h in a header is problematic.  Are you sure you need
it?  Will qnio/qnio_api.h ever be included outside block/?

> +#include "qapi/qmp/qerror.h"
> +#include "qapi/qmp/qdict.h"
> +#include "qapi/qmp/qstring.h"
> +#include "trace.h"
> +#include "qemu/uri.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"  < remove

In general, headers should include what they need, but no more.



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Laurent Vivier


On 16/11/2016 09:39, Thomas Huth wrote:
> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> fall back to TCG otherwise.
> 
> Signed-off-by: Thomas Huth 
> ---
>  tests/postcopy-test.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index d6613c5..dafe8be 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -380,17 +380,21 @@ static void test_migrate(void)
>" -incoming %s",
>tmpfs, bootpath, uri);
>  } else if (strcmp(arch, "ppc64") == 0) {
> +const char *accel;
> +
> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";

why not "kvm" instead of "kvm:tcg"?
If it doesn't work it should fail.

Laurent



Re: [Qemu-devel] [PATCH] HACKING: document #include order

2016-11-16 Thread Markus Armbruster
Eric Blake  writes:

> On 11/15/2016 02:29 PM, Stefan Hajnoczi wrote:
>> It was not obvious to me why "qemu/osdep.h" must be the first #include.
>> This documents the rationale and the overall #include order.
>> 
>> Cc: Fam Zheng 
>> Cc: Markus Armbruster 
>> Cc: Eric Blake 
>> Signed-off-by: Stefan Hajnoczi 
>> ---
>>  HACKING | 15 +++
>>  1 file changed, 15 insertions(+)
>> 
>
>> +1.2. Include directives
>> +
>> +Order include directives as follows:
>> +
>> +#include "qemu/osdep.h"  /* Always first... */
>> +#include <...>   /* then system headers... */
>> +#include "..."   /* and finally QEMU headers. */
>> +
>> +The "qemu/osdep.h" header contains preprocessor macros that affect the 
>> behavior
>> +of core system headers like .  It must be the first include so 
>> that
>> +core system headers included by external libraries get the preprocessor 
>> macros
>> +that QEMU depends on.
>
> Might be worth mentioning that only .c files include osdep.h (.h files
> do not need to, because they can only be included by a .c file that has
> already included osdep.h first).

Yes, please, but make it "headers should not include osdep.h".



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Thomas Huth
On 16.11.2016 10:19, Laurent Vivier wrote:
> 
> 
> On 16/11/2016 09:39, Thomas Huth wrote:
>> The ppc64 postcopy test does not work with KVM-PR, and it is also
>> causing annoying warning messages when run on a x86 host. So let's
>> use KVM here only if we know that we're running with KVM-HV (which
>> automatically also means that we're running on a ppc64 host), and
>> fall back to TCG otherwise.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  tests/postcopy-test.c | 12 
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
>> index d6613c5..dafe8be 100644
>> --- a/tests/postcopy-test.c
>> +++ b/tests/postcopy-test.c
>> @@ -380,17 +380,21 @@ static void test_migrate(void)
>>" -incoming %s",
>>tmpfs, bootpath, uri);
>>  } else if (strcmp(arch, "ppc64") == 0) {
>> +const char *accel;
>> +
>> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
>> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
> 
> why not "kvm" instead of "kvm:tcg"?
> If it doesn't work it should fail.

Yes, sounds right. I'll send a v2...

 Thomas




Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2016-11-16 Thread Fam Zheng
On Wed, 11/16 10:04, Markus Armbruster wrote:
> ashish mittal  writes:
> 
> > Thanks for concluding on this.
> >
> > I will rearrange the qnio_api.h header accordingly as follows:
> >
> > +#include "qemu/osdep.h"
> 
> Headers should not include osdep.h.

This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what
Ashish means looks good to me.

Fam

> 
> > +#include<=== after osdep.h
> > +#include "block/block_int.h"
> 
> Including block_int.h in a header is problematic.  Are you sure you need
> it?  Will qnio/qnio_api.h ever be included outside block/?
> 
> > +#include "qapi/qmp/qerror.h"
> > +#include "qapi/qmp/qdict.h"
> > +#include "qapi/qmp/qstring.h"
> > +#include "trace.h"
> > +#include "qemu/uri.h"
> > +#include "qapi/error.h"
> > +#include "qemu/error-report.h"  < remove
> 
> In general, headers should include what they need, but no more.
> 



Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters

2016-11-16 Thread Pavel Dovgalyuk
Kevin,

> From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru]
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben:
> > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben:
> > > > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben:
> > > > > > > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben:
> > > > > > Originally, we only called bdrv_goto_snapshot() for all _top level_
> > > > > > BDSes, and this is still what you normally get. However, if you
> > > > > > explicitly create a BDS (e.g. with its own -drive option), it is
> > > > > > considered a top level BDS without actually being top level for the
> > > > > > guest, and therefore the snapshotting function is called for it.
> > > > > >
> > > > > > Of course, this is highly inefficient because the goto_snapshot 
> > > > > > request
> > > > > > is passed by the filter driver and then called another time for the
> > > > > > lower node, effectively loading the snapshot a second time.
> > >
> > > Maybe double-saving/loading does the smallest damage then?
> > > And we should just document how to use blkreplay effectively?
> > >
> > > > > >
> > > > > > On the other hand if you use a single -drive option to create both 
> > > > > > the
> > > > > > qcow2 BDS and the blkreplay filter, we do need to pass down the
> > > > > > goto_snapshot request because it won't be called for the qcow2 layer
> > > > > > otherwise.
> > > > >
> > > > > How this can be specified in command line?
> > > > > I believed that separate -drive option is required.
> > > >
> > > > Something like this:
> > > >
> > > > -drive driver=blkreplay,image.driver=file,image.filename=test.img
> > > >
> > >
> > > I tried the following command line, but VM does not detect the hard drive
> > > and cannot boot.
> > >
> > > -drive 
> > > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-
> > blkreplay
> > > -device ide-hd,drive=img-blkreplay
> >
> > My command line was assuming a raw image. It looks like you're using a
> > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> > driver:
> >
> > -drive driver=blkreplay,if=none,image.driver=qcow2,\
> > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay
> 
> This doesn't work for some reason. Replay just hangs at some moment.
> 
> Maybe there exists some internal difference between command line with one or 
> two -drive
> options?

I've investigated this issue.
This command line works ok:
 -drive 
driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
 
 -device ide-hd,drive=img-blkreplay

And this does not:
 -drive
driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
,id=img-blkreplay
 -device ide-hd,drive=img-blkreplay

QEMU hangs at some moment of replay.

I found that some dma requests do not pass through the blkreplay driver
due to the following line in block-backend.c:
return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);

This line passes read request directly to qcow driver and blkreplay cannot
process it to make deterministic.

Pavel Dovgalyuk





Re: [Qemu-devel] [PATCH] crypto: add virtio-crypto driver

2016-11-16 Thread Gonglei (Arei)
Hi Michael,

May I should convert all __virtio32/64 to le32/64 in virtio_crypto.h ?


> +#define VIRTIO_CRYPTO_OPCODE(service, op)   (((service) << 8) | (op))
> +
> +struct virtio_crypto_ctrl_header {
> +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02)
> +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03)
> +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02)
> +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03)
> +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02)
> +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03)
> +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02)
> +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \
> +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03)
> + __virtio32 opcode;
> + __virtio32 algo;
> + __virtio32 flag;
> + /* data virtqueue id */
> + __virtio32 queue_id;
> +};
> +
> +struct virtio_crypto_cipher_session_para {
> +#define VIRTIO_CRYPTO_NO_CIPHER 0
> +#define VIRTIO_CRYPTO_CIPHER_ARC4   1
> +#define VIRTIO_CRYPTO_CIPHER_AES_ECB2
> +#define VIRTIO_CRYPTO_CIPHER_AES_CBC3
> +#define VIRTIO_CRYPTO_CIPHER_AES_CTR4
> +#define VIRTIO_CRYPTO_CIPHER_DES_ECB5
> +#define VIRTIO_CRYPTO_CIPHER_DES_CBC6
> +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB   7
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC   8
> +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR   9
> +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8  10
> +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA211
> +#define VIRTIO_CRYPTO_CIPHER_AES_F8 12
> +#define VIRTIO_CRYPTO_CIPHER_AES_XTS13
> +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3   14
> + __virtio32 algo;
> + /* length of key */
> + __virtio32 keylen;
> +
> +#define VIRTIO_CRYPTO_OP_ENCRYPT  1
> +#define VIRTIO_CRYPTO_OP_DECRYPT  2
> + /* encrypt or decrypt */
> + __virtio32 op;
> + __virtio32 padding;
> +};
> +
> +struct virtio_crypto_session_input {
> + /* Device-writable part */
> + __virtio64 session_id;
> + __virtio32 status;
> + __virtio32 padding;
> +};
> +
> +struct virtio_crypto_cipher_session_req {
> + struct virtio_crypto_cipher_session_para para;
> +};
> +
> +struct virtio_crypto_hash_session_para {
> +#define VIRTIO_CRYPTO_NO_HASH0
> +#define VIRTIO_CRYPTO_HASH_MD5   1
> +#define VIRTIO_CRYPTO_HASH_SHA1  2
> +#define VIRTIO_CRYPTO_HASH_SHA_224   3
> +#define VIRTIO_CRYPTO_HASH_SHA_256   4
> +#define VIRTIO_CRYPTO_HASH_SHA_384   5
> +#define VIRTIO_CRYPTO_HASH_SHA_512   6
> +#define VIRTIO_CRYPTO_HASH_SHA3_224  7
> +#define VIRTIO_CRYPTO_HASH_SHA3_256  8
> +#define VIRTIO_CRYPTO_HASH_SHA3_384  9
> +#define VIRTIO_CRYPTO_HASH_SHA3_512  10
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128  11
> +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256  12
> + __virtio32 algo;
> + /* hash result length */
> + __virtio32 hash_result_len;
> +};
> +
> +struct virtio_crypto_hash_create_session_req {
> + struct virtio_crypto_hash_session_para para;
> +};
> +
> +struct virtio_crypto_mac_session_para {
> +#define VIRTIO_CRYPTO_NO_MAC   0
> +#define VIRTIO_CRYPTO_MAC_HMAC_MD5 1
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA12
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224 3
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256 4
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384 5
> +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512 6
> +#define VIRTIO_CRYPTO_MAC_CMAC_3DES25
> +#define VIRTIO_CRYPTO_MAC_CMAC_AES 26
> +#define VIRTIO_CRYPTO_MAC_KASUMI_F927
> +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA2  28
> +#define VIRTIO_CRYPTO_MAC_GMAC_AES 41
> +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH 42
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES   49
> +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9 50
> +#define VIRTIO_CRYPTO_MAC_XCBC_AES 53
> + __virtio32 algo;
> + /* hash result length */
> + __virtio32 hash_result_len;
> + /* length of authenticated key */
> + __virtio32 auth_key_len;
> + __virtio32 padding;
> +};
> +
> +struct virtio_crypto_mac_create_session_req {
> + struct virtio_crypto_mac_session_para para;
> +};
> +
> +struct virtio_crypto_aead_session_para {
> +#define VIRTIO_CRYPTO_NO_AEAD 0
> +#define VIRTIO_CRYPTO_AEAD_GCM1
> +#define VIRTIO_CRYPTO_AEAD_CCM2
> +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305 

[Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Thomas Huth
The ppc64 postcopy test does not work with KVM-PR, and it is also
causing annoying warning messages when run on a x86 host. So let's
use KVM here only if we know that we're running with KVM-HV (which
automatically also means that we're running on a ppc64 host), and
use TCG otherwise.

Signed-off-by: Thomas Huth 
---
 v2:
 - Check also /dev/kvm to make sure that we're allowed to access KVM
 - Use only "accel=kvm" instead of "accel=kvm:tcg" if we feel confident
   that we're running with KVM-HV and can use it

 tests/postcopy-test.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
index d6613c5..e4f0f3f 100644
--- a/tests/postcopy-test.c
+++ b/tests/postcopy-test.c
@@ -380,17 +380,27 @@ static void test_migrate(void)
   " -incoming %s",
   tmpfs, bootpath, uri);
 } else if (strcmp(arch, "ppc64") == 0) {
+const char *accel = "tcg";
+
+/*
+ * We preferably want to test with KVM, but on ppc64, the test only
+ * works with kvm-hv, not with kvm-pr, so we check that here first
+ */
+if (access("/sys/module/kvm_hv", F_OK) == 0 &&
+access("/dev/kvm", R_OK | W_OK) == 0) {
+accel = "kvm";
+}
 init_bootfile_ppc(bootpath);
-cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
+cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
   " -name pcsource,debug-threads=on"
   " -serial file:%s/src_serial"
   " -drive file=%s,if=pflash,format=raw",
-  tmpfs, bootpath);
-cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
+  accel, tmpfs, bootpath);
+cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
   " -name pcdest,debug-threads=on"
   " -serial file:%s/dest_serial"
   " -incoming %s",
-  tmpfs, uri);
+  accel, tmpfs, uri);
 } else {
 g_assert_not_reached();
 }
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Laurent Vivier


On 16/11/2016 11:14, Thomas Huth wrote:
> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> use TCG otherwise.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2:
>  - Check also /dev/kvm to make sure that we're allowed to access KVM

I'm not sure it's a good idea as we will fail silently whereas QEMU
sends an error message. It's common mistake we should be aware of.

Laurent



Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Thomas Huth
On 16.11.2016 11:18, Laurent Vivier wrote:
> 
> 
> On 16/11/2016 11:14, Thomas Huth wrote:
>> The ppc64 postcopy test does not work with KVM-PR, and it is also
>> causing annoying warning messages when run on a x86 host. So let's
>> use KVM here only if we know that we're running with KVM-HV (which
>> automatically also means that we're running on a ppc64 host), and
>> use TCG otherwise.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  v2:
>>  - Check also /dev/kvm to make sure that we're allowed to access KVM
> 
> I'm not sure it's a good idea as we will fail silently whereas QEMU
> sends an error message. It's common mistake we should be aware of.

But if I run "make check" as a normal user who does not have access
right to /dev/kvm, this is IMHO not a fatal error (since this could be
on purpose), thus we should not issue an error message here and simply
use TCG instead.

If you want to see at least a warning in this case, I think we should
rather go with v1 of this patch that used "kvm:tcg".

 Thomas




Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Laurent Vivier


On 16/11/2016 11:26, Thomas Huth wrote:
> On 16.11.2016 11:18, Laurent Vivier wrote:
>>
>>
>> On 16/11/2016 11:14, Thomas Huth wrote:
>>> The ppc64 postcopy test does not work with KVM-PR, and it is also
>>> causing annoying warning messages when run on a x86 host. So let's
>>> use KVM here only if we know that we're running with KVM-HV (which
>>> automatically also means that we're running on a ppc64 host), and
>>> use TCG otherwise.
>>>
>>> Signed-off-by: Thomas Huth 
>>> ---
>>>  v2:
>>>  - Check also /dev/kvm to make sure that we're allowed to access KVM
>>
>> I'm not sure it's a good idea as we will fail silently whereas QEMU
>> sends an error message. It's common mistake we should be aware of.
> 
> But if I run "make check" as a normal user who does not have access
> right to /dev/kvm, this is IMHO not a fatal error (since this could be
> on purpose), thus we should not issue an error message here and simply
> use TCG instead.
> 
> If you want to see at least a warning in this case, I think we should
> rather go with v1 of this patch that used "kvm:tcg".

I think it's better to have a warning, so let's got with v1...

Laurent



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Laurent Vivier


On 16/11/2016 09:39, Thomas Huth wrote:
> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> fall back to TCG otherwise.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Laurent Vivier 

> ---
>  tests/postcopy-test.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index d6613c5..dafe8be 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -380,17 +380,21 @@ static void test_migrate(void)
>" -incoming %s",
>tmpfs, bootpath, uri);
>  } else if (strcmp(arch, "ppc64") == 0) {
> +const char *accel;
> +
> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
>  init_bootfile_ppc(bootpath);
> -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcsource,debug-threads=on"
>" -serial file:%s/src_serial"
>" -drive file=%s,if=pflash,format=raw",
> -  tmpfs, bootpath);
> -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +  accel, tmpfs, bootpath);
> +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcdest,debug-threads=on"
>" -serial file:%s/dest_serial"
>" -incoming %s",
> -  tmpfs, uri);
> +  accel, tmpfs, uri);
>  } else {
>  g_assert_not_reached();
>  }
> 



Re: [Qemu-devel] [PATCH] tcg/mips: Add support for mips64el backend

2016-11-16 Thread James Hogan
Hi Richard,

On Tue, Nov 15, 2016 at 10:37:41PM +0100, Richard Henderson wrote:
> On 11/14/2016 10:33 AM, Jin Guojie wrote:
> > I want listen to your advice. Should I test your v2 patch on Loongson
> > and use it? Or whether it is worth modifying my patch and resubmit it
> > according to your review comments?
> 
> I would like very much if you would test my patch on Loongson (or a 
> re-submission of my patch; I could perhaps prepare that against master in the 
> next few days).
> 
> If it is possible, I would like if you could help fix the problems that 
> Aurelien discovered with my patch.  I have no access to mips hardware myself, 
> so all of the development that I was doing was from within a qemu itself.  As 
> you can imagine, qemu-in-qemu is very very slow.
> 
> At the time I was hoping that people from imgtec would be able to help, but 
> that never came to pass.  Oh well.

I'm up for helping a bit with this (testing & debugging), though I admit
it fell off my radar a bit. We could try and run it up on our kernel
test farm too. Please keep me Cc'd on any future patches :)

Cheers
James


signature.asc
Description: Digital signature


Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2016-11-16 Thread Stefan Hajnoczi
On Wed, Nov 16, 2016 at 9:49 AM, Fam Zheng  wrote:
> On Wed, 11/16 10:04, Markus Armbruster wrote:
>> ashish mittal  writes:
>>
>> > Thanks for concluding on this.
>> >
>> > I will rearrange the qnio_api.h header accordingly as follows:
>> >
>> > +#include "qemu/osdep.h"
>>
>> Headers should not include osdep.h.
>
> This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what
> Ashish means looks good to me.

Yes, I think "will rearrange the qnio_api.h header" was a typo and was
supposed to be block/vxhs.c.

Stefan



Re: [Qemu-devel] [PATCH 1/3] virtio: Basic implementation of virtio pstore driver

2016-11-16 Thread Paolo Bonzini
> Not sure how independent ERST is from ACPI and other specs.  It looks
> like referencing UEFI spec at least.

It is just the format of error records that comes from the UEFI spec
(include/linux/cper.h) but you can ignore it, I think.  It should be
handled by tools on the host side.  For you, the error log address
range contains a CPER header followed by a binary blob.  In practice,
you only need the record length field (bytes 20-23 of the header),
though it may be a good idea to validate the signature at the beginning
of the header.

> Btw, is the ERST used for pstore only (in Linux)?

Yes.  It can store various records, including dmesg and MCE.

There are other examples in QEMU of interfaces with ACPI.  They all use the
DSDT, but the logic is similar.  For example, docs/specs/acpi_mem_hotplug.txt
documents the memory hotplug interface. In all cases, ACPI tables contain small
programs that talk to specialized hardware registers, typically allocated to
hard-coded I/O ports.

In your case, the registers could occupy 16 consecutive I/O ports, like the
following:

 0x00   read/write   operation type (0=write,1=read,2=clear,3=dummy 
write)

 0x01   read-onlybit 7: if set, operation in progress

 bit 0-6: operation status, see "Command Status 
Definition" in
 the ACPI spec

 0x02   read-onlywhen read:

 - read a 64-bit record id from the store to memory,
   from the address that was last written to 0x08.

 - if the id is valid and is not the last id in the 
store,
   write the next 64-bit record id to the same 
address

 - otherwise, write the first record id to the same 
address,
   or 0x if the store is empty

 0x03unused, read as zero

 0x04-0x07  read/write   offset of the error record into the error log 
address range

 0x08-0x0b  read/write   when read, return number of stored records

 when written, the written value is a 32-bit memory 
address,
 which points to a 64-bit location used to 
communicate record ids.

 0x0c-0x0f  read/write   when read, always return -1 (together with the 
"mask" field
 and READ_REGISTER, this lets ERST instructions 
return any value!)

 when written, trigger the pstore operation:

 - if the current operation is a dummy write, do 
nothing

 - if the current operation is a write, write a new 
record, using
 the written value as the base of the error log 
address range.  The
 length must be parsed from the CPER header.

 - if the current operation is a clear, read the 
record id
 from the memory location that was last written to 
0x08 and do the
 operation.  the value written is ignored.

 - if the current operation is a read, read the 
record id from the
 memory location that was last written to 0x08, 
using the written
 value as the base of the error log address range.

In addition, the firmware will need to reserve a few KB of RAM for the error log
address range (I checked a real system and it reserves 8KB).  The first eight
bytes are needed for the record identifier interface, because there's no such
thing as 64-bit I/O ports, and the rest can be used for the actual buffer.

QEMU already has an interface to allocate RAM and patch the address into an
ACPI table (bios_linker_loader_alloc).  Because this interface is actually meant
to load data from QEMU into the firmware (using the "fw_cfg" interface), you
would have to add a dummy 8KB file to fw_cfg using fw_cfg_add_file (for
example "etc/erst-memory"), it can be just full of zeros.

QEMU supports two chipsets, PIIX and ICH9, and the free I/O port ranges are
different.  You could use 0xa20 for ICH9 and 0xae20 for PIIX.

All in all, the contents of the ERST table would not be very different from a
non-virtual system, except that on real hardware the firmware would use SMIs
as the trap mechanism.  You almost have a one-to-one mapping between ERST
actions and registers accesses:

   BEGIN_WRITE_OPERATION  write value 0 to register at 0x00
   BEGIN_READ_OPERATION   write value 1 to register at 0x00
   BEGIN_CLEAR_OPERATION  write value 2 to register at 0x00
   BEGIN_DUMMY_WRITE_OPERATIONwrite value 3 to register at 0x00
   END_OPERATION  no-op
   CHECK_BUSY_STATUS  read register at 0x01 with mask 0x80
   GET_COMMAND_STATUS 

Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Greg Kurz
On Wed, 16 Nov 2016 09:39:31 +0100
Thomas Huth  wrote:

> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> fall back to TCG otherwise.
> 

This patch addresses two issues actually:
- the annoying warning when running on a ppc64 guest on a non-ppc64 host
- the fact that KVM-PR seems to be currently broken

I agree that the former makes sense, but what about the case of running
a x86 guest on a non-x86 host ?

I'm still feeling uncomfortable with the KVM-PR case... is this a workaround
we want to keep until we find out what's going on or are we starting to
partially deprecate KVM PR ? In any case, I guess we should document this
and probably print some meaningful error message.

> Signed-off-by: Thomas Huth 
> ---
>  tests/postcopy-test.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index d6613c5..dafe8be 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -380,17 +380,21 @@ static void test_migrate(void)
>" -incoming %s",
>tmpfs, bootpath, uri);
>  } else if (strcmp(arch, "ppc64") == 0) {
> +const char *accel;
> +
> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
>  init_bootfile_ppc(bootpath);
> -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcsource,debug-threads=on"
>" -serial file:%s/src_serial"
>" -drive file=%s,if=pflash,format=raw",
> -  tmpfs, bootpath);
> -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +  accel, tmpfs, bootpath);
> +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcdest,debug-threads=on"
>" -serial file:%s/dest_serial"
>" -incoming %s",
> -  tmpfs, uri);
> +  accel, tmpfs, uri);
>  } else {
>  g_assert_not_reached();
>  }




Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters

2016-11-16 Thread Paolo Bonzini


> I've investigated this issue.
> This command line works ok:
>  -drive
>  
> driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> And this does not:
>  -drive
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
> ,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> QEMU hangs at some moment of replay.
> 
> I found that some dma requests do not pass through the blkreplay driver
> due to the following line in block-backend.c:
> return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> 
> This line passes read request directly to qcow driver and blkreplay cannot
> process it to make deterministic.

I don't understand, blk->root should be the blkreplay here.

Paolo



Re: [Qemu-devel] [RFC, v1, 1/2] hw/vfio/platform: add hisilicon hnsvf device

2016-11-16 Thread Auger Eric
Hi Rick,

On 21/10/2016 03:22, Rick Song wrote:
> The platform device class has become abstract. This
> patch introduces a hisilicon hnsvf device that derives
> from it.

in https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03401.html
we discussed the relevance to get the platform device non abstract. No
change was submitted though. I can submit something next week except if
you want to submit a patch yourself.

The idea is we would instantiate the vfio platform device using such an
option:

-device vfio-platform-device,compat="hisilicon,hnsvf-v2"

Once such change is accepted, only your second patch will be requested.

Thanks

Eric

> 
> Signed-off-by: Rick Song 
> ---
>  hw/vfio/Makefile.objs |  1 +
>  hw/vfio/hisi-hnsvf.c  | 56 
> +++
>  include/hw/vfio/vfio-hisi-hnsvf.h | 51 +++
>  3 files changed, 108 insertions(+)
>  create mode 100644 hw/vfio/hisi-hnsvf.c
>  create mode 100644 include/hw/vfio/vfio-hisi-hnsvf.h
> 
> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> index c25e32b..d19dffc 100644
> --- a/hw/vfio/Makefile.objs
> +++ b/hw/vfio/Makefile.objs
> @@ -4,5 +4,6 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o
>  obj-$(CONFIG_SOFTMMU) += platform.o
>  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
>  obj-$(CONFIG_SOFTMMU) += amd-xgbe.o
> +obj-$(CONFIG_SOFTMMU) += hisi-hnsvf.o
>  obj-$(CONFIG_SOFTMMU) += spapr.o
>  endif
> diff --git a/hw/vfio/hisi-hnsvf.c b/hw/vfio/hisi-hnsvf.c
> new file mode 100644
> index 000..5b48e27
> --- /dev/null
> +++ b/hw/vfio/hisi-hnsvf.c
> @@ -0,0 +1,56 @@
> +/*
> + * Hisilicon HNS Virtual Function VFIO device
> + *
> + * Copyright Huawei Limited, 2016
> + *
> + * Authors:
> + *  Rick Song 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/vfio/vfio-hisi-hnsvf.h"
> +
> +static void hisi_hnsvf_realize(DeviceState *dev, Error **errp)
> +{
> +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
> +VFIOHisiHnsvfDeviceClass *k = VFIO_HISI_HNSVF_DEVICE_GET_CLASS(dev);
> +
> +vdev->compat = g_strdup("hisilicon,hnsvf-v2");
> +
> +k->parent_realize(dev, errp);
> +}
> +
> +static const VMStateDescription vfio_platform_hisi_hnsvf_vmstate = {
> +.name = TYPE_VFIO_HISI_HNSVF,
> +.unmigratable = 1,
> +};
> +
> +static void vfio_hisi_hnsvf_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +VFIOHisiHnsvfDeviceClass *vcxc =
> +VFIO_HISI_HNSVF_DEVICE_CLASS(klass);
> +vcxc->parent_realize = dc->realize;
> +dc->realize = hisi_hnsvf_realize;
> +dc->desc = "VFIO HISI HNSVF";
> +dc->vmsd = &vfio_platform_hisi_hnsvf_vmstate;
> +}
> +
> +static const TypeInfo vfio_hisi_hnsvf_dev_info = {
> +.name = TYPE_VFIO_HISI_HNSVF,
> +.parent = TYPE_VFIO_PLATFORM,
> +.instance_size = sizeof(VFIOHisiHnsvfDevice),
> +.class_init = vfio_hisi_hnsvf_class_init,
> +.class_size = sizeof(VFIOHisiHnsvfDeviceClass),
> +};
> +
> +static void register_hisi_hnsvf_dev_type(void)
> +{
> +type_register_static(&vfio_hisi_hnsvf_dev_info);
> +}
> +
> +type_init(register_hisi_hnsvf_dev_type)
> diff --git a/include/hw/vfio/vfio-hisi-hnsvf.h 
> b/include/hw/vfio/vfio-hisi-hnsvf.h
> new file mode 100644
> index 000..9208656
> --- /dev/null
> +++ b/include/hw/vfio/vfio-hisi-hnsvf.h
> @@ -0,0 +1,51 @@
> +/*
> + * VFIO Hisilicon HNS Virtual Function device
> + *
> + * Copyright Hisilicon Limited, 2016
> + *
> + * Authors:
> + *  Rick Song 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef HW_VFIO_VFIO_HISI_HNSVF_H
> +#define HW_VFIO_VFIO_HISI_HNSVF_H
> +
> +#include "hw/vfio/vfio-platform.h"
> +
> +#define TYPE_VFIO_HISI_HNSVF "vfio-hisi-hnsvf"
> +
> +/**
> + * This device exposes:
> + * - 5 MMIO regions: MAC, PCS, SerDes Rx/Tx regs,
> + SerDes Integration Registers 1/2 & 2/2
> + * - 2 level sensitive IRQs and optional DMA channel IRQs
> + */
> +struct VFIOHisiHnsvfDevice {
> +VFIOPlatformDevice vdev;
> +};
> +
> +typedef struct VFIOHisiHnsvfDevice VFIOHisiHnsvfDevice;
> +
> +struct VFIOHisiHnsvfDeviceClass {
> +/*< private >*/
> +VFIOPlatformDeviceClass parent_class;
> +/*< public >*/
> +DeviceRealize parent_realize;
> +};
> +
> +typedef struct VFIOHisiHnsvfDeviceClass VFIOHisiHnsvfDeviceClass;
> +
> +#define VFIO_HISI_HNSVF_DEVICE(obj) \
> + OBJECT_CHECK(VFIOHisiHnsvfDevice, (obj), TYPE_VFIO_HISI_HNSVF)
> +#define VFIO_HISI_HNSVF_DEVICE_CLASS(klass) \
> + OBJECT_CLASS_CHECK(VFIOHisiHnsvfDeviceClass, (klass), \
> +TYPE_VFIO_HISI_HNSVF)
> +#define VFIO_HISI_HNSVF_DEVICE_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(VFIOHisiHnsvfDeviceClass, (obj), \
> +  TYPE_VFIO_HISI_HNSVF)
> +
> +#

Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters

2016-11-16 Thread Kevin Wolf
Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben:
> > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru]
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > My command line was assuming a raw image. It looks like you're using a
> > > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2
> > > driver:
> > >
> > > -drive driver=blkreplay,if=none,image.driver=qcow2,\
> > > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay
> > 
> > This doesn't work for some reason. Replay just hangs at some moment.
> > 
> > Maybe there exists some internal difference between command line with one 
> > or two -drive
> > options?
> 
> I've investigated this issue.
> This command line works ok:
>  -drive 
> driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
>  
>  -device ide-hd,drive=img-blkreplay
> 
> And this does not:
>  -drive
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow
> ,id=img-blkreplay
>  -device ide-hd,drive=img-blkreplay
> 
> QEMU hangs at some moment of replay.
> 
> I found that some dma requests do not pass through the blkreplay driver
> due to the following line in block-backend.c:
> return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> 
> This line passes read request directly to qcow driver and blkreplay cannot
> process it to make deterministic.

How does that bypass blkreplay? blk->root is supposed to be the blkreply
node, do you see something different? If it were the qcow2 node, then I
would expect that no requests at all go through the blkreplay layer.

Kevin



Re: [Qemu-devel] [RFC, v1, 2/2] hw/arm/sysbus-fdt: enable vfio-hisi-hnsvf dynamic instantiation

2016-11-16 Thread Auger Eric
Hi,

On 21/10/2016 03:22, Rick Song wrote:
> This patch allows the instantiation of the vfio-hisi-hnsvf device
> from the QEMU command line (-device vfio-hisi-hnsvf,host="").
> A specialized device tree node is created for the guest, containing
> compat, dma-coherent, reg and interrupts properties.

For additional devices, Peter requested we re-structured the
sysbus-fdt.c file to avoid it gets too large. We need to define relevant
helpers and put node creation function elsewhere. Similarly I can
propose something next week except if you want to do it.

Thanks

Eric
> 
> Signed-off-by: Rick Song 
> ---
>  hw/arm/sysbus-fdt.c | 71 
> +
>  1 file changed, 71 insertions(+)
> 
> diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
> index d68e3dc..207586f 100644
> --- a/hw/arm/sysbus-fdt.c
> +++ b/hw/arm/sysbus-fdt.c
> @@ -36,6 +36,7 @@
>  #include "hw/vfio/vfio-platform.h"
>  #include "hw/vfio/vfio-calxeda-xgmac.h"
>  #include "hw/vfio/vfio-amd-xgbe.h"
> +#include "hw/vfio/vfio-hisi-hnsvf.h"
>  #include "hw/arm/fdt.h"
>  
>  /*
> @@ -413,6 +414,75 @@ static int add_amd_xgbe_fdt_node(SysBusDevice *sbdev, 
> void *opaque)
>  return 0;
>  }
>  
> +/**
> + * add_hisi_hnsvf_fdt_node
> + *
> + * Generates a simple node with following properties:
> + * compatible string, regs, interrupts, dma-coherent
> + */
> +static int add_hisi_hnsvf_fdt_node(SysBusDevice *sbdev, void *opaque)
> +{
> +PlatformBusFDTData *data = opaque;
> +PlatformBusDevice *pbus = data->pbus;
> +void *fdt = data->fdt;
> +const char *parent_node = data->pbus_node_name;
> +int compat_str_len, i;
> +char *nodename;
> +uint32_t *irq_attr, *reg_attr;
> +uint64_t mmio_base, irq_number;
> +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +VFIODevice *vbasedev = &vdev->vbasedev;
> +VFIOINTp *intp;
> +
> +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
> +nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +   vbasedev->name, mmio_base);
> +qemu_fdt_add_subnode(fdt, nodename);
> +
> +compat_str_len = strlen(vdev->compat) + 1;
> +qemu_fdt_setprop(fdt, nodename, "compatible",
> +  vdev->compat, compat_str_len);
> +
> +qemu_fdt_setprop(fdt, nodename, "dma-coherent", "", 0);
> +
> +reg_attr = g_new(uint32_t, vbasedev->num_regions * 2);
> +for (i = 0; i < vbasedev->num_regions; i++) {
> +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
> +reg_attr[2 * i] = cpu_to_be32(mmio_base);
> +reg_attr[2 * i + 1] = cpu_to_be32(
> +memory_region_size(vdev->regions[i]->mem));
> +}
> +qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
> + vbasedev->num_regions * 2 * sizeof(uint32_t));
> +
> +irq_attr = g_new(uint32_t, vbasedev->num_irqs * 3);
> +for (i = 0; i < vbasedev->num_irqs; i++) {
> +irq_number = platform_bus_get_irqn(pbus, sbdev , i)
> + + data->irq_start;
> +irq_attr[3 * i] = cpu_to_be32(GIC_FDT_IRQ_TYPE_SPI);
> +irq_attr[3 * i + 1] = cpu_to_be32(irq_number);
> +
> +QLIST_FOREACH(intp, &vdev->intp_list, next) {
> +if (intp->pin == i) {
> +break;
> +}
> +}
> +
> +if (intp->flags & VFIO_IRQ_INFO_AUTOMASKED) {
> +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_LEVEL_HI);
> +} else {
> +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
> +}
> +}
> +qemu_fdt_setprop(fdt, nodename, "interrupts",
> + irq_attr, vbasedev->num_irqs * 3 * sizeof(uint32_t));
> +g_free(irq_attr);
> +g_free(reg_attr);
> +g_free(nodename);
> +return 0;
> +
> +}
> +
>  #endif /* CONFIG_LINUX */
>  
>  /* list of supported dynamic sysbus devices */
> @@ -420,6 +490,7 @@ static const NodeCreationPair add_fdt_node_functions[] = {
>  #ifdef CONFIG_LINUX
>  {TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node},
>  {TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node},
> +{TYPE_VFIO_HISI_HNSVF, add_hisi_hnsvf_fdt_node},
>  #endif
>  {"", NULL}, /* last element */
>  };
> 



Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Igor Mammedov
On Tue, 15 Nov 2016 15:34:45 -0200
Eduardo Habkost  wrote:

> On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote:
> [...]
> > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data)
> >  if (pcms->fw_cfg) {
> >  pc_build_smbios(pcms->fw_cfg);
> >  pc_build_feature_control_file(pcms);
> > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */
> > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> >  }
> >  
> >  if (pcms->apic_id_limit > 255) {
> > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms)
> >  assert(MACHINE(pcms)->kernel_filename != NULL);
> >  
> >  fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
> > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> >  rom_set_fw(fw_cfg);
> >  
> >  load_linux(pcms, fw_cfg);
> > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
> >  }
> >  }
> >  
> > +/* increment the number of CPUs */
> > +pcms->boot_cpus++;
> >  if (dev->hotplugged) {
> > -/* increment the number of CPUs */
> > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 
> > 1);
> > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> >  }
> >  
> >  found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
> > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler 
> > *hotplug_dev,
> >  found_cpu->cpu = NULL;
> >  object_unparent(OBJECT(dev));
> >  
> > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1);
> > +/* decrement the number of CPUs */
> > +pcms->boot_cpus--;
> > +/* Update the number of CPUs in CMOS */
> > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);  
> 
> Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug,
> too?
Indeed, it should be updated
otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs
if CPUs count is above 256.

the same bug has been present in the reverted
"pc: Add 'etc/boot-cpus'  fw_cfg file for machine with more than 255 CPUs"

Thanks for noticing it!
I'll post v3 as reply to this thread.



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Dr. David Alan Gilbert
* Greg Kurz (gr...@kaod.org) wrote:
> On Wed, 16 Nov 2016 09:39:31 +0100
> Thomas Huth  wrote:
> 
> > The ppc64 postcopy test does not work with KVM-PR, and it is also
> > causing annoying warning messages when run on a x86 host. So let's
> > use KVM here only if we know that we're running with KVM-HV (which
> > automatically also means that we're running on a ppc64 host), and
> > fall back to TCG otherwise.
> > 
> 
> This patch addresses two issues actually:
> - the annoying warning when running on a ppc64 guest on a non-ppc64 host
> - the fact that KVM-PR seems to be currently broken
> 
> I agree that the former makes sense, but what about the case of running
> a x86 guest on a non-x86 host ?
> 
> I'm still feeling uncomfortable with the KVM-PR case... is this a workaround
> we want to keep until we find out what's going on or are we starting to
> partially deprecate KVM PR ? In any case, I guess we should document this
> and probably print some meaningful error message.

This is certainly a work around for now, it doesn't suggest anything about
deprecation.

Dave

> > Signed-off-by: Thomas Huth 
> > ---
> >  tests/postcopy-test.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> > index d6613c5..dafe8be 100644
> > --- a/tests/postcopy-test.c
> > +++ b/tests/postcopy-test.c
> > @@ -380,17 +380,21 @@ static void test_migrate(void)
> >" -incoming %s",
> >tmpfs, bootpath, uri);
> >  } else if (strcmp(arch, "ppc64") == 0) {
> > +const char *accel;
> > +
> > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr 
> > */
> > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
> >  init_bootfile_ppc(bootpath);
> > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
> >" -name pcsource,debug-threads=on"
> >" -serial file:%s/src_serial"
> >" -drive file=%s,if=pflash,format=raw",
> > -  tmpfs, bootpath);
> > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> > +  accel, tmpfs, bootpath);
> > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
> >" -name pcdest,debug-threads=on"
> >" -serial file:%s/dest_serial"
> >" -incoming %s",
> > -  tmpfs, uri);
> > +  accel, tmpfs, uri);
> >  } else {
> >  g_assert_not_reached();
> >  }
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH] HACKING: document #include order

2016-11-16 Thread Stefan Hajnoczi
On Wed, Nov 16, 2016 at 9:39 AM, Markus Armbruster  wrote:
> Eric Blake  writes:
>
>> On 11/15/2016 02:29 PM, Stefan Hajnoczi wrote:
>>> It was not obvious to me why "qemu/osdep.h" must be the first #include.
>>> This documents the rationale and the overall #include order.
>>>
>>> Cc: Fam Zheng 
>>> Cc: Markus Armbruster 
>>> Cc: Eric Blake 
>>> Signed-off-by: Stefan Hajnoczi 
>>> ---
>>>  HACKING | 15 +++
>>>  1 file changed, 15 insertions(+)
>>>
>>
>>> +1.2. Include directives
>>> +
>>> +Order include directives as follows:
>>> +
>>> +#include "qemu/osdep.h"  /* Always first... */
>>> +#include <...>   /* then system headers... */
>>> +#include "..."   /* and finally QEMU headers. */
>>> +
>>> +The "qemu/osdep.h" header contains preprocessor macros that affect the 
>>> behavior
>>> +of core system headers like .  It must be the first include so 
>>> that
>>> +core system headers included by external libraries get the preprocessor 
>>> macros
>>> +that QEMU depends on.
>>
>> Might be worth mentioning that only .c files include osdep.h (.h files
>> do not need to, because they can only be included by a .c file that has
>> already included osdep.h first).
>
> Yes, please, but make it "headers should not include osdep.h".

Will send v2.

Stefan



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Greg Kurz
On Wed, 16 Nov 2016 12:24:50 +
"Dr. David Alan Gilbert"  wrote:

> * Greg Kurz (gr...@kaod.org) wrote:
> > On Wed, 16 Nov 2016 09:39:31 +0100
> > Thomas Huth  wrote:
> >   
> > > The ppc64 postcopy test does not work with KVM-PR, and it is also
> > > causing annoying warning messages when run on a x86 host. So let's
> > > use KVM here only if we know that we're running with KVM-HV (which
> > > automatically also means that we're running on a ppc64 host), and
> > > fall back to TCG otherwise.
> > >   
> > 
> > This patch addresses two issues actually:
> > - the annoying warning when running on a ppc64 guest on a non-ppc64 host
> > - the fact that KVM-PR seems to be currently broken
> > 
> > I agree that the former makes sense, but what about the case of running
> > a x86 guest on a non-x86 host ?
> > 
> > I'm still feeling uncomfortable with the KVM-PR case... is this a workaround
> > we want to keep until we find out what's going on or are we starting to
> > partially deprecate KVM PR ? In any case, I guess we should document this
> > and probably print some meaningful error message.  
> 
> This is certainly a work around for now, it doesn't suggest anything about
> deprecation.
> 

Well it doesn't suggest anything actually, it just silently skips KVM PR...
I would at least expect a comment in the code mentioning this is a
workaround and maybe an explicit warning for the user. If the user really
wants to run this test with KVM on ppc64, then she should ensure it is
KVM HV.

Cheers.

--
Greg

> Dave
> 
> > > Signed-off-by: Thomas Huth 
> > > ---
> > >  tests/postcopy-test.c | 12 
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> > > index d6613c5..dafe8be 100644
> > > --- a/tests/postcopy-test.c
> > > +++ b/tests/postcopy-test.c
> > > @@ -380,17 +380,21 @@ static void test_migrate(void)
> > >" -incoming %s",
> > >tmpfs, bootpath, uri);
> > >  } else if (strcmp(arch, "ppc64") == 0) {
> > > +const char *accel;
> > > +
> > > +/* On ppc64, the test only works with kvm-hv, but not with 
> > > kvm-pr */
> > > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
> > >  init_bootfile_ppc(bootpath);
> > > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> > > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
> > >" -name pcsource,debug-threads=on"
> > >" -serial file:%s/src_serial"
> > >" -drive file=%s,if=pflash,format=raw",
> > > -  tmpfs, bootpath);
> > > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> > > +  accel, tmpfs, bootpath);
> > > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
> > >" -name pcdest,debug-threads=on"
> > >" -serial file:%s/dest_serial"
> > >" -incoming %s",
> > > -  tmpfs, uri);
> > > +  accel, tmpfs, uri);
> > >  } else {
> > >  g_assert_not_reached();
> > >  }  
> >   
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Eduardo Habkost
On Wed, Nov 16, 2016 at 01:24:11PM +0100, Igor Mammedov wrote:
> On Tue, 15 Nov 2016 15:34:45 -0200
> Eduardo Habkost  wrote:
> 
> > On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote:
> > [...]
> > > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data)
> > >  if (pcms->fw_cfg) {
> > >  pc_build_smbios(pcms->fw_cfg);
> > >  pc_build_feature_control_file(pcms);
> > > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */
> > > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> > >  }
> > >  
> > >  if (pcms->apic_id_limit > 255) {
> > > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms)
> > >  assert(MACHINE(pcms)->kernel_filename != NULL);
> > >  
> > >  fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
> > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> > >  rom_set_fw(fw_cfg);
> > >  
> > >  load_linux(pcms, fw_cfg);
> > > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler 
> > > *hotplug_dev,
> > >  }
> > >  }
> > >  
> > > +/* increment the number of CPUs */
> > > +pcms->boot_cpus++;
> > >  if (dev->hotplugged) {
> > > -/* increment the number of CPUs */
> > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) 
> > > + 1);
> > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> > >  }
> > >  
> > >  found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
> > > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler 
> > > *hotplug_dev,
> > >  found_cpu->cpu = NULL;
> > >  object_unparent(OBJECT(dev));
> > >  
> > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1);
> > > +/* decrement the number of CPUs */
> > > +pcms->boot_cpus--;
> > > +/* Update the number of CPUs in CMOS */
> > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);  
> > 
> > Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug,
> > too?
> Indeed, it should be updated
> otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs
> if CPUs count is above 256.
> 
> the same bug has been present in the reverted
> "pc: Add 'etc/boot-cpus'  fw_cfg file for machine with more than 255 CPUs"

The "etc/boot-cpus" patch changed boot_cpus_le on the plug/unplug
callbacks.


> Thanks for noticing it!
> I'll post v3 as reply to this thread.

Thanks!

-- 
Eduardo



Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Paolo Bonzini

> If the consensus is that the patch is a QEMU bugfix (as opposed to a
> feature) and that it is eligible for the currently supported upstream
> stable branches, that's the best, no doubt.

The currently supported upstream stable branches is just 2.7. :)

I'm okay with bending the rules and including it in 2.8, but it's
worrisome that you also needed to go back from relaxed to traditional
delivery, meaning that old QEMU + new OVMF will take ages to boot.

If this is the case, I still think this needs some kind of discovery
mechanism, unless OVMF can just say "things were too broken, stop
supporting SMM on QEMUs older than 2.8".

For example:

- OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
setting is used for the PCD; this would be backwards compatibility mode.

- we could have another magic 0xB2 value, which is implemented directly
in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
to detect the new feature.  It can fail to start if using traditional
AP and the new feature is not there.

By the way, in case OVMF needs to use SmmSwDispatch in the future, I
would make QEMU use broadcast behavior for all values in the 0x10-0xff
range, or something like that.

Paolo

> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
> correct; when I was writing the OVMF docs, I must have misunderstood the
> requirements and needlessly required 2.5+; 2.4+ should have been fine.)
> 
> Which means the fix should be backported as far as stable-2.4.
> 
> Should we proceed with that? CC'ing Mike Roth and the stable list.
> 
> Thanks!
> Laszlo
> 
> > 
> > 
> >>>
> >>> Paolo
> >>>
>  ---
>   hw/isa/lpc_ich9.c | 12 +++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
> 
>  diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>  index 10d1ee8b9310..f2fe644fdaa4 100644
>  --- a/hw/isa/lpc_ich9.c
>  +++ b/hw/isa/lpc_ich9.c
>  @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool
>  smm_enabled)
>   
>   /* APM */
>   
>  +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
>  +
>   static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
>   {
>   ICH9LPCState *lpc = arg;
>  @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
>  void *arg)
>   
>   /* SMI_EN = PMBASE + 30. SMI control and enable register */
>   if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
>  -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>  +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
>  +CPUState *cs;
>  +
>  +CPU_FOREACH(cs) {
>  +cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>  +}
>  +} else {
>  +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>  +}
>   }
>   }
>   
> 
> 
> 



Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Andrew Jones
On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
> 
> 
> On 11/14/2016 09:12 AM, Christopher Covington wrote:
> > Hi Drew, Wei,
> > 
> > On 11/14/2016 05:05 AM, Andrew Jones wrote:
> >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
> >>>
> >>>
> >>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>  On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> > From: Christopher Covington 
> >
> > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> > even for the smallest delta of two subsequent reads.
> >
> > Signed-off-by: Christopher Covington 
> > Signed-off-by: Wei Huang 
> > ---
> >  arm/pmu.c | 98 
> > +++
> >  1 file changed, 98 insertions(+)
> >
> > diff --git a/arm/pmu.c b/arm/pmu.c
> > index 0b29088..d5e3ac3 100644
> > --- a/arm/pmu.c
> > +++ b/arm/pmu.c
> > @@ -14,6 +14,7 @@
> >   */
> >  #include "libcflat.h"
> >  
> > +#define PMU_PMCR_E (1 << 0)
> >  #define PMU_PMCR_N_SHIFT   11
> >  #define PMU_PMCR_N_MASK0x1f
> >  #define PMU_PMCR_ID_SHIFT  16
> > @@ -21,6 +22,10 @@
> >  #define PMU_PMCR_IMP_SHIFT 24
> >  #define PMU_PMCR_IMP_MASK  0xff
> >  
> > +#define PMU_CYCLE_IDX  31
> > +
> > +#define NR_SAMPLES 10
> > +
> >  #if defined(__arm__)
> >  static inline uint32_t pmcr_read(void)
> >  {
> > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
> > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
> > return ret;
> >  }
> > +
> > +static inline void pmcr_write(uint32_t value)
> > +{
> > +   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> > +}
> > +
> > +static inline void pmselr_write(uint32_t value)
> > +{
> > +   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> > +}
> > +
> > +static inline void pmxevtyper_write(uint32_t value)
> > +{
> > +   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> > +}
> > +
> > +/*
> > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
> > returning 64
> > + * bits doesn't seem worth the trouble when differential usage of the 
> > result is
> > + * expected (with differences that can easily fit in 32 bits). So just 
> > return
> > + * the lower 32 bits of the cycle count in AArch32.
> 
>  Like I said in the last review, I'd rather we not do this. We should
>  return the full value and then the test case should confirm the upper
>  32 bits are zero.
> >>>
> >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> >>> register. We can force it to a more coarse-grained cycle counter with
> >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
> > 
> > AArch32 System Register Descriptions
> > Performance Monitors registers
> > PMCCNTR, Performance Monitors Cycle Count Register
> > 
> > To access the PMCCNTR when accessing as a 32-bit register:
> > MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> > MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are 
> > unchanged
> > 
> > To access the PMCCNTR when accessing as a 64-bit register:
> > MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] 
> > into Rt2
> > MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to 
> > PMCCNTR[63:32]
> > 
> 
> Thanks. I did some research based on your info and came back with the
> following proposals (Cov, correct me if I am wrong):
> 
> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
> think this 64-bit cycle register is only available when running under
> aarch32 compatibility mode on ARMv8 because it is not specified in A15
> TRM.

OK, I hadn't realized that there would be differences between v7 and
AArch32. It looks like we need to add a function to the kvm-unit-tests
framework that enables unit tests to make that distinction, because we'll
want to explicitly test those differences in order to flush out emulation
bugs. I see now that Appendix K5 of the v8 ARM ARM lists some differences,
but this PMCCNTR difference isn't there...

As v8-A32 is an update/extension of v7-A, I'd expect there to be a RES0
bit in some v7 ID register that, on v8, is no longer reserved and a 1.
Unfortunately I just did some ARM doc skimming but can't find anything
like that. As we currently only use the cortex-a15 for our v7 processor,
then I guess we can just check MIDR, but yuck. Anyway, I'll send a
patch for that.

> To further verify it, I tested 32-bit pmu code on QEMU with TCG
> mode. The result is: accessing 64-bit PMCCNTR using the following
> assembly failed on A15:
> 
>volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
> or
>volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));
> 
> Given this difference, I think there are 

Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Igor Mammedov
On Wed, 16 Nov 2016 10:39:33 -0200
Eduardo Habkost  wrote:

> On Wed, Nov 16, 2016 at 01:24:11PM +0100, Igor Mammedov wrote:
> > On Tue, 15 Nov 2016 15:34:45 -0200
> > Eduardo Habkost  wrote:
> >   
> > > On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote:
> > > [...]  
> > > > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void 
> > > > *data)
> > > >  if (pcms->fw_cfg) {
> > > >  pc_build_smbios(pcms->fw_cfg);
> > > >  pc_build_feature_control_file(pcms);
> > > > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */
> > > > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, 
> > > > pcms->boot_cpus);
> > > >  }
> > > >  
> > > >  if (pcms->apic_id_limit > 255) {
> > > > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms)
> > > >  assert(MACHINE(pcms)->kernel_filename != NULL);
> > > >  
> > > >  fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
> > > > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
> > > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
> > > >  rom_set_fw(fw_cfg);
> > > >  
> > > >  load_linux(pcms, fw_cfg);
> > > > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler 
> > > > *hotplug_dev,
> > > >  }
> > > >  }
> > > >  
> > > > +/* increment the number of CPUs */
> > > > +pcms->boot_cpus++;
> > > >  if (dev->hotplugged) {
> > > > -/* increment the number of CPUs */
> > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 
> > > > 0x5f) + 1);
> > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> > > >  }
> > > >  
> > > >  found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
> > > > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler 
> > > > *hotplug_dev,
> > > >  found_cpu->cpu = NULL;
> > > >  object_unparent(OBJECT(dev));
> > > >  
> > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 
> > > > 1);
> > > > +/* decrement the number of CPUs */
> > > > +pcms->boot_cpus--;
> > > > +/* Update the number of CPUs in CMOS */
> > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
> > > 
> > > Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug,
> > > too?  
> > Indeed, it should be updated
> > otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs
> > if CPUs count is above 256.
> > 
> > the same bug has been present in the reverted
> > "pc: Add 'etc/boot-cpus'  fw_cfg file for machine with more than 255 CPUs"  
> 
> The "etc/boot-cpus" patch changed boot_cpus_le on the plug/unplug
> callbacks.
Ah yes,
I've forgotten that boot_cpus_le has been directly accessible by fwcfg

> 
> 
> > Thanks for noticing it!
> > I'll post v3 as reply to this thread.  
> 
> Thanks!
> 




[Qemu-devel] [PATCH for-2.8 v3 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Igor Mammedov
Signed-off-by: Igor Mammedov 
---
v3:
  - Update FW_CFG_NB_CPUS on CPU hot(un)plug to avoid
hang in BIOS on reboot if number of CPUs is over 256
(Eduardo)
---
 include/hw/i386/pc.h |  2 ++
 hw/i386/pc.c | 44 +++-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index e32e957..67a1a9e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -36,6 +36,7 @@
 /**
  * PCMachineState:
  * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
+ * @boot_cpus: number of present VCPUs
  */
 struct PCMachineState {
 /*< private >*/
@@ -70,6 +71,7 @@ struct PCMachineState {
 bool apic_xrupt_override;
 unsigned apic_id_limit;
 CPUArchIdList *possible_cpus;
+uint16_t boot_cpus;
 
 /* NUMA information: */
 uint64_t numa_nodes;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5aeae7d..677a594 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -744,7 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
PCMachineState *pcms)
 int i, j;
 
 fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
-fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 
 /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
  *
@@ -1087,17 +1087,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
-static int pc_present_cpus_count(PCMachineState *pcms)
-{
-int i, boot_cpus = 0;
-for (i = 0; i < pcms->possible_cpus->len; i++) {
-if (pcms->possible_cpus->cpus[i].cpu) {
-boot_cpus++;
-}
-}
-return boot_cpus;
-}
-
 static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id,
   Error **errp)
 {
@@ -1234,6 +1223,19 @@ static void pc_build_feature_control_file(PCMachineState 
*pcms)
 fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, 
sizeof(*val));
 }
 
+static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
+{
+if (cpus_count > 0xff) {
+/* If the number of CPUs can't be represented in 8 bits, the
+ * BIOS must use "FW_CFG_NB_CPUS". Set RTC field to 0 just
+ * to make old BIOSes fail more predictably.
+ */
+rtc_set_memory(rtc, 0x5f, 0);
+} else {
+rtc_set_memory(rtc, 0x5f, cpus_count - 1);
+}
+}
+
 static
 void pc_machine_done(Notifier *notifier, void *data)
 {
@@ -1242,7 +1244,7 @@ void pc_machine_done(Notifier *notifier, void *data)
 PCIBus *bus = pcms->bus;
 
 /* set the number of CPUs */
-rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1);
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
 
 if (bus) {
 int extra_hosts = 0;
@@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data)
 if (pcms->fw_cfg) {
 pc_build_smbios(pcms->fw_cfg);
 pc_build_feature_control_file(pcms);
+/* update FW_CFG_NB_CPUS to account for -device added CPUs */
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 }
 
 if (pcms->apic_id_limit > 255) {
@@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms)
 assert(MACHINE(pcms)->kernel_filename != NULL);
 
 fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
-fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 rom_set_fw(fw_cfg);
 
 load_linux(pcms, fw_cfg);
@@ -1824,9 +1828,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 }
 }
 
+/* increment the number of CPUs */
+pcms->boot_cpus++;
 if (dev->hotplugged) {
-/* increment the number of CPUs */
-rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 }
 
 found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
@@ -1880,7 +1886,11 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
 found_cpu->cpu = NULL;
 object_unparent(OBJECT(dev));
 
-rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1);
+/* decrement the number of CPUs */
+pcms->boot_cpus--;
+/* Update the number of CPUs in CMOS */
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
  out:
 error_propagate(errp, local_err);
 }
-- 
2.7.4




Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support

2016-11-16 Thread Jiri Denemark
On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote:
> CCing qemu-devel.
> 
> CCing Markus, in case he has any insights about the interface
> introspection.
> 
> On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote:
> > On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote:
> > > On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote:
> > > > cpu features are passed to the qemu command with feature=on/off
> > > > instead of +/-feature.
> > > > 
> > > > Signed-off-by: Collin L. Walling 
> > > 
> > > If I'm not mistaken, the "feature=on|off" syntax was added on
> > > QEMU 2.0.0. Does current libvirt support older QEMU versions?
> > 
> > Of course it does. I'd love to switch to feature=on|off, but how can we
> > check if QEMU supports it? We can't really start using this syntax
> > without it.
> 
> Actually, I was wrong, this was added in v2.4.0. "feat=on|off"
> needs two things to work (in x86):
> 
> * Translation of all "foo=bar" options to QOM property setting.
>   This was added in v2.0.0-rc0~162^2
> * The actual QOM properties for feature names to be present. They
>   were added in v2.4.0-rc0~101^2~1
> 
> So you can be sure "feat=on" is supported by checking if the
> feature flags are present in device-list-properties output for
> the CPU model. But device-list-properties is also messy[1].
> 
> Maybe we can use the availability of query-cpu-model-expansion to
> check if we can safely use the new "feat=on|off" system? It's
> easier than taking all the variables above into account.

Yeah, this could work since s390 already supports
query-cpu-model-expansion. It would cause feature=on|off not to be used
on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal,
is it?

Jirka



Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Thomas Huth
On 16.11.2016 13:37, Greg Kurz wrote:
> On Wed, 16 Nov 2016 12:24:50 +
> "Dr. David Alan Gilbert"  wrote:
> 
>> * Greg Kurz (gr...@kaod.org) wrote:
>>> On Wed, 16 Nov 2016 09:39:31 +0100
>>> Thomas Huth  wrote:
>>>   
 The ppc64 postcopy test does not work with KVM-PR, and it is also
 causing annoying warning messages when run on a x86 host. So let's
 use KVM here only if we know that we're running with KVM-HV (which
 automatically also means that we're running on a ppc64 host), and
 fall back to TCG otherwise.
   
>>>
>>> This patch addresses two issues actually:
>>> - the annoying warning when running on a ppc64 guest on a non-ppc64 host
>>> - the fact that KVM-PR seems to be currently broken
>>>
>>> I agree that the former makes sense, but what about the case of running
>>> a x86 guest on a non-x86 host ?

Of course you also get these '"kvm" accelerator not found' messages
there. But so far, I think nobody complained about that yet (only for
ppc64 running on x86). And at least the test succeeds there - unlike
with KVM-PR, where the test fails completely.

>>> I'm still feeling uncomfortable with the KVM-PR case... is this a workaround
>>> we want to keep until we find out what's going on or are we starting to
>>> partially deprecate KVM PR ? In any case, I guess we should document this
>>> and probably print some meaningful error message.  
>>
>> This is certainly a work around for now, it doesn't suggest anything about
>> deprecation.
> 
> Well it doesn't suggest anything actually, it just silently skips KVM PR...
> I would at least expect a comment in the code mentioning this is a
> workaround and maybe an explicit warning for the user. If the user really
> wants to run this test with KVM on ppc64, then she should ensure it is
> KVM HV.

Honestly, also considering the number of patches that Laurent already
wrote here and never have been accepted, all this has become quite an
ugly bike-shed painting discussion.

My opinion:

- If we want to properly test KVM (be it KVM-HV or KVM-PR), write
  a proper kvm-unit-test instead. I.e. I personally don't care if this
  test in QEMU is only run with TCG or with KVM.

- The current status of "make check" is broken, since it does not
  work on KVM-PR. We've got to fix that before the release.

That means I currently really don't care if we've spill out a warning
message for KVM-PR here or not - sure, somebody just got to look at
KVM-PR later, but that's IMHO off-topic for the test here in the QEMU
context.

So if you think that the patch for fixing this issue here with the QEMU
test should look differently, please propose a different patch instead.
I'm fine with every other approach as long as we get this fixed in time
for QEMU 2.8.

 Thomas




Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Michael S. Tsirkin
On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote:
> 
> > If the consensus is that the patch is a QEMU bugfix (as opposed to a
> > feature) and that it is eligible for the currently supported upstream
> > stable branches, that's the best, no doubt.
> 
> The currently supported upstream stable branches is just 2.7. :)
> 
> I'm okay with bending the rules and including it in 2.8, but it's
> worrisome that you also needed to go back from relaxed to traditional
> delivery, meaning that old QEMU + new OVMF will take ages to boot.
> 
> If this is the case, I still think this needs some kind of discovery
> mechanism, unless OVMF can just say "things were too broken, stop
> supporting SMM on QEMUs older than 2.8".
> 
> For example:
> 
> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
> setting is used for the PCD; this would be backwards compatibility mode.
> 
> - we could have another magic 0xB2 value, which is implemented directly
> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> to detect the new feature.  It can fail to start if using traditional
> AP and the new feature is not there.

If we keep collecting these magic values, should architect it
and do a host/guest bitmap like virtio does?

> By the way, in case OVMF needs to use SmmSwDispatch in the future, I
> would make QEMU use broadcast behavior for all values in the 0x10-0xff
> range, or something like that.
> 
> Paolo

It bothers me with all these ideas is that it's PV.
Unavoidable?

> > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
> > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
> > correct; when I was writing the OVMF docs, I must have misunderstood the
> > requirements and needlessly required 2.5+; 2.4+ should have been fine.)
> > 
> > Which means the fix should be backported as far as stable-2.4.
> > 
> > Should we proceed with that? CC'ing Mike Roth and the stable list.
> > 
> > Thanks!
> > Laszlo
> > 
> > > 
> > > 
> > >>>
> > >>> Paolo
> > >>>
> >  ---
> >   hw/isa/lpc_ich9.c | 12 +++-
> >   1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> >  diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> >  index 10d1ee8b9310..f2fe644fdaa4 100644
> >  --- a/hw/isa/lpc_ich9.c
> >  +++ b/hw/isa/lpc_ich9.c
> >  @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool
> >  smm_enabled)
> >   
> >   /* APM */
> >   
> >  +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
> >  +
> >   static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
> >   {
> >   ICH9LPCState *lpc = arg;
> >  @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
> >  void *arg)
> >   
> >   /* SMI_EN = PMBASE + 30. SMI control and enable register */
> >   if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
> >  -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
> >  +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
> >  +CPUState *cs;
> >  +
> >  +CPU_FOREACH(cs) {
> >  +cpu_interrupt(cs, CPU_INTERRUPT_SMI);
> >  +}
> >  +} else {
> >  +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
> >  +}
> >   }
> >   }
> >   
> > 
> > 
> > 



Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters

2016-11-16 Thread Pavel Dovgalyuk
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> > I've investigated this issue.
> > This command line works ok:
> >  -drive
> >  
> > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > And this does not:
> >  -drive
> >
> driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis
> k.qcow
> > ,id=img-blkreplay
> >  -device ide-hd,drive=img-blkreplay
> >
> > QEMU hangs at some moment of replay.
> >
> > I found that some dma requests do not pass through the blkreplay driver
> > due to the following line in block-backend.c:
> > return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags);
> >
> > This line passes read request directly to qcow driver and blkreplay cannot
> > process it to make deterministic.
> 
> I don't understand, blk->root should be the blkreplay here.

I've got some more logs. I used the disk image which references the backing 
file.
It seems that some weird things happen with both command lines.

== For the first command line (blkreplay separated from image):
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay1)
 -> bdrv_co_preadv(blkreplay, temp_overlay)
 -> bdrv_co_preadv(qcow2, temp_overlay2)
 -> bdrv_co_preadv(qcow2, image_overlay)
 -> bdrv_co_preadv(qcow2, image_backing)
 -> bdrv_co_preadv(file, image_backing)

But sometimes it changes to:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay1)
 -> bdrv_co_preadv(file, temp_overlay1)

== For the second command line (blkreplay combined with image):

In most cases we have the following call stack:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp_overlay)
 -> bdrv_co_preadv(blkreplay, image_overlay)
 -> bdrv_co_preadv(qcow2, image_overlay)
 -> bdrv_co_preadv(qcow2, image_backing)
 -> bdrv_co_preadv(file, image_backing)

But sometimes it changes to:
blk_co_preadv(img-blkreplay)
 -> bdrv_co_preadv(qcow2, temp overlay)
 -> bdrv_co_preadv(file, temp overlay)



It seems, that temporary overlay is created over blkreplay, which
it intended to work as a simple filter. Is that correct?

Pavel Dovgalyuk




Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-11-16 Thread Michael S. Tsirkin
On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> On Thu, 10 Nov 2016 21:20:36 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:  
> > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > "Michael S. Tsirkin"  wrote:
> > > > > 
> > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > > > > > > From: "Aviv Ben-David" 
> > > > > > > 
> > > > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > > > >   This capability is controlled by "cache-mode" property of 
> > > > > > > intel-iommu device.
> > > > > > >   To enable this option call QEMU with "-device 
> > > > > > > intel-iommu,cache-mode=true".
> > > > > > > 
> > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain 
> > > > > > > belong to
> > > > > > >   registered notifier, and notify accordingly.  
> > > > > > 
> > > > > > This looks sane I think. Alex, care to comment?
> > > > > > Merging will have to wait until after the release.
> > > > > > Pls remember to re-test and re-ping then.
> > > > > 
> > > > > I don't think it's suitable for upstream until there's a reasonable
> > > > > replay mechanism
> > > > 
> > > > Could you pls clarify what do you mean by replay?
> > > > Is this when you attach a device by hotplug to
> > > > a running system?
> > > > 
> > > > If yes this can maybe be addressed by disabling hotplug temporarily.  
> > > 
> > > No, hotplug is not required, moving a device between existing domains
> > > requires replay, ie. actually using it for nested device assignment.  
> > 
> > Good point, that one is a correctness thing. Aviv,
> > could you add this in TODO list in a cover letter pls?
> > 
> > > > > and we straighten out whether it's expected to get
> > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > them or if the notif-er should do filtering.
> > > > 
> > > > OK this is a documentation thing.  
> > > 
> > > Well no, it needs to be decided and if necessary implemented.  
> > 
> > Let's assume it's the notif-ee for now. Less is more and all that.
> 
> I think this is opposite of the approach dwg suggested.
>  
> > > > >  Without those, this is
> > > > > effectively just an RFC.
> > > > 
> > > > It's infrastructure without users so it doesn't break things,
> > > > I'm more interested in seeing whether it's broken in
> > > > some way than whether it's complete.  
> > > 
> > > If it allows use with vfio but doesn't fully implement the complete set
> > > of interfaces, it does break things.  We currently prevent viommu usage
> > > with vfio because it is incomplete.  
> > 
> > Right - that bit is still in as far as I can see.
> 
> Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> vfio even though it's still incomplete.  We would at least need
> something like a replay callback for VT-d that triggers an abort if you
> still want to accept it incomplete.  Thanks,
> 
> Alex

IIUC practically things seems to work, right?
So how about disabling by default with a flag for people that want to
experiment with it?
E.g. x-vfio-allow-broken-translations ?

I would like to help this make progress such that 1. Aviv
gets the credit he did so far and 2. more people can join
development and help complete it.

> > > > The patchset spent out of tree too long and I'd like to see
> > > > us make progress towards device assignment working with
> > > > vIOMMU sooner rather than later, so if it's broken I won't
> > > > merge it but if it's incomplete I will.  
> > > 
> > > So long as it's incomplete and still prevents vfio usage, I'm ok with
> > > merging it, but I don't want to enable vfio usage until it's complete.
> > > Thanks,
> > > 
> > > Alex
> > >   
> > > > > > > Currently this patch still doesn't enabling VFIO devices support 
> > > > > > > with vIOMMU 
> > > > > > > present. Current problems:
> > > > > > > * vfio_iommu_map_notify is not aware about memory range belong to 
> > > > > > > specific 
> > > > > > >   VFIOGuestIOMMU.
> > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it 
> > > > > > > itterate over 
> > > > > > >   64bit address space. Commenting out the call to this function 
> > > > > > > enables 
> > > > > > >   workable VFIO device while vIOMMU present.
> > > > > > > * vfio_iommu_map_notify should check if address space range is 
> > > > > > > suitable for 
> > > > > > >   current notifier.
> > > > > > > 
> > > > > > > Changes from v1 to v2:
> > > > > > > * remove assumption that the cache do not clears
> > > > > > > * fix lockup on high load.
> > > > > > > 
> > > > > > > Changes from v2 to v3:
> > > > > > > * remove debug leftovers
> > > > > > > * split to sepearate commits
> > > > > > > * change is_write to flags in vtd_do_iommu_translate, add 
> > >

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Paolo Bonzini


On 16/11/2016 14:18, Michael S. Tsirkin wrote:
> > - we could have another magic 0xB2 value, which is implemented directly
> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> > to detect the new feature.  It can fail to start if using traditional
> > AP and the new feature is not there.
> 
> If we keep collecting these magic values, should architect it
> and do a host/guest bitmap like virtio does?

The value written in 0xB3 can certainly be a feature bitmap.  For now we
would have for example

bit 0   if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI
bit 1-7 zero

Paolo



Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support

2016-11-16 Thread Eduardo Habkost
On Wed, Nov 16, 2016 at 02:15:02PM +0100, Jiri Denemark wrote:
> On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote:
> > CCing qemu-devel.
> > 
> > CCing Markus, in case he has any insights about the interface
> > introspection.
> > 
> > On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote:
> > > On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote:
> > > > On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote:
> > > > > cpu features are passed to the qemu command with feature=on/off
> > > > > instead of +/-feature.
> > > > > 
> > > > > Signed-off-by: Collin L. Walling 
> > > > 
> > > > If I'm not mistaken, the "feature=on|off" syntax was added on
> > > > QEMU 2.0.0. Does current libvirt support older QEMU versions?
> > > 
> > > Of course it does. I'd love to switch to feature=on|off, but how can we
> > > check if QEMU supports it? We can't really start using this syntax
> > > without it.
> > 
> > Actually, I was wrong, this was added in v2.4.0. "feat=on|off"
> > needs two things to work (in x86):
> > 
> > * Translation of all "foo=bar" options to QOM property setting.
> >   This was added in v2.0.0-rc0~162^2
> > * The actual QOM properties for feature names to be present. They
> >   were added in v2.4.0-rc0~101^2~1
> > 
> > So you can be sure "feat=on" is supported by checking if the
> > feature flags are present in device-list-properties output for
> > the CPU model. But device-list-properties is also messy[1].
> > 
> > Maybe we can use the availability of query-cpu-model-expansion to
> > check if we can safely use the new "feat=on|off" system? It's
> > easier than taking all the variables above into account.
> 
> Yeah, this could work since s390 already supports
> query-cpu-model-expansion. It would cause feature=on|off not to be used
> on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal,
> is it?

Not a problem, as we have no plans to remove +feat/-feat support
in x86 anymore.

-- 
Eduardo



Re: [Qemu-devel] [PATCH for-2.8 v3 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Eduardo Habkost
On Wed, Nov 16, 2016 at 02:04:41PM +0100, Igor Mammedov wrote:
> Signed-off-by: Igor Mammedov 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo



Re: [Qemu-devel] [PATCH] display: cirrus: check vga bits per pixel(bpp) value

2016-11-16 Thread Marc-André Lureau
Hi

On Tue, Oct 18, 2016 at 11:46 AM P J P  wrote:

> From: Prasad J Pandit 
>
> In Cirrus CLGD 54xx VGA Emulator, if cirrus graphics mode is VGA,
> 'cirrus_get_bpp' returns zero(0), which could lead to a divide
> by zero error in while copying pixel data. The same could occur
> via blit pitch values. Add check to avoid it.
>

For completeness, do you have a reproducer and/or a backtrace?


>
> Reported-by: Huawei PSIRT 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/display/cirrus_vga.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
> index 3d712d5..bdb092e 100644
> --- a/hw/display/cirrus_vga.c
> +++ b/hw/display/cirrus_vga.c
> @@ -272,6 +272,9 @@ static void cirrus_update_memory_access(CirrusVGAState
> *s);
>  static bool blit_region_is_unsafe(struct CirrusVGAState *s,
>int32_t pitch, int32_t addr)
>  {
> +if (!pitch) {
> +return true;
> +}
>

That doesn't look directly related to 'cirrus_get_bpp', care to explain?

 if (pitch < 0) {
>  int64_t min = addr
>  + ((int64_t)s->cirrus_blt_height-1) * pitch;
> @@ -715,7 +718,7 @@ static int
> cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s)
>  s->cirrus_addr_mask));
>  }
>
> -static void cirrus_do_copy(CirrusVGAState *s, int dst, int src, int w,
> int h)
> +static int cirrus_do_copy(CirrusVGAState *s, int dst, int src, int w, int
> h)
>  {
>  int sx = 0, sy = 0;
>  int dx = 0, dy = 0;
> @@ -729,6 +732,9 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst,
> int src, int w, int h)
>  int width, height;
>
>  depth = s->vga.get_bpp(&s->vga) / 8;
> +if (!depth) {
> +return 0;
> +}
>

Makes sense, since 'cirrus_get_bpp' would return 0 in VGA mode. But isn't
this a cirrus operation (not VGA), how did it get there? Perhaps this
should be catched earlier (invalid VGA operations).

 s->vga.get_resolution(&s->vga, &width, &height);
>
>  /* extra x, y */
> @@ -783,6 +789,8 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst,
> int src, int w, int h)
>  cirrus_invalidate_region(s, s->cirrus_blt_dstaddr,
> s->cirrus_blt_dstpitch,
> s->cirrus_blt_width,
> s->cirrus_blt_height);
> +
> +return 1;
>  }
>
>  static int cirrus_bitblt_videotovideo_copy(CirrusVGAState * s)
> @@ -790,11 +798,9 @@ static int
> cirrus_bitblt_videotovideo_copy(CirrusVGAState * s)
>  if (blit_is_unsafe(s))
>  return 0;
>
> -cirrus_do_copy(s, s->cirrus_blt_dstaddr - s->vga.start_addr,
> +return cirrus_do_copy(s, s->cirrus_blt_dstaddr - s->vga.start_addr,
>  s->cirrus_blt_srcaddr - s->vga.start_addr,
>  s->cirrus_blt_width, s->cirrus_blt_height);
> -
> -return 1;
>

btw, not directly related to your patch, but the code looks strange in
cirrus_bitblt_videotovideo(), cirrus_bitblt_reset() is called if(ret), and
later if (!ret) in cirrus_bitblt_start(), that looks a bit weird, but it
may be fine.

I hope someone more familiar with the code can help review your patch.

Thanks


 }
>
>  /***
> --
> 2.7.4
>
>
> --
Marc-André Lureau


[Qemu-devel] [PULL 2/3] fw_cfg: move FW_CFG_NB_CPUS out of fw_cfg_init1()

2016-11-16 Thread Eduardo Habkost
From: Igor Mammedov 

PC will use this field in other way, so move it outside the common
code so PC could set a different value, i.e. all CPUs
regardless of where they are coming from (-smp X | -device cpu...).

It's quick and dirty hack as it could be implemented in more generic
way in MashineClass. But do it in simple way since only PC is affected
so far.

Later we can generalize it when another affected target gets support
for -device cpu.

Signed-off-by: Igor Mammedov 
Message-Id: <1479212236-183810-3-git-send-email-imamm...@redhat.com>
Reviewed-by: Eduardo Habkost 
Signed-off-by: Eduardo Habkost 
---
 hw/arm/virt.c | 4 +++-
 hw/i386/pc.c  | 2 ++
 hw/nvram/fw_cfg.c | 1 -
 hw/ppc/mac_newworld.c | 1 +
 hw/ppc/mac_oldworld.c | 1 +
 hw/sparc/sun4m.c  | 1 +
 hw/sparc64/sun4u.c| 1 +
 7 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 54a8b28..d04e4ac 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -929,9 +929,11 @@ static void create_fw_cfg(const VirtBoardInfo *vbi, 
AddressSpace *as)
 {
 hwaddr base = vbi->memmap[VIRT_FW_CFG].base;
 hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
+FWCfgState *fw_cfg;
 char *nodename;
 
-fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as);
+fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 
 nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
 qemu_fdt_add_subnode(vbi->fdt, nodename);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c227ead..e8757b4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -744,6 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
PCMachineState *pcms)
 int i, j;
 
 fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 
 /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
  *
@@ -1341,6 +1342,7 @@ void xen_load_linux(PCMachineState *pcms)
 assert(MACHINE(pcms)->kernel_filename != NULL);
 
 fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 rom_set_fw(fw_cfg);
 
 load_linux(pcms, fw_cfg);
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 1f0c3e9..3ebecb2 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -884,7 +884,6 @@ static void fw_cfg_init1(DeviceState *dev)
 fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
 fw_cfg_add_bytes(s, FW_CFG_UUID, &qemu_uuid, 16);
 fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)!machine->enable_graphics);
-fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu);
 fw_cfg_bootsplash(s);
 fw_cfg_reboot(s);
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 7d25106..2bfdb64 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -466,6 +466,7 @@ static void ppc_core99_init(MachineState *machine)
 /* No PCI init: the BIOS will do it */
 
 fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch);
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 4479487..56282c5 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -319,6 +319,7 @@ static void ppc_heathrow_init(MachineState *machine)
 /* No PCI init: the BIOS will do it */
 
 fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW);
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 6224288..f5b6efd 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -1033,6 +1033,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
  hwdef->ecc_version);
 
 fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef->machine_id);
diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index 271d8bc..4663315 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -855,6 +855,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
(uint8_t *)&nd_table[0].macaddr);
 
 fw_cfg = fw_cfg_init_io(BIOS_CFG_IOPORT);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
 fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus);
 fw_cfg

[Qemu-devel] [PULL 1/3] Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs"

2016-11-16 Thread Eduardo Habkost
From: Igor Mammedov 

This reverts commit 080ac219cc7d9c55adf925c3545b7450055ad625.

Legacy FW_CFG_NB_CPUS will be reused instead of 'etc/boot-cpus'
fw_cfg file since it does the same and there is no point
to maintaing duplicate guest ABI, if it can be helped.

Signed-off-by: Igor Mammedov 
Message-Id: <1479212236-183810-2-git-send-email-imamm...@redhat.com>
Reviewed-by: Eduardo Habkost 
Signed-off-by: Eduardo Habkost 
---
 hw/i386/pc.c | 44 +++-
 include/hw/i386/pc.h |  2 --
 2 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index a9b1950..c227ead 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1086,6 +1086,17 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
+static int pc_present_cpus_count(PCMachineState *pcms)
+{
+int i, boot_cpus = 0;
+for (i = 0; i < pcms->possible_cpus->len; i++) {
+if (pcms->possible_cpus->cpus[i].cpu) {
+boot_cpus++;
+}
+}
+return boot_cpus;
+}
+
 static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id,
   Error **errp)
 {
@@ -1222,19 +1233,6 @@ static void pc_build_feature_control_file(PCMachineState 
*pcms)
 fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, 
sizeof(*val));
 }
 
-static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
-{
-if (cpus_count > 0xff) {
-/* If the number of CPUs can't be represented in 8 bits, the
- * BIOS must use "etc/boot-cpus". Set RTC field to 0 just
- * to make old BIOSes fail more predictably.
- */
-rtc_set_memory(rtc, 0x5f, 0);
-} else {
-rtc_set_memory(rtc, 0x5f, cpus_count - 1);
-}
-}
-
 static
 void pc_machine_done(Notifier *notifier, void *data)
 {
@@ -1243,7 +1241,7 @@ void pc_machine_done(Notifier *notifier, void *data)
 PCIBus *bus = pcms->bus;
 
 /* set the number of CPUs */
-rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le));
+rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1);
 
 if (bus) {
 int extra_hosts = 0;
@@ -1264,15 +1262,8 @@ void pc_machine_done(Notifier *notifier, void *data)
 
 acpi_setup();
 if (pcms->fw_cfg) {
-MachineClass *mc = MACHINE_GET_CLASS(pcms);
-
 pc_build_smbios(pcms->fw_cfg);
 pc_build_feature_control_file(pcms);
-
-if (mc->max_cpus > 255) {
-fw_cfg_add_file(pcms->fw_cfg, "etc/boot-cpus", &pcms->boot_cpus_le,
-sizeof(pcms->boot_cpus_le));
-}
 }
 
 if (pcms->apic_id_limit > 255) {
@@ -1819,11 +1810,9 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 }
 }
 
-/* increment the number of CPUs */
-pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) + 1);
 if (dev->hotplugged) {
-/* Update the number of CPUs in CMOS */
-rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le));
+/* increment the number of CPUs */
+rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
 }
 
 found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
@@ -1877,10 +1866,7 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
 found_cpu->cpu = NULL;
 object_unparent(OBJECT(dev));
 
-/* decrement the number of CPUs */
-pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) - 1);
-/* Update the number of CPUs in CMOS */
-rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le));
+rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1);
  out:
 error_propagate(errp, local_err);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 8eb517f..e32e957 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -36,7 +36,6 @@
 /**
  * PCMachineState:
  * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
- * @boot_cpus_le: number of present VCPUs, referenced by 'etc/boot-cpus' fw_cfg
  */
 struct PCMachineState {
 /*< private >*/
@@ -71,7 +70,6 @@ struct PCMachineState {
 bool apic_xrupt_override;
 unsigned apic_id_limit;
 CPUArchIdList *possible_cpus;
-uint16_t boot_cpus_le;
 
 /* NUMA information: */
 uint64_t numa_nodes;
-- 
2.7.4




Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Greg Kurz
On Wed, 16 Nov 2016 14:17:47 +0100
Thomas Huth  wrote:

> On 16.11.2016 13:37, Greg Kurz wrote:
> > On Wed, 16 Nov 2016 12:24:50 +
> > "Dr. David Alan Gilbert"  wrote:
> >   
> >> * Greg Kurz (gr...@kaod.org) wrote:  
> >>> On Wed, 16 Nov 2016 09:39:31 +0100
> >>> Thomas Huth  wrote:
> >>> 
>  The ppc64 postcopy test does not work with KVM-PR, and it is also
>  causing annoying warning messages when run on a x86 host. So let's
>  use KVM here only if we know that we're running with KVM-HV (which
>  automatically also means that we're running on a ppc64 host), and
>  fall back to TCG otherwise.
>  
> >>>
> >>> This patch addresses two issues actually:
> >>> - the annoying warning when running on a ppc64 guest on a non-ppc64 host
> >>> - the fact that KVM-PR seems to be currently broken
> >>>
> >>> I agree that the former makes sense, but what about the case of running
> >>> a x86 guest on a non-x86 host ?  
> 
> Of course you also get these '"kvm" accelerator not found' messages
> there. But so far, I think nobody complained about that yet (only for
> ppc64 running on x86). And at least the test succeeds there - unlike
> with KVM-PR, where the test fails completely.
> 
> >>> I'm still feeling uncomfortable with the KVM-PR case... is this a 
> >>> workaround
> >>> we want to keep until we find out what's going on or are we starting to
> >>> partially deprecate KVM PR ? In any case, I guess we should document this
> >>> and probably print some meaningful error message.
> >>
> >> This is certainly a work around for now, it doesn't suggest anything about
> >> deprecation.  
> > 
> > Well it doesn't suggest anything actually, it just silently skips KVM PR...
> > I would at least expect a comment in the code mentioning this is a
> > workaround and maybe an explicit warning for the user. If the user really
> > wants to run this test with KVM on ppc64, then she should ensure it is
> > KVM HV.  
> 
> Honestly, also considering the number of patches that Laurent already
> wrote here and never have been accepted, all this has become quite an
> ugly bike-shed painting discussion.
> 

Understood. I'm done with the trivial details ;)

> My opinion:
> 
> - If we want to properly test KVM (be it KVM-HV or KVM-PR), write
>   a proper kvm-unit-test instead. I.e. I personally don't care if this
>   test in QEMU is only run with TCG or with KVM.
> 

Agreed.

> - The current status of "make check" is broken, since it does not
>   work on KVM-PR. We've got to fix that before the release.
> 
> That means I currently really don't care if we've spill out a warning
> message for KVM-PR here or not - sure, somebody just got to look at
> KVM-PR later, but that's IMHO off-topic for the test here in the QEMU
> context.
> 
> So if you think that the patch for fixing this issue here with the QEMU
> test should look differently, please propose a different patch instead.
> I'm fine with every other approach as long as we get this fixed in time
> for QEMU 2.8.
> 

The changes to the code look ok and I prefer to spend time chasing the
KVM PR issue rather than arguing on a comment...

Cheers.

--
Greg

>  Thomas
> 




[Qemu-devel] [PULL 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

2016-11-16 Thread Eduardo Habkost
From: Igor Mammedov 

Signed-off-by: Igor Mammedov 
Message-Id: <1479301481-197333-1-git-send-email-imamm...@redhat.com>
Reviewed-by: Eduardo Habkost 
Signed-off-by: Eduardo Habkost 
---
 hw/i386/pc.c | 44 +++-
 include/hw/i386/pc.h |  2 ++
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index e8757b4..a9e64a8 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -744,7 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, 
PCMachineState *pcms)
 int i, j;
 
 fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as);
-fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 
 /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
  *
@@ -1087,17 +1087,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int 
level)
 }
 }
 
-static int pc_present_cpus_count(PCMachineState *pcms)
-{
-int i, boot_cpus = 0;
-for (i = 0; i < pcms->possible_cpus->len; i++) {
-if (pcms->possible_cpus->cpus[i].cpu) {
-boot_cpus++;
-}
-}
-return boot_cpus;
-}
-
 static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id,
   Error **errp)
 {
@@ -1234,6 +1223,19 @@ static void pc_build_feature_control_file(PCMachineState 
*pcms)
 fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, 
sizeof(*val));
 }
 
+static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count)
+{
+if (cpus_count > 0xff) {
+/* If the number of CPUs can't be represented in 8 bits, the
+ * BIOS must use "FW_CFG_NB_CPUS". Set RTC field to 0 just
+ * to make old BIOSes fail more predictably.
+ */
+rtc_set_memory(rtc, 0x5f, 0);
+} else {
+rtc_set_memory(rtc, 0x5f, cpus_count - 1);
+}
+}
+
 static
 void pc_machine_done(Notifier *notifier, void *data)
 {
@@ -1242,7 +1244,7 @@ void pc_machine_done(Notifier *notifier, void *data)
 PCIBus *bus = pcms->bus;
 
 /* set the number of CPUs */
-rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1);
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
 
 if (bus) {
 int extra_hosts = 0;
@@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data)
 if (pcms->fw_cfg) {
 pc_build_smbios(pcms->fw_cfg);
 pc_build_feature_control_file(pcms);
+/* update FW_CFG_NB_CPUS to account for -device added CPUs */
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 }
 
 if (pcms->apic_id_limit > 255) {
@@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms)
 assert(MACHINE(pcms)->kernel_filename != NULL);
 
 fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE);
-fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
+fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 rom_set_fw(fw_cfg);
 
 load_linux(pcms, fw_cfg);
@@ -1812,9 +1816,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev,
 }
 }
 
+/* increment the number of CPUs */
+pcms->boot_cpus++;
 if (dev->hotplugged) {
-/* increment the number of CPUs */
-rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1);
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
 }
 
 found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL);
@@ -1868,7 +1874,11 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev,
 found_cpu->cpu = NULL;
 object_unparent(OBJECT(dev));
 
-rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1);
+/* decrement the number of CPUs */
+pcms->boot_cpus--;
+/* Update the number of CPUs in CMOS */
+rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus);
+fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus);
  out:
 error_propagate(errp, local_err);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index e32e957..67a1a9e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -36,6 +36,7 @@
 /**
  * PCMachineState:
  * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
+ * @boot_cpus: number of present VCPUs
  */
 struct PCMachineState {
 /*< private >*/
@@ -70,6 +71,7 @@ struct PCMachineState {
 bool apic_xrupt_override;
 unsigned apic_id_limit;
 CPUArchIdList *possible_cpus;
+uint16_t boot_cpus;
 
 /* NUMA information: */
 uint64_t numa_nodes;
-- 
2.7.4




[Qemu-devel] [PULL 0/3] pc: remove redundant fw_cfg file "etc/boot-cpus"

2016-11-16 Thread Eduardo Habkost
Unfortunately not in time for -rc0, but we still want to remove
"etc/boot-cpus" before 2.8.0 is released.

The following changes since commit b0bcc86d2a87456f5a276f941dc775b265b309cf:

  Update version for v2.8.0-rc0 release (2016-11-15 20:55:12 +)

are available in the git repository at:

  git://github.com/ehabkost/qemu.git tags/machine-pull-request

for you to fetch changes up to e3cadac073a99489df1627be56c3f487f5cb9e31:

  pc: fix FW_CFG_NB_CPUS to account for -device added CPUs (2016-11-16 12:10:00 
-0200)


pc: remove redundant fw_cfg file "etc/boot-cpus"



Igor Mammedov (3):
  Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than
255 CPUs"
  fw_cfg: move FW_CFG_NB_CPUS out of fw_cfg_init1()
  pc: fix FW_CFG_NB_CPUS to account for -device added CPUs

 hw/arm/virt.c |  4 +++-
 hw/i386/pc.c  | 26 --
 hw/nvram/fw_cfg.c |  1 -
 hw/ppc/mac_newworld.c |  1 +
 hw/ppc/mac_oldworld.c |  1 +
 hw/sparc/sun4m.c  |  1 +
 hw/sparc64/sun4u.c|  1 +
 include/hw/i386/pc.h  |  4 ++--
 8 files changed, 21 insertions(+), 18 deletions(-)

-- 
2.7.4




Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV

2016-11-16 Thread Greg Kurz
On Wed, 16 Nov 2016 09:39:31 +0100
Thomas Huth  wrote:

> The ppc64 postcopy test does not work with KVM-PR, and it is also
> causing annoying warning messages when run on a x86 host. So let's
> use KVM here only if we know that we're running with KVM-HV (which
> automatically also means that we're running on a ppc64 host), and
> fall back to TCG otherwise.
> 
> Signed-off-by: Thomas Huth 
> ---

FWIW

Reviewed-by: Greg Kurz 

>  tests/postcopy-test.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c
> index d6613c5..dafe8be 100644
> --- a/tests/postcopy-test.c
> +++ b/tests/postcopy-test.c
> @@ -380,17 +380,21 @@ static void test_migrate(void)
>" -incoming %s",
>tmpfs, bootpath, uri);
>  } else if (strcmp(arch, "ppc64") == 0) {
> +const char *accel;
> +
> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */
> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg";
>  init_bootfile_ppc(bootpath);
> -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +cmd_src = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcsource,debug-threads=on"
>" -serial file:%s/src_serial"
>" -drive file=%s,if=pflash,format=raw",
> -  tmpfs, bootpath);
> -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M"
> +  accel, tmpfs, bootpath);
> +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M"
>" -name pcdest,debug-threads=on"
>" -serial file:%s/dest_serial"
>" -incoming %s",
> -  tmpfs, uri);
> +  accel, tmpfs, uri);
>  } else {
>  g_assert_not_reached();
>  }




[Qemu-devel] QMP event on reboot when -no-reboot is set

2016-11-16 Thread Dirk Braunschweiger

Hey Guys,

I want to get a qmp event when the qemu does a shutdown due to the 
-no-reboot flag. Looking at the code I realized that the -no-reboot flag 
just changes any reset request to a shutdown request.
Does anybody already patched qemu to emit some kind of reboot event to 
the qmp socket?


If no one already patched it, would you accept such a patch? Or is a 
non-wanted feature?


Best regards,
Dirk Braunschweiger



[Qemu-devel] [PATCH v2] HACKING: document #include order

2016-11-16 Thread Stefan Hajnoczi
It was not obvious to me why "qemu/osdep.h" must be the first #include.
This documents the rationale and the overall #include order.

Cc: Fam Zheng 
Cc: Markus Armbruster 
Cc: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 HACKING | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/HACKING b/HACKING
index 20a9101..4125c97 100644
--- a/HACKING
+++ b/HACKING
@@ -1,10 +1,28 @@
 1. Preprocessor
 
+1.1. Variadic macros
+
 For variadic macros, stick with this C99-like syntax:
 
 #define DPRINTF(fmt, ...)   \
 do { printf("IRQ: " fmt, ## __VA_ARGS__); } while (0)
 
+1.2. Include directives
+
+Order include directives as follows:
+
+#include "qemu/osdep.h"  /* Always first... */
+#include <...>   /* then system headers... */
+#include "..."   /* and finally QEMU headers. */
+
+The "qemu/osdep.h" header contains preprocessor macros that affect the behavior
+of core system headers like .  It must be the first include so that
+core system headers included by external libraries get the preprocessor macros
+that QEMU depends on.
+
+Do not include "qemu/osdep.h" from header files since the .c file will have
+already included it.
+
 2. C types
 
 It should be common sense to use the right type, but we have collected
-- 
2.7.4




Re: [Qemu-devel] [PATCH v2] HACKING: document #include order

2016-11-16 Thread Eric Blake
On 11/16/2016 08:39 AM, Stefan Hajnoczi wrote:
> It was not obvious to me why "qemu/osdep.h" must be the first #include.
> This documents the rationale and the overall #include order.
> 
> Cc: Fam Zheng 
> Cc: Markus Armbruster 
> Cc: Eric Blake 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  HACKING | 18 ++
>  1 file changed, 18 insertions(+)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v13 02/22] vfio: VFIO based driver for Mediated devices

2016-11-16 Thread Kirti Wankhede


On 11/16/2016 7:59 AM, Dong Jia Shi wrote:
> * Kirti Wankhede  [2016-11-15 20:59:45 +0530]:
> 
> Hi Kirti,
> 
>> vfio_mdev driver registers with mdev core driver.
>> mdev core driver creates mediated device and calls probe routine of
>> vfio_mdev driver for each device.
>> Probe routine of vfio_mdev driver adds mediated device to VFIO core module
>>
>> This driver forms a shim layer that pass through VFIO devices operations
>> to vendor driver for mediated devices.
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Reviewed-by: Jike Song 
>>
>> Change-Id: I583f4734752971d3d112324d69e2508c88f359ec
>> ---
>>  drivers/vfio/mdev/Kconfig |   7 ++
>>  drivers/vfio/mdev/Makefile|   1 +
>>  drivers/vfio/mdev/mdev_core.c |  16 -
>>  drivers/vfio/mdev/vfio_mdev.c | 148 
>> ++
>>  4 files changed, 171 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>>
>> diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
>> index 258481d65ebd..1aa0391d74f2 100644
>> --- a/drivers/vfio/mdev/Kconfig
>> +++ b/drivers/vfio/mdev/Kconfig
>> @@ -7,3 +7,10 @@ config VFIO_MDEV
>>Provides a framework to virtualize devices.
>>
>>If you don't know what do here, say N.
>> +
>> +config VFIO_MDEV_DEVICE
>> +tristate "VFIO support for Mediated devices"
>
> 
>> +depends on VFIO && VFIO_MDEV
>> +default n
>> +help
>> +  VFIO based driver for mediated devices.
> 
> nit:
> s/mediated/Mediated/
> 
> I saw in many places you use the term "Mediated device", so I guess this
> is what you preferred to name them.
> 
>> diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
>> index 31bc04801d94..fa2d5ea466ee 100644
>> --- a/drivers/vfio/mdev/Makefile
>> +++ b/drivers/vfio/mdev/Makefile
>> @@ -2,3 +2,4 @@
>>  mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
>>
>>  obj-$(CONFIG_VFIO_MDEV) += mdev.o
>> +obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index 613e8a8a3b2a..1e0714ebc56a 100644
>> --- a/drivers/vfio/mdev/mdev_core.c
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -354,7 +354,21 @@ int mdev_device_remove(struct device *dev, bool 
>> force_remove)
>>
>>  static int __init mdev_init(void)
>>  {
>> -return mdev_bus_register();
>> +int ret;
>> +
>> +ret = mdev_bus_register();
>> +if (ret) {
>> +pr_err("Failed to register mdev bus\n");
> If you want to report an error message here, you should do it in a
> previous patch where you introduce the call for mdev_bus_register.
> 

Removing this error message.

>> +return ret;
>> +}
>> +
>> +/*
>> + * Attempt to load known vfio_mdev.  This gives us a working environment
>> + * without the user needing to explicitly load vfio_mdev driver.
>> + */
>> +request_module_nowait("vfio_mdev");
>> +
>> +return ret;
>>  }
>>
>>  static void __exit mdev_exit(void)
> [...]
> 
> Please:
> Reviewed-by: Dong Jia Shi 
> 

Thanks.




Re: [Qemu-devel] [PATCH v13 09/22] vfio iommu type1: Add task structure to vfio_dma

2016-11-16 Thread Kirti Wankhede


On 11/16/2016 11:36 AM, Dong Jia Shi wrote:
> * Kirti Wankhede  [2016-11-15 20:59:52 +0530]:
> 
> Hi Kirti,
> 
> [...]
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> 
>> @@ -331,13 +338,16 @@ static long vfio_pin_pages_remote(unsigned long vaddr, 
>> long npage,
>>  }
>>
>>  if (!rsvd)
>> -vfio_lock_acct(current, i);
>> +vfio_lock_acct(dma->task, i);
>> +ret = i;
>>
>> -return i;
>> +pin_pg_remote_exit:
> out_mmput sounds a better name to me.
> 
>> +mmput(mm);
>> +return ret;
>>  }
>>
> [...]
> 
>> @@ -510,6 +521,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>  while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) {
>>  if (!iommu->v2 && unmap->iova > dma->iova)
>>  break;
>> +/*
>> + * Task with same address space who mapped this iova range is
>> + * allowed to unmap the iova range.
>> + */
>> +if (dma->task->mm != current->mm)
> How about:
>   if (dma->task != current)
> 

As I mentioned in comment above this and commit description, if a
process calls DMA_MAP, forks a thread and then child thread calls
DMA_UNMAP, this should be allowed since address space is same for parent
process and child. QEMU also works that way.

>> +break;
>>  unmapped += dma->size;
>>  vfio_remove_dma(iommu, dma);
>>  }
>> @@ -576,17 +593,55 @@ unwind:
>>  return ret;
>>  }
>>
>> +static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
>> +size_t map_size)
> Do you factor out this function for future usage?
> I didn't find the other callers.
>

This is pulled out to make caller simple and short. Otherwise
vfio_dma_do_map() would have become a long function.


>> +{
>> +dma_addr_t iova = dma->iova;
>> +unsigned long vaddr = dma->vaddr;
>> +size_t size = map_size;
>> +long npage;
>> +unsigned long pfn;
>> +int ret = 0;
>> +
>> +while (size) {
>> +/* Pin a contiguous chunk of memory */
>> +npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
>> +  size >> PAGE_SHIFT, dma->prot,
>> +  &pfn);
>> +if (npage <= 0) {
>> +WARN_ON(!npage);
>> +ret = (int)npage;
>> +break;
>> +}
>> +
>> +/* Map it! */
>> +ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
>> + dma->prot);
>> +if (ret) {
>> +vfio_unpin_pages_remote(dma, pfn, npage,
>> + dma->prot, true);
>> +break;
>> +}
>> +
>> +size -= npage << PAGE_SHIFT;
>> +dma->size += npage << PAGE_SHIFT;
>> +}
>> +
>> +if (ret)
>> +vfio_remove_dma(iommu, dma);
>> +
>> +return ret;
>> +}
>> +
>>  static int vfio_dma_do_map(struct vfio_iommu *iommu,
>> struct vfio_iommu_type1_dma_map *map)
>>  {
>>  dma_addr_t iova = map->iova;
>>  unsigned long vaddr = map->vaddr;
>>  size_t size = map->size;
>> -long npage;
>>  int ret = 0, prot = 0;
>>  uint64_t mask;
>>  struct vfio_dma *dma;
>> -unsigned long pfn;
>>
>>  /* Verify that none of our __u64 fields overflow */
>>  if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>> @@ -612,47 +667,27 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  mutex_lock(&iommu->lock);
>>
>>  if (vfio_find_dma(iommu, iova, size)) {
>> -mutex_unlock(&iommu->lock);
>> -return -EEXIST;
>> +ret = -EEXIST;
>> +goto do_map_err;
>>  }
>>
>>  dma = kzalloc(sizeof(*dma), GFP_KERNEL);
>>  if (!dma) {
>> -mutex_unlock(&iommu->lock);
>> -return -ENOMEM;
>> +ret = -ENOMEM;
>> +goto do_map_err;
>>  }
>>
>>  dma->iova = iova;
>>  dma->vaddr = vaddr;
>>  dma->prot = prot;
>> +get_task_struct(current);
>> +dma->task = current;
>>
>>  /* Insert zero-sized and grow as we map chunks of it */
>>  vfio_link_dma(iommu, dma);
>>
>> -while (size) {
>> -/* Pin a contiguous chunk of memory */
>> -npage = vfio_pin_pages_remote(vaddr + dma->size,
>> -  size >> PAGE_SHIFT, prot, &pfn);
>> -if (npage <= 0) {
>> -WARN_ON(!npage);
>> -ret = (int)npage;
>> -break;
>> -}
>> -
>> -/* Map it! */
>> -ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, prot);
>> -if (ret) {
>> -vfio_unpin_pages_remote(pfn, npage, prot, true);
>> -  

Re: [Qemu-devel] [PATCH v13 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops

2016-11-16 Thread Kirti Wankhede


On 11/16/2016 8:33 AM, Dong Jia Shi wrote:
> * Kirti Wankhede  [2016-11-15 20:59:48 +0530]:
> 
> Hi Kirti,
> 
>> Added APIs for pining and unpining set of pages. These call back into
>> backend iommu module to actually pin and unpin pages.
>> Added two new callback functions to struct vfio_iommu_driver_ops. Backend
>> IOMMU module that supports pining and unpinning pages for mdev devices
>> should provide these functions.
>>
>> Renamed static functions in vfio_type1_iommu.c to resolve conflicts
>>
>> Signed-off-by: Kirti Wankhede 
>> Signed-off-by: Neo Jia 
>> Change-Id: Ia7417723aaae86bec2959ad9ae6c2915ddd340e0
>> ---
>>  drivers/vfio/vfio.c | 103 
>> 
>>  drivers/vfio/vfio_iommu_type1.c |  20 
>>  include/linux/vfio.h|  14 +-
>>  3 files changed, 126 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index 2e83bdf007fe..3bf8a01bf67b 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -1799,6 +1799,109 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, 
>> size_t offset)
>>  }
>>  EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
>>
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for local
>> + * domain only.
>> + * @dev [in] : device
>> + * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of 
>> user/guest
>> + *PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
> Move the second sentence to the @npage section?
> 
>> + * @npage [in] :count of elements in array.  This count should not be 
>> greater
>> + *  than PAGE_SIZE.
> And remove the second sentence here.
> 
>> + * @prot [in] : protection flags
>> + * @phys_pfn[out] : array of host PFNs
> nit:
> I saw three differnt styles here:
>  @xxx [in] :xxx
>  @xxx [in]: xxx
>  @xxx[out]: xxx
> 
> Frankly speeking, I didn't think the [in|out] flags helps much.
> 
>> + * Return error or number of pages pinned.
>> + */
>> +int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage,
>> +   int prot, unsigned long *phys_pfn)
>> +{
>> +struct vfio_container *container;
>> +struct vfio_group *group;
>> +struct vfio_iommu_driver *driver;
>> +int ret;
>> +
>> +if (!dev || !user_pfn || !phys_pfn || !npage)
>> +return -EINVAL;
>> +
>> +if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
>> +return -E2BIG;
>> +
>> +group = vfio_group_get_from_dev(dev);
>> +if (IS_ERR(group))
>> +return PTR_ERR(group);
>> +
>> +ret = vfio_group_add_container_user(group);
>> +if (ret)
>> +goto err_pin_pages;
>> +
>> +container = group->container;
>> +down_read(&container->group_lock);
>> +
>> +driver = container->iommu_driver;
>> +if (likely(driver && driver->ops->pin_pages))
>> +ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
>> + npage, prot, phys_pfn);
>> +else
>> +ret = -ENOTTY;
>> +
>> +up_read(&container->group_lock);
>> +vfio_group_try_dissolve_container(group);
>> +
>> +err_pin_pages:
>> +vfio_group_put(group);
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +
>> +/*
>> + * Unpin set of host PFNs for local domain only.
>> + * @dev [in] : device
>> + * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of 
>> user/guest
>> + *PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
>> + * @npage [in] :count of elements in array.  This count should not be 
>> greater
>> + *  than PAGE_SIZE.
> Same nits as above here.
> 
>> + * Return error or number of pages unpinned.
>> + */
> [...]
> 
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index 0ecae0b1cd34..420cdc928786 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -75,7 +75,11 @@ struct vfio_iommu_driver_ops {
>>  struct iommu_group *group);
>>  void(*detach_group)(void *iommu_data,
>>  struct iommu_group *group);
>> -
>> +int (*pin_pages)(void *iommu_data, unsigned long *user_pfn,
>> + int npage, int prot,
>> + unsigned long *phys_pfn);
>> +int (*unpin_pages)(void *iommu_data,
>> +   unsigned long *user_pfn, int npage);
>>  };
>>
>>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops 
>> *ops);
>> @@ -127,6 +131,14 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
>> iommu_group *group,
>>  }
>>  #endif /* CONFIG_EEH */
>>
>> +#define VFIO_PIN_PAGES_MAX_ENTRIES  (PAGE_SIZE/sizeof(unsigned long))
>> +
>> +extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
>> +  int npage, int prot, unsigned long *phys_pfn);
>> +
>> +extern int vfio_unp

Re: [Qemu-devel] [PATCH v13 12/22] vfio: Add notifier callback to parent's ops structure of mdev

2016-11-16 Thread Kirti Wankhede


On 11/16/2016 12:07 PM, Dong Jia Shi wrote:
> * Kirti Wankhede  [2016-11-15 20:59:55 +0530]:
> 
> Hi Kirti,
> 
> [...]
> 
>> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
>> index ffc36758cb84..4fc63db38829 100644
>> --- a/drivers/vfio/mdev/vfio_mdev.c
>> +++ b/drivers/vfio/mdev/vfio_mdev.c
>> @@ -24,6 +24,15 @@
>>  #define DRIVER_AUTHOR   "NVIDIA Corporation"
>>  #define DRIVER_DESC "VFIO based driver for Mediated device"
>>
>> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long 
>> action,
>> +  void *data)
>> +{
>> +struct mdev_device *mdev = container_of(nb, struct mdev_device, nb);
>> +struct parent_device *parent = mdev->parent;
>> +
>> +return parent->ops->notifier(mdev, action, data);
>> +}
>> +
>>  static int vfio_mdev_open(void *device_data)
>>  {
>>  struct mdev_device *mdev = device_data;
>> @@ -36,9 +45,18 @@ static int vfio_mdev_open(void *device_data)
>>  if (!try_module_get(THIS_MODULE))
>>  return -ENODEV;
>>
>> +if (likely(parent->ops->notifier)) {
>> +mdev->nb.notifier_call = vfio_mdev_notifier;
>> +if (vfio_register_notifier(&mdev->dev, &mdev->nb))
>> +pr_err("Failed to register notifier for mdev\n");
> I think we should just return here if the error value is not -ENOTTY.
> 

It might be the case where iommu backend module might not support
.register_notifier(). In that case vfio_register_notifier() returns
-ENOTTY and that should not fail this open() call
Changing it to:

ret = vfio_register_notifier(&mdev->dev, &mdev->nb);
if (ret && (ret != -ENOTTY)) {
pr_err("Failed to register notifier for mdev\n");
module_put(THIS_MODULE);
return ret;
}

Thanks,
Kirti




Re: [Qemu-devel] [PATCH v13 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP

2016-11-16 Thread Kirti Wankhede


On 11/16/2016 10:06 AM, Alex Williamson wrote:
> On Wed, 16 Nov 2016 09:46:20 +0530
> Kirti Wankhede  wrote:
> 
>> On 11/16/2016 9:28 AM, Alex Williamson wrote:
>>> On Wed, 16 Nov 2016 09:13:37 +0530
>>> Kirti Wankhede  wrote:
>>>   
 On 11/16/2016 8:55 AM, Alex Williamson wrote:  
> On Tue, 15 Nov 2016 20:16:12 -0700
> Alex Williamson  wrote:
> 
>> On Wed, 16 Nov 2016 08:16:15 +0530
>> Kirti Wankhede  wrote:
>>
>>> On 11/16/2016 3:49 AM, Alex Williamson wrote:  
 On Tue, 15 Nov 2016 20:59:54 +0530
 Kirti Wankhede  wrote:
 
>>> ...
>>>   
> @@ -854,7 +857,28 @@ static int vfio_dma_do_unmap(struct vfio_iommu 
> *iommu,
>*/
>   if (dma->task->mm != current->mm)
>   break;
> +
>   unmapped += dma->size;
> +
> + if (iommu->external_domain && 
> !RB_EMPTY_ROOT(&dma->pfn_list)) {
> + struct vfio_iommu_type1_dma_unmap nb_unmap;
> +
> + nb_unmap.iova = dma->iova;
> + nb_unmap.size = dma->size;
> +
> + /*
> +  * Notifier callback would call 
> vfio_unpin_pages() which
> +  * would acquire iommu->lock. Release lock here 
> and
> +  * reacquire it again.
> +  */
> + mutex_unlock(&iommu->lock);
> + blocking_notifier_call_chain(&iommu->notifier,
> + 
> VFIO_IOMMU_NOTIFY_DMA_UNMAP,
> + &nb_unmap);
> + mutex_lock(&iommu->lock);
> + if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)))
> + break;
> + }


 Why exactly do we need to notify per vfio_dma rather than per unmap
 request?  If we do the latter we can send the notify first, limiting us
 to races where a page is pinned between the notify and the locking,
 whereas here, even our dma pointer is suspect once we re-acquire the
 lock, we don't technically know if another unmap could have removed
 that already.  Perhaps something like this (untested):
 
>>>
>>> There are checks to validate unmap request, like v2 check and who is
>>> calling unmap and is it allowed for that task to unmap. Before these
>>> checks its not sure that unmap region range which asked for would be
>>> unmapped all. Notify call should be at the place where its sure that the
>>> range provided to notify call is definitely going to be removed. My
>>> change do that.  
>>
>> Ok, but that does solve the problem.  What about this (untested):
>
> s/does/does not/
>
> BTW, I like how the retries here fill the gap in my previous proposal
> where we could still race re-pinning.  We've given it an honest shot or
> someone is not participating if we've retried 10 times.  I don't
> understand why the test for iommu->external_domain was there, clearly
> if the list is not empty, we need to notify.  Thanks,
> 

 Ok. Retry is good to give a chance to unpin all. But is it really
 required to use BUG_ON() that would panic the host. I think WARN_ON
 should be fine and then when container is closed or when the last group
 is removed from the container, vfio_iommu_type1_release() is called and
 we have a chance to unpin it all.  
>>>
>>> See my comments on patch 10/22, we need to be vigilant that the vendor
>>> driver is participating.  I don't think we should be cleaning up after
>>> the vendor driver on release, if we need to do that, it implies we
>>> already have problems in multi-mdev containers since we'll be left with
>>> pfn_list entries that no longer have an owner.  Thanks,
>>>   
>>
>> If any vendor driver doesn't clean its pinned pages and there are
>> entries in pfn_list with no owner, that would be indicated by WARN_ON,
>> which should be fixed by that vendor driver. I still feel it shouldn't
>> cause host panic.
>> When such warning is seen with multiple mdev devices in container, it is
>> easy to isolate and find which vendor driver is not cleaning their
>> stuff, same warning would be seen with single mdev device in a
>> container. To isolate and find which vendor driver is culprit check with
>> one mdev device at a time.
>> Finally, we have a chance to clean all residue from
>> vfio_iommu_type1_release() so that vfio_iommu_type1 module doesn't leave
>> any leaks.
> 
> How can we claim that we've resolved anything by unpinning the
> residue?  In 

Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-11-16 Thread Alex Williamson
On Wed, 16 Nov 2016 15:54:56 +0200
"Michael S. Tsirkin"  wrote:

> On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 21:20:36 +0200
> > "Michael S. Tsirkin"  wrote:
> >   
> > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:  
> > > > On Thu, 10 Nov 2016 17:54:35 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > > 
> > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > > > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > >   
> > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:  
> > > > > > > > From: "Aviv Ben-David" 
> > > > > > > > 
> > > > > > > > * Advertize Cache Mode capability in iommu cap register. 
> > > > > > > >   This capability is controlled by "cache-mode" property of 
> > > > > > > > intel-iommu device.
> > > > > > > >   To enable this option call QEMU with "-device 
> > > > > > > > intel-iommu,cache-mode=true".
> > > > > > > > 
> > > > > > > > * On page cache invalidation in intel vIOMMU, check if the 
> > > > > > > > domain belong to
> > > > > > > >   registered notifier, and notify accordingly.
> > > > > > > 
> > > > > > > This looks sane I think. Alex, care to comment?
> > > > > > > Merging will have to wait until after the release.
> > > > > > > Pls remember to re-test and re-ping then.  
> > > > > > 
> > > > > > I don't think it's suitable for upstream until there's a reasonable
> > > > > > replay mechanism  
> > > > > 
> > > > > Could you pls clarify what do you mean by replay?
> > > > > Is this when you attach a device by hotplug to
> > > > > a running system?
> > > > > 
> > > > > If yes this can maybe be addressed by disabling hotplug temporarily.  
> > > > >   
> > > > 
> > > > No, hotplug is not required, moving a device between existing domains
> > > > requires replay, ie. actually using it for nested device assignment.
> > > 
> > > Good point, that one is a correctness thing. Aviv,
> > > could you add this in TODO list in a cover letter pls?
> > >   
> > > > > > and we straighten out whether it's expected to get
> > > > > > multiple notifies and the notif-ee is responsible for filtering
> > > > > > them or if the notif-er should do filtering.  
> > > > > 
> > > > > OK this is a documentation thing.
> > > > 
> > > > Well no, it needs to be decided and if necessary implemented.
> > > 
> > > Let's assume it's the notif-ee for now. Less is more and all that.  
> > 
> > I think this is opposite of the approach dwg suggested.
> >
> > > > > >  Without those, this is
> > > > > > effectively just an RFC.  
> > > > > 
> > > > > It's infrastructure without users so it doesn't break things,
> > > > > I'm more interested in seeing whether it's broken in
> > > > > some way than whether it's complete.
> > > > 
> > > > If it allows use with vfio but doesn't fully implement the complete set
> > > > of interfaces, it does break things.  We currently prevent viommu usage
> > > > with vfio because it is incomplete.
> > > 
> > > Right - that bit is still in as far as I can see.  
> > 
> > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with
> > vfio even though it's still incomplete.  We would at least need
> > something like a replay callback for VT-d that triggers an abort if you
> > still want to accept it incomplete.  Thanks,
> > 
> > Alex  
> 
> IIUC practically things seems to work, right?

AFAIK, no.

> So how about disabling by default with a flag for people that want to
> experiment with it?
> E.g. x-vfio-allow-broken-translations ?

We've already been through one round of "intel-iommu is incomplete for
use with device assignment, how can we prevent it from being used",
which led to the notify_flag_changed callback on MemoryRegionIOMMUOps.
This series now claims to fix that yet still doesn't provide a
mechanism to do memory_region_iommu_replay() given that VT-d has a much
larger address width.  Why is the onus on vfio to resolve this or
provide some sort of workaround?  vfio is using the QEMU iommu
interface correctly, intel-iommu is still incomplete. The least it
could do is add an optional replay callback to MemoryRegionIOMMUOps
that supersedes the existing memory_region_iommu_replay() code and
triggers an abort when it gets called.  I don't know what an
x-vfio-allow-broken-translations option would do, how I'd implement it,
or why I'd bother to implement it.  Thanks,

Alex



[Qemu-devel] [PATCH] translate-all: Enable locking debug in a debug build

2016-11-16 Thread Pranith Kumar
Unconditionally enable locking checks in debug builds so that we get
wider testing. Using tcg_debug_assert() allows us to remove
DEBUG_LOCKING define.

Signed-off-by: Pranith Kumar 
---
 translate-all.c | 50 +-
 1 file changed, 17 insertions(+), 33 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index cf828aa..a03f323 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -60,7 +60,6 @@
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
-/* #define DEBUG_LOCKING */
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
@@ -75,23 +74,13 @@
  * access to the memory related structures are protected with the
  * mmap_lock.
  */
-#ifdef DEBUG_LOCKING
-#define DEBUG_MEM_LOCKS 1
-#else
-#define DEBUG_MEM_LOCKS 0
-#endif
-
 #ifdef CONFIG_SOFTMMU
 #define assert_memory_lock() do {   \
-if (DEBUG_MEM_LOCKS) {  \
-g_assert(have_tb_lock); \
-}   \
+tcg_debug_assert(have_tb_lock); \
 } while (0)
 #else
 #define assert_memory_lock() do {   \
-if (DEBUG_MEM_LOCKS) {  \
-g_assert(have_mmap_lock()); \
-}   \
+tcg_debug_assert(have_mmap_lock()); \
 } while (0)
 #endif
 
@@ -172,16 +161,24 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
+#define assert_tb_locked() do { \
+tcg_debug_assert(have_tb_lock); \
+} while (0)
+
+#define assert_tb_unlocked() do {   \
+tcg_debug_assert(!have_tb_lock);\
+} while (0)
+
 void tb_lock(void)
 {
-assert(!have_tb_lock);
+assert_tb_unlocked();
 qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock++;
 }
 
 void tb_unlock(void)
 {
-assert(have_tb_lock);
+assert_tb_locked();
 have_tb_lock--;
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
 }
@@ -194,19 +191,6 @@ void tb_lock_reset(void)
 }
 }
 
-#ifdef DEBUG_LOCKING
-#define DEBUG_TB_LOCKS 1
-#else
-#define DEBUG_TB_LOCKS 0
-#endif
-
-#define assert_tb_lock() do {   \
-if (DEBUG_TB_LOCKS) {   \
-g_assert(have_tb_lock); \
-}   \
-} while (0)
-
-
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
@@ -840,7 +824,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
 TranslationBlock *tb;
 
-assert_tb_lock();
+assert_tb_locked();
 
 if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
 return NULL;
@@ -855,7 +839,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
-assert_tb_lock();
+assert_tb_locked();
 
 /* In practice this is mostly used for single use temporary TB
Ignore the hard cases and just back up if this TB happens to
@@ -1097,7 +1081,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 uint32_t h;
 tb_page_addr_t phys_pc;
 
-assert_tb_lock();
+assert_tb_locked();
 
 atomic_set(&tb->invalid, true);
 
@@ -1412,7 +1396,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t 
start, tb_page_addr_t end)
 #ifdef CONFIG_SOFTMMU
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
-assert_tb_lock();
+assert_tb_locked();
 tb_invalidate_phys_range_1(start, end);
 }
 #else
@@ -1455,7 +1439,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif /* TARGET_HAS_PRECISE_SMC */
 
 assert_memory_lock();
-assert_tb_lock();
+assert_tb_locked();
 
 p = page_find(start >> TARGET_PAGE_BITS);
 if (!p) {
-- 
2.10.2




Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Andrew Jones

Just crossed my mind that we're missing isb's.

On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> From: Christopher Covington 
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington 
> Signed-off-by: Wei Huang 
> ---
>  arm/pmu.c | 98 
> +++
>  1 file changed, 98 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 0b29088..d5e3ac3 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libcflat.h"
>  
> +#define PMU_PMCR_E (1 << 0)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -21,6 +22,10 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define PMU_CYCLE_IDX  31
> +
> +#define NR_SAMPLES 10
> +
>  #if defined(__arm__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>   return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> +}
> +
> +static inline void pmselr_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));

Probably want an isb here, users will call this and then immediately
another PMU reg write, like is done below

> +}
> +
> +static inline void pmxevtyper_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> +}
> +
> +/*
> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning 
> 64
> + * bits doesn't seem worth the trouble when differential usage of the result 
> is
> + * expected (with differences that can easily fit in 32 bits). So just return
> + * the lower 32 bits of the cycle count in AArch32.

Also, while we're discussing confirming upper bits are as expected, I
guess we should confirm no overflow too. We should clear the overflow
bit PMOVSCLR_EL0.C before we use the counter, and then check it at some
point to confirm it's as expected. I guess that could be separate test
cases though.

> + */
> +static inline uint32_t pmccntr_read(void)
> +{
> + uint32_t cycles;
> +
> + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> + return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> + pmselr_write(PMU_CYCLE_IDX);
> + pmxevtyper_write(value);
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void)
>   asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>   return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> + asm volatile("msr pmcr_el0, %0" : : "r" (value));
> +}
> +
> +static inline uint32_t pmccntr_read(void)
> +{
> + uint32_t cycles;
> +
> + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> + return cycles;
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> + asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
> +}
> +
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
> +}
>  #endif
>  
>  /*
> @@ -63,11 +132,40 @@ static bool check_pmcr(void)
>   return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> + pmcr_write(pmcr_read() | PMU_PMCR_E);

Need isb() here

> +
> + for (int i = 0; i < NR_SAMPLES; i++) {
> + unsigned long a, b;
> +
> + a = pmccntr_read();
> + b = pmccntr_read();
> +
> + if (a >= b) {
> + printf("Read %ld then %ld.\n", a, b);
> + return false;
> + }
> + }
> +
> + pmcr_write(pmcr_read() & ~PMU_PMCR_E);
> +

Need isb() here

> + return true;
> +}
> +
>  int main(void)
>  {
>   report_prefix_push("pmu");
>  
> + /* init for PMU event access, right now only care about cycle count */
> + pmcntenset_write(1 << PMU_CYCLE_IDX);
> + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */

Need isb() here

> +
>   report("Control register", check_pmcr());
> + report("Monotonically increasing cycle count", check_cycles_increase());
>  
>   return report_summary();
>  }
> -- 
> 1.8.3.1
> 
>

Thanks,
drew



Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking

2016-11-16 Thread Andrew Jones
On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote:
> From: Christopher Covington 
> 
> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington 
> Signed-off-by: Wei Huang 
> ---
>  arm/pmu.c | 101 
> +-
>  arm/unittests.cfg |  14 
>  2 files changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index d5e3ac3..09aff89 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -15,6 +15,7 @@
>  #include "libcflat.h"
>  
>  #define PMU_PMCR_E (1 << 0)
> +#define PMU_PMCR_C (1 << 2)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value)
>   pmselr_write(PMU_CYCLE_IDX);
>   pmxevtyper_write(value);
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to 
> compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> + asm volatile(
> + "   mcr p15, 0, %[pmcr], c9, c12, 0\n"

isb

> + "1: subs%[i], %[i], #1\n"
> + "   bgt 1b\n"
> + "   mcr p15, 0, %[z], c9, c12, 0\n"

isb

> + : [i] "+r" (i)
> + : [pmcr] "r" (pmcr), [z] "r" (0)
> + : "cc");
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value)
>  {
>   asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to 
> compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting.
> + */
> +static inline void loop(int i, uint32_t pmcr)
> +{
> + asm volatile(
> + "   msr pmcr_el0, %[pmcr]\n"

isb

> + "1: subs%[i], %[i], #1\n"
> + "   b.gt1b\n"
> + "   msr pmcr_el0, xzr\n"

isb

> + : [i] "+r" (i)
> + : [pmcr] "r" (pmcr)
> + : "cc");
> +}
>  #endif
>  
>  /*
> @@ -156,8 +191,71 @@ static bool check_cycles_increase(void)
>   return true;
>  }
>  
> -int main(void)
> +/*
> + * Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The
> + * control register (PMCR_EL0) is initialized with the provided value 
> (allowing
> + * for example for the cycle counter or event counters to be reset). At the 
> end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> + int i = (num - 1) / 2;
> +
> + assert(num >= 3 && ((num - 1) % 2 == 0));
> + loop(i, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E;
> + 
> + if (cpi > 0)
> + printf("Checking for CPI=%d.\n", cpi);
> + printf("instrs : cycles0 cycles1 ...\n");
> +
> + for (int i = 3; i < 300; i += 32) {
> + int avg, sum = 0;
> +
> + printf("%d :", i);
> + for (int j = 0; j < NR_SAMPLES; j++) {
> + int cycles;
> +
> + measure_instrs(i, pmcr);
> + cycles =pmccntr_read();
> + printf(" %d", cycles);
> +
> + if (!cycles || (cpi > 0 && cycles != i * cpi)) {
> + printf("\n");
> + return false;
> + }
> +
> + sum += cycles;
> + }
> + avg = sum / NR_SAMPLES;
> + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n",
> + sum, avg, i / avg, avg / i);
> + }
> +
> + return true;
> +}
> +
> +int main(int argc, char *argv[])
>  {
> + int cpi = 0;
> +
> + if (argc >= 1)
> + cpi = atol(argv[0]);
> +
>   report_prefix_push("pmu");
>  
>   /* init for PMU event access, right now only care about cycle count */
> @@ -166,6 +264,7 @@ int main(void)
>  
>   report("Control register", check_p

Re: [Qemu-devel] [PATCH] translate-all: Enable locking debug in a debug build

2016-11-16 Thread Alex Bennée

Pranith Kumar  writes:

> Unconditionally enable locking checks in debug builds so that we get
> wider testing. Using tcg_debug_assert() allows us to remove
> DEBUG_LOCKING define.

Interesting. The other option would be to add a debug build to
.travis.yml that define this (and others) with -DFOO_DEBUG.

>
> Signed-off-by: Pranith Kumar 
> ---
>  translate-all.c | 50 +-
>  1 file changed, 17 insertions(+), 33 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index cf828aa..a03f323 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -60,7 +60,6 @@
>
>  /* #define DEBUG_TB_INVALIDATE */
>  /* #define DEBUG_TB_FLUSH */
> -/* #define DEBUG_LOCKING */
>  /* make various TB consistency checks */
>  /* #define DEBUG_TB_CHECK */

So if we are enabling this for tcg_debug builds why not the other cases?

>
> @@ -75,23 +74,13 @@
>   * access to the memory related structures are protected with the
>   * mmap_lock.
>   */
> -#ifdef DEBUG_LOCKING
> -#define DEBUG_MEM_LOCKS 1
> -#else
> -#define DEBUG_MEM_LOCKS 0
> -#endif
> -

In retrospect I should probably of had a comment in here about the roll
of tb_lock in CONFIG_SOFTMMU versus the mmap_lock.

>  #ifdef CONFIG_SOFTMMU
>  #define assert_memory_lock() do {   \
> -if (DEBUG_MEM_LOCKS) {  \
> -g_assert(have_tb_lock); \
> -}   \
> +tcg_debug_assert(have_tb_lock); \
>  } while (0)
>  #else
>  #define assert_memory_lock() do {   \
> -if (DEBUG_MEM_LOCKS) {  \
> -g_assert(have_mmap_lock()); \
> -}   \
> +tcg_debug_assert(have_mmap_lock()); \
>  } while (0)
>  #endif
>
> @@ -172,16 +161,24 @@ static void page_table_config_init(void)
>  assert(v_l2_levels >= 0);
>  }
>
> +#define assert_tb_locked() do { \
> +tcg_debug_assert(have_tb_lock); \
> +} while (0)
> +
> +#define assert_tb_unlocked() do {   \
> +tcg_debug_assert(!have_tb_lock);\
> +} while (0)
> +

I'm not sure we need all this multi-line stuff for a simple
substitution? Richard?

>  void tb_lock(void)
>  {
> -assert(!have_tb_lock);
> +assert_tb_unlocked();

Hmm why introduce a helper for exactly one use?

>  qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
>  have_tb_lock++;
>  }
>
>  void tb_unlock(void)
>  {
> -assert(have_tb_lock);
> +assert_tb_locked();
>  have_tb_lock--;
>  qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
>  }
> @@ -194,19 +191,6 @@ void tb_lock_reset(void)
>  }
>  }
>
> -#ifdef DEBUG_LOCKING
> -#define DEBUG_TB_LOCKS 1
> -#else
> -#define DEBUG_TB_LOCKS 0
> -#endif
> -
> -#define assert_tb_lock() do {   \
> -if (DEBUG_TB_LOCKS) {   \
> -g_assert(have_tb_lock); \
> -}   \
> -} while (0)
> -
> -
>  static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
>
>  void cpu_gen_init(void)
> @@ -840,7 +824,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
>  {
>  TranslationBlock *tb;
>
> -assert_tb_lock();
> +assert_tb_locked();
>
>  if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
>  return NULL;
> @@ -855,7 +839,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
>  /* Called with tb_lock held.  */
>  void tb_free(TranslationBlock *tb)
>  {
> -assert_tb_lock();
> +assert_tb_locked();
>
>  /* In practice this is mostly used for single use temporary TB
> Ignore the hard cases and just back up if this TB happens to
> @@ -1097,7 +1081,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
> tb_page_addr_t page_addr)
>  uint32_t h;
>  tb_page_addr_t phys_pc;
>
> -assert_tb_lock();
> +assert_tb_locked();
>
>  atomic_set(&tb->invalid, true);
>
> @@ -1412,7 +1396,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t 
> start, tb_page_addr_t end)
>  #ifdef CONFIG_SOFTMMU
>  void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
>  {
> -assert_tb_lock();
> +assert_tb_locked();
>  tb_invalidate_phys_range_1(start, end);
>  }
>  #else
> @@ -1455,7 +1439,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t 
> start, tb_page_addr_t end,
>  #endif /* TARGET_HAS_PRECISE_SMC */
>
>  assert_memory_lock();
> -assert_tb_lock();
> +assert_tb_locked();
>
>  p = page_find(start >> TARGET_PAGE_BITS);
>  if (!p) {


--
Alex Bennée



Re: [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev

2016-11-16 Thread Alberto Garcia
On Thu 10 Nov 2016 06:19:04 PM CET, Kevin Wolf wrote:

> +typedef struct QuorumCo {
> +QuorumAIOCB *acb;
>  int i;

Maybe 'i' could rename to something a bit more descriptive ('idx', I
don't know).

> +} QuorumCo;
> +
> +static void read_quorum_children_entry(void *opaque)
> +{
> +QuorumCo *co = opaque;
> +QuorumAIOCB *acb = co->acb;
> +BDRVQuorumState *s = acb->bs->opaque;
> +int i = co->i;
> +int ret;
> +co = NULL; /* Not valid after the first yield */

I also don't understand this last line. Is it to make sure that no one
tries to use it after the bdrv_co_preadv() call?

> +acb->qcrs[i].bs = s->children[i]->bs;
> +ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE,
> + acb->nb_sectors * BDRV_SECTOR_SIZE,
> + &acb->qcrs[i].qiov, 0);
> +quorum_aio_cb(&acb->qcrs[i], ret);
> +}

Otherwise the patch looks good to me.

Berto



Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support

2016-11-16 Thread Collin L. Walling

On 11/16/2016 09:05 AM, Eduardo Habkost wrote:

On Wed, Nov 16, 2016 at 02:15:02PM +0100, Jiri Denemark wrote:

On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote:

CCing qemu-devel.

CCing Markus, in case he has any insights about the interface
introspection.

On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote:

On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote:

On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote:

cpu features are passed to the qemu command with feature=on/off
instead of +/-feature.

Signed-off-by: Collin L. Walling 

If I'm not mistaken, the "feature=on|off" syntax was added on
QEMU 2.0.0. Does current libvirt support older QEMU versions?

Of course it does. I'd love to switch to feature=on|off, but how can we
check if QEMU supports it? We can't really start using this syntax
without it.

Actually, I was wrong, this was added in v2.4.0. "feat=on|off"
needs two things to work (in x86):

* Translation of all "foo=bar" options to QOM property setting.
   This was added in v2.0.0-rc0~162^2
* The actual QOM properties for feature names to be present. They
   were added in v2.4.0-rc0~101^2~1

So you can be sure "feat=on" is supported by checking if the
feature flags are present in device-list-properties output for
the CPU model. But device-list-properties is also messy[1].

Maybe we can use the availability of query-cpu-model-expansion to
check if we can safely use the new "feat=on|off" system? It's
easier than taking all the variables above into account.

Yeah, this could work since s390 already supports
query-cpu-model-expansion. It would cause feature=on|off not to be used
on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal,
is it?

Not a problem, as we have no plans to remove +feat/-feat support
in x86 anymore.


Beautiful. Thanks for your responses everyone. :)




[Qemu-devel] [PATCH RFC 2/2] numa: make -numa parser dynamically allocate CPUs masks

2016-11-16 Thread Igor Mammedov
so it won't impose an additional limits on max_cpus limits
supported by different targets.

It removes global MAX_CPUMASK_BITS constant and need to
bump it up whenever max_cpus is being increased for
a target above MAX_CPUMASK_BITS value.

Use runtime max_cpus value instead to allocate sufficiently
sized node_cpu bitmasks in numa parser.

Signed-off-by: Igor Mammedov 
---
 include/sysemu/numa.h   |  2 +-
 include/sysemu/sysemu.h |  7 ---
 numa.c  | 19 ---
 vl.c|  5 -
 4 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 4da808a..8f09dcf 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -17,7 +17,7 @@ struct numa_addr_range {
 
 typedef struct node_info {
 uint64_t node_mem;
-DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
+unsigned long *node_cpu;
 struct HostMemoryBackend *node_memdev;
 bool present;
 QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 66c6f15..cccde56 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -168,13 +168,6 @@ extern int mem_prealloc;
 #define MAX_NODES 128
 #define NUMA_NODE_UNASSIGNED MAX_NODES
 
-/* The following shall be true for all CPUs:
- *   cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
- *
- * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS.
- */
-#define MAX_CPUMASK_BITS 288
-
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
 const char *name;
diff --git a/numa.c b/numa.c
index 9c09e45..5542e40 100644
--- a/numa.c
+++ b/numa.c
@@ -266,20 +266,20 @@ static char *enumerate_cpus(unsigned long *cpus, int 
max_cpus)
 static void validate_numa_cpus(void)
 {
 int i;
-DECLARE_BITMAP(seen_cpus, MAX_CPUMASK_BITS);
+unsigned long *seen_cpus = bitmap_new(max_cpus);
 
-bitmap_zero(seen_cpus, MAX_CPUMASK_BITS);
+bitmap_zero(seen_cpus, max_cpus);
 for (i = 0; i < nb_numa_nodes; i++) {
-if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu,
-  MAX_CPUMASK_BITS)) {
+if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) {
 bitmap_and(seen_cpus, seen_cpus,
-   numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+   numa_info[i].node_cpu, max_cpus);
 error_report("CPU(s) present in multiple NUMA nodes: %s",
  enumerate_cpus(seen_cpus, max_cpus));
+bitmap_free(seen_cpus);
 exit(EXIT_FAILURE);
 }
 bitmap_or(seen_cpus, seen_cpus,
-  numa_info[i].node_cpu, MAX_CPUMASK_BITS);
+  numa_info[i].node_cpu, max_cpus);
 }
 
 if (!bitmap_full(seen_cpus, max_cpus)) {
@@ -291,12 +291,17 @@ static void validate_numa_cpus(void)
  "in NUMA config");
 g_free(msg);
 }
+bitmap_free(seen_cpus);
 }
 
 void parse_numa_opts(MachineClass *mc)
 {
 int i;
 
+for (i = 0; i < MAX_NODES; i++) {
+numa_info[i].node_cpu = bitmap_new(max_cpus);
+}
+
 if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) {
 exit(1);
 }
@@ -362,7 +367,7 @@ void parse_numa_opts(MachineClass *mc)
 numa_set_mem_ranges();
 
 for (i = 0; i < nb_numa_nodes; i++) {
-if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
+if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) {
 break;
 }
 }
diff --git a/vl.c b/vl.c
index d77dd86..37790e5 100644
--- a/vl.c
+++ b/vl.c
@@ -1277,11 +1277,6 @@ static void smp_parse(QemuOpts *opts)
 
 max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
 
-if (max_cpus > MAX_CPUMASK_BITS) {
-error_report("unsupported number of maxcpus");
-exit(1);
-}
-
 if (max_cpus < cpus) {
 error_report("maxcpus must be equal to or greater than smp");
 exit(1);
-- 
2.7.4




[Qemu-devel] [PATCH RFC 0/2] numa: allocate CPUs masks dynamically

2016-11-16 Thread Igor Mammedov
This series removes global MAX_CPUMASK_BITS constant
so that it won't inderectly influence maximum CPUs count
supported by different targets.

It replaces statically allocated bitmasks with dynamically
allocated ones using '-smp maxcpus' value for setting
bitmasks size.
That would allocate just enough memory to handle all
CPUs indexes that a QEMU instance would ever have.

CC: Alexey Kardashevskiy 
CC: Greg Kurz 
CC: David Gibson 
CC: Eduardo Habkost 
CC: Paolo Bonzini 


Igor Mammedov (2):
  add bitmap_free() wrapper
  numa: make -numa parser dynamically allocate CPUs masks

 include/qemu/bitmap.h   |  5 +
 include/sysemu/numa.h   |  2 +-
 include/sysemu/sysemu.h |  7 ---
 numa.c  | 19 ---
 vl.c|  5 -
 5 files changed, 18 insertions(+), 20 deletions(-)

-- 
2.7.4




Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Christopher Covington
On 11/16/2016 08:01 AM, Andrew Jones wrote:
> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
>>
>>
>> On 11/14/2016 09:12 AM, Christopher Covington wrote:
>>> Hi Drew, Wei,
>>>
>>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
 On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>
>
> On 11/11/2016 01:43 AM, Andrew Jones wrote:
>> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
>>> From: Christopher Covington 
>>>
>>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
>>> even for the smallest delta of two subsequent reads.
>>>
>>> Signed-off-by: Christopher Covington 
>>> Signed-off-by: Wei Huang 
>>> ---
>>>  arm/pmu.c | 98 
>>> +++
>>>  1 file changed, 98 insertions(+)
>>>
>>> diff --git a/arm/pmu.c b/arm/pmu.c
>>> index 0b29088..d5e3ac3 100644
>>> --- a/arm/pmu.c
>>> +++ b/arm/pmu.c
>>> @@ -14,6 +14,7 @@
>>>   */
>>>  #include "libcflat.h"
>>>  
>>> +#define PMU_PMCR_E (1 << 0)
>>>  #define PMU_PMCR_N_SHIFT   11
>>>  #define PMU_PMCR_N_MASK0x1f
>>>  #define PMU_PMCR_ID_SHIFT  16
>>> @@ -21,6 +22,10 @@
>>>  #define PMU_PMCR_IMP_SHIFT 24
>>>  #define PMU_PMCR_IMP_MASK  0xff
>>>  
>>> +#define PMU_CYCLE_IDX  31
>>> +
>>> +#define NR_SAMPLES 10
>>> +
>>>  #if defined(__arm__)
>>>  static inline uint32_t pmcr_read(void)
>>>  {
>>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>>> return ret;
>>>  }
>>> +
>>> +static inline void pmcr_write(uint32_t value)
>>> +{
>>> +   asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
>>> +}
>>> +
>>> +static inline void pmselr_write(uint32_t value)
>>> +{
>>> +   asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
>>> +}
>>> +
>>> +static inline void pmxevtyper_write(uint32_t value)
>>> +{
>>> +   asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
>>> +}
>>> +
>>> +/*
>>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
>>> returning 64
>>> + * bits doesn't seem worth the trouble when differential usage of the 
>>> result is
>>> + * expected (with differences that can easily fit in 32 bits). So just 
>>> return
>>> + * the lower 32 bits of the cycle count in AArch32.
>>
>> Like I said in the last review, I'd rather we not do this. We should
>> return the full value and then the test case should confirm the upper
>> 32 bits are zero.
>
> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> register. We can force it to a more coarse-grained cycle counter with
> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>>>
>>> AArch32 System Register Descriptions
>>> Performance Monitors registers
>>> PMCCNTR, Performance Monitors Cycle Count Register
>>>
>>> To access the PMCCNTR when accessing as a 32-bit register:
>>> MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
>>> MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are 
>>> unchanged
>>>
>>> To access the PMCCNTR when accessing as a 64-bit register:
>>> MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] 
>>> into Rt2
>>> MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to 
>>> PMCCNTR[63:32]
>>>
>>
>> Thanks. I did some research based on your info and came back with the
>> following proposals (Cov, correct me if I am wrong):
>>
>> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
>> think this 64-bit cycle register is only available when running under
>> aarch32 compatibility mode on ARMv8 because it is not specified in A15
>> TRM.

That interpretation sounds really strange to me. My recollection is that the
cycle counter was available as a 64 bit register in ARMv7 as well. I would
expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
Manual is the complete and authoritative source.

>> To further verify it, I tested 32-bit pmu code on QEMU with TCG
>> mode. The result is: accessing 64-bit PMCCNTR using the following
>> assembly failed on A15:
>>
>>volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
>> or
>>volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));

The PMU implementation on QEMU TCG mode is infantile. (I was trying to
write these tests to help guide fixes and enhancements in a
test-driven-development manner.) I would not trust QEMU TCG to behave
properly here. If you want to execute those instructions, is there anything
preventing you from doing it on hardware, or at least the Foundation Model?

>> Given this difference, I think there are two solutions for 64-bit
>> AArch32 pmccntr_read, as requested by Drew:
>>
>> 1) The PMU un

[Qemu-devel] [PATCH RFC 1/2] add bitmap_free() wrapper

2016-11-16 Thread Igor Mammedov
it will be used for freeing bitmaps allocated with bitmap_[try]_new()

Signed-off-by: Igor Mammedov 
---
 include/qemu/bitmap.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 63ea2d0..0289836 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -98,6 +98,11 @@ static inline unsigned long *bitmap_new(long nbits)
 return ptr;
 }
 
+static inline void bitmap_free(unsigned long *bitmap)
+{
+g_free(bitmap);
+}
+
 static inline void bitmap_zero(unsigned long *dst, long nbits)
 {
 if (small_nbits(nbits)) {
-- 
2.7.4




Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Andrew Jones
On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
> On 11/16/2016 08:01 AM, Andrew Jones wrote:
> > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:
> >>
> >>
> >> On 11/14/2016 09:12 AM, Christopher Covington wrote:
> >>> Hi Drew, Wei,
> >>>
> >>> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>  On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
> >
> >
> > On 11/11/2016 01:43 AM, Andrew Jones wrote:
> >> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> >>> From: Christopher Covington 
> >>>
> >>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> >>> even for the smallest delta of two subsequent reads.
> >>>
> >>> Signed-off-by: Christopher Covington 
> >>> Signed-off-by: Wei Huang 
> >>> ---
> >>>  arm/pmu.c | 98 
> >>> +++
> >>>  1 file changed, 98 insertions(+)
> >>>
> >>> diff --git a/arm/pmu.c b/arm/pmu.c
> >>> index 0b29088..d5e3ac3 100644
> >>> --- a/arm/pmu.c
> >>> +++ b/arm/pmu.c
> >>> @@ -14,6 +14,7 @@
> >>>   */
> >>>  #include "libcflat.h"
> >>>  
> >>> +#define PMU_PMCR_E (1 << 0)
> >>>  #define PMU_PMCR_N_SHIFT   11
> >>>  #define PMU_PMCR_N_MASK0x1f
> >>>  #define PMU_PMCR_ID_SHIFT  16
> >>> @@ -21,6 +22,10 @@
> >>>  #define PMU_PMCR_IMP_SHIFT 24
> >>>  #define PMU_PMCR_IMP_MASK  0xff
> >>>  
> >>> +#define PMU_CYCLE_IDX  31
> >>> +
> >>> +#define NR_SAMPLES 10
> >>> +
> >>>  #if defined(__arm__)
> >>>  static inline uint32_t pmcr_read(void)
> >>>  {
> >>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
> >>>   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
> >>>   return ret;
> >>>  }
> >>> +
> >>> +static inline void pmcr_write(uint32_t value)
> >>> +{
> >>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> >>> +}
> >>> +
> >>> +static inline void pmselr_write(uint32_t value)
> >>> +{
> >>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> >>> +}
> >>> +
> >>> +static inline void pmxevtyper_write(uint32_t value)
> >>> +{
> >>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> >>> +}
> >>> +
> >>> +/*
> >>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
> >>> returning 64
> >>> + * bits doesn't seem worth the trouble when differential usage of 
> >>> the result is
> >>> + * expected (with differences that can easily fit in 32 bits). So 
> >>> just return
> >>> + * the lower 32 bits of the cycle count in AArch32.
> >>
> >> Like I said in the last review, I'd rather we not do this. We should
> >> return the full value and then the test case should confirm the upper
> >> 32 bits are zero.
> >
> > Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
> > register. We can force it to a more coarse-grained cycle counter with
> > PMCR.D bit=1 (see below). But it is still not a 64-bit register.
> >>>
> >>> AArch32 System Register Descriptions
> >>> Performance Monitors registers
> >>> PMCCNTR, Performance Monitors Cycle Count Register
> >>>
> >>> To access the PMCCNTR when accessing as a 32-bit register:
> >>> MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> >>> MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are 
> >>> unchanged
> >>>
> >>> To access the PMCCNTR when accessing as a 64-bit register:
> >>> MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] 
> >>> into Rt2
> >>> MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to 
> >>> PMCCNTR[63:32]
> >>>
> >>
> >> Thanks. I did some research based on your info and came back with the
> >> following proposals (Cov, correct me if I am wrong):
> >>
> >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
> >> think this 64-bit cycle register is only available when running under
> >> aarch32 compatibility mode on ARMv8 because it is not specified in A15
> >> TRM.
> 
> That interpretation sounds really strange to me. My recollection is that the
> cycle counter was available as a 64 bit register in ARMv7 as well. I would
> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
> Manual is the complete and authoritative source.

Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit.
Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64.

> 
> >> To further verify it, I tested 32-bit pmu code on QEMU with TCG
> >> mode. The result is: accessing 64-bit PMCCNTR using the following
> >> assembly failed on A15:
> >>
> >>volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
> >> or
> >>volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val));
> 
> The PMU implementation on QEMU TCG mode is

Re: [Qemu-devel] QMP event on reboot when -no-reboot is set

2016-11-16 Thread John Snow



On 11/16/2016 09:01 AM, Dirk Braunschweiger wrote:

Hey Guys,

I want to get a qmp event when the qemu does a shutdown due to the
-no-reboot flag. Looking at the code I realized that the -no-reboot flag
just changes any reset request to a shutdown request.
Does anybody already patched qemu to emit some kind of reboot event to
the qmp socket?

If no one already patched it, would you accept such a patch? Or is a
non-wanted feature?

Best regards,
Dirk Braunschweiger



Is the existing "STOP" event insufficient for some reason? Is it 
important to distinguish between a 'real' stop and a stop that was 
originally intended to be a reboot?


If you can elaborate on that case, you have a good chance of amending 
the event spec to add some new events.


--js



Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases

2016-11-16 Thread Christopher Covington
On 11/16/2016 11:25 AM, Andrew Jones wrote:
> On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote:
>> On 11/16/2016 08:01 AM, Andrew Jones wrote:
>>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote:


 On 11/14/2016 09:12 AM, Christopher Covington wrote:
> Hi Drew, Wei,
>
> On 11/14/2016 05:05 AM, Andrew Jones wrote:
>> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote:
>>>
>>>
>>> On 11/11/2016 01:43 AM, Andrew Jones wrote:
 On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote:
> From: Christopher Covington 
>
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
>
> Signed-off-by: Christopher Covington 
> Signed-off-by: Wei Huang 
> ---
>  arm/pmu.c | 98 
> +++
>  1 file changed, 98 insertions(+)
>
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 0b29088..d5e3ac3 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libcflat.h"
>  
> +#define PMU_PMCR_E (1 << 0)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -21,6 +22,10 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define PMU_CYCLE_IDX  31
> +
> +#define NR_SAMPLES 10
> +
>  #if defined(__arm__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void)
>   asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>   return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> +}
> +
> +static inline void pmselr_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> +}
> +
> +static inline void pmxevtyper_write(uint32_t value)
> +{
> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> +}
> +
> +/*
> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, 
> returning 64
> + * bits doesn't seem worth the trouble when differential usage of 
> the result is
> + * expected (with differences that can easily fit in 32 bits). So 
> just return
> + * the lower 32 bits of the cycle count in AArch32.

 Like I said in the last review, I'd rather we not do this. We should
 return the full value and then the test case should confirm the upper
 32 bits are zero.
>>>
>>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit
>>> register. We can force it to a more coarse-grained cycle counter with
>>> PMCR.D bit=1 (see below). But it is still not a 64-bit register.
>
> AArch32 System Register Descriptions
> Performance Monitors registers
> PMCCNTR, Performance Monitors Cycle Count Register
>
> To access the PMCCNTR when accessing as a 32-bit register:
> MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt
> MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are 
> unchanged
>
> To access the PMCCNTR when accessing as a 64-bit register:
> MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] 
> into Rt2
> MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to 
> PMCCNTR[63:32]
>

 Thanks. I did some research based on your info and came back with the
 following proposals (Cov, correct me if I am wrong):

 By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I
 think this 64-bit cycle register is only available when running under
 aarch32 compatibility mode on ARMv8 because it is not specified in A15
 TRM.
>>
>> That interpretation sounds really strange to me. My recollection is that the
>> cycle counter was available as a 64 bit register in ARMv7 as well. I would
>> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference
>> Manual is the complete and authoritative source.
> 
> Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit.
> Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64.

Just looked it up as well in the good old ARM DDI 0406C.c and you're absolutely
right. Sorry for the bad recollection.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Col

Re: [Qemu-devel] [PATCH] hw/pci: disable pci-bridge's shpc by default

2016-11-16 Thread Andrew Jones
On Sat, Nov 05, 2016 at 06:46:34PM +0200, Marcel Apfelbaum wrote:
> On 11/03/2016 09:40 PM, Michael S. Tsirkin wrote:
> > On Thu, Nov 03, 2016 at 01:05:44PM +0200, Marcel Apfelbaum wrote:
> > > On 11/03/2016 06:18 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Nov 02, 2016 at 05:16:42PM +0200, Marcel Apfelbaum wrote:
> > > > > The shpc component is optional while  ACPI hotplug is used
> > > > > for hot-plugging PCI devices into a PCI-PCI bridge.
> > > > > Disabling the shpc by default will make slot 0 usable at boot time
> > > 
> > > Hi Michael
> > > 
> > > > 
> > > > at the cost of breaking all hotplug for all non-acpi users.
> > > > 
> > > 
> > > Do we have a non-acpi user that is able to use the shpc component as-is 
> > > today?
> > 
> > power and some arm systems I guess?
> > 
> 
> Adding Andrew , maybe he can give us an answer.

Not really :-) My lack of PCI knowledge makes that difficult. I'd be happy
to help with an experiment though. Can you give me command line arguments,
qmp commands, etc. that I should use to try it out? I imagine I should
just boot an ARM guest using DT (instead of ACPI) and then attempt to
hotplug a PCI device. I'm not sure, however, what, if any, special
configuration I need in order to ensure I'm testing what you're
interested in.

Thanks,
drew


> 
> Anybody else can help answering this?
> 
> > > I remember we need to even tweak QEMU before it can be used, but I might 
> > > be wrong.
> > > 
> > > And we don't touch the current machines < 2.8 .
> > > 
> > > > > and not only for hot-plug, without loosing any functionality.
> > > > > Older machines will have shpc enabled for compatibility reasons.
> > > > > 
> > > > > Signed-off-by: Marcel Apfelbaum 
> > > > 
> > > > Is an extra slot such a big deal? You can always add more bridges ...
> > > > 
> > > 
> > > It is not only about the slot itself, but more about the usage model.
> > > The PCIe Upstream ports/DMI-PCI devices are also pci-bridges,
> > > but for them slot 0 is allowed.
> > 
> > The reason is that these devices are not themselves
> > hotpluggable. Isn't there a flag that allows adding
> > a non hotpluggable device? Allowing these would be one solution.
> > 
> > > And what about the hotplug? Slot 0 is not usable at boot, but then is
> > > usable again (for ACPI users) making people wondering:
> > >  https://bugzilla.redhat.com/show_bug.cgi?id=1175113
> > 
> > Let's just disallow that then for consistency?
> > 
> 
> I suppose we can do that... not sure if it worth it.
> 
> Thanks,
> Marcel
> 
> > 
> > > My point is - can shpc be used as-is today? Even so, I suspect there are 
> > > much (much)
> > > less users using SHPC than ACPI based hotplug. If this is the case, why 
> > > bother the
> > > majority of the users? And for the shpc users, they can keep the prev 
> > > machines
> > > or change the command line, I think changes like this happens over the 
> > > time.
> > > 
> > > Adding Markus for his opinion on command line changes.
> > > 
> > > Thanks,
> > > Marcel
> > > > > ---
> > > > >  hw/pci-bridge/pci_bridge_dev.c | 2 +-
> > > > >  include/hw/compat.h| 4 
> > > > >  2 files changed, 5 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/pci-bridge/pci_bridge_dev.c 
> > > > > b/hw/pci-bridge/pci_bridge_dev.c
> > > > > index 5dbd933..647ad80 100644
> > > > > --- a/hw/pci-bridge/pci_bridge_dev.c
> > > > > +++ b/hw/pci-bridge/pci_bridge_dev.c
> > > > > @@ -163,7 +163,7 @@ static Property pci_bridge_dev_properties[] = {
> > > > >  DEFINE_PROP_ON_OFF_AUTO(PCI_BRIDGE_DEV_PROP_MSI, PCIBridgeDev, 
> > > > > msi,
> > > > >  ON_OFF_AUTO_AUTO),
> > > > >  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
> > > > > -PCI_BRIDGE_DEV_F_SHPC_REQ, true),
> > > > > +PCI_BRIDGE_DEV_F_SHPC_REQ, false),
> > > > >  DEFINE_PROP_END_OF_LIST(),
> > > > >  };
> > > > > 
> > > > > diff --git a/include/hw/compat.h b/include/hw/compat.h
> > > > > index 0f06e11..388b7ec 100644
> > > > > --- a/include/hw/compat.h
> > > > > +++ b/include/hw/compat.h
> > > > > @@ -18,6 +18,10 @@
> > > > >  .driver   = "intel-iommu",\
> > > > >  .property = "x-buggy-eim",\
> > > > >  .value= "true",\
> > > > > +},{\
> > > > > +.driver   = "pci-bridge",\
> > > > > +.property = "shpc",\
> > > > > +.value= "on",\
> > > > >  },
> > > > > 
> > > > >  #define HW_COMPAT_2_6 \
> > > > > --
> > > > > 2.5.5
> 
> 



Re: [Qemu-devel] [PATCH] hw/pci: disable pci-bridge's shpc by default

2016-11-16 Thread Marcel Apfelbaum

On 11/16/2016 06:44 PM, Andrew Jones wrote:

On Sat, Nov 05, 2016 at 06:46:34PM +0200, Marcel Apfelbaum wrote:

On 11/03/2016 09:40 PM, Michael S. Tsirkin wrote:

On Thu, Nov 03, 2016 at 01:05:44PM +0200, Marcel Apfelbaum wrote:

On 11/03/2016 06:18 AM, Michael S. Tsirkin wrote:

On Wed, Nov 02, 2016 at 05:16:42PM +0200, Marcel Apfelbaum wrote:

The shpc component is optional while  ACPI hotplug is used
for hot-plugging PCI devices into a PCI-PCI bridge.
Disabling the shpc by default will make slot 0 usable at boot time


Hi Michael



at the cost of breaking all hotplug for all non-acpi users.



Do we have a non-acpi user that is able to use the shpc component as-is today?


power and some arm systems I guess?



Adding Andrew , maybe he can give us an answer.


Not really :-) My lack of PCI knowledge makes that difficult. I'd be happy
to help with an experiment though. Can you give me command line arguments,
qmp commands, etc. that I should use to try it out? I imagine I should
just boot an ARM guest using DT (instead of ACPI) and then attempt to
hotplug a PCI device. I'm not sure, however, what, if any, special
configuration I need in order to ensure I'm testing what you're
interested in.



Hi Drew,


Just run QEMU with '-device pci-bridge,chassis_nr=1,id=bridge1 -monitor stdio'
with an ARM guest using DT and wait until the guest finish booting.

Then run at hmp:
device_add virtio-net-pci,bus=bridge1,id=net2

Next run lspci in the guest to see the new device.


BTW, will an ARM guest run 'fast' enough to be usable on a x86 machine?
If yes, any pointers on how to create such a guest?


Thanks,
Marcel





Thanks,
drew




[...]



Re: [Qemu-devel] [PATCH RFC 1/2] add bitmap_free() wrapper

2016-11-16 Thread Eduardo Habkost
On Wed, Nov 16, 2016 at 05:02:55PM +0100, Igor Mammedov wrote:
> it will be used for freeing bitmaps allocated with bitmap_[try]_new()
> 
> Signed-off-by: Igor Mammedov 

We need to change all code using g_free() for bitmaps to use
bitmap_free(), as people in the future might assume that changing
bitmap_free() is safe (and it won't be).

Personally, I think g_free() is good enough and we don't need
bitmap_free(). The assumption that bitmap_new() returns
g_free()-able memory is part of the API.

> ---
>  include/qemu/bitmap.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 63ea2d0..0289836 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -98,6 +98,11 @@ static inline unsigned long *bitmap_new(long nbits)
>  return ptr;
>  }
>  
> +static inline void bitmap_free(unsigned long *bitmap)
> +{
> +g_free(bitmap);
> +}
> +
>  static inline void bitmap_zero(unsigned long *dst, long nbits)
>  {
>  if (small_nbits(nbits)) {
> -- 
> 2.7.4
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"

2016-11-16 Thread ashish mittal
On Wed, Nov 16, 2016 at 3:27 AM, Stefan Hajnoczi  wrote:
> On Wed, Nov 16, 2016 at 9:49 AM, Fam Zheng  wrote:
>> On Wed, 11/16 10:04, Markus Armbruster wrote:
>>> ashish mittal  writes:
>>>
>>> > Thanks for concluding on this.
>>> >
>>> > I will rearrange the qnio_api.h header accordingly as follows:
>>> >
>>> > +#include "qemu/osdep.h"
>>>
>>> Headers should not include osdep.h.
>>
>> This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what
>> Ashish means looks good to me.
>
> Yes, I think "will rearrange the qnio_api.h header" was a typo and was
> supposed to be block/vxhs.c.
>
> Stefan

Thanks for the correction. Yes, i meant rearrange headers in block/vxhs.c.



Re: [Qemu-devel] [PATCH RFC 2/2] numa: make -numa parser dynamically allocate CPUs masks

2016-11-16 Thread Eduardo Habkost
On Wed, Nov 16, 2016 at 05:02:56PM +0100, Igor Mammedov wrote:
> so it won't impose an additional limits on max_cpus limits
> supported by different targets.
> 
> It removes global MAX_CPUMASK_BITS constant and need to
> bump it up whenever max_cpus is being increased for
> a target above MAX_CPUMASK_BITS value.
> 
> Use runtime max_cpus value instead to allocate sufficiently
> sized node_cpu bitmasks in numa parser.
> 
> Signed-off-by: Igor Mammedov 
> ---
>  include/sysemu/numa.h   |  2 +-
>  include/sysemu/sysemu.h |  7 ---
>  numa.c  | 19 ---
>  vl.c|  5 -
>  4 files changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 4da808a..8f09dcf 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -17,7 +17,7 @@ struct numa_addr_range {
>  
>  typedef struct node_info {
>  uint64_t node_mem;
> -DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
> +unsigned long *node_cpu;
>  struct HostMemoryBackend *node_memdev;
>  bool present;
>  QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 66c6f15..cccde56 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -168,13 +168,6 @@ extern int mem_prealloc;
>  #define MAX_NODES 128
>  #define NUMA_NODE_UNASSIGNED MAX_NODES
>  
> -/* The following shall be true for all CPUs:
> - *   cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS
> - *
> - * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS.
> - */
> -#define MAX_CPUMASK_BITS 288
> -

Nice!

>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
>  const char *name;
> diff --git a/numa.c b/numa.c
> index 9c09e45..5542e40 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -266,20 +266,20 @@ static char *enumerate_cpus(unsigned long *cpus, int 
> max_cpus)
>  static void validate_numa_cpus(void)
>  {
>  int i;
> -DECLARE_BITMAP(seen_cpus, MAX_CPUMASK_BITS);
> +unsigned long *seen_cpus = bitmap_new(max_cpus);
>  
> -bitmap_zero(seen_cpus, MAX_CPUMASK_BITS);
> +bitmap_zero(seen_cpus, max_cpus);

bitmap_new() already returns a zeroed bitmap.

>  for (i = 0; i < nb_numa_nodes; i++) {
> -if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu,
> -  MAX_CPUMASK_BITS)) {
> +if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) {
>  bitmap_and(seen_cpus, seen_cpus,
> -   numa_info[i].node_cpu, MAX_CPUMASK_BITS);
> +   numa_info[i].node_cpu, max_cpus);
>  error_report("CPU(s) present in multiple NUMA nodes: %s",
>   enumerate_cpus(seen_cpus, max_cpus));
> +bitmap_free(seen_cpus);
>  exit(EXIT_FAILURE);
>  }
>  bitmap_or(seen_cpus, seen_cpus,
> -  numa_info[i].node_cpu, MAX_CPUMASK_BITS);
> +  numa_info[i].node_cpu, max_cpus);
>  }
>  
>  if (!bitmap_full(seen_cpus, max_cpus)) {
> @@ -291,12 +291,17 @@ static void validate_numa_cpus(void)
>   "in NUMA config");
>  g_free(msg);
>  }
> +bitmap_free(seen_cpus);

See comment about bitmap_free() on patch 1/2. I think g_free() is
good enough (unless you really want to review all callers of
bitmap_[try_]new()).

>  }
>  
>  void parse_numa_opts(MachineClass *mc)
>  {
>  int i;
>  
> +for (i = 0; i < MAX_NODES; i++) {
> +numa_info[i].node_cpu = bitmap_new(max_cpus);
> +}
> +
>  if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) {
>  exit(1);
>  }
> @@ -362,7 +367,7 @@ void parse_numa_opts(MachineClass *mc)
>  numa_set_mem_ranges();
>  
>  for (i = 0; i < nb_numa_nodes; i++) {
> -if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
> +if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) {
>  break;
>  }
>  }
> diff --git a/vl.c b/vl.c
> index d77dd86..37790e5 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1277,11 +1277,6 @@ static void smp_parse(QemuOpts *opts)
>  
>  max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
>  
> -if (max_cpus > MAX_CPUMASK_BITS) {
> -error_report("unsupported number of maxcpus");
> -exit(1);
> -}
> -
>  if (max_cpus < cpus) {
>  error_report("maxcpus must be equal to or greater than smp");
>  exit(1);
> -- 
> 2.7.4
> 

-- 
Eduardo



Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Laszlo Ersek
On 11/16/16 13:47, Paolo Bonzini wrote:
> 
>> If the consensus is that the patch is a QEMU bugfix (as opposed to a
>> feature) and that it is eligible for the currently supported upstream
>> stable branches, that's the best, no doubt.
> 
> The currently supported upstream stable branches is just 2.7. :)
> 
> I'm okay with bending the rules and including it in 2.8, but it's
> worrisome that you also needed to go back from relaxed to traditional
> delivery, meaning that old QEMU + new OVMF will take ages to boot.
> 
> If this is the case, I still think this needs some kind of discovery
> mechanism, unless OVMF can just say "things were too broken, stop
> supporting SMM on QEMUs older than 2.8".
> 
> For example:
> 
> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
> setting is used for the PCD; this would be backwards compatibility mode.

Okay, but this still means that the PCD has to become dynamic, and we
must set the PCD earlier (likely in PlatformPei) based on something.

I guess that's what the next paragraph is about:

> - we could have another magic 0xB2 value, which is implemented directly
> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> to detect the new feature.  It can fail to start if using traditional
> AP and the new feature is not there.

Please explain in more detail. If I write to 0xB2 (by invoking the
Trigger() method or somehow else), then on old QEMU's that will raise a
sync / unicast SMI. The SMI handler in edk2 will run, but no request
parameters will have been set up by OVMF, so the SMI handler will do...
no clue what. I don't think this is a good idea.

My preference is fw_cfg ATM. It provides a prove, flexible and
extensible interface (it's easy to add new files for future features).
If we expect more knobs in the area, I can modify my proposal to use
"etc/smi/broadcast", so we can add "etc/smi/" later.

Do you have any specific arguments against fw_cfg? As I suggested in my
previous email, with fw_cfg I can implement the change in OVMF such that
the default behavior wouldn't change -- the default delivery would
remain relaxed, and the broadcast wouldn't be requested, unless the
fw_cfg file told OVMF otherwise.

> By the way, in case OVMF needs to use SmmSwDispatch in the future, I
> would make QEMU use broadcast behavior for all values in the 0x10-0xff
> range, or something like that.

Are we talking control/command (0xB2) or scratch/data (0xB3) register
values? My patches currently use the scratch/data register to provide
the hint to QEMU; that register is less likely to interfere with
anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch
uses command/control values to distinguish the called functions. Should
we keep the broadcast / unicast decision separate from the
control/command value ?

Thanks
Laszlo

> 
> Paolo
> 
>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
>> correct; when I was writing the OVMF docs, I must have misunderstood the
>> requirements and needlessly required 2.5+; 2.4+ should have been fine.)
>>
>> Which means the fix should be backported as far as stable-2.4.
>>
>> Should we proceed with that? CC'ing Mike Roth and the stable list.
>>
>> Thanks!
>> Laszlo
>>
>>>
>>>
>
> Paolo
>
>> ---
>>  hw/isa/lpc_ich9.c | 12 +++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>> index 10d1ee8b9310..f2fe644fdaa4 100644
>> --- a/hw/isa/lpc_ich9.c
>> +++ b/hw/isa/lpc_ich9.c
>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool
>> smm_enabled)
>>  
>>  /* APM */
>>  
>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
>> +
>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
>>  {
>>  ICH9LPCState *lpc = arg;
>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
>> void *arg)
>>  
>>  /* SMI_EN = PMBASE + 30. SMI control and enable register */
>>  if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
>> -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>> +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
>> +CPUState *cs;
>> +
>> +CPU_FOREACH(cs) {
>> +cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>> +}
>> +} else {
>> +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>> +}
>>  }
>>  }
>>  
>>
>>
>>




[Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode

2016-11-16 Thread Stefan Hajnoczi
v2:
 * Uninitialized node->deleted gone [Fam]
 * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization which
   created a weird step pattern [Fam]
 * Unified with AioHandler, dropped AioPollHandler struct [Paolo]
   (actually I think Paolo had more in mind but this is the first step)
 * Only poll when all event loop resources support it [Paolo]
 * Added run_poll_handlers_begin/end trace events for perf analysis
 * Sorry, Christian, no virtqueue kick suppression yet

Recent performance investigation work done by Karl Rister shows that the
guest->host notification takes around 20 us.  This is more than the "overhead"
of QEMU itself (e.g. block layer).

One way to avoid the costly exit is to use polling instead of notification.
The main drawback of polling is that it consumes CPU resources.  In order to
benefit performance the host must have extra CPU cycles available on physical
CPUs that aren't used by the guest.

This is an experimental AioContext polling implementation.  It adds a polling
callback into the event loop.  Polling functions are implemented for virtio-blk
virtqueue guest->host kick and Linux AIO completion.

The QEMU_AIO_POLL_MAX_NS environment variable sets the number of nanoseconds to
poll before entering the usual blocking poll(2) syscall.  Try setting this
variable to the time from old request completion to new virtqueue kick.

By default no polling is done.  The QEMU_AIO_POLL_MAX_NS must be set to get any
polling!

Stefan Hajnoczi (4):
  aio: add AioPollFn and io_poll() interface
  aio: add polling mode to AioContext
  virtio: poll virtqueues for new buffers
  linux-aio: poll ring for completions

 aio-posix.c | 115 ++--
 async.c |  14 +-
 block/curl.c|   8 +--
 block/iscsi.c   |   3 +-
 block/linux-aio.c   |  19 +++-
 block/nbd-client.c  |   8 +--
 block/nfs.c |   7 +--
 block/sheepdog.c|  26 +-
 block/ssh.c |   4 +-
 block/win32-aio.c   |   4 +-
 hw/virtio/virtio.c  |  18 ++-
 include/block/aio.h |   8 ++-
 iohandler.c |   2 +-
 nbd/server.c|   9 ++--
 stubs/set-fd-handler.c  |   1 +
 tests/test-aio.c|   4 +-
 trace-events|   4 ++
 util/event_notifier-posix.c |   2 +-
 18 files changed, 207 insertions(+), 49 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v2 1/4] aio: add AioPollFn and io_poll() interface

2016-11-16 Thread Stefan Hajnoczi
The new AioPollFn io_poll() argument to aio_set_fd_handler() and
aio_set_event_handler() is used in the next patch.

Keep this code change separate due to the number of files it touches.

Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c |  8 +---
 async.c |  5 +++--
 block/curl.c|  8 
 block/iscsi.c   |  3 ++-
 block/linux-aio.c   |  4 ++--
 block/nbd-client.c  |  8 
 block/nfs.c |  7 ---
 block/sheepdog.c| 26 +-
 block/ssh.c |  4 ++--
 block/win32-aio.c   |  4 ++--
 hw/virtio/virtio.c  |  4 ++--
 include/block/aio.h |  5 -
 iohandler.c |  2 +-
 nbd/server.c|  9 -
 stubs/set-fd-handler.c  |  1 +
 tests/test-aio.c|  4 ++--
 util/event_notifier-posix.c |  2 +-
 17 files changed, 56 insertions(+), 48 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index e13b9ab..4379c13 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -200,6 +200,7 @@ void aio_set_fd_handler(AioContext *ctx,
 bool is_external,
 IOHandler *io_read,
 IOHandler *io_write,
+AioPollFn *io_poll,
 void *opaque)
 {
 AioHandler *node;
@@ -258,10 +259,11 @@ void aio_set_fd_handler(AioContext *ctx,
 void aio_set_event_notifier(AioContext *ctx,
 EventNotifier *notifier,
 bool is_external,
-EventNotifierHandler *io_read)
+EventNotifierHandler *io_read,
+AioPollFn *io_poll)
 {
-aio_set_fd_handler(ctx, event_notifier_get_fd(notifier),
-   is_external, (IOHandler *)io_read, NULL, notifier);
+aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external,
+   (IOHandler *)io_read, NULL, io_poll, notifier);
 }
 
 bool aio_prepare(AioContext *ctx)
diff --git a/async.c b/async.c
index b2de360..c8fbd63 100644
--- a/async.c
+++ b/async.c
@@ -282,7 +282,7 @@ aio_ctx_finalize(GSource *source)
 }
 qemu_mutex_unlock(&ctx->bh_lock);
 
-aio_set_event_notifier(ctx, &ctx->notifier, false, NULL);
+aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
 event_notifier_cleanup(&ctx->notifier);
 qemu_rec_mutex_destroy(&ctx->lock);
 qemu_mutex_destroy(&ctx->bh_lock);
@@ -366,7 +366,8 @@ AioContext *aio_context_new(Error **errp)
 aio_set_event_notifier(ctx, &ctx->notifier,
false,
(EventNotifierHandler *)
-   event_notifier_dummy_cb);
+   event_notifier_dummy_cb,
+   NULL);
 #ifdef CONFIG_LINUX_AIO
 ctx->linux_aio = NULL;
 #endif
diff --git a/block/curl.c b/block/curl.c
index 0404c1b..792fef8 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -192,19 +192,19 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int 
action,
 switch (action) {
 case CURL_POLL_IN:
 aio_set_fd_handler(s->aio_context, fd, false,
-   curl_multi_read, NULL, state);
+   curl_multi_read, NULL, NULL, state);
 break;
 case CURL_POLL_OUT:
 aio_set_fd_handler(s->aio_context, fd, false,
-   NULL, curl_multi_do, state);
+   NULL, curl_multi_do, NULL, state);
 break;
 case CURL_POLL_INOUT:
 aio_set_fd_handler(s->aio_context, fd, false,
-   curl_multi_read, curl_multi_do, state);
+   curl_multi_read, curl_multi_do, NULL, state);
 break;
 case CURL_POLL_REMOVE:
 aio_set_fd_handler(s->aio_context, fd, false,
-   NULL, NULL, NULL);
+   NULL, NULL, NULL, NULL);
 break;
 }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 71bd523..76d0308 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -362,6 +362,7 @@ iscsi_set_events(IscsiLun *iscsilun)
false,
(ev & POLLIN) ? iscsi_process_read : NULL,
(ev & POLLOUT) ? iscsi_process_write : NULL,
+   NULL,
iscsilun);
 iscsilun->events = ev;
 }
@@ -1524,7 +1525,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs)
 IscsiLun *iscsilun = bs->opaque;
 
 aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi),
-   false, NULL, NULL, NULL);
+   false, NULL, NULL, NULL, NULL);
 iscsilun->events = 0;
 
 if (iscsilun->nop_timer) {
diff --git a/block/linux-aio.c b/block/

[Qemu-devel] [PATCH v2 4/4] linux-aio: poll ring for completions

2016-11-16 Thread Stefan Hajnoczi
The Linux AIO userspace ABI includes a ring that is shared with the
kernel.  This allows userspace programs to process completions without
system calls.

Add an AioContext poll handler to check for completions in the ring.

Signed-off-by: Stefan Hajnoczi 
---
 block/linux-aio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 69c4ed5..03ab741 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -255,6 +255,20 @@ static void qemu_laio_completion_cb(EventNotifier *e)
 }
 }
 
+static bool qemu_laio_poll_cb(void *opaque)
+{
+EventNotifier *e = opaque;
+LinuxAioState *s = container_of(e, LinuxAioState, e);
+struct io_event *events;
+
+if (!io_getevents_peek(s->ctx, &events)) {
+return false;
+}
+
+qemu_laio_process_completions_and_submit(s);
+return true;
+}
+
 static void laio_cancel(BlockAIOCB *blockacb)
 {
 struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb;
@@ -448,7 +462,8 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext 
*new_context)
 s->aio_context = new_context;
 s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
 aio_set_event_notifier(new_context, &s->e, false,
-   qemu_laio_completion_cb, NULL);
+   qemu_laio_completion_cb,
+   qemu_laio_poll_cb);
 }
 
 LinuxAioState *laio_init(void)
-- 
2.7.4




[Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext

2016-11-16 Thread Stefan Hajnoczi
The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file
descriptors or until a timer expires.  In cases like virtqueues, Linux
AIO, and ThreadPool it is technically possible to wait for events via
polling (i.e. continuously checking for events without blocking).

Polling can be faster than blocking syscalls because file descriptors,
the process scheduler, and system calls are bypassed.

The main disadvantage to polling is that it increases CPU utilization.
In classic polling configuration a full host CPU thread might run at
100% to respond to events as quickly as possible.  This patch implements
a timeout so we fall back to blocking syscalls if polling detects no
activity.  After the timeout no CPU cycles are wasted on polling until
the next event loop iteration.

This patch implements an experimental polling mode that can be
controlled with the QEMU_AIO_POLL_MAX_NS= environment
variable.  The aio_poll() event loop function will attempt to poll
instead of using blocking syscalls.

The run_poll_handlers_begin() and run_poll_handlers_end() trace events
are added to aid performance analysis and troubleshooting.  If you need
to know whether polling mode is being used, trace these events to find
out.

Signed-off-by: Stefan Hajnoczi 
---
 aio-posix.c | 107 +++-
 async.c |  11 +-
 include/block/aio.h |   3 ++
 trace-events|   4 ++
 4 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/aio-posix.c b/aio-posix.c
index 4379c13..5e5a561 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -18,6 +18,8 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 #include "qemu/sockets.h"
+#include "qemu/cutils.h"
+#include "trace.h"
 #ifdef CONFIG_EPOLL_CREATE1
 #include 
 #endif
@@ -27,12 +29,16 @@ struct AioHandler
 GPollFD pfd;
 IOHandler *io_read;
 IOHandler *io_write;
+AioPollFn *io_poll;
 int deleted;
 void *opaque;
 bool is_external;
 QLIST_ENTRY(AioHandler) node;
 };
 
+/* How long to poll AioPollHandlers before monitoring file descriptors */
+static int64_t aio_poll_max_ns;
+
 #ifdef CONFIG_EPOLL_CREATE1
 
 /* The fd number threashold to switch to epoll */
@@ -206,11 +212,12 @@ void aio_set_fd_handler(AioContext *ctx,
 AioHandler *node;
 bool is_new = false;
 bool deleted = false;
+int poll_disable_cnt = 0;
 
 node = find_aio_handler(ctx, fd);
 
 /* Are we deleting the fd handler? */
-if (!io_read && !io_write) {
+if (!io_read && !io_write && !io_poll) {
 if (node == NULL) {
 return;
 }
@@ -229,6 +236,10 @@ void aio_set_fd_handler(AioContext *ctx,
 QLIST_REMOVE(node, node);
 deleted = true;
 }
+
+if (!node->io_poll) {
+poll_disable_cnt = -1;
+}
 } else {
 if (node == NULL) {
 /* Alloc and insert if it's not already there */
@@ -238,10 +249,22 @@ void aio_set_fd_handler(AioContext *ctx,
 
 g_source_add_poll(&ctx->source, &node->pfd);
 is_new = true;
+
+if (!io_poll) {
+poll_disable_cnt = 1;
+}
+} else {
+if (!node->io_poll && io_poll) {
+poll_disable_cnt = -1;
+} else if (node->io_poll && !io_poll) {
+poll_disable_cnt = 1;
+}
 }
+
 /* Update handler with latest information */
 node->io_read = io_read;
 node->io_write = io_write;
+node->io_poll = io_poll;
 node->opaque = opaque;
 node->is_external = is_external;
 
@@ -251,6 +274,9 @@ void aio_set_fd_handler(AioContext *ctx,
 
 aio_epoll_update(ctx, node, is_new);
 aio_notify(ctx);
+
+ctx->poll_disable_cnt += poll_disable_cnt;
+
 if (deleted) {
 g_free(node);
 }
@@ -268,6 +294,7 @@ void aio_set_event_notifier(AioContext *ctx,
 
 bool aio_prepare(AioContext *ctx)
 {
+/* TODO run poll handlers? */
 return false;
 }
 
@@ -402,6 +429,56 @@ static void add_pollfd(AioHandler *node)
 npfd++;
 }
 
+/* run_poll_handlers:
+ * @ctx: the AioContext
+ * @max_ns: maximum time to poll for, in nanoseconds
+ *
+ * Polls for a given time.
+ *
+ * Note that ctx->notify_me must be non-zero so this function can detect
+ * aio_notify().
+ *
+ * Note that the caller must have incremented ctx->walking_handlers.
+ *
+ * Returns: true if progress was made, false otherwise
+ */
+static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
+{
+bool progress = false;
+int64_t end_time;
+
+assert(ctx->notify_me);
+assert(ctx->walking_handlers > 0);
+assert(ctx->poll_disable_cnt == 0);
+
+trace_run_poll_handlers_begin(ctx, max_ns);
+
+end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns;
+
+do {
+AioHandler *node;
+
+/* Bail if aio_notify() was called (e.g. BH was scheduled) */
+if (atomic_read(&ctx->notified)) {
+progres

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Laszlo Ersek
On 11/16/16 14:18, Michael S. Tsirkin wrote:
> On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote:
>>
>>> If the consensus is that the patch is a QEMU bugfix (as opposed to a
>>> feature) and that it is eligible for the currently supported upstream
>>> stable branches, that's the best, no doubt.
>>
>> The currently supported upstream stable branches is just 2.7. :)
>>
>> I'm okay with bending the rules and including it in 2.8, but it's
>> worrisome that you also needed to go back from relaxed to traditional
>> delivery, meaning that old QEMU + new OVMF will take ages to boot.
>>
>> If this is the case, I still think this needs some kind of discovery
>> mechanism, unless OVMF can just say "things were too broken, stop
>> supporting SMM on QEMUs older than 2.8".
>>
>> For example:
>>
>> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP
>> setting is used for the PCD; this would be backwards compatibility mode.
>>
>> - we could have another magic 0xB2 value, which is implemented directly
>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
>> to detect the new feature.  It can fail to start if using traditional
>> AP and the new feature is not there.
> 
> If we keep collecting these magic values, should architect it
> and do a host/guest bitmap like virtio does?

A feature bitmap is not a bad idea; I can modify my proposal to say,
'"etc/smi/features" is a little-endian uint64_t feature bitmap, where
bit #0 is the availability of broadcast SMIs. Request it by writing 'Q'
to STS before triggering an SMI via writing CNT'.

Another example where we use a feature bitmap is fw_cfg itself (the DMA
capability is signaled by bit 1).

However, feature *negotiation* is overkill, in my opinion.

> 
>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I
>> would make QEMU use broadcast behavior for all values in the 0x10-0xff
>> range, or something like that.
>>
>> Paolo
> 
> It bothers me with all these ideas is that it's PV.
> Unavoidable?

It seems so, yes -- as I understand it, the software-initiated SMI on
bare metal Q35 is meant to be broadcast unconditionally, but we had
diverged from that in our Q35 implementation, historically. SeaBIOS came
to rely on the unicast nature of QEMU's SMI (AIUI) and now we have to
invent a way to select the non-historical broadcast.

(

BTW, I foresee further Frankensteinization of Q35, as the maximum amount
of SMRAM (TSEG) it provides, by spec, is 8MB, and that might not be
enough for a very large VCPU count.

(The SMM stack was originally tested against 255 VCPUs, yes, but the
VCPU max continues to grow, plus edk2 developers keep adding SMM
features that require more SMRAM -- sometimes more SMRAM even per CPU.)

We have one unused bit pattern left in the TSEG_SZ bit field of the
ESMRAMC register, namely binary 11, which stands for "reserved". We
might want to commandeer that down the line, and associate a really
large SMRAM / TSEG size with it -- 128MB or 256MB, for example. Or, we
could use it to signal some other way for TSEG size configuration.

The TSEG is carved out of the end of the <4GB RAM, so larger TSEGs than
8MB should fit, as long as the guest is started with enough memory.

Anyway, I digress...

)

Thanks
Laszlo

> 
>>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The
>>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually
>>> correct; when I was writing the OVMF docs, I must have misunderstood the
>>> requirements and needlessly required 2.5+; 2.4+ should have been fine.)
>>>
>>> Which means the fix should be backported as far as stable-2.4.
>>>
>>> Should we proceed with that? CC'ing Mike Roth and the stable list.
>>>
>>> Thanks!
>>> Laszlo
>>>


>>
>> Paolo
>>
>>> ---
>>>  hw/isa/lpc_ich9.c | 12 +++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>> index 10d1ee8b9310..f2fe644fdaa4 100644
>>> --- a/hw/isa/lpc_ich9.c
>>> +++ b/hw/isa/lpc_ich9.c
>>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool
>>> smm_enabled)
>>>  
>>>  /* APM */
>>>  
>>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q'
>>> +
>>>  static void ich9_apm_ctrl_changed(uint32_t val, void *arg)
>>>  {
>>>  ICH9LPCState *lpc = arg;
>>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
>>> void *arg)
>>>  
>>>  /* SMI_EN = PMBASE + 30. SMI control and enable register */
>>>  if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
>>> -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>>> +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) {
>>> +CPUState *cs;
>>> +
>>> +CPU_FOREACH(cs) {
>>> +cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>>> +

[Qemu-devel] [PATCH v2 3/4] virtio: poll virtqueues for new buffers

2016-11-16 Thread Stefan Hajnoczi
Add an AioContext poll handler to detect new virtqueue buffers without
waiting for a guest->host notification.

Signed-off-by: Stefan Hajnoczi 
---
 hw/virtio/virtio.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 8985a2f..982ba85 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2015,13 +2015,27 @@ static void 
virtio_queue_host_notifier_aio_read(EventNotifier *n)
 }
 }
 
+static bool virtio_queue_host_notifier_aio_poll(void *opaque)
+{
+EventNotifier *n = opaque;
+VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
+
+if (virtio_queue_empty(vq)) {
+return false;
+}
+
+virtio_queue_notify_aio_vq(vq);
+return true;
+}
+
 void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx,
 VirtIOHandleOutput 
handle_output)
 {
 if (handle_output) {
 vq->handle_aio_output = handle_output;
 aio_set_event_notifier(ctx, &vq->host_notifier, true,
-   virtio_queue_host_notifier_aio_read, NULL);
+   virtio_queue_host_notifier_aio_read,
+   virtio_queue_host_notifier_aio_poll);
 } else {
 aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL);
 /* Test and clear notifier before after disabling event,
-- 
2.7.4




Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Laszlo Ersek
On 11/16/16 15:05, Paolo Bonzini wrote:
> 
> 
> On 16/11/2016 14:18, Michael S. Tsirkin wrote:
>>> - we could have another magic 0xB2 value, which is implemented directly
>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
>>> to detect the new feature.  It can fail to start if using traditional
>>> AP and the new feature is not there.
>>
>> If we keep collecting these magic values, should architect it
>> and do a host/guest bitmap like virtio does?
> 
> The value written in 0xB3 can certainly be a feature bitmap.  For now we
> would have for example
> 
> bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI
> bit 1-7   zero

Doable, but:
- doesn't address how OVMF learns about the broadcast SMI availability,
- the command value OVMF currently writes is 0.

How about this:
- etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0
stands for broadcast SMI availability
- 0xB2 is the command value (independent of 0xB3)
- 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS
reserves bit#0 already (uses values 0 and 1), so we can use the
remaining 7 bits for requesting features. Bit#1 (value 2) could be the
broadcast SMI.

This does resemble a kind of feature negotiation, except the host cannot
signal back an error (unsupported combination of features), like
virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags.

Thanks
Laszlo



[Qemu-devel] [PATCH 2/3] virtio: access ISR atomically

2016-11-16 Thread Paolo Bonzini
This will be needed once dataplane will be able to set it outside
the big QEMU lock.

Signed-off-by: Paolo Bonzini 
---
v1->v2: squash syntax error fix from patch 3 [Christian]

 hw/virtio/virtio-mmio.c |  6 +++---
 hw/virtio/virtio-pci.c  |  9 +++--
 hw/virtio/virtio.c  | 18 +-
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index a30270f..17412cb 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -191,7 +191,7 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr 
offset, unsigned size)
 return virtio_queue_get_addr(vdev, vdev->queue_sel)
 >> proxy->guest_page_shift;
 case VIRTIO_MMIO_INTERRUPTSTATUS:
-return vdev->isr;
+return atomic_read(&vdev->isr);
 case VIRTIO_MMIO_STATUS:
 return vdev->status;
 case VIRTIO_MMIO_HOSTFEATURESSEL:
@@ -299,7 +299,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_INTERRUPTACK:
-vdev->isr &= ~value;
+atomic_and(&vdev->isr, ~value);
 virtio_update_irq(vdev);
 break;
 case VIRTIO_MMIO_STATUS:
@@ -347,7 +347,7 @@ static void virtio_mmio_update_irq(DeviceState *opaque, 
uint16_t vector)
 if (!vdev) {
 return;
 }
-level = (vdev->isr != 0);
+level = (atomic_read(&vdev->isr) != 0);
 DPRINTF("virtio_mmio setting IRQ %d\n", level);
 qemu_set_irq(proxy->irq, level);
 }
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 97b32fe..521ba0b 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -73,7 +73,7 @@ static void virtio_pci_notify(DeviceState *d, uint16_t vector)
 msix_notify(&proxy->pci_dev, vector);
 else {
 VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
-pci_set_irq(&proxy->pci_dev, vdev->isr & 1);
+pci_set_irq(&proxy->pci_dev, atomic_read(&vdev->isr) & 1);
 }
 }
 
@@ -449,8 +449,7 @@ static uint32_t virtio_ioport_read(VirtIOPCIProxy *proxy, 
uint32_t addr)
 break;
 case VIRTIO_PCI_ISR:
 /* reading from the ISR also clears it. */
-ret = vdev->isr;
-vdev->isr = 0;
+ret = atomic_xchg(&vdev->isr, 0);
 pci_irq_deassert(&proxy->pci_dev);
 break;
 case VIRTIO_MSI_CONFIG_VECTOR:
@@ -1379,9 +1378,7 @@ static uint64_t virtio_pci_isr_read(void *opaque, hwaddr 
addr,
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
-uint64_t val = vdev->isr;
-
-vdev->isr = 0;
+uint64_t val = atomic_xchg(&vdev->isr, 0);
 pci_irq_deassert(&proxy->pci_dev);
 
 return val;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index b7d5828..ecf13bd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -945,7 +945,7 @@ void virtio_reset(void *opaque)
 vdev->guest_features = 0;
 vdev->queue_sel = 0;
 vdev->status = 0;
-vdev->isr = 0;
+atomic_set(&vdev->isr, 0);
 vdev->config_vector = VIRTIO_NO_VECTOR;
 virtio_notify_vector(vdev, vdev->config_vector);
 
@@ -1318,10 +1318,18 @@ void virtio_del_queue(VirtIODevice *vdev, int n)
 vdev->vq[n].vring.num_default = 0;
 }
 
+static void virtio_set_isr(VirtIODevice *vdev, int value)
+{
+uint8_t old = atomic_read(&vdev->isr);
+if ((old & value) != value) {
+atomic_or(&vdev->isr, value);
+}
+}
+
 void virtio_irq(VirtQueue *vq)
 {
 trace_virtio_irq(vq);
-vq->vdev->isr |= 0x01;
+virtio_set_isr(vq->vdev, 0x1);
 virtio_notify_vector(vq->vdev, vq->vector);
 }
 
@@ -1355,7 +1363,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq)
 }
 
 trace_virtio_notify(vdev, vq);
-vdev->isr |= 0x01;
+virtio_set_isr(vq->vdev, 0x1);
 virtio_notify_vector(vdev, vq->vector);
 }
 
@@ -1364,7 +1372,7 @@ void virtio_notify_config(VirtIODevice *vdev)
 if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))
 return;
 
-vdev->isr |= 0x03;
+virtio_set_isr(vdev, 0x3);
 vdev->generation++;
 virtio_notify_vector(vdev, vdev->config_vector);
 }
@@ -1895,7 +1903,7 @@ void virtio_init(VirtIODevice *vdev, const char *name,
 
 vdev->device_id = device_id;
 vdev->status = 0;
-vdev->isr = 0;
+atomic_set(&vdev->isr, 0);
 vdev->queue_sel = 0;
 vdev->config_vector = VIRTIO_NO_VECTOR;
 vdev->vq = g_malloc0(sizeof(VirtQueue) * VIRTIO_QUEUE_MAX);
-- 
2.9.3





[Qemu-devel] [PATCH 1/3] virtio: introduce grab/release_ioeventfd to fix vhost

2016-11-16 Thread Paolo Bonzini
Following the recent refactoring of virtio notifiers [1], more specifically
the patch ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to
start/stop ioeventfd") that uses virtio_bus_set_host_notifier [2]
by default, core virtio code requires 'ioeventfd_started' to be set
to true/false when the host notifiers are configured.

When vhost is stopped and started, however, there is a stop followed by
another start. Since ioeventfd_started was never set to true, the 'stop'
operation triggered by virtio_bus_set_host_notifier() will not result
in a call to virtio_pci_ioeventfd_assign(assign=false). This leaves
the memory regions with stale notifiers and results on the next start
triggering the following assertion:

  kvm_mem_ioeventfd_add: error adding ioeventfd: File exists
  Aborted

This patch reintroduces (hopefully in a cleaner way) the concept
that was present with ioeventfd_disabled before the refactoring.
When ioeventfd_grabbed>0, ioeventfd_started tracks whether ioeventfd
should be enabled or not, but ioeventfd is actually not started at
all until vhost releases the host notifiers.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07748.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07760.html

Reported-by: Felipe Franciosi 
Reported-by: Christian Borntraeger 
Reported-by: Alex Williamson 
Fixes: ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to start/stop 
ioeventfd")
Signed-off-by: Paolo Bonzini 
Message-Id: <2016192855.26350-1-pbonz...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
v1->v2: more comments [Cornelia]

 hw/virtio/vhost.c  | 14 +-
 hw/virtio/virtio-bus.c | 58 ++
 hw/virtio/virtio.c | 16 
 include/hw/virtio/virtio-bus.h | 14 ++
 include/hw/virtio/virtio.h |  2 ++
 5 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 30aee88..f7f7023 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1214,17 +1214,17 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
 int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev)
 {
 BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
-VirtioBusState *vbus = VIRTIO_BUS(qbus);
-VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
 int i, r, e;
 
-if (!k->ioeventfd_assign) {
+/* We will pass the notifiers to the kernel, make sure that QEMU
+ * doesn't interfere.
+ */
+r = virtio_device_grab_ioeventfd(vdev);
+if (r < 0) {
 error_report("binding does not support host notifiers");
-r = -ENOSYS;
 goto fail;
 }
 
-virtio_device_stop_ioeventfd(vdev);
 for (i = 0; i < hdev->nvqs; ++i) {
 r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i,
  true);
@@ -1244,7 +1244,7 @@ fail_vq:
 }
 assert (e >= 0);
 }
-virtio_device_start_ioeventfd(vdev);
+virtio_device_release_ioeventfd(vdev);
 fail:
 return r;
 }
@@ -1267,7 +1267,7 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 }
 assert (r >= 0);
 }
-virtio_device_start_ioeventfd(vdev);
+virtio_device_release_ioeventfd(vdev);
 }
 
 /* Test and clear event pending status.
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index bf61f66..d6c0c72 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -147,6 +147,39 @@ void virtio_bus_set_vdev_config(VirtioBusState *bus, 
uint8_t *config)
 }
 }
 
+/* On success, ioeventfd ownership belongs to the caller.  */
+int virtio_bus_grab_ioeventfd(VirtioBusState *bus)
+{
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+
+/* vhost can be used even if ioeventfd=off in the proxy device,
+ * so do not check k->ioeventfd_enabled.
+ */
+if (!k->ioeventfd_assign) {
+return -ENOSYS;
+}
+
+if (bus->ioeventfd_grabbed == 0 && bus->ioeventfd_started) {
+virtio_bus_stop_ioeventfd(bus);
+/* Remember that we need to restart ioeventfd
+ * when ioeventfd_grabbed becomes zero.
+ */
+bus->ioeventfd_started = true;
+}
+bus->ioeventfd_grabbed++;
+return 0;
+}
+
+void virtio_bus_release_ioeventfd(VirtioBusState *bus)
+{
+assert(bus->ioeventfd_grabbed != 0);
+if (--bus->ioeventfd_grabbed == 0 && bus->ioeventfd_started) {
+/* Force virtio_bus_start_ioeventfd to act.  */
+bus->ioeventfd_started = false;
+virtio_bus_start_ioeventfd(bus);
+}
+}
+
 int virtio_bus_start_ioeventfd(VirtioBusState *bus)
 {
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
@@ -161,10 +194,14 @@ int virtio_bus_start_ioeventfd(VirtioBusState *bus)
 if (bus->ioeventfd_started) {
 return 0;
 }
-r = vdc->start_ioeventfd(vdev);
-if (r < 0) {
-error_report("%s: failed. Fallback to userspace (slower).", __

Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Paolo Bonzini
> I guess that's what the next paragraph is about:
> 
> > - we could have another magic 0xB2 value, which is implemented directly
> > in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
> > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
> > to detect the new feature.  It can fail to start if using traditional
> > AP and the new feature is not there.
> 
> Please explain in more detail. If I write to 0xB2 (by invoking the
> Trigger() method or somehow else), then on old QEMU's that will raise a
> sync / unicast SMI. The SMI handler in edk2 will run, but no request
> parameters will have been set up by OVMF, so the SMI handler will do...
> no clue what.

It should hopefully do nothing.  A spurious SMI (such as the one caused
by the write to 0xB2) should not crash OVMF.

SMBASE relocation uses IPIs, so my hope was to use the
SmmCpuFeaturesSmmRelocationComplete hook.

> My preference is fw_cfg ATM. It provides a prove, flexible and
> extensible interface (it's easy to add new files for future features).
> If we expect more knobs in the area, I can modify my proposal to use
> "etc/smi/broadcast", so we can add "etc/smi/" later.

Did you know there are 16 entries only for fw_cfg files? :)  And we're
using already 20 in the worst case:

genroms/linuxboot.bin
genroms/kvmvapic.bin
NVDIMM_DSM_MEM_FILE
"etc/smbios/smbios-tables"
"etc/smbios/smbios-anchor"
"etc/acpi/tables"
"etc/table-loader"
ACPI_BUILD_TPMLOG_FILE
ACPI_BUILD_RSDP_FILE
"etc/e820"
"etc/msr_feature_control"
"etc/reserved-memory-end"
"etc/pvpanic-port"
"etc/boot-menu-wait"
"bootsplash.jpg"
"etc/boot-fail-wait"
"etc/igd-opregion"
"etc/igd-bdsm-size"
"etc/extra-pci-roots"
"bootorder"

Therefore, so close to the release I'm a bit worried about doing
changes to fw_cfg or adding more fw_cfg files.  Though we just got
rid of one file for the number of CPUs, so I guess we might not care.

> Do you have any specific arguments against fw_cfg? As I suggested in my
> previous email, with fw_cfg I can implement the change in OVMF such that
> the default behavior wouldn't change -- the default delivery would
> remain relaxed, and the broadcast wouldn't be requested, unless the
> fw_cfg file told OVMF otherwise.
> 
> > By the way, in case OVMF needs to use SmmSwDispatch in the future, I
> > would make QEMU use broadcast behavior for all values in the 0x10-0xff
> > range, or something like that.
> 
> Are we talking control/command (0xB2) or scratch/data (0xB3) register
> values? My patches currently use the scratch/data register to provide
> the hint to QEMU; that register is less likely to interfere with
> anything the SMM core in edk2 does.

Sorry I confused the two registers.  0xb3 is more or less unused as far
as I can see indeed.

Paolo



Re: [Qemu-devel] [PATCH v14 1/2] virtio-crypto: Add virtio crypto device specification

2016-11-16 Thread Halil Pasic


On 11/11/2016 10:23 AM, Gonglei wrote:
> The virtio crypto device is a virtual crypto device (ie. hardware
> crypto accelerator card). Currently, the virtio crypto device provides
> the following crypto services: CIPHER, MAC, HASH, and AEAD.
> 
> In this patch, CIPHER, MAC, HASH, AEAD services are introduced.
> 
> VIRTIO-153
> 
> Signed-off-by: Gonglei 
> CC: Michael S. Tsirkin 
> CC: Cornelia Huck 
> CC: Stefan Hajnoczi 
> CC: Lingli Deng 
> CC: Jani Kokkonen 
> CC: Ola Liljedahl 
> CC: Varun Sethi 
> CC: Zeng Xin 
> CC: Keating Brian 
> CC: Ma Liang J 
> CC: Griffin John 
> CC: Hanweidong 
> CC: Mihai Claudiu Caraman 
> ---
>  content.tex   |   2 +
>  virtio-crypto.tex | 945 
> ++
>  2 files changed, 947 insertions(+)
>  create mode 100644 virtio-crypto.tex
> 
> diff --git a/content.tex b/content.tex
> index 4b45678..ab75f78 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual},
>  \field{status_qualifier}, \field{status}, \field{response} and
>  \field{sense} fields.
> 
> +\input{virtio-crypto.tex}
> +
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> 
>  Currently there are three device-independent feature bits defined:
> diff --git a/virtio-crypto.tex b/virtio-crypto.tex
> new file mode 100644
> index 000..9f7faf0
> --- /dev/null
> +++ b/virtio-crypto.tex
> @@ -0,0 +1,945 @@
> +\section{Crypto Device}\label{sec:Device Types / Crypto Device}
> +
> +The virtio crypto device is a virtual cryptography device as well as a kind 
> of
> +virtual hardware accelerator for virtual machines. The encryption and
> +decryption requests are placed in the data queue and are ultimately handled 
> by the
 ~~
The data queue can be misleading since its rather any of the data active
queues.

> +backend crypto accelerators. The second queue is the control queue used to 
> create 

This could be confusing since it is a second type or kind of queue but
not necessarily the queue with index 1. 
 
> +or destroy sessions for symmetric algorithms and will control some advanced
> +features in the future. The virtio crypto device provides the following 
> crypto

Promising future advanced features seems to be out of scope for this
specification.

> +services: CIPHER, MAC, HASH, and AEAD.
> +
> +
> +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID}
> +
> +20
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] dataq1
> +\item[\ldots]
> +\item[N-1] dataqN
> +\item[N] controlq
> +\end{description}
> +
> +N is set by \field{max_dataqueues}.
> +
> +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature 
> bits}
> +
> +Undefined currently.

Could use "None currently defined." like entropy device.

> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Crypto 
> Device / Device configuration layout}
> +
> +The following driver-read-only configuration fields are defined:
> +
> +\begin{lstlisting}
> +struct virtio_crypto_config {
> +le32 status;
> +le32 max_dataqueues;
> +le32 crypto_services;
> +/* Detailed algorithms mask */
> +le32 cipher_algo_l;
> +le32 cipher_algo_h;
> +le32 hash_algo;
> +le32 mac_algo_l;
> +le32 mac_algo_h;
> +le32 aead_algo;
> +/* Maximum length of cipher key */
> +le32 max_cipher_key_len;
> +/* Maximum length of authenticated key */
> +le32 max_auth_key_len;
> +le32 reserve;
> +/* Maximum size of each crypto request's content */
> +le64 max_size;
> +};
> +\end{lstlisting}
> +
> +The value of the \field{status} field is VIRTIO_CRYPTO_S_HW_READY or 
> VIRTIO_CRYPTO_S_STARTED.
> +
> +\begin{lstlisting}
> +#define VIRTIO_CRYPTO_S_HW_READY  (1 << 0)
> +#define VIRTIO_CRYPTO_S_STARTED  (1 << 1)
> +\end{lstlisting}
> +

Could not really figure out what this status actually does and how does
it relate to the device status field if at all.

Furthermore I see no mention of VIRTIO_CRYPTO_S_STARTED except for this
one, so the only thing I can think of is that it's the initial value and
means hardware not ready (you state these are the only two values).

This however does not seem consistent with what your QEMU reference
implementation does. Another thing is your implementations seem to
use VIRTIO_CRYPTO_S_HW_READY as flag but your specification would
(prohibit combining flags because you get another value).

There are more comments on this topic below.

> +The following driver-read-only fields include \field{max_dataqueues}, which 
> specifies the
> +maximum number of data virtqueues (dataq1\ldots dataqN), and 
> \field{crypto_services},
> +which indicates the crypto services the virtio crypto supports.
> +
> +The following services are defined:
> +
> +\begin{lstlisting}
> +/* CIPH

Re: [Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext

2016-11-16 Thread Paolo Bonzini


On 16/11/2016 18:47, Stefan Hajnoczi wrote:
> The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file
> descriptors or until a timer expires.  In cases like virtqueues, Linux
> AIO, and ThreadPool it is technically possible to wait for events via
> polling (i.e. continuously checking for events without blocking).
> 
> Polling can be faster than blocking syscalls because file descriptors,
> the process scheduler, and system calls are bypassed.
> 
> The main disadvantage to polling is that it increases CPU utilization.
> In classic polling configuration a full host CPU thread might run at
> 100% to respond to events as quickly as possible.  This patch implements
> a timeout so we fall back to blocking syscalls if polling detects no
> activity.  After the timeout no CPU cycles are wasted on polling until
> the next event loop iteration.
> 
> This patch implements an experimental polling mode that can be
> controlled with the QEMU_AIO_POLL_MAX_NS= environment
> variable.  The aio_poll() event loop function will attempt to poll
> instead of using blocking syscalls.
> 
> The run_poll_handlers_begin() and run_poll_handlers_end() trace events
> are added to aid performance analysis and troubleshooting.  If you need
> to know whether polling mode is being used, trace these events to find
> out.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  aio-posix.c | 107 
> +++-
>  async.c |  11 +-
>  include/block/aio.h |   3 ++
>  trace-events|   4 ++
>  4 files changed, 123 insertions(+), 2 deletions(-)

Nice!


> diff --git a/aio-posix.c b/aio-posix.c
> index 4379c13..5e5a561 100644
> --- a/aio-posix.c
> +++ b/aio-posix.c
> @@ -18,6 +18,8 @@
>  #include "block/block.h"
>  #include "qemu/queue.h"
>  #include "qemu/sockets.h"
> +#include "qemu/cutils.h"
> +#include "trace.h"
>  #ifdef CONFIG_EPOLL_CREATE1
>  #include 
>  #endif
> @@ -27,12 +29,16 @@ struct AioHandler
>  GPollFD pfd;
>  IOHandler *io_read;
>  IOHandler *io_write;
> +AioPollFn *io_poll;
>  int deleted;
>  void *opaque;
>  bool is_external;
>  QLIST_ENTRY(AioHandler) node;
>  };
>  
> +/* How long to poll AioPollHandlers before monitoring file descriptors */
> +static int64_t aio_poll_max_ns;
> +
>  #ifdef CONFIG_EPOLL_CREATE1
>  
>  /* The fd number threashold to switch to epoll */
> @@ -206,11 +212,12 @@ void aio_set_fd_handler(AioContext *ctx,
>  AioHandler *node;
>  bool is_new = false;
>  bool deleted = false;
> +int poll_disable_cnt = 0;

poll_disable_cnt = !io_poll - !node->io_poll

?  Not the most readable thing, but effective...

>  node = find_aio_handler(ctx, fd);
>  
>  /* Are we deleting the fd handler? */
> -if (!io_read && !io_write) {
> +if (!io_read && !io_write && !io_poll) {
>  if (node == NULL) {
>  return;
>  }
> @@ -229,6 +236,10 @@ void aio_set_fd_handler(AioContext *ctx,
>  QLIST_REMOVE(node, node);
>  deleted = true;
>  }
> +
> +if (!node->io_poll) {
> +poll_disable_cnt = -1;
> +}
>  } else {
>  if (node == NULL) {
>  /* Alloc and insert if it's not already there */
> @@ -238,10 +249,22 @@ void aio_set_fd_handler(AioContext *ctx,
>  
>  g_source_add_poll(&ctx->source, &node->pfd);
>  is_new = true;
> +
> +if (!io_poll) {
> +poll_disable_cnt = 1;
> +}
> +} else {
> +if (!node->io_poll && io_poll) {
> +poll_disable_cnt = -1;
> +} else if (node->io_poll && !io_poll) {
> +poll_disable_cnt = 1;
> +}
>  }
> +
>  /* Update handler with latest information */
>  node->io_read = io_read;
>  node->io_write = io_write;
> +node->io_poll = io_poll;
>  node->opaque = opaque;
>  node->is_external = is_external;
>  
> @@ -251,6 +274,9 @@ void aio_set_fd_handler(AioContext *ctx,
>  
>  aio_epoll_update(ctx, node, is_new);
>  aio_notify(ctx);
> +
> +ctx->poll_disable_cnt += poll_disable_cnt;
> +
>  if (deleted) {
>  g_free(node);
>  }
> @@ -268,6 +294,7 @@ void aio_set_event_notifier(AioContext *ctx,
>  
>  bool aio_prepare(AioContext *ctx)
>  {
> +/* TODO run poll handlers? */
>  return false;
>  }
>  
> @@ -402,6 +429,56 @@ static void add_pollfd(AioHandler *node)
>  npfd++;
>  }
>  
> +/* run_poll_handlers:
> + * @ctx: the AioContext
> + * @max_ns: maximum time to poll for, in nanoseconds
> + *
> + * Polls for a given time.
> + *
> + * Note that ctx->notify_me must be non-zero so this function can detect
> + * aio_notify().
> + *
> + * Note that the caller must have incremented ctx->walking_handlers.
> + *
> + * Returns: true if progress was made, false otherwise
> + */
> +static bool run_poll_handlers(AioContext *ctx, int64_t max_ns)
> +{
> +bool p

[Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications

2016-11-16 Thread Paolo Bonzini
Dataplane has been omitting forever the step of setting ISR when
an interrupt is raised.  This caused little breakage, because the
specification actually says that ISR may not be updated in MSI mode.

Some versions of the Windows drivers however didn't clear MSI mode
correctly, and proceeded using polling mode (using ISR, not the used
ring index!) for crashdump and hibernation.  If it were just crashdump
and hibernation it would not be a big deal, but recent releases of
Windows do not really shut down, but rather log out and hibernate to
make the next startup faster.  Hence, this manifested as a more serious
hang during shutdown with e.g. Windows 8.1 and virtio-win 1.8.0 RPMs.
Newer versions fixed this, while older versions do not use MSI at all.

The failure has always been there for virtio dataplane, but it became
visible after commits 9ffe337 ("virtio-blk: always use dataplane path
if ioeventfd is active", 2016-10-30) and ad07cd6 ("virtio-scsi: always
use dataplane path if ioeventfd is active", 2016-10-30) made virtio-blk
and virtio-scsi always use the dataplane code under KVM.  The good news
therefore is that it was not a bug in the patches---they were doing
exactly what they were meant for, i.e. shake out remaining dataplane bugs.

The fix is not hard, so it's worth arranging for the broken drivers.
The virtio_should_notify+event_notifier_set pair that is common to
virtio-blk and virtio-scsi dataplane is replaced with a new public
function virtio_notify_irqfd that also sets ISR.  The irqfd emulation
code now need not set ISR anymore, so virtio_irq is removed.

Signed-off-by: Paolo Bonzini 
---
 hw/block/dataplane/virtio-blk.c |  4 +---
 hw/scsi/virtio-scsi-dataplane.c |  7 ---
 hw/scsi/virtio-scsi.c   |  2 +-
 hw/virtio/trace-events  |  2 +-
 hw/virtio/virtio.c  | 20 
 include/hw/virtio/virtio-scsi.h |  1 -
 include/hw/virtio/virtio.h  |  2 +-
 7 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 90ef557..d1f9f63 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -68,9 +68,7 @@ static void notify_guest_bh(void *opaque)
 unsigned i = j + ctzl(bits);
 VirtQueue *vq = virtio_get_queue(s->vdev, i);
 
-if (virtio_should_notify(s->vdev, vq)) {
-event_notifier_set(virtio_queue_get_guest_notifier(vq));
-}
+virtio_notify_irqfd(s->vdev, vq);
 
 bits &= bits - 1; /* clear right-most bit */
 }
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index f2ea29d..6b8d0f0 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -95,13 +95,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue 
*vq, int n,
 return 0;
 }
 
-void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq *req)
-{
-if (virtio_should_notify(vdev, req->vq)) {
-event_notifier_set(virtio_queue_get_guest_notifier(req->vq));
-}
-}
-
 /* assumes s->ctx held */
 static void virtio_scsi_clear_aio(VirtIOSCSI *s)
 {
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 3e5ae6a..10fd687 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -69,7 +69,7 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 qemu_iovec_from_buf(&req->resp_iov, 0, &req->resp, req->resp_size);
 virtqueue_push(vq, &req->elem, req->qsgl.size + req->resp_iov.size);
 if (s->dataplane_started && !s->dataplane_fenced) {
-virtio_scsi_dataplane_notify(vdev, req);
+virtio_notify_irqfd(vdev, vq);
 } else {
 virtio_notify(vdev, vq);
 }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 8756cef..7b6f55e 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -5,7 +5,7 @@ virtqueue_fill(void *vq, const void *elem, unsigned int len, 
unsigned int idx) "
 virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
 virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) 
"vq %p elem %p in_num %u out_num %u"
 virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
-virtio_irq(void *vq) "vq %p"
+virtio_notify_irqfd(void *vdev, void *vq) "vdev %p vq %p"
 virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
 virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ecf13bd..860ebdb 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1326,13 +1326,6 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
 }
 }
 
-void virtio_irq(VirtQueue *vq)
-{
-trace_virtio_irq(vq);
-virtio_set_isr(vq->vdev, 0x1);
-virtio_notify_vector(vq->vdev, vq->vector);
-}
-
 bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 uint16_t old, new;
@@ -1356,6 +1349,17 @@ bool virtio_should_notify(VirtIODevice *vdev, VirtQ

[Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes

2016-11-16 Thread Paolo Bonzini
Patch 1 fixes vhost, patches 2-3 fix Windows hibernation.

Paolo

v1->v2: more comments [Cornelia]
squash syntax error fix from patch 3 into patch 2 [Christian]

Paolo Bonzini (3):
  virtio: introduce grab/release_ioeventfd to fix vhost
  virtio: access ISR atomically
  virtio: set ISR on dataplane notifications

 hw/block/dataplane/virtio-blk.c |  4 +--
 hw/scsi/virtio-scsi-dataplane.c |  7 -
 hw/scsi/virtio-scsi.c   |  2 +-
 hw/virtio/trace-events  |  2 +-
 hw/virtio/vhost.c   | 14 +-
 hw/virtio/virtio-bus.c  | 58 +
 hw/virtio/virtio-mmio.c |  6 ++---
 hw/virtio/virtio-pci.c  |  9 +++
 hw/virtio/virtio.c  | 46 +---
 include/hw/virtio/virtio-bus.h  | 14 ++
 include/hw/virtio/virtio-scsi.h |  1 -
 include/hw/virtio/virtio.h  |  4 ++-
 12 files changed, 117 insertions(+), 50 deletions(-)

-- 
2.9.3




[Qemu-devel] [PATCH for-2.9] qmp: Report QOM type name on query-cpu-definitions

2016-11-16 Thread Eduardo Habkost
The new typename attribute on query-cpu-definitions will be used
to help management software use device-list-properties to check
which properties can be set using -cpu or -global for the CPU
model.

Signed-off-by: Eduardo Habkost 
---
 qapi-schema.json| 4 +++-
 target-arm/helper.c | 1 +
 target-i386/cpu.c   | 1 +
 target-ppc/translate_init.c | 1 +
 target-s390x/cpu_models.c   | 1 +
 5 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/qapi-schema.json b/qapi-schema.json
index b0b4bf6..9a3bdd4 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3216,6 +3216,8 @@
 # @unavailable-features: #optional List of properties that prevent
 #the CPU model from running in the current
 #host. (since 2.8)
+# @typename: Type name that can be used as argument to @device-list-properties,
+#to introspect properties configurable using -cpu or -global.
 #
 # @unavailable-features is a list of QOM property names that
 # represent CPU model attributes that prevent the CPU from running.
@@ -3237,7 +3239,7 @@
 ##
 { 'struct': 'CpuDefinitionInfo',
   'data': { 'name': 'str', '*migration-safe': 'bool', 'static': 'bool',
-'*unavailable-features': [ 'str' ] } }
+'*unavailable-features': [ 'str' ], 'typename': 'str' } }
 
 ##
 # @query-cpu-definitions:
diff --git a/target-arm/helper.c b/target-arm/helper.c
index b5b65ca..3fc01b5 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -5207,6 +5207,7 @@ static void arm_cpu_add_definition(gpointer data, 
gpointer user_data)
 info = g_malloc0(sizeof(*info));
 info->name = g_strndup(typename,
strlen(typename) - strlen("-" TYPE_ARM_CPU));
+info->q_typename = g_strdup(typename);
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 6eec5dc..725f6cb 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2239,6 +2239,7 @@ static void x86_cpu_definition_entry(gpointer data, 
gpointer user_data)
 info->name = x86_cpu_class_get_model_name(cc);
 x86_cpu_class_check_missing_features(cc, &info->unavailable_features);
 info->has_unavailable_features = true;
+info->q_typename = g_strdup(object_class_get_name(oc));
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 208fa1e..42b9274 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -10305,6 +10305,7 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error 
**errp)
 
 info = g_malloc0(sizeof(*info));
 info->name = g_strdup(alias->alias);
+info->q_typename = g_strdup(object_class_get_name(oc));
 
 entry = g_malloc0(sizeof(*entry));
 entry->value = info;
diff --git a/target-s390x/cpu_models.c b/target-s390x/cpu_models.c
index c1e729d..5b66d33 100644
--- a/target-s390x/cpu_models.c
+++ b/target-s390x/cpu_models.c
@@ -290,6 +290,7 @@ static void create_cpu_model_list(ObjectClass *klass, void 
*opaque)
 info->has_migration_safe = true;
 info->migration_safe = scc->is_migration_safe;
 info->q_static = scc->is_static;
+info->q_typename = g_strdup(object_class_get_name(klass));
 
 
 entry = g_malloc0(sizeof(*entry));
-- 
2.7.4




Re: [Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext

2016-11-16 Thread Paolo Bonzini


On 16/11/2016 18:47, Stefan Hajnoczi wrote:
> +if (max_ns && run_poll_handlers(ctx, max_ns)) {
> +atomic_sub(&ctx->notify_me, 2);
> +blocking = false; /* poll again, don't block */

You don't need to poll---you only need to run bottom halves and timers.

Paolo

> +progress = true;
> +}
> +}



Re: [Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode

2016-11-16 Thread no-reply
Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Subject: [Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode
Message-id: 1479318422-10979-1-git-send-email-stefa...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
7f175bc linux-aio: poll ring for completions
937de16 virtio: poll virtqueues for new buffers
3d0f4c1 aio: add polling mode to AioContext
3e75e2a aio: add AioPollFn and io_poll() interface

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-r21_4ojm/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=1956184f8abf
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
COLO support  yes
RDMA support  no
TCG interpreter   no
fdt suppor

Re: [Qemu-devel] [PATCH for-2.9] qmp: Report QOM type name on query-cpu-definitions

2016-11-16 Thread Eric Blake
On 11/16/2016 12:21 PM, Eduardo Habkost wrote:
> The new typename attribute on query-cpu-definitions will be used
> to help management software use device-list-properties to check
> which properties can be set using -cpu or -global for the CPU
> model.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  qapi-schema.json| 4 +++-
>  target-arm/helper.c | 1 +
>  target-i386/cpu.c   | 1 +
>  target-ppc/translate_init.c | 1 +
>  target-s390x/cpu_models.c   | 1 +
>  5 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index b0b4bf6..9a3bdd4 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -3216,6 +3216,8 @@
>  # @unavailable-features: #optional List of properties that prevent
>  #the CPU model from running in the current
>  #host. (since 2.8)
> +# @typename: Type name that can be used as argument to 
> @device-list-properties,
> +#to introspect properties configurable using -cpu or -global.

Missing a '(since 2.9)' designation.

>  #
>  # @unavailable-features is a list of QOM property names that
>  # represent CPU model attributes that prevent the CPU from running.
> @@ -3237,7 +3239,7 @@
>  ##
>  { 'struct': 'CpuDefinitionInfo',
>'data': { 'name': 'str', '*migration-safe': 'bool', 'static': 'bool',
> -'*unavailable-features': [ 'str' ] } }
> +'*unavailable-features': [ 'str' ], 'typename': 'str' } }
>  

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'

2016-11-16 Thread Laszlo Ersek
On 11/16/16 19:04, Paolo Bonzini wrote:
>> I guess that's what the next paragraph is about:
>>
>>> - we could have another magic 0xB2 value, which is implemented directly
>>> in QEMU and sets 0xB3 to a magic value.  Then OVMF can invoke it
>>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs)
>>> to detect the new feature.  It can fail to start if using traditional
>>> AP and the new feature is not there.
>>
>> Please explain in more detail. If I write to 0xB2 (by invoking the
>> Trigger() method or somehow else), then on old QEMU's that will raise a
>> sync / unicast SMI. The SMI handler in edk2 will run, but no request
>> parameters will have been set up by OVMF, so the SMI handler will do...
>> no clue what.
> 
> It should hopefully do nothing.  A spurious SMI (such as the one caused
> by the write to 0xB2) should not crash OVMF.
> 
> SMBASE relocation uses IPIs, so my hope was to use the
> SmmCpuFeaturesSmmRelocationComplete hook.

>From a cursory look, SmmCpuFeaturesSmmRelocationComplete() seems to be
called early enough from PiSmmCpuDxeSmm that we might be able to call
PcdSet() from it, for updating PcdCpuSmmApSyncTimeout and
PcdCpuSmmSyncMode. I perceive it a bit too close to the edge :)

>> My preference is fw_cfg ATM. It provides a prove, flexible and
>> extensible interface (it's easy to add new files for future features).
>> If we expect more knobs in the area, I can modify my proposal to use
>> "etc/smi/broadcast", so we can add "etc/smi/" later.
> 
> Did you know there are 16 entries only for fw_cfg files? :)

Yes, I've known that, but it can be changed by redefining
FW_CFG_FILE_SLOTS, can't it? The key type for fw_cfg is uint16_t, so we
should have some reserves.

> And we're
> using already 20 in the worst case:
> 
> genroms/linuxboot.bin
> genroms/kvmvapic.bin
> NVDIMM_DSM_MEM_FILE
> "etc/smbios/smbios-tables"
> "etc/smbios/smbios-anchor"
> "etc/acpi/tables"
> "etc/table-loader"
> ACPI_BUILD_TPMLOG_FILE
> ACPI_BUILD_RSDP_FILE
> "etc/e820"
> "etc/msr_feature_control"
> "etc/reserved-memory-end"
> "etc/pvpanic-port"
> "etc/boot-menu-wait"
> "bootsplash.jpg"
> "etc/boot-fail-wait"
> "etc/igd-opregion"
> "etc/igd-bdsm-size"
> "etc/extra-pci-roots"
> "bootorder"
> 
> Therefore, so close to the release I'm a bit worried about doing
> changes to fw_cfg or adding more fw_cfg files.  Though we just got
> rid of one file for the number of CPUs, so I guess we might not care.

I agree with your caution about this. I'm also perfectly fine if this
update misses 2.8. :)

> 
>> Do you have any specific arguments against fw_cfg? As I suggested in my
>> previous email, with fw_cfg I can implement the change in OVMF such that
>> the default behavior wouldn't change -- the default delivery would
>> remain relaxed, and the broadcast wouldn't be requested, unless the
>> fw_cfg file told OVMF otherwise.
>>
>>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I
>>> would make QEMU use broadcast behavior for all values in the 0x10-0xff
>>> range, or something like that.
>>
>> Are we talking control/command (0xB2) or scratch/data (0xB3) register
>> values? My patches currently use the scratch/data register to provide
>> the hint to QEMU; that register is less likely to interfere with
>> anything the SMM core in edk2 does.
> 
> Sorry I confused the two registers.  0xb3 is more or less unused as far
> as I can see indeed.

Thanks
Laszlo




Re: [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes

2016-11-16 Thread no-reply
Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Type: series
Subject: [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes
Message-id: 20161116180551.9611-1-pbonz...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4476079 virtio: set ISR on dataplane notifications
f45efd4 virtio: access ISR atomically
9fd4e4a virtio: introduce grab/release_ioeventfd to fix vhost

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-5tzxa5rp/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=4fce3ac805f5
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
COLO support  yes
RDMA support  no
TCG interpreter   no
fdt support   yes
preadv supportyes
fdatasync yes
madvise   

[Qemu-devel] [PATCH for-2.9 0/2] qom, qdev: Cleanup release functions

2016-11-16 Thread Eduardo Habkost
While working on the qdev class properteis series, I've noticed
that the release function for class properties is never called,
and have unclear semantics (should it be called when the object
is destroyed, or when the class is destroyed?). Patch 1/1 removes
the unused feature.

Patch 2/2 changes the function signature of qdev property release
functions to make their implementations simpler and safer, and
make them not depend on the way property release functions are
implemented (so the functions don't need to be rewritten if we
change qdev to use class properties).

Eduardo Habkost (2):
  qom: Remove release function from class properties
  qdev: Change signature of PropertyInfo::release

 backends/hostmem.c   |  4 ++--
 hw/core/machine.c|  6 +++---
 hw/core/qdev-properties-system.c |  8 ++--
 hw/core/qdev-properties.c| 10 +-
 hw/core/qdev.c   | 10 +-
 hw/i386/pc.c |  8 
 hw/ppc/pnv.c |  2 +-
 include/hw/qdev-core.h   |  2 +-
 include/qom/object.h |  1 -
 qom/object.c | 14 --
 10 files changed, 31 insertions(+), 34 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH for-2.9 1/2] qom: Remove release function from class properties

2016-11-16 Thread Eduardo Habkost
The release functions are never called for class properties, and
their semantics aren't even defined clearly (should the release
function be called when an instance is destroyed, or when a class
is destroyed?). Remove the unused functionality.

Signed-off-by: Eduardo Habkost 
---
 backends/hostmem.c   |  4 ++--
 hw/core/machine.c|  6 +++---
 hw/i386/pc.c |  8 
 hw/ppc/pnv.c |  2 +-
 include/qom/object.h |  1 -
 qom/object.c | 14 --
 6 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/backends/hostmem.c b/backends/hostmem.c
index 4256d24..856e96e 100644
--- a/backends/hostmem.c
+++ b/backends/hostmem.c
@@ -368,11 +368,11 @@ host_memory_backend_class_init(ObjectClass *oc, void 
*data)
 object_class_property_add(oc, "size", "int",
 host_memory_backend_get_size,
 host_memory_backend_set_size,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_add(oc, "host-nodes", "int",
 host_memory_backend_get_host_nodes,
 host_memory_backend_set_host_nodes,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_add_enum(oc, "policy", "HostMemPolicy",
 HostMemPolicy_lookup,
 host_memory_backend_get_policy,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index b0fd91f..c64e5f1 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -372,13 +372,13 @@ static void machine_class_init(ObjectClass *oc, void 
*data)
 
 object_class_property_add(oc, "kernel-irqchip", "OnOffSplit",
 NULL, machine_set_kernel_irqchip,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_set_description(oc, "kernel-irqchip",
 "Configure KVM in-kernel irqchip", &error_abort);
 
 object_class_property_add(oc, "kvm-shadow-mem", "int",
 machine_get_kvm_shadow_mem, machine_set_kvm_shadow_mem,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_set_description(oc, "kvm-shadow-mem",
 "KVM shadow MMU size", &error_abort);
 
@@ -409,7 +409,7 @@ static void machine_class_init(ObjectClass *oc, void *data)
 
 object_class_property_add(oc, "phandle-start", "int",
 machine_get_phandle_start, machine_set_phandle_start,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_set_description(oc, "phandle-start",
 "The first phandle ID we may generate dynamically", &error_abort);
 
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index a9b1950..46f95bf 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -2308,24 +2308,24 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 
 object_class_property_add(oc, PC_MACHINE_MEMHP_REGION_SIZE, "int",
 pc_machine_get_hotplug_memory_region_size, NULL,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 
 object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size",
 pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 
 object_class_property_set_description(oc, PC_MACHINE_MAX_RAM_BELOW_4G,
 "Maximum ram below the 4G boundary (32bit boundary)", &error_abort);
 
 object_class_property_add(oc, PC_MACHINE_SMM, "OnOffAuto",
 pc_machine_get_smm, pc_machine_set_smm,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_set_description(oc, PC_MACHINE_SMM,
 "Enable SMM (pc & q35)", &error_abort);
 
 object_class_property_add(oc, PC_MACHINE_VMPORT, "OnOffAuto",
 pc_machine_get_vmport, pc_machine_set_vmport,
-NULL, NULL, &error_abort);
+NULL, &error_abort);
 object_class_property_set_description(oc, PC_MACHINE_VMPORT,
 "Enable vmport (pc & q35)", &error_abort);
 
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 9df7b25..3fb68c3 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -777,7 +777,7 @@ static void powernv_machine_class_props_init(ObjectClass 
*oc)
 {
 object_class_property_add(oc, "num-chips", "uint32_t",
   pnv_get_num_chips, pnv_set_num_chips,
-  NULL, NULL, NULL);
+  NULL, NULL);
 object_class_property_set_description(oc, "num-chips",
   "Specifies the number of processor chips",
   NULL);
diff --git a/include/qom/object.h b/include/qom/object.h
index 5ecc2d1..fbf9df2 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -945,7 +945,6 @@ ObjectProperty *object_class_property_add(ObjectClass 
*klass, const char *name,
   const char *type,
   ObjectPropertyAccessor *get,
   ObjectPropertyAccessor *set,
-  ObjectPropertyR

[Qemu-devel] [PATCH for-2.9 2/2] qdev: Change signature of PropertyInfo::release

2016-11-16 Thread Eduardo Habkost
Change the function signature to make implementations simpler and
safer. No void pointers and Object->DeviceState casts inside each
release function.

Signed-off-by: Eduardo Habkost 
---
 hw/core/qdev-properties-system.c |  8 ++--
 hw/core/qdev-properties.c| 10 +-
 hw/core/qdev.c   | 10 +-
 include/hw/qdev-core.h   |  2 +-
 4 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 1b7ea50..4f49109 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -112,10 +112,8 @@ fail:
 }
 }
 
-static void release_drive(Object *obj, const char *name, void *opaque)
+static void release_drive(DeviceState *dev, Property *prop)
 {
-DeviceState *dev = DEVICE(obj);
-Property *prop = opaque;
 BlockBackend **ptr = qdev_get_prop_ptr(dev, prop);
 
 if (*ptr) {
@@ -210,10 +208,8 @@ static void set_chr(Object *obj, Visitor *v, const char 
*name, void *opaque,
 g_free(str);
 }
 
-static void release_chr(Object *obj, const char *name, void *opaque)
+static void release_chr(DeviceState *dev, Property *prop)
 {
-DeviceState *dev = DEVICE(obj);
-Property *prop = opaque;
 CharBackend *be = qdev_get_prop_ptr(dev, prop);
 
 qemu_chr_fe_deinit(be);
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 2a82768..3709050 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -383,10 +383,9 @@ PropertyInfo qdev_prop_uint64 = {
 
 /* --- string --- */
 
-static void release_string(Object *obj, const char *name, void *opaque)
+static void release_string(DeviceState *dev, Property *prop)
 {
-Property *prop = opaque;
-g_free(*(char **)qdev_get_prop_ptr(DEVICE(obj), prop));
+g_free(*(char **)qdev_get_prop_ptr(dev, prop));
 }
 
 static void get_string(Object *obj, Visitor *v, const char *name,
@@ -823,7 +822,7 @@ PropertyInfo qdev_prop_pci_host_devaddr = {
 typedef struct {
 struct Property prop;
 char *propname;
-ObjectPropertyRelease *release;
+void (*release)(DeviceState *dev, Property *prop);
 } ArrayElementProperty;
 
 /* object property release callback for array element properties:
@@ -832,9 +831,10 @@ typedef struct {
  */
 static void array_element_release(Object *obj, const char *name, void *opaque)
 {
+DeviceState *dev = DEVICE(obj);
 ArrayElementProperty *p = opaque;
 if (p->release) {
-p->release(obj, name, opaque);
+p->release(dev, &p->prop);
 }
 g_free(p->propname);
 g_free(p);
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 5783442..b859e15 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -774,6 +774,14 @@ static void qdev_property_add_legacy(DeviceState *dev, 
Property *prop,
 g_free(name);
 }
 
+static void qdev_release_prop(Object *obj, const char *name, void *opaque)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+
+prop->info->release(dev, prop);
+}
+
 /**
  * qdev_property_add_static:
  * @dev: Device to add the property to.
@@ -801,7 +809,7 @@ void qdev_property_add_static(DeviceState *dev, Property 
*prop,
 
 object_property_add(obj, prop->name, prop->info->name,
 prop->info->get, prop->info->set,
-prop->info->release,
+prop->info->release ? qdev_release_prop : NULL,
 prop, &local_err);
 
 if (local_err) {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 2c97347..5ea2095 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -251,7 +251,7 @@ struct PropertyInfo {
 int (*print)(DeviceState *dev, Property *prop, char *dest, size_t len);
 ObjectPropertyAccessor *get;
 ObjectPropertyAccessor *set;
-ObjectPropertyRelease *release;
+void (*release)(DeviceState *dev, Property *prop);
 };
 
 /**
-- 
2.7.4




Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications

2016-11-16 Thread Aviv B.D.
On Thu, Nov 10, 2016 at 9:20 PM, Michael S. Tsirkin  wrote:

> On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote:
> > On Thu, 10 Nov 2016 17:54:35 +0200
> > "Michael S. Tsirkin"  wrote:
> >
> > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote:
> > > > On Thu, 10 Nov 2016 17:14:24 +0200
> > > > "Michael S. Tsirkin"  wrote:
> > > >
> > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote:
> > > > > > From: "Aviv Ben-David" 
> > > > > >
> > > > > > * Advertize Cache Mode capability in iommu cap register.
> > > > > >   This capability is controlled by "cache-mode" property of
> intel-iommu device.
> > > > > >   To enable this option call QEMU with "-device
> intel-iommu,cache-mode=true".
> > > > > >
> > > > > > * On page cache invalidation in intel vIOMMU, check if the
> domain belong to
> > > > > >   registered notifier, and notify accordingly.
> > > > >
> > > > > This looks sane I think. Alex, care to comment?
> > > > > Merging will have to wait until after the release.
> > > > > Pls remember to re-test and re-ping then.
> > > >
> > > > I don't think it's suitable for upstream until there's a reasonable
> > > > replay mechanism
> > >
> > > Could you pls clarify what do you mean by replay?
> > > Is this when you attach a device by hotplug to
> > > a running system?
> > >
> > > If yes this can maybe be addressed by disabling hotplug temporarily.
> >
> > No, hotplug is not required, moving a device between existing domains
> > requires replay, ie. actually using it for nested device assignment.
>
> Good point, that one is a correctness thing. Aviv,
> could you add this in TODO list in a cover letter pls?
>

Sure, no problem.


>
> > > > and we straighten out whether it's expected to get
> > > > multiple notifies and the notif-ee is responsible for filtering
> > > > them or if the notif-er should do filtering.
> > >
> > > OK this is a documentation thing.
> >
> > Well no, it needs to be decided and if necessary implemented.
>
> Let's assume it's the notif-ee for now. Less is more and all that.
>
> > > >  Without those, this is
> > > > effectively just an RFC.
> > >
> > > It's infrastructure without users so it doesn't break things,
> > > I'm more interested in seeing whether it's broken in
> > > some way than whether it's complete.
> >
> > If it allows use with vfio but doesn't fully implement the complete set
> > of interfaces, it does break things.  We currently prevent viommu usage
> > with vfio because it is incomplete.
>
> Right - that bit is still in as far as I can see.
>
> > > The patchset spent out of tree too long and I'd like to see
> > > us make progress towards device assignment working with
> > > vIOMMU sooner rather than later, so if it's broken I won't
> > > merge it but if it's incomplete I will.
> >
> > So long as it's incomplete and still prevents vfio usage, I'm ok with
> > merging it, but I don't want to enable vfio usage until it's complete.
> > Thanks,
> >
> > Alex
> >
> > > > > > Currently this patch still doesn't enabling VFIO devices support
> with vIOMMU
> > > > > > present. Current problems:
> > > > > > * vfio_iommu_map_notify is not aware about memory range belong
> to specific
> > > > > >   VFIOGuestIOMMU.
> > > > > > * memory_region_iommu_replay hangs QEMU on start up while it
> itterate over
> > > > > >   64bit address space. Commenting out the call to this function
> enables
> > > > > >   workable VFIO device while vIOMMU present.
> > > > > > * vfio_iommu_map_notify should check if address space range is
> suitable for
> > > > > >   current notifier.
> > > > > >
> > > > > > Changes from v1 to v2:
> > > > > > * remove assumption that the cache do not clears
> > > > > > * fix lockup on high load.
> > > > > >
> > > > > > Changes from v2 to v3:
> > > > > > * remove debug leftovers
> > > > > > * split to sepearate commits
> > > > > > * change is_write to flags in vtd_do_iommu_translate, add
> IOMMU_NO_FAIL
> > > > > >   to suppress error propagating to guest.
> > > > > >
> > > > > > Changes from v3 to v4:
> > > > > > * Add property to intel_iommu device to control the CM
> capability,
> > > > > >   default to False.
> > > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers.
> > > > > >
> > > > > > Changes from v4 to v4 RESEND:
> > > > > > * Fix codding style pointed by checkpatch.pl script.
> > > > > >
> > > > > > Changes from v4 to v5:
> > > > > > * Reduce the number of changes in patch 2 and make flags real
> bitfield.
> > > > > > * Revert deleted debug prints.
> > > > > > * Fix memory leak in patch 3.
> > > > > >
> > > > > > Changes from v5 to v6:
> > > > > > * fix prototype of iommu_translate function for more IOMMU types.
> > > > > > * VFIO will be notified only on the difference, without unmap
> > > > > >   before change to maps.
> > > > > >
> > > > > > Aviv Ben-David (3):
> > > > > >   IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility
> exposoed to
> > > > > > guest
> > > > > >   IOMMU: change iommu_op->transla

  1   2   3   >