Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support
On Tue, Nov 15, 2016 at 10:38 PM, ashish mittal wrote: > On Wed, Sep 28, 2016 at 2:45 PM, Stefan Hajnoczi wrote: >> On Tue, Sep 27, 2016 at 09:09:49PM -0700, Ashish Mittal wrote: >> 5. >> I don't see any endianness handling or portable alignment of struct >> fields in the network protocol code. Binary network protocols need to >> take care of these issue for portability. This means libqnio compiled >> for different architectures will not work. Do you plan to support any >> other architectures besides x86? >> > > No, we support only x86 and do not plan to support any other arch. > Please let me know if this necessitates any changes to the configure > script. I think no change to ./configure is necessary. The library will only ship on x86 so other platforms will never attempt to compile the code. >> 6. >> The networking code doesn't look robust: kvset uses assert() on input >> from the network so the other side of the connection could cause SIGABRT >> (coredump), the client uses the msg pointer as the cookie for the >> response packet so the server can easily crash the client by sending a >> bogus cookie value, etc. Even on the client side these things are >> troublesome but on a server they are guaranteed security issues. I >> didn't look into it deeply. Please audit the code. >> > > By design, our solution on OpenStack platform uses a closed set of > nodes communicating on dedicated networks. VxHS servers on all the > nodes are on a dedicated network. Clients (qemu) connects to these > only after reading the server IP from the XML (read by libvirt). The > XML cannot be modified without proper access. Therefore, IMO this > problem would be relevant only if someone were to use qnio as a > generic mode of communication/data transfer, but for our use-case, we > will not run into this problem. Is this explanation acceptable? No. The trust model is that the guest is untrusted and in the worst case may gain code execution in QEMU due to security bugs. You are assuming block/vxhs.c and libqnio are trusted but that assumption violates the trust model. In other words: 1. Guest exploits a security hole inside QEMU and gains code execution on the host. 2. Guest uses VxHS client file descriptor on host to send a malicious packet to VxHS server. 3. VxHS server is compromised by guest. 4. Compromised VxHS server sends malicious packets to all other connected clients. 5. All clients have been compromised. This means both the VxHS client and server must be robust. They have to validate inputs to avoid buffer overflows, assertion failures, infinite loops, etc. Stefan
Re: [Qemu-devel] [RFC 0/3] aio: experimental virtio-blk polling mode
On Mon, 11/14 16:29, Paolo Bonzini wrote: > > > On 14/11/2016 16:26, Stefan Hajnoczi wrote: > > On Fri, Nov 11, 2016 at 01:59:25PM -0600, Karl Rister wrote: > >> QEMU_AIO_POLL_MAX_NS IOPs > >>unset31,383 > >>146,860 > >>246,440 > >>435,246 > >>834,973 > >> 1646,794 > >> 3246,729 > >> 6435,520 > >> 12845,902 > > > > The environment variable is in nanoseconds. The range of values you > > tried are very small (all <1 usec). It would be interesting to try > > larger values in the ballpark of the latencies you have traced. For > > example 2000, 4000, 8000, 16000, and 32000 ns. > > > > Very interesting that QEMU_AIO_POLL_MAX_NS=1 performs so well without > > much CPU overhead. > > That basically means "avoid a syscall if you already know there's > something to do", so in retrospect it's not that surprising. Still > interesting though, and it means that the feature is useful even if you > don't have CPU to waste. With the "deleted" bug fixed I did a little more testing to understand this. Setting QEMU_AIO_POLL_MAX_NS=1 doesn't mean run_poll_handlers() will only loop for 1 ns - the patch only checks at every 1024 polls. The first poll in a run_poll_handlers() call can hardly succeed, so we poll at least 1024 times. According to my test, on average each run_poll_handlers() takes ~12000ns, which is ~160 iterations of the poll loop, before geting a new event (either from virtio queue or linux-aio, I don't have the ratio here). So in the worse case (no new event), 1024 iterations is basically (12000 / 160 * 1024) = 76800 ns! The above is with iodepth=1 and jobs=1. With iodepth=32 and jobs=1, or iodepth=8 and jobs=4, the numbers are ~30th poll with 5600ns. Fam
Re: [Qemu-devel] [PATCH v2] vhost: Update 'ioeventfd_started' with host notifiers
> On 16 Nov 2016, at 04:05, Alexey Kardashevskiy wrote: > > On 11/11/16 01:45, Christian Borntraeger wrote: >> On 11/09/2016 01:44 PM, Felipe Franciosi wrote: >>> Following the recent refactor of virtio notfiers [1], more specifically >>> the patch that uses virtio_bus_set_host_notifier [2] by default, core >>> virtio code requires 'ioeventfd_started' to be set to true/false when >>> the host notifiers are configured. Because not all vhost devices were >>> update (eg. vhost-scsi) to use the new interface, this value is always >>> set to false. >>> >>> When booting a guest with a vhost-scsi backend controller, SeaBIOS will >>> initially configure the device which sets all notifiers. The guest will >>> continue to boot fine until the kernel virtio-scsi driver reinitialises >>> the device causing a stop followed by another start. Since >>> ioeventfd_started was never set to true, the 'stop' operation triggered >>> by virtio_bus_set_host_notifier() will not result in a call to >>> virtio_pci_ioeventfd_assign(assign=false). This leaves the memory >>> regions with stale notifiers and results on the next start triggering >>> the following assertion: >>> >>> kvm_mem_ioeventfd_add: error adding ioeventfd: File exists >>> Aborted >>> >>> This patch updates ioeventfd_started whenever the notifiers are set or >>> cleared, fixing this issue. >>> >>> Signed-off-by: Felipe Franciosi >> >> This also fixes vhost-net after reboot on s390/kvm for me > > > It does not fix it (the original breakage from e616c2f "virtio: remove > ioeventfd_disabled altogether") for me: Can you try Paolo's latest patches for this issue? http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02834.html Specifically this: http://lists.nongnu.org/archive/html/qemu-devel/2016-11/msg02837.html If that doesn't work, can you please plug a gdb on your qemu and print a stack trace once you hit the assertion? Thanks, Felipe > > /home/aik/p/qemu/ppc64-softmmu/qemu-system-ppc64 -nodefaults \ > -chardev stdio,id=STDIO0,signal=off,mux=on \ > -device spapr-vty,id=svty0,chardev=STDIO0,reg=0x71000100 \ > -mon id=MON0,chardev=STDIO0,mode=readline -nographic -vga none \ > -enable-kvm -m 2G \ > -kernel /home/aik/t/vml450le \ > -initrd /home/aik/t/le.cpio \ > -netdev tap,id=TAP0,vhost=on,helper=/home/aik/qemu-bridge-helper \ > -device "virtio-net-pci,id=vnet0,mac=C0:41:49:4b:00:00,netdev=TAP0" \ > -smp 16,threads=8 \ > -trace events=qemu_trace_events \ > -machine pseries \ > -L /home/aik/t/qemu-ppc64-bios/ > QEMU PID = 22145 > QEMU 2.7.50 monitor - type 'help' for more information > (qemu) > > > SLOF ** > QEMU Starting > Build Date = Nov 14 2016 19:13:53 > FW Version = git-9b8945ecbde65b06 > Press "s" to enter Open Firmware. > > Populating /vdevice methods > Populating /vdevice/nvram@7100 > Populating /vdevice/vty@71000100 > Populating /pci@8002000 > 00 (D) : 1af4 1000virtio [ net ] > qemu-system-ppc64: /home/aik/p/qemu/memory.c:1940: > memory_region_del_eventfd: Assertion `i != mr->ioeventfd_nb' failed. > QEMU pid = 22145 returned -6 > > > > > Without this one, the breakage looked different (error would have happened > lot later, when in the guest kernel): > > > > SLOF ** > QEMU Starting > Build Date = Nov 14 2016 19:13:53 > FW Version = git-9b8945ecbde65b06 > Press "s" to enter Open Firmware. > > Populating /vdevice methods > Populating /vdevice/nvram@7100 > Populating /vdevice/vty@71000100 > Populating /pci@8002000 > 00 (D) : 1af4 1000virtio [ net ] > No NVRAM common partition, re-initializing... > Scanning USB > Using default console: /vdevice/vty@71000100 > ted RAM kernel at 40 (16ef23c bytes) C08FF > Welcome to Open Firmware > > Copyright (c) 2004, 2011 IBM Corporation All rights reserved. > This program and the accompanying materials are made available > under the terms of the BSD License available at > http://www.opensource.org/licenses/bsd-license.php > > Booting from memory... > OF stdout device is: /vdevice/vty@71000100 > Preparing to boot Linux version 4.5.0-le_v4.5_aik@vpl2-kernel > (a...@vpl2.ozlabs.ibm.com) (gcc version 5.4.1 20160623 (GCC) ) #59 SMP > > [skipping bunch of boring stuff] > > virtio-pci :00:00.0: enabling device (0100 -> 0103) > HVCS: Driver registered. > Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > brd: module loaded > loop: module loaded > Uniform Multi-Platform E-IDE driver > ide-gd driver 1.18 > ide-cd driver 5.00 > Loading iSCSI transport class v2.0-870. > Emulex LightPulse Fibre Channel SCSI driver 11.0.0.10. > Copyright(c) 2004-2015 Emulex. All rights reserved. > ipr: IBM Power RAID SCSI Device Driver version: 2.6.3 (October 17, 2015) > ibmvfc: IBM Virtual Fibre Channel Driver version: 1.0.11 (April 12, 2013) > rtas_msi: calc quota for :00:0
[Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
The ppc64 postcopy test does not work with KVM-PR, and it is also causing annoying warning messages when run on a x86 host. So let's use KVM here only if we know that we're running with KVM-HV (which automatically also means that we're running on a ppc64 host), and fall back to TCG otherwise. Signed-off-by: Thomas Huth --- tests/postcopy-test.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c index d6613c5..dafe8be 100644 --- a/tests/postcopy-test.c +++ b/tests/postcopy-test.c @@ -380,17 +380,21 @@ static void test_migrate(void) " -incoming %s", tmpfs, bootpath, uri); } else if (strcmp(arch, "ppc64") == 0) { +const char *accel; + +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; init_bootfile_ppc(bootpath); -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" " -name pcsource,debug-threads=on" " -serial file:%s/src_serial" " -drive file=%s,if=pflash,format=raw", - tmpfs, bootpath); -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" + accel, tmpfs, bootpath); +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" " -name pcdest,debug-threads=on" " -serial file:%s/dest_serial" " -incoming %s", - tmpfs, uri); + accel, tmpfs, uri); } else { g_assert_not_reached(); } -- 1.8.3.1
Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"
ashish mittal writes: > Thanks for concluding on this. > > I will rearrange the qnio_api.h header accordingly as follows: > > +#include "qemu/osdep.h" Headers should not include osdep.h. > +#include<=== after osdep.h > +#include "block/block_int.h" Including block_int.h in a header is problematic. Are you sure you need it? Will qnio/qnio_api.h ever be included outside block/? > +#include "qapi/qmp/qerror.h" > +#include "qapi/qmp/qdict.h" > +#include "qapi/qmp/qstring.h" > +#include "trace.h" > +#include "qemu/uri.h" > +#include "qapi/error.h" > +#include "qemu/error-report.h" < remove In general, headers should include what they need, but no more.
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16/11/2016 09:39, Thomas Huth wrote: > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > fall back to TCG otherwise. > > Signed-off-by: Thomas Huth > --- > tests/postcopy-test.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > index d6613c5..dafe8be 100644 > --- a/tests/postcopy-test.c > +++ b/tests/postcopy-test.c > @@ -380,17 +380,21 @@ static void test_migrate(void) >" -incoming %s", >tmpfs, bootpath, uri); > } else if (strcmp(arch, "ppc64") == 0) { > +const char *accel; > + > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; why not "kvm" instead of "kvm:tcg"? If it doesn't work it should fail. Laurent
Re: [Qemu-devel] [PATCH] HACKING: document #include order
Eric Blake writes: > On 11/15/2016 02:29 PM, Stefan Hajnoczi wrote: >> It was not obvious to me why "qemu/osdep.h" must be the first #include. >> This documents the rationale and the overall #include order. >> >> Cc: Fam Zheng >> Cc: Markus Armbruster >> Cc: Eric Blake >> Signed-off-by: Stefan Hajnoczi >> --- >> HACKING | 15 +++ >> 1 file changed, 15 insertions(+) >> > >> +1.2. Include directives >> + >> +Order include directives as follows: >> + >> +#include "qemu/osdep.h" /* Always first... */ >> +#include <...> /* then system headers... */ >> +#include "..." /* and finally QEMU headers. */ >> + >> +The "qemu/osdep.h" header contains preprocessor macros that affect the >> behavior >> +of core system headers like . It must be the first include so >> that >> +core system headers included by external libraries get the preprocessor >> macros >> +that QEMU depends on. > > Might be worth mentioning that only .c files include osdep.h (.h files > do not need to, because they can only be included by a .c file that has > already included osdep.h first). Yes, please, but make it "headers should not include osdep.h".
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16.11.2016 10:19, Laurent Vivier wrote: > > > On 16/11/2016 09:39, Thomas Huth wrote: >> The ppc64 postcopy test does not work with KVM-PR, and it is also >> causing annoying warning messages when run on a x86 host. So let's >> use KVM here only if we know that we're running with KVM-HV (which >> automatically also means that we're running on a ppc64 host), and >> fall back to TCG otherwise. >> >> Signed-off-by: Thomas Huth >> --- >> tests/postcopy-test.c | 12 >> 1 file changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c >> index d6613c5..dafe8be 100644 >> --- a/tests/postcopy-test.c >> +++ b/tests/postcopy-test.c >> @@ -380,17 +380,21 @@ static void test_migrate(void) >>" -incoming %s", >>tmpfs, bootpath, uri); >> } else if (strcmp(arch, "ppc64") == 0) { >> +const char *accel; >> + >> +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ >> +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > > why not "kvm" instead of "kvm:tcg"? > If it doesn't work it should fail. Yes, sounds right. I'll send a v2... Thomas
Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"
On Wed, 11/16 10:04, Markus Armbruster wrote: > ashish mittal writes: > > > Thanks for concluding on this. > > > > I will rearrange the qnio_api.h header accordingly as follows: > > > > +#include "qemu/osdep.h" > > Headers should not include osdep.h. This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what Ashish means looks good to me. Fam > > > +#include<=== after osdep.h > > +#include "block/block_int.h" > > Including block_int.h in a header is problematic. Are you sure you need > it? Will qnio/qnio_api.h ever be included outside block/? > > > +#include "qapi/qmp/qerror.h" > > +#include "qapi/qmp/qdict.h" > > +#include "qapi/qmp/qstring.h" > > +#include "trace.h" > > +#include "qemu/uri.h" > > +#include "qapi/error.h" > > +#include "qemu/error-report.h" < remove > > In general, headers should include what they need, but no more. >
Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters
Kevin, > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru] > > From: Kevin Wolf [mailto:kw...@redhat.com] > > Am 28.09.2016 um 11:32 hat Pavel Dovgalyuk geschrieben: > > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > > Am 27.09.2016 um 16:06 hat Pavel Dovgalyuk geschrieben: > > > > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > > > > Am 26.09.2016 um 11:51 hat Pavel Dovgalyuk geschrieben: > > > > > > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > > > > > > Am 26.09.2016 um 10:08 hat Pavel Dovgalyuk geschrieben: > > > > > > Originally, we only called bdrv_goto_snapshot() for all _top level_ > > > > > > BDSes, and this is still what you normally get. However, if you > > > > > > explicitly create a BDS (e.g. with its own -drive option), it is > > > > > > considered a top level BDS without actually being top level for the > > > > > > guest, and therefore the snapshotting function is called for it. > > > > > > > > > > > > Of course, this is highly inefficient because the goto_snapshot > > > > > > request > > > > > > is passed by the filter driver and then called another time for the > > > > > > lower node, effectively loading the snapshot a second time. > > > > > > Maybe double-saving/loading does the smallest damage then? > > > And we should just document how to use blkreplay effectively? > > > > > > > > > > > > > > > On the other hand if you use a single -drive option to create both > > > > > > the > > > > > > qcow2 BDS and the blkreplay filter, we do need to pass down the > > > > > > goto_snapshot request because it won't be called for the qcow2 layer > > > > > > otherwise. > > > > > > > > > > How this can be specified in command line? > > > > > I believed that separate -drive option is required. > > > > > > > > Something like this: > > > > > > > > -drive driver=blkreplay,image.driver=file,image.filename=test.img > > > > > > > > > > I tried the following command line, but VM does not detect the hard drive > > > and cannot boot. > > > > > > -drive > > > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img- > > blkreplay > > > -device ide-hd,drive=img-blkreplay > > > > My command line was assuming a raw image. It looks like you're using a > > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2 > > driver: > > > > -drive driver=blkreplay,if=none,image.driver=qcow2,\ > > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay > > This doesn't work for some reason. Replay just hangs at some moment. > > Maybe there exists some internal difference between command line with one or > two -drive > options? I've investigated this issue. This command line works ok: -drive driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay -device ide-hd,drive=img-blkreplay And this does not: -drive driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow ,id=img-blkreplay -device ide-hd,drive=img-blkreplay QEMU hangs at some moment of replay. I found that some dma requests do not pass through the blkreplay driver due to the following line in block-backend.c: return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags); This line passes read request directly to qcow driver and blkreplay cannot process it to make deterministic. Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH] crypto: add virtio-crypto driver
Hi Michael, May I should convert all __virtio32/64 to le32/64 in virtio_crypto.h ? > +#define VIRTIO_CRYPTO_OPCODE(service, op) (((service) << 8) | (op)) > + > +struct virtio_crypto_ctrl_header { > +#define VIRTIO_CRYPTO_CIPHER_CREATE_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x02) > +#define VIRTIO_CRYPTO_CIPHER_DESTROY_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_CIPHER, 0x03) > +#define VIRTIO_CRYPTO_HASH_CREATE_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x02) > +#define VIRTIO_CRYPTO_HASH_DESTROY_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_HASH, 0x03) > +#define VIRTIO_CRYPTO_MAC_CREATE_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x02) > +#define VIRTIO_CRYPTO_MAC_DESTROY_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_MAC, 0x03) > +#define VIRTIO_CRYPTO_AEAD_CREATE_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x02) > +#define VIRTIO_CRYPTO_AEAD_DESTROY_SESSION \ > +VIRTIO_CRYPTO_OPCODE(VIRTIO_CRYPTO_SERVICE_AEAD, 0x03) > + __virtio32 opcode; > + __virtio32 algo; > + __virtio32 flag; > + /* data virtqueue id */ > + __virtio32 queue_id; > +}; > + > +struct virtio_crypto_cipher_session_para { > +#define VIRTIO_CRYPTO_NO_CIPHER 0 > +#define VIRTIO_CRYPTO_CIPHER_ARC4 1 > +#define VIRTIO_CRYPTO_CIPHER_AES_ECB2 > +#define VIRTIO_CRYPTO_CIPHER_AES_CBC3 > +#define VIRTIO_CRYPTO_CIPHER_AES_CTR4 > +#define VIRTIO_CRYPTO_CIPHER_DES_ECB5 > +#define VIRTIO_CRYPTO_CIPHER_DES_CBC6 > +#define VIRTIO_CRYPTO_CIPHER_3DES_ECB 7 > +#define VIRTIO_CRYPTO_CIPHER_3DES_CBC 8 > +#define VIRTIO_CRYPTO_CIPHER_3DES_CTR 9 > +#define VIRTIO_CRYPTO_CIPHER_KASUMI_F8 10 > +#define VIRTIO_CRYPTO_CIPHER_SNOW3G_UEA211 > +#define VIRTIO_CRYPTO_CIPHER_AES_F8 12 > +#define VIRTIO_CRYPTO_CIPHER_AES_XTS13 > +#define VIRTIO_CRYPTO_CIPHER_ZUC_EEA3 14 > + __virtio32 algo; > + /* length of key */ > + __virtio32 keylen; > + > +#define VIRTIO_CRYPTO_OP_ENCRYPT 1 > +#define VIRTIO_CRYPTO_OP_DECRYPT 2 > + /* encrypt or decrypt */ > + __virtio32 op; > + __virtio32 padding; > +}; > + > +struct virtio_crypto_session_input { > + /* Device-writable part */ > + __virtio64 session_id; > + __virtio32 status; > + __virtio32 padding; > +}; > + > +struct virtio_crypto_cipher_session_req { > + struct virtio_crypto_cipher_session_para para; > +}; > + > +struct virtio_crypto_hash_session_para { > +#define VIRTIO_CRYPTO_NO_HASH0 > +#define VIRTIO_CRYPTO_HASH_MD5 1 > +#define VIRTIO_CRYPTO_HASH_SHA1 2 > +#define VIRTIO_CRYPTO_HASH_SHA_224 3 > +#define VIRTIO_CRYPTO_HASH_SHA_256 4 > +#define VIRTIO_CRYPTO_HASH_SHA_384 5 > +#define VIRTIO_CRYPTO_HASH_SHA_512 6 > +#define VIRTIO_CRYPTO_HASH_SHA3_224 7 > +#define VIRTIO_CRYPTO_HASH_SHA3_256 8 > +#define VIRTIO_CRYPTO_HASH_SHA3_384 9 > +#define VIRTIO_CRYPTO_HASH_SHA3_512 10 > +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE128 11 > +#define VIRTIO_CRYPTO_HASH_SHA3_SHAKE256 12 > + __virtio32 algo; > + /* hash result length */ > + __virtio32 hash_result_len; > +}; > + > +struct virtio_crypto_hash_create_session_req { > + struct virtio_crypto_hash_session_para para; > +}; > + > +struct virtio_crypto_mac_session_para { > +#define VIRTIO_CRYPTO_NO_MAC 0 > +#define VIRTIO_CRYPTO_MAC_HMAC_MD5 1 > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA12 > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_224 3 > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_256 4 > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_384 5 > +#define VIRTIO_CRYPTO_MAC_HMAC_SHA_512 6 > +#define VIRTIO_CRYPTO_MAC_CMAC_3DES25 > +#define VIRTIO_CRYPTO_MAC_CMAC_AES 26 > +#define VIRTIO_CRYPTO_MAC_KASUMI_F927 > +#define VIRTIO_CRYPTO_MAC_SNOW3G_UIA2 28 > +#define VIRTIO_CRYPTO_MAC_GMAC_AES 41 > +#define VIRTIO_CRYPTO_MAC_GMAC_TWOFISH 42 > +#define VIRTIO_CRYPTO_MAC_CBCMAC_AES 49 > +#define VIRTIO_CRYPTO_MAC_CBCMAC_KASUMI_F9 50 > +#define VIRTIO_CRYPTO_MAC_XCBC_AES 53 > + __virtio32 algo; > + /* hash result length */ > + __virtio32 hash_result_len; > + /* length of authenticated key */ > + __virtio32 auth_key_len; > + __virtio32 padding; > +}; > + > +struct virtio_crypto_mac_create_session_req { > + struct virtio_crypto_mac_session_para para; > +}; > + > +struct virtio_crypto_aead_session_para { > +#define VIRTIO_CRYPTO_NO_AEAD 0 > +#define VIRTIO_CRYPTO_AEAD_GCM1 > +#define VIRTIO_CRYPTO_AEAD_CCM2 > +#define VIRTIO_CRYPTO_AEAD_CHACHA20_POLY1305
[Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
The ppc64 postcopy test does not work with KVM-PR, and it is also causing annoying warning messages when run on a x86 host. So let's use KVM here only if we know that we're running with KVM-HV (which automatically also means that we're running on a ppc64 host), and use TCG otherwise. Signed-off-by: Thomas Huth --- v2: - Check also /dev/kvm to make sure that we're allowed to access KVM - Use only "accel=kvm" instead of "accel=kvm:tcg" if we feel confident that we're running with KVM-HV and can use it tests/postcopy-test.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c index d6613c5..e4f0f3f 100644 --- a/tests/postcopy-test.c +++ b/tests/postcopy-test.c @@ -380,17 +380,27 @@ static void test_migrate(void) " -incoming %s", tmpfs, bootpath, uri); } else if (strcmp(arch, "ppc64") == 0) { +const char *accel = "tcg"; + +/* + * We preferably want to test with KVM, but on ppc64, the test only + * works with kvm-hv, not with kvm-pr, so we check that here first + */ +if (access("/sys/module/kvm_hv", F_OK) == 0 && +access("/dev/kvm", R_OK | W_OK) == 0) { +accel = "kvm"; +} init_bootfile_ppc(bootpath); -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" " -name pcsource,debug-threads=on" " -serial file:%s/src_serial" " -drive file=%s,if=pflash,format=raw", - tmpfs, bootpath); -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" + accel, tmpfs, bootpath); +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" " -name pcdest,debug-threads=on" " -serial file:%s/dest_serial" " -incoming %s", - tmpfs, uri); + accel, tmpfs, uri); } else { g_assert_not_reached(); } -- 1.8.3.1
Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16/11/2016 11:14, Thomas Huth wrote: > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > use TCG otherwise. > > Signed-off-by: Thomas Huth > --- > v2: > - Check also /dev/kvm to make sure that we're allowed to access KVM I'm not sure it's a good idea as we will fail silently whereas QEMU sends an error message. It's common mistake we should be aware of. Laurent
Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16.11.2016 11:18, Laurent Vivier wrote: > > > On 16/11/2016 11:14, Thomas Huth wrote: >> The ppc64 postcopy test does not work with KVM-PR, and it is also >> causing annoying warning messages when run on a x86 host. So let's >> use KVM here only if we know that we're running with KVM-HV (which >> automatically also means that we're running on a ppc64 host), and >> use TCG otherwise. >> >> Signed-off-by: Thomas Huth >> --- >> v2: >> - Check also /dev/kvm to make sure that we're allowed to access KVM > > I'm not sure it's a good idea as we will fail silently whereas QEMU > sends an error message. It's common mistake we should be aware of. But if I run "make check" as a normal user who does not have access right to /dev/kvm, this is IMHO not a fatal error (since this could be on purpose), thus we should not issue an error message here and simply use TCG instead. If you want to see at least a warning in this case, I think we should rather go with v1 of this patch that used "kvm:tcg". Thomas
Re: [Qemu-devel] [PATCH v2] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16/11/2016 11:26, Thomas Huth wrote: > On 16.11.2016 11:18, Laurent Vivier wrote: >> >> >> On 16/11/2016 11:14, Thomas Huth wrote: >>> The ppc64 postcopy test does not work with KVM-PR, and it is also >>> causing annoying warning messages when run on a x86 host. So let's >>> use KVM here only if we know that we're running with KVM-HV (which >>> automatically also means that we're running on a ppc64 host), and >>> use TCG otherwise. >>> >>> Signed-off-by: Thomas Huth >>> --- >>> v2: >>> - Check also /dev/kvm to make sure that we're allowed to access KVM >> >> I'm not sure it's a good idea as we will fail silently whereas QEMU >> sends an error message. It's common mistake we should be aware of. > > But if I run "make check" as a normal user who does not have access > right to /dev/kvm, this is IMHO not a fatal error (since this could be > on purpose), thus we should not issue an error message here and simply > use TCG instead. > > If you want to see at least a warning in this case, I think we should > rather go with v1 of this patch that used "kvm:tcg". I think it's better to have a warning, so let's got with v1... Laurent
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16/11/2016 09:39, Thomas Huth wrote: > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > fall back to TCG otherwise. > > Signed-off-by: Thomas Huth Reviewed-by: Laurent Vivier > --- > tests/postcopy-test.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > index d6613c5..dafe8be 100644 > --- a/tests/postcopy-test.c > +++ b/tests/postcopy-test.c > @@ -380,17 +380,21 @@ static void test_migrate(void) >" -incoming %s", >tmpfs, bootpath, uri); > } else if (strcmp(arch, "ppc64") == 0) { > +const char *accel; > + > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > init_bootfile_ppc(bootpath); > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcsource,debug-threads=on" >" -serial file:%s/src_serial" >" -drive file=%s,if=pflash,format=raw", > - tmpfs, bootpath); > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > + accel, tmpfs, bootpath); > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcdest,debug-threads=on" >" -serial file:%s/dest_serial" >" -incoming %s", > - tmpfs, uri); > + accel, tmpfs, uri); > } else { > g_assert_not_reached(); > } >
Re: [Qemu-devel] [PATCH] tcg/mips: Add support for mips64el backend
Hi Richard, On Tue, Nov 15, 2016 at 10:37:41PM +0100, Richard Henderson wrote: > On 11/14/2016 10:33 AM, Jin Guojie wrote: > > I want listen to your advice. Should I test your v2 patch on Loongson > > and use it? Or whether it is worth modifying my patch and resubmit it > > according to your review comments? > > I would like very much if you would test my patch on Loongson (or a > re-submission of my patch; I could perhaps prepare that against master in the > next few days). > > If it is possible, I would like if you could help fix the problems that > Aurelien discovered with my patch. I have no access to mips hardware myself, > so all of the development that I was doing was from within a qemu itself. As > you can imagine, qemu-in-qemu is very very slow. > > At the time I was hoping that people from imgtec would be able to help, but > that never came to pass. Oh well. I'm up for helping a bit with this (testing & debugging), though I admit it fell off my radar a bit. We could try and run it up on our kernel test farm too. Please keep me Cc'd on any future patches :) Cheers James signature.asc Description: Digital signature
Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"
On Wed, Nov 16, 2016 at 9:49 AM, Fam Zheng wrote: > On Wed, 11/16 10:04, Markus Armbruster wrote: >> ashish mittal writes: >> >> > Thanks for concluding on this. >> > >> > I will rearrange the qnio_api.h header accordingly as follows: >> > >> > +#include "qemu/osdep.h" >> >> Headers should not include osdep.h. > > This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what > Ashish means looks good to me. Yes, I think "will rearrange the qnio_api.h header" was a typo and was supposed to be block/vxhs.c. Stefan
Re: [Qemu-devel] [PATCH 1/3] virtio: Basic implementation of virtio pstore driver
> Not sure how independent ERST is from ACPI and other specs. It looks > like referencing UEFI spec at least. It is just the format of error records that comes from the UEFI spec (include/linux/cper.h) but you can ignore it, I think. It should be handled by tools on the host side. For you, the error log address range contains a CPER header followed by a binary blob. In practice, you only need the record length field (bytes 20-23 of the header), though it may be a good idea to validate the signature at the beginning of the header. > Btw, is the ERST used for pstore only (in Linux)? Yes. It can store various records, including dmesg and MCE. There are other examples in QEMU of interfaces with ACPI. They all use the DSDT, but the logic is similar. For example, docs/specs/acpi_mem_hotplug.txt documents the memory hotplug interface. In all cases, ACPI tables contain small programs that talk to specialized hardware registers, typically allocated to hard-coded I/O ports. In your case, the registers could occupy 16 consecutive I/O ports, like the following: 0x00 read/write operation type (0=write,1=read,2=clear,3=dummy write) 0x01 read-onlybit 7: if set, operation in progress bit 0-6: operation status, see "Command Status Definition" in the ACPI spec 0x02 read-onlywhen read: - read a 64-bit record id from the store to memory, from the address that was last written to 0x08. - if the id is valid and is not the last id in the store, write the next 64-bit record id to the same address - otherwise, write the first record id to the same address, or 0x if the store is empty 0x03unused, read as zero 0x04-0x07 read/write offset of the error record into the error log address range 0x08-0x0b read/write when read, return number of stored records when written, the written value is a 32-bit memory address, which points to a 64-bit location used to communicate record ids. 0x0c-0x0f read/write when read, always return -1 (together with the "mask" field and READ_REGISTER, this lets ERST instructions return any value!) when written, trigger the pstore operation: - if the current operation is a dummy write, do nothing - if the current operation is a write, write a new record, using the written value as the base of the error log address range. The length must be parsed from the CPER header. - if the current operation is a clear, read the record id from the memory location that was last written to 0x08 and do the operation. the value written is ignored. - if the current operation is a read, read the record id from the memory location that was last written to 0x08, using the written value as the base of the error log address range. In addition, the firmware will need to reserve a few KB of RAM for the error log address range (I checked a real system and it reserves 8KB). The first eight bytes are needed for the record identifier interface, because there's no such thing as 64-bit I/O ports, and the rest can be used for the actual buffer. QEMU already has an interface to allocate RAM and patch the address into an ACPI table (bios_linker_loader_alloc). Because this interface is actually meant to load data from QEMU into the firmware (using the "fw_cfg" interface), you would have to add a dummy 8KB file to fw_cfg using fw_cfg_add_file (for example "etc/erst-memory"), it can be just full of zeros. QEMU supports two chipsets, PIIX and ICH9, and the free I/O port ranges are different. You could use 0xa20 for ICH9 and 0xae20 for PIIX. All in all, the contents of the ERST table would not be very different from a non-virtual system, except that on real hardware the firmware would use SMIs as the trap mechanism. You almost have a one-to-one mapping between ERST actions and registers accesses: BEGIN_WRITE_OPERATION write value 0 to register at 0x00 BEGIN_READ_OPERATION write value 1 to register at 0x00 BEGIN_CLEAR_OPERATION write value 2 to register at 0x00 BEGIN_DUMMY_WRITE_OPERATIONwrite value 3 to register at 0x00 END_OPERATION no-op CHECK_BUSY_STATUS read register at 0x01 with mask 0x80 GET_COMMAND_STATUS
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On Wed, 16 Nov 2016 09:39:31 +0100 Thomas Huth wrote: > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > fall back to TCG otherwise. > This patch addresses two issues actually: - the annoying warning when running on a ppc64 guest on a non-ppc64 host - the fact that KVM-PR seems to be currently broken I agree that the former makes sense, but what about the case of running a x86 guest on a non-x86 host ? I'm still feeling uncomfortable with the KVM-PR case... is this a workaround we want to keep until we find out what's going on or are we starting to partially deprecate KVM PR ? In any case, I guess we should document this and probably print some meaningful error message. > Signed-off-by: Thomas Huth > --- > tests/postcopy-test.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > index d6613c5..dafe8be 100644 > --- a/tests/postcopy-test.c > +++ b/tests/postcopy-test.c > @@ -380,17 +380,21 @@ static void test_migrate(void) >" -incoming %s", >tmpfs, bootpath, uri); > } else if (strcmp(arch, "ppc64") == 0) { > +const char *accel; > + > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > init_bootfile_ppc(bootpath); > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcsource,debug-threads=on" >" -serial file:%s/src_serial" >" -drive file=%s,if=pflash,format=raw", > - tmpfs, bootpath); > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > + accel, tmpfs, bootpath); > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcdest,debug-threads=on" >" -serial file:%s/dest_serial" >" -incoming %s", > - tmpfs, uri); > + accel, tmpfs, uri); > } else { > g_assert_not_reached(); > }
Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters
> I've investigated this issue. > This command line works ok: > -drive > > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay > -device ide-hd,drive=img-blkreplay > > And this does not: > -drive > driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow > ,id=img-blkreplay > -device ide-hd,drive=img-blkreplay > > QEMU hangs at some moment of replay. > > I found that some dma requests do not pass through the blkreplay driver > due to the following line in block-backend.c: > return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags); > > This line passes read request directly to qcow driver and blkreplay cannot > process it to make deterministic. I don't understand, blk->root should be the blkreplay here. Paolo
Re: [Qemu-devel] [RFC, v1, 1/2] hw/vfio/platform: add hisilicon hnsvf device
Hi Rick, On 21/10/2016 03:22, Rick Song wrote: > The platform device class has become abstract. This > patch introduces a hisilicon hnsvf device that derives > from it. in https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03401.html we discussed the relevance to get the platform device non abstract. No change was submitted though. I can submit something next week except if you want to submit a patch yourself. The idea is we would instantiate the vfio platform device using such an option: -device vfio-platform-device,compat="hisilicon,hnsvf-v2" Once such change is accepted, only your second patch will be requested. Thanks Eric > > Signed-off-by: Rick Song > --- > hw/vfio/Makefile.objs | 1 + > hw/vfio/hisi-hnsvf.c | 56 > +++ > include/hw/vfio/vfio-hisi-hnsvf.h | 51 +++ > 3 files changed, 108 insertions(+) > create mode 100644 hw/vfio/hisi-hnsvf.c > create mode 100644 include/hw/vfio/vfio-hisi-hnsvf.h > > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs > index c25e32b..d19dffc 100644 > --- a/hw/vfio/Makefile.objs > +++ b/hw/vfio/Makefile.objs > @@ -4,5 +4,6 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o > obj-$(CONFIG_SOFTMMU) += platform.o > obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o > obj-$(CONFIG_SOFTMMU) += amd-xgbe.o > +obj-$(CONFIG_SOFTMMU) += hisi-hnsvf.o > obj-$(CONFIG_SOFTMMU) += spapr.o > endif > diff --git a/hw/vfio/hisi-hnsvf.c b/hw/vfio/hisi-hnsvf.c > new file mode 100644 > index 000..5b48e27 > --- /dev/null > +++ b/hw/vfio/hisi-hnsvf.c > @@ -0,0 +1,56 @@ > +/* > + * Hisilicon HNS Virtual Function VFIO device > + * > + * Copyright Huawei Limited, 2016 > + * > + * Authors: > + * Rick Song > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + * > + */ > + > +#include "qemu/osdep.h" > +#include "hw/vfio/vfio-hisi-hnsvf.h" > + > +static void hisi_hnsvf_realize(DeviceState *dev, Error **errp) > +{ > +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev); > +VFIOHisiHnsvfDeviceClass *k = VFIO_HISI_HNSVF_DEVICE_GET_CLASS(dev); > + > +vdev->compat = g_strdup("hisilicon,hnsvf-v2"); > + > +k->parent_realize(dev, errp); > +} > + > +static const VMStateDescription vfio_platform_hisi_hnsvf_vmstate = { > +.name = TYPE_VFIO_HISI_HNSVF, > +.unmigratable = 1, > +}; > + > +static void vfio_hisi_hnsvf_class_init(ObjectClass *klass, void *data) > +{ > +DeviceClass *dc = DEVICE_CLASS(klass); > +VFIOHisiHnsvfDeviceClass *vcxc = > +VFIO_HISI_HNSVF_DEVICE_CLASS(klass); > +vcxc->parent_realize = dc->realize; > +dc->realize = hisi_hnsvf_realize; > +dc->desc = "VFIO HISI HNSVF"; > +dc->vmsd = &vfio_platform_hisi_hnsvf_vmstate; > +} > + > +static const TypeInfo vfio_hisi_hnsvf_dev_info = { > +.name = TYPE_VFIO_HISI_HNSVF, > +.parent = TYPE_VFIO_PLATFORM, > +.instance_size = sizeof(VFIOHisiHnsvfDevice), > +.class_init = vfio_hisi_hnsvf_class_init, > +.class_size = sizeof(VFIOHisiHnsvfDeviceClass), > +}; > + > +static void register_hisi_hnsvf_dev_type(void) > +{ > +type_register_static(&vfio_hisi_hnsvf_dev_info); > +} > + > +type_init(register_hisi_hnsvf_dev_type) > diff --git a/include/hw/vfio/vfio-hisi-hnsvf.h > b/include/hw/vfio/vfio-hisi-hnsvf.h > new file mode 100644 > index 000..9208656 > --- /dev/null > +++ b/include/hw/vfio/vfio-hisi-hnsvf.h > @@ -0,0 +1,51 @@ > +/* > + * VFIO Hisilicon HNS Virtual Function device > + * > + * Copyright Hisilicon Limited, 2016 > + * > + * Authors: > + * Rick Song > + * > + * This work is licensed under the terms of the GNU GPL, version 2. See > + * the COPYING file in the top-level directory. > + * > + */ > + > +#ifndef HW_VFIO_VFIO_HISI_HNSVF_H > +#define HW_VFIO_VFIO_HISI_HNSVF_H > + > +#include "hw/vfio/vfio-platform.h" > + > +#define TYPE_VFIO_HISI_HNSVF "vfio-hisi-hnsvf" > + > +/** > + * This device exposes: > + * - 5 MMIO regions: MAC, PCS, SerDes Rx/Tx regs, > + SerDes Integration Registers 1/2 & 2/2 > + * - 2 level sensitive IRQs and optional DMA channel IRQs > + */ > +struct VFIOHisiHnsvfDevice { > +VFIOPlatformDevice vdev; > +}; > + > +typedef struct VFIOHisiHnsvfDevice VFIOHisiHnsvfDevice; > + > +struct VFIOHisiHnsvfDeviceClass { > +/*< private >*/ > +VFIOPlatformDeviceClass parent_class; > +/*< public >*/ > +DeviceRealize parent_realize; > +}; > + > +typedef struct VFIOHisiHnsvfDeviceClass VFIOHisiHnsvfDeviceClass; > + > +#define VFIO_HISI_HNSVF_DEVICE(obj) \ > + OBJECT_CHECK(VFIOHisiHnsvfDevice, (obj), TYPE_VFIO_HISI_HNSVF) > +#define VFIO_HISI_HNSVF_DEVICE_CLASS(klass) \ > + OBJECT_CLASS_CHECK(VFIOHisiHnsvfDeviceClass, (klass), \ > +TYPE_VFIO_HISI_HNSVF) > +#define VFIO_HISI_HNSVF_DEVICE_GET_CLASS(obj) \ > + OBJECT_GET_CLASS(VFIOHisiHnsvfDeviceClass, (obj), \ > + TYPE_VFIO_HISI_HNSVF) > + > +#
Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters
Am 16.11.2016 um 10:49 hat Pavel Dovgalyuk geschrieben: > > From: Pavel Dovgalyuk [mailto:dovga...@ispras.ru] > > > From: Kevin Wolf [mailto:kw...@redhat.com] > > > My command line was assuming a raw image. It looks like you're using a > > > qcow (hopefully qcow2?) image. If so, then you need to include the qcow2 > > > driver: > > > > > > -drive driver=blkreplay,if=none,image.driver=qcow2,\ > > > image.file.driver=file,image.file.filename=testdisk.qcow,id=img-blkreplay > > > > This doesn't work for some reason. Replay just hangs at some moment. > > > > Maybe there exists some internal difference between command line with one > > or two -drive > > options? > > I've investigated this issue. > This command line works ok: > -drive > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay > > -device ide-hd,drive=img-blkreplay > > And this does not: > -drive > driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdisk.qcow > ,id=img-blkreplay > -device ide-hd,drive=img-blkreplay > > QEMU hangs at some moment of replay. > > I found that some dma requests do not pass through the blkreplay driver > due to the following line in block-backend.c: > return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags); > > This line passes read request directly to qcow driver and blkreplay cannot > process it to make deterministic. How does that bypass blkreplay? blk->root is supposed to be the blkreply node, do you see something different? If it were the qcow2 node, then I would expect that no requests at all go through the blkreplay layer. Kevin
Re: [Qemu-devel] [RFC, v1, 2/2] hw/arm/sysbus-fdt: enable vfio-hisi-hnsvf dynamic instantiation
Hi, On 21/10/2016 03:22, Rick Song wrote: > This patch allows the instantiation of the vfio-hisi-hnsvf device > from the QEMU command line (-device vfio-hisi-hnsvf,host=""). > A specialized device tree node is created for the guest, containing > compat, dma-coherent, reg and interrupts properties. For additional devices, Peter requested we re-structured the sysbus-fdt.c file to avoid it gets too large. We need to define relevant helpers and put node creation function elsewhere. Similarly I can propose something next week except if you want to do it. Thanks Eric > > Signed-off-by: Rick Song > --- > hw/arm/sysbus-fdt.c | 71 > + > 1 file changed, 71 insertions(+) > > diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c > index d68e3dc..207586f 100644 > --- a/hw/arm/sysbus-fdt.c > +++ b/hw/arm/sysbus-fdt.c > @@ -36,6 +36,7 @@ > #include "hw/vfio/vfio-platform.h" > #include "hw/vfio/vfio-calxeda-xgmac.h" > #include "hw/vfio/vfio-amd-xgbe.h" > +#include "hw/vfio/vfio-hisi-hnsvf.h" > #include "hw/arm/fdt.h" > > /* > @@ -413,6 +414,75 @@ static int add_amd_xgbe_fdt_node(SysBusDevice *sbdev, > void *opaque) > return 0; > } > > +/** > + * add_hisi_hnsvf_fdt_node > + * > + * Generates a simple node with following properties: > + * compatible string, regs, interrupts, dma-coherent > + */ > +static int add_hisi_hnsvf_fdt_node(SysBusDevice *sbdev, void *opaque) > +{ > +PlatformBusFDTData *data = opaque; > +PlatformBusDevice *pbus = data->pbus; > +void *fdt = data->fdt; > +const char *parent_node = data->pbus_node_name; > +int compat_str_len, i; > +char *nodename; > +uint32_t *irq_attr, *reg_attr; > +uint64_t mmio_base, irq_number; > +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev); > +VFIODevice *vbasedev = &vdev->vbasedev; > +VFIOINTp *intp; > + > +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, 0); > +nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node, > + vbasedev->name, mmio_base); > +qemu_fdt_add_subnode(fdt, nodename); > + > +compat_str_len = strlen(vdev->compat) + 1; > +qemu_fdt_setprop(fdt, nodename, "compatible", > + vdev->compat, compat_str_len); > + > +qemu_fdt_setprop(fdt, nodename, "dma-coherent", "", 0); > + > +reg_attr = g_new(uint32_t, vbasedev->num_regions * 2); > +for (i = 0; i < vbasedev->num_regions; i++) { > +mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i); > +reg_attr[2 * i] = cpu_to_be32(mmio_base); > +reg_attr[2 * i + 1] = cpu_to_be32( > +memory_region_size(vdev->regions[i]->mem)); > +} > +qemu_fdt_setprop(fdt, nodename, "reg", reg_attr, > + vbasedev->num_regions * 2 * sizeof(uint32_t)); > + > +irq_attr = g_new(uint32_t, vbasedev->num_irqs * 3); > +for (i = 0; i < vbasedev->num_irqs; i++) { > +irq_number = platform_bus_get_irqn(pbus, sbdev , i) > + + data->irq_start; > +irq_attr[3 * i] = cpu_to_be32(GIC_FDT_IRQ_TYPE_SPI); > +irq_attr[3 * i + 1] = cpu_to_be32(irq_number); > + > +QLIST_FOREACH(intp, &vdev->intp_list, next) { > +if (intp->pin == i) { > +break; > +} > +} > + > +if (intp->flags & VFIO_IRQ_INFO_AUTOMASKED) { > +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_LEVEL_HI); > +} else { > +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI); > +} > +} > +qemu_fdt_setprop(fdt, nodename, "interrupts", > + irq_attr, vbasedev->num_irqs * 3 * sizeof(uint32_t)); > +g_free(irq_attr); > +g_free(reg_attr); > +g_free(nodename); > +return 0; > + > +} > + > #endif /* CONFIG_LINUX */ > > /* list of supported dynamic sysbus devices */ > @@ -420,6 +490,7 @@ static const NodeCreationPair add_fdt_node_functions[] = { > #ifdef CONFIG_LINUX > {TYPE_VFIO_CALXEDA_XGMAC, add_calxeda_midway_xgmac_fdt_node}, > {TYPE_VFIO_AMD_XGBE, add_amd_xgbe_fdt_node}, > +{TYPE_VFIO_HISI_HNSVF, add_hisi_hnsvf_fdt_node}, > #endif > {"", NULL}, /* last element */ > }; >
Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
On Tue, 15 Nov 2016 15:34:45 -0200 Eduardo Habkost wrote: > On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote: > [...] > > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data) > > if (pcms->fw_cfg) { > > pc_build_smbios(pcms->fw_cfg); > > pc_build_feature_control_file(pcms); > > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ > > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); > > } > > > > if (pcms->apic_id_limit > 255) { > > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms) > > assert(MACHINE(pcms)->kernel_filename != NULL); > > > > fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); > > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); > > rom_set_fw(fw_cfg); > > > > load_linux(pcms, fw_cfg); > > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, > > } > > } > > > > +/* increment the number of CPUs */ > > +pcms->boot_cpus++; > > if (dev->hotplugged) { > > -/* increment the number of CPUs */ > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + > > 1); > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > } > > > > found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); > > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler > > *hotplug_dev, > > found_cpu->cpu = NULL; > > object_unparent(OBJECT(dev)); > > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1); > > +/* decrement the number of CPUs */ > > +pcms->boot_cpus--; > > +/* Update the number of CPUs in CMOS */ > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug, > too? Indeed, it should be updated otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs if CPUs count is above 256. the same bug has been present in the reverted "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs" Thanks for noticing it! I'll post v3 as reply to this thread.
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
* Greg Kurz (gr...@kaod.org) wrote: > On Wed, 16 Nov 2016 09:39:31 +0100 > Thomas Huth wrote: > > > The ppc64 postcopy test does not work with KVM-PR, and it is also > > causing annoying warning messages when run on a x86 host. So let's > > use KVM here only if we know that we're running with KVM-HV (which > > automatically also means that we're running on a ppc64 host), and > > fall back to TCG otherwise. > > > > This patch addresses two issues actually: > - the annoying warning when running on a ppc64 guest on a non-ppc64 host > - the fact that KVM-PR seems to be currently broken > > I agree that the former makes sense, but what about the case of running > a x86 guest on a non-x86 host ? > > I'm still feeling uncomfortable with the KVM-PR case... is this a workaround > we want to keep until we find out what's going on or are we starting to > partially deprecate KVM PR ? In any case, I guess we should document this > and probably print some meaningful error message. This is certainly a work around for now, it doesn't suggest anything about deprecation. Dave > > Signed-off-by: Thomas Huth > > --- > > tests/postcopy-test.c | 12 > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > > index d6613c5..dafe8be 100644 > > --- a/tests/postcopy-test.c > > +++ b/tests/postcopy-test.c > > @@ -380,17 +380,21 @@ static void test_migrate(void) > >" -incoming %s", > >tmpfs, bootpath, uri); > > } else if (strcmp(arch, "ppc64") == 0) { > > +const char *accel; > > + > > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr > > */ > > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > > init_bootfile_ppc(bootpath); > > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" > >" -name pcsource,debug-threads=on" > >" -serial file:%s/src_serial" > >" -drive file=%s,if=pflash,format=raw", > > - tmpfs, bootpath); > > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > > + accel, tmpfs, bootpath); > > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" > >" -name pcdest,debug-threads=on" > >" -serial file:%s/dest_serial" > >" -incoming %s", > > - tmpfs, uri); > > + accel, tmpfs, uri); > > } else { > > g_assert_not_reached(); > > } > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH] HACKING: document #include order
On Wed, Nov 16, 2016 at 9:39 AM, Markus Armbruster wrote: > Eric Blake writes: > >> On 11/15/2016 02:29 PM, Stefan Hajnoczi wrote: >>> It was not obvious to me why "qemu/osdep.h" must be the first #include. >>> This documents the rationale and the overall #include order. >>> >>> Cc: Fam Zheng >>> Cc: Markus Armbruster >>> Cc: Eric Blake >>> Signed-off-by: Stefan Hajnoczi >>> --- >>> HACKING | 15 +++ >>> 1 file changed, 15 insertions(+) >>> >> >>> +1.2. Include directives >>> + >>> +Order include directives as follows: >>> + >>> +#include "qemu/osdep.h" /* Always first... */ >>> +#include <...> /* then system headers... */ >>> +#include "..." /* and finally QEMU headers. */ >>> + >>> +The "qemu/osdep.h" header contains preprocessor macros that affect the >>> behavior >>> +of core system headers like . It must be the first include so >>> that >>> +core system headers included by external libraries get the preprocessor >>> macros >>> +that QEMU depends on. >> >> Might be worth mentioning that only .c files include osdep.h (.h files >> do not need to, because they can only be included by a .c file that has >> already included osdep.h first). > > Yes, please, but make it "headers should not include osdep.h". Will send v2. Stefan
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On Wed, 16 Nov 2016 12:24:50 + "Dr. David Alan Gilbert" wrote: > * Greg Kurz (gr...@kaod.org) wrote: > > On Wed, 16 Nov 2016 09:39:31 +0100 > > Thomas Huth wrote: > > > > > The ppc64 postcopy test does not work with KVM-PR, and it is also > > > causing annoying warning messages when run on a x86 host. So let's > > > use KVM here only if we know that we're running with KVM-HV (which > > > automatically also means that we're running on a ppc64 host), and > > > fall back to TCG otherwise. > > > > > > > This patch addresses two issues actually: > > - the annoying warning when running on a ppc64 guest on a non-ppc64 host > > - the fact that KVM-PR seems to be currently broken > > > > I agree that the former makes sense, but what about the case of running > > a x86 guest on a non-x86 host ? > > > > I'm still feeling uncomfortable with the KVM-PR case... is this a workaround > > we want to keep until we find out what's going on or are we starting to > > partially deprecate KVM PR ? In any case, I guess we should document this > > and probably print some meaningful error message. > > This is certainly a work around for now, it doesn't suggest anything about > deprecation. > Well it doesn't suggest anything actually, it just silently skips KVM PR... I would at least expect a comment in the code mentioning this is a workaround and maybe an explicit warning for the user. If the user really wants to run this test with KVM on ppc64, then she should ensure it is KVM HV. Cheers. -- Greg > Dave > > > > Signed-off-by: Thomas Huth > > > --- > > > tests/postcopy-test.c | 12 > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > > > index d6613c5..dafe8be 100644 > > > --- a/tests/postcopy-test.c > > > +++ b/tests/postcopy-test.c > > > @@ -380,17 +380,21 @@ static void test_migrate(void) > > >" -incoming %s", > > >tmpfs, bootpath, uri); > > > } else if (strcmp(arch, "ppc64") == 0) { > > > +const char *accel; > > > + > > > +/* On ppc64, the test only works with kvm-hv, but not with > > > kvm-pr */ > > > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > > > init_bootfile_ppc(bootpath); > > > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > > > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" > > >" -name pcsource,debug-threads=on" > > >" -serial file:%s/src_serial" > > >" -drive file=%s,if=pflash,format=raw", > > > - tmpfs, bootpath); > > > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > > > + accel, tmpfs, bootpath); > > > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" > > >" -name pcdest,debug-threads=on" > > >" -serial file:%s/dest_serial" > > >" -incoming %s", > > > - tmpfs, uri); > > > + accel, tmpfs, uri); > > > } else { > > > g_assert_not_reached(); > > > } > > > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
On Wed, Nov 16, 2016 at 01:24:11PM +0100, Igor Mammedov wrote: > On Tue, 15 Nov 2016 15:34:45 -0200 > Eduardo Habkost wrote: > > > On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote: > > [...] > > > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data) > > > if (pcms->fw_cfg) { > > > pc_build_smbios(pcms->fw_cfg); > > > pc_build_feature_control_file(pcms); > > > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ > > > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); > > > } > > > > > > if (pcms->apic_id_limit > 255) { > > > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms) > > > assert(MACHINE(pcms)->kernel_filename != NULL); > > > > > > fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); > > > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); > > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); > > > rom_set_fw(fw_cfg); > > > > > > load_linux(pcms, fw_cfg); > > > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler > > > *hotplug_dev, > > > } > > > } > > > > > > +/* increment the number of CPUs */ > > > +pcms->boot_cpus++; > > > if (dev->hotplugged) { > > > -/* increment the number of CPUs */ > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) > > > + 1); > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > > } > > > > > > found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); > > > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler > > > *hotplug_dev, > > > found_cpu->cpu = NULL; > > > object_unparent(OBJECT(dev)); > > > > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1); > > > +/* decrement the number of CPUs */ > > > +pcms->boot_cpus--; > > > +/* Update the number of CPUs in CMOS */ > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > > > Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug, > > too? > Indeed, it should be updated > otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs > if CPUs count is above 256. > > the same bug has been present in the reverted > "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs" The "etc/boot-cpus" patch changed boot_cpus_le on the plug/unplug callbacks. > Thanks for noticing it! > I'll post v3 as reply to this thread. Thanks! -- Eduardo
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
> If the consensus is that the patch is a QEMU bugfix (as opposed to a > feature) and that it is eligible for the currently supported upstream > stable branches, that's the best, no doubt. The currently supported upstream stable branches is just 2.7. :) I'm okay with bending the rules and including it in 2.8, but it's worrisome that you also needed to go back from relaxed to traditional delivery, meaning that old QEMU + new OVMF will take ages to boot. If this is the case, I still think this needs some kind of discovery mechanism, unless OVMF can just say "things were too broken, stop supporting SMM on QEMUs older than 2.8". For example: - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP setting is used for the PCD; this would be backwards compatibility mode. - we could have another magic 0xB2 value, which is implemented directly in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) to detect the new feature. It can fail to start if using traditional AP and the new feature is not there. By the way, in case OVMF needs to use SmmSwDispatch in the future, I would make QEMU use broadcast behavior for all values in the 0x10-0xff range, or something like that. Paolo > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually > correct; when I was writing the OVMF docs, I must have misunderstood the > requirements and needlessly required 2.5+; 2.4+ should have been fine.) > > Which means the fix should be backported as far as stable-2.4. > > Should we proceed with that? CC'ing Mike Roth and the stable list. > > Thanks! > Laszlo > > > > > > >>> > >>> Paolo > >>> > --- > hw/isa/lpc_ich9.c | 12 +++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > index 10d1ee8b9310..f2fe644fdaa4 100644 > --- a/hw/isa/lpc_ich9.c > +++ b/hw/isa/lpc_ich9.c > @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool > smm_enabled) > > /* APM */ > > +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > + > static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > { > ICH9LPCState *lpc = arg; > @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, > void *arg) > > /* SMI_EN = PMBASE + 30. SMI control and enable register */ > if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > +CPUState *cs; > + > +CPU_FOREACH(cs) { > +cpu_interrupt(cs, CPU_INTERRUPT_SMI); > +} > +} else { > +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > +} > } > } > > > >
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: > > > On 11/14/2016 09:12 AM, Christopher Covington wrote: > > Hi Drew, Wei, > > > > On 11/14/2016 05:05 AM, Andrew Jones wrote: > >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > >>> > >>> > >>> On 11/11/2016 01:43 AM, Andrew Jones wrote: > On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > > From: Christopher Covington > > > > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > > even for the smallest delta of two subsequent reads. > > > > Signed-off-by: Christopher Covington > > Signed-off-by: Wei Huang > > --- > > arm/pmu.c | 98 > > +++ > > 1 file changed, 98 insertions(+) > > > > diff --git a/arm/pmu.c b/arm/pmu.c > > index 0b29088..d5e3ac3 100644 > > --- a/arm/pmu.c > > +++ b/arm/pmu.c > > @@ -14,6 +14,7 @@ > > */ > > #include "libcflat.h" > > > > +#define PMU_PMCR_E (1 << 0) > > #define PMU_PMCR_N_SHIFT 11 > > #define PMU_PMCR_N_MASK0x1f > > #define PMU_PMCR_ID_SHIFT 16 > > @@ -21,6 +22,10 @@ > > #define PMU_PMCR_IMP_SHIFT 24 > > #define PMU_PMCR_IMP_MASK 0xff > > > > +#define PMU_CYCLE_IDX 31 > > + > > +#define NR_SAMPLES 10 > > + > > #if defined(__arm__) > > static inline uint32_t pmcr_read(void) > > { > > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > > return ret; > > } > > + > > +static inline void pmcr_write(uint32_t value) > > +{ > > + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > > +} > > + > > +static inline void pmselr_write(uint32_t value) > > +{ > > + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > > +} > > + > > +static inline void pmxevtyper_write(uint32_t value) > > +{ > > + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > > +} > > + > > +/* > > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, > > returning 64 > > + * bits doesn't seem worth the trouble when differential usage of the > > result is > > + * expected (with differences that can easily fit in 32 bits). So just > > return > > + * the lower 32 bits of the cycle count in AArch32. > > Like I said in the last review, I'd rather we not do this. We should > return the full value and then the test case should confirm the upper > 32 bits are zero. > >>> > >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > >>> register. We can force it to a more coarse-grained cycle counter with > >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > > > > AArch32 System Register Descriptions > > Performance Monitors registers > > PMCCNTR, Performance Monitors Cycle Count Register > > > > To access the PMCCNTR when accessing as a 32-bit register: > > MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > > MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are > > unchanged > > > > To access the PMCCNTR when accessing as a 64-bit register: > > MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] > > into Rt2 > > MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to > > PMCCNTR[63:32] > > > > Thanks. I did some research based on your info and came back with the > following proposals (Cov, correct me if I am wrong): > > By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I > think this 64-bit cycle register is only available when running under > aarch32 compatibility mode on ARMv8 because it is not specified in A15 > TRM. OK, I hadn't realized that there would be differences between v7 and AArch32. It looks like we need to add a function to the kvm-unit-tests framework that enables unit tests to make that distinction, because we'll want to explicitly test those differences in order to flush out emulation bugs. I see now that Appendix K5 of the v8 ARM ARM lists some differences, but this PMCCNTR difference isn't there... As v8-A32 is an update/extension of v7-A, I'd expect there to be a RES0 bit in some v7 ID register that, on v8, is no longer reserved and a 1. Unfortunately I just did some ARM doc skimming but can't find anything like that. As we currently only use the cortex-a15 for our v7 processor, then I guess we can just check MIDR, but yuck. Anyway, I'll send a patch for that. > To further verify it, I tested 32-bit pmu code on QEMU with TCG > mode. The result is: accessing 64-bit PMCCNTR using the following > assembly failed on A15: > >volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); > or >volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); > > Given this difference, I think there are
Re: [Qemu-devel] [PATCH for-2.8 v2 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
On Wed, 16 Nov 2016 10:39:33 -0200 Eduardo Habkost wrote: > On Wed, Nov 16, 2016 at 01:24:11PM +0100, Igor Mammedov wrote: > > On Tue, 15 Nov 2016 15:34:45 -0200 > > Eduardo Habkost wrote: > > > > > On Tue, Nov 15, 2016 at 01:17:16PM +0100, Igor Mammedov wrote: > > > [...] > > > > @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void > > > > *data) > > > > if (pcms->fw_cfg) { > > > > pc_build_smbios(pcms->fw_cfg); > > > > pc_build_feature_control_file(pcms); > > > > +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ > > > > +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, > > > > pcms->boot_cpus); > > > > } > > > > > > > > if (pcms->apic_id_limit > 255) { > > > > @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms) > > > > assert(MACHINE(pcms)->kernel_filename != NULL); > > > > > > > > fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); > > > > -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); > > > > +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); > > > > rom_set_fw(fw_cfg); > > > > > > > > load_linux(pcms, fw_cfg); > > > > @@ -1824,9 +1828,10 @@ static void pc_cpu_plug(HotplugHandler > > > > *hotplug_dev, > > > > } > > > > } > > > > > > > > +/* increment the number of CPUs */ > > > > +pcms->boot_cpus++; > > > > if (dev->hotplugged) { > > > > -/* increment the number of CPUs */ > > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, > > > > 0x5f) + 1); > > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > > > } > > > > > > > > found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); > > > > @@ -1880,7 +1885,10 @@ static void pc_cpu_unplug_cb(HotplugHandler > > > > *hotplug_dev, > > > > found_cpu->cpu = NULL; > > > > object_unparent(OBJECT(dev)); > > > > > > > > -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - > > > > 1); > > > > +/* decrement the number of CPUs */ > > > > +pcms->boot_cpus--; > > > > +/* Update the number of CPUs in CMOS */ > > > > +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); > > > > > > Don't we need to call fw_cfg_modify_i16() on hotplug/hot-unplug, > > > too? > > Indeed, it should be updated > > otherwise it will hang on reboot in BIOS waiting for wrong number of CPUs > > if CPUs count is above 256. > > > > the same bug has been present in the reverted > > "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs" > > The "etc/boot-cpus" patch changed boot_cpus_le on the plug/unplug > callbacks. Ah yes, I've forgotten that boot_cpus_le has been directly accessible by fwcfg > > > > Thanks for noticing it! > > I'll post v3 as reply to this thread. > > Thanks! >
[Qemu-devel] [PATCH for-2.8 v3 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
Signed-off-by: Igor Mammedov --- v3: - Update FW_CFG_NB_CPUS on CPU hot(un)plug to avoid hang in BIOS on reboot if number of CPUs is over 256 (Eduardo) --- include/hw/i386/pc.h | 2 ++ hw/i386/pc.c | 44 +++- 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index e32e957..67a1a9e 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -36,6 +36,7 @@ /** * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling + * @boot_cpus: number of present VCPUs */ struct PCMachineState { /*< private >*/ @@ -70,6 +71,7 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; CPUArchIdList *possible_cpus; +uint16_t boot_cpus; /* NUMA information: */ uint64_t numa_nodes; diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 5aeae7d..677a594 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -744,7 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms) int i, j; fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86: * @@ -1087,17 +1087,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) } } -static int pc_present_cpus_count(PCMachineState *pcms) -{ -int i, boot_cpus = 0; -for (i = 0; i < pcms->possible_cpus->len; i++) { -if (pcms->possible_cpus->cpus[i].cpu) { -boot_cpus++; -} -} -return boot_cpus; -} - static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) { @@ -1234,6 +1223,19 @@ static void pc_build_feature_control_file(PCMachineState *pcms) fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val)); } +static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count) +{ +if (cpus_count > 0xff) { +/* If the number of CPUs can't be represented in 8 bits, the + * BIOS must use "FW_CFG_NB_CPUS". Set RTC field to 0 just + * to make old BIOSes fail more predictably. + */ +rtc_set_memory(rtc, 0x5f, 0); +} else { +rtc_set_memory(rtc, 0x5f, cpus_count - 1); +} +} + static void pc_machine_done(Notifier *notifier, void *data) { @@ -1242,7 +1244,7 @@ void pc_machine_done(Notifier *notifier, void *data) PCIBus *bus = pcms->bus; /* set the number of CPUs */ -rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1); +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); if (bus) { int extra_hosts = 0; @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data) if (pcms->fw_cfg) { pc_build_smbios(pcms->fw_cfg); pc_build_feature_control_file(pcms); +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); } if (pcms->apic_id_limit > 255) { @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms) assert(MACHINE(pcms)->kernel_filename != NULL); fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); rom_set_fw(fw_cfg); load_linux(pcms, fw_cfg); @@ -1824,9 +1828,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, } } +/* increment the number of CPUs */ +pcms->boot_cpus++; if (dev->hotplugged) { -/* increment the number of CPUs */ -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1); +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); } found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); @@ -1880,7 +1886,11 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev, found_cpu->cpu = NULL; object_unparent(OBJECT(dev)); -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1); +/* decrement the number of CPUs */ +pcms->boot_cpus--; +/* Update the number of CPUs in CMOS */ +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); out: error_propagate(errp, local_err); } -- 2.7.4
Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support
On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote: > CCing qemu-devel. > > CCing Markus, in case he has any insights about the interface > introspection. > > On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote: > > On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote: > > > On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote: > > > > cpu features are passed to the qemu command with feature=on/off > > > > instead of +/-feature. > > > > > > > > Signed-off-by: Collin L. Walling > > > > > > If I'm not mistaken, the "feature=on|off" syntax was added on > > > QEMU 2.0.0. Does current libvirt support older QEMU versions? > > > > Of course it does. I'd love to switch to feature=on|off, but how can we > > check if QEMU supports it? We can't really start using this syntax > > without it. > > Actually, I was wrong, this was added in v2.4.0. "feat=on|off" > needs two things to work (in x86): > > * Translation of all "foo=bar" options to QOM property setting. > This was added in v2.0.0-rc0~162^2 > * The actual QOM properties for feature names to be present. They > were added in v2.4.0-rc0~101^2~1 > > So you can be sure "feat=on" is supported by checking if the > feature flags are present in device-list-properties output for > the CPU model. But device-list-properties is also messy[1]. > > Maybe we can use the availability of query-cpu-model-expansion to > check if we can safely use the new "feat=on|off" system? It's > easier than taking all the variables above into account. Yeah, this could work since s390 already supports query-cpu-model-expansion. It would cause feature=on|off not to be used on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal, is it? Jirka
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On 16.11.2016 13:37, Greg Kurz wrote: > On Wed, 16 Nov 2016 12:24:50 + > "Dr. David Alan Gilbert" wrote: > >> * Greg Kurz (gr...@kaod.org) wrote: >>> On Wed, 16 Nov 2016 09:39:31 +0100 >>> Thomas Huth wrote: >>> The ppc64 postcopy test does not work with KVM-PR, and it is also causing annoying warning messages when run on a x86 host. So let's use KVM here only if we know that we're running with KVM-HV (which automatically also means that we're running on a ppc64 host), and fall back to TCG otherwise. >>> >>> This patch addresses two issues actually: >>> - the annoying warning when running on a ppc64 guest on a non-ppc64 host >>> - the fact that KVM-PR seems to be currently broken >>> >>> I agree that the former makes sense, but what about the case of running >>> a x86 guest on a non-x86 host ? Of course you also get these '"kvm" accelerator not found' messages there. But so far, I think nobody complained about that yet (only for ppc64 running on x86). And at least the test succeeds there - unlike with KVM-PR, where the test fails completely. >>> I'm still feeling uncomfortable with the KVM-PR case... is this a workaround >>> we want to keep until we find out what's going on or are we starting to >>> partially deprecate KVM PR ? In any case, I guess we should document this >>> and probably print some meaningful error message. >> >> This is certainly a work around for now, it doesn't suggest anything about >> deprecation. > > Well it doesn't suggest anything actually, it just silently skips KVM PR... > I would at least expect a comment in the code mentioning this is a > workaround and maybe an explicit warning for the user. If the user really > wants to run this test with KVM on ppc64, then she should ensure it is > KVM HV. Honestly, also considering the number of patches that Laurent already wrote here and never have been accepted, all this has become quite an ugly bike-shed painting discussion. My opinion: - If we want to properly test KVM (be it KVM-HV or KVM-PR), write a proper kvm-unit-test instead. I.e. I personally don't care if this test in QEMU is only run with TCG or with KVM. - The current status of "make check" is broken, since it does not work on KVM-PR. We've got to fix that before the release. That means I currently really don't care if we've spill out a warning message for KVM-PR here or not - sure, somebody just got to look at KVM-PR later, but that's IMHO off-topic for the test here in the QEMU context. So if you think that the patch for fixing this issue here with the QEMU test should look differently, please propose a different patch instead. I'm fine with every other approach as long as we get this fixed in time for QEMU 2.8. Thomas
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote: > > > If the consensus is that the patch is a QEMU bugfix (as opposed to a > > feature) and that it is eligible for the currently supported upstream > > stable branches, that's the best, no doubt. > > The currently supported upstream stable branches is just 2.7. :) > > I'm okay with bending the rules and including it in 2.8, but it's > worrisome that you also needed to go back from relaxed to traditional > delivery, meaning that old QEMU + new OVMF will take ages to boot. > > If this is the case, I still think this needs some kind of discovery > mechanism, unless OVMF can just say "things were too broken, stop > supporting SMM on QEMUs older than 2.8". > > For example: > > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP > setting is used for the PCD; this would be backwards compatibility mode. > > - we could have another magic 0xB2 value, which is implemented directly > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > to detect the new feature. It can fail to start if using traditional > AP and the new feature is not there. If we keep collecting these magic values, should architect it and do a host/guest bitmap like virtio does? > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > would make QEMU use broadcast behavior for all values in the 0x10-0xff > range, or something like that. > > Paolo It bothers me with all these ideas is that it's PV. Unavoidable? > > For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The > > SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually > > correct; when I was writing the OVMF docs, I must have misunderstood the > > requirements and needlessly required 2.5+; 2.4+ should have been fine.) > > > > Which means the fix should be backported as far as stable-2.4. > > > > Should we proceed with that? CC'ing Mike Roth and the stable list. > > > > Thanks! > > Laszlo > > > > > > > > > > >>> > > >>> Paolo > > >>> > > --- > > hw/isa/lpc_ich9.c | 12 +++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c > > index 10d1ee8b9310..f2fe644fdaa4 100644 > > --- a/hw/isa/lpc_ich9.c > > +++ b/hw/isa/lpc_ich9.c > > @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool > > smm_enabled) > > > > /* APM */ > > > > +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' > > + > > static void ich9_apm_ctrl_changed(uint32_t val, void *arg) > > { > > ICH9LPCState *lpc = arg; > > @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, > > void *arg) > > > > /* SMI_EN = PMBASE + 30. SMI control and enable register */ > > if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { > > -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > > +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { > > +CPUState *cs; > > + > > +CPU_FOREACH(cs) { > > +cpu_interrupt(cs, CPU_INTERRUPT_SMI); > > +} > > +} else { > > +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); > > +} > > } > > } > > > > > > > >
Re: [Qemu-devel] [PATCH v5 7/9] block: don't make snapshots for filters
> From: Paolo Bonzini [mailto:pbonz...@redhat.com] > > I've investigated this issue. > > This command line works ok: > > -drive > > > > driver=blkreplay,if=none,image.driver=file,image.filename=testdisk.qcow,id=img-blkreplay > > -device ide-hd,drive=img-blkreplay > > > > And this does not: > > -drive > > > driver=blkreplay,if=none,image.driver=qcow2,image.file.driver=file,image.file.filename=testdis > k.qcow > > ,id=img-blkreplay > > -device ide-hd,drive=img-blkreplay > > > > QEMU hangs at some moment of replay. > > > > I found that some dma requests do not pass through the blkreplay driver > > due to the following line in block-backend.c: > > return bdrv_co_preadv(blk->root, offset, bytes, qiov, flags); > > > > This line passes read request directly to qcow driver and blkreplay cannot > > process it to make deterministic. > > I don't understand, blk->root should be the blkreplay here. I've got some more logs. I used the disk image which references the backing file. It seems that some weird things happen with both command lines. == For the first command line (blkreplay separated from image): blk_co_preadv(img-blkreplay) -> bdrv_co_preadv(qcow2, temp_overlay1) -> bdrv_co_preadv(blkreplay, temp_overlay) -> bdrv_co_preadv(qcow2, temp_overlay2) -> bdrv_co_preadv(qcow2, image_overlay) -> bdrv_co_preadv(qcow2, image_backing) -> bdrv_co_preadv(file, image_backing) But sometimes it changes to: blk_co_preadv(img-blkreplay) -> bdrv_co_preadv(qcow2, temp_overlay1) -> bdrv_co_preadv(file, temp_overlay1) == For the second command line (blkreplay combined with image): In most cases we have the following call stack: blk_co_preadv(img-blkreplay) -> bdrv_co_preadv(qcow2, temp_overlay) -> bdrv_co_preadv(blkreplay, image_overlay) -> bdrv_co_preadv(qcow2, image_overlay) -> bdrv_co_preadv(qcow2, image_backing) -> bdrv_co_preadv(file, image_backing) But sometimes it changes to: blk_co_preadv(img-blkreplay) -> bdrv_co_preadv(qcow2, temp overlay) -> bdrv_co_preadv(file, temp overlay) It seems, that temporary overlay is created over blkreplay, which it intended to work as a simple filter. Is that correct? Pavel Dovgalyuk
Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote: > On Thu, 10 Nov 2016 21:20:36 +0200 > "Michael S. Tsirkin" wrote: > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote: > > > On Thu, 10 Nov 2016 17:54:35 +0200 > > > "Michael S. Tsirkin" wrote: > > > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote: > > > > > On Thu, 10 Nov 2016 17:14:24 +0200 > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote: > > > > > > > From: "Aviv Ben-David" > > > > > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register. > > > > > > > This capability is controlled by "cache-mode" property of > > > > > > > intel-iommu device. > > > > > > > To enable this option call QEMU with "-device > > > > > > > intel-iommu,cache-mode=true". > > > > > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain > > > > > > > belong to > > > > > > > registered notifier, and notify accordingly. > > > > > > > > > > > > This looks sane I think. Alex, care to comment? > > > > > > Merging will have to wait until after the release. > > > > > > Pls remember to re-test and re-ping then. > > > > > > > > > > I don't think it's suitable for upstream until there's a reasonable > > > > > replay mechanism > > > > > > > > Could you pls clarify what do you mean by replay? > > > > Is this when you attach a device by hotplug to > > > > a running system? > > > > > > > > If yes this can maybe be addressed by disabling hotplug temporarily. > > > > > > No, hotplug is not required, moving a device between existing domains > > > requires replay, ie. actually using it for nested device assignment. > > > > Good point, that one is a correctness thing. Aviv, > > could you add this in TODO list in a cover letter pls? > > > > > > > and we straighten out whether it's expected to get > > > > > multiple notifies and the notif-ee is responsible for filtering > > > > > them or if the notif-er should do filtering. > > > > > > > > OK this is a documentation thing. > > > > > > Well no, it needs to be decided and if necessary implemented. > > > > Let's assume it's the notif-ee for now. Less is more and all that. > > I think this is opposite of the approach dwg suggested. > > > > > > Without those, this is > > > > > effectively just an RFC. > > > > > > > > It's infrastructure without users so it doesn't break things, > > > > I'm more interested in seeing whether it's broken in > > > > some way than whether it's complete. > > > > > > If it allows use with vfio but doesn't fully implement the complete set > > > of interfaces, it does break things. We currently prevent viommu usage > > > with vfio because it is incomplete. > > > > Right - that bit is still in as far as I can see. > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with > vfio even though it's still incomplete. We would at least need > something like a replay callback for VT-d that triggers an abort if you > still want to accept it incomplete. Thanks, > > Alex IIUC practically things seems to work, right? So how about disabling by default with a flag for people that want to experiment with it? E.g. x-vfio-allow-broken-translations ? I would like to help this make progress such that 1. Aviv gets the credit he did so far and 2. more people can join development and help complete it. > > > > The patchset spent out of tree too long and I'd like to see > > > > us make progress towards device assignment working with > > > > vIOMMU sooner rather than later, so if it's broken I won't > > > > merge it but if it's incomplete I will. > > > > > > So long as it's incomplete and still prevents vfio usage, I'm ok with > > > merging it, but I don't want to enable vfio usage until it's complete. > > > Thanks, > > > > > > Alex > > > > > > > > > > Currently this patch still doesn't enabling VFIO devices support > > > > > > > with vIOMMU > > > > > > > present. Current problems: > > > > > > > * vfio_iommu_map_notify is not aware about memory range belong to > > > > > > > specific > > > > > > > VFIOGuestIOMMU. > > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it > > > > > > > itterate over > > > > > > > 64bit address space. Commenting out the call to this function > > > > > > > enables > > > > > > > workable VFIO device while vIOMMU present. > > > > > > > * vfio_iommu_map_notify should check if address space range is > > > > > > > suitable for > > > > > > > current notifier. > > > > > > > > > > > > > > Changes from v1 to v2: > > > > > > > * remove assumption that the cache do not clears > > > > > > > * fix lockup on high load. > > > > > > > > > > > > > > Changes from v2 to v3: > > > > > > > * remove debug leftovers > > > > > > > * split to sepearate commits > > > > > > > * change is_write to flags in vtd_do_iommu_translate, add > > >
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On 16/11/2016 14:18, Michael S. Tsirkin wrote: > > - we could have another magic 0xB2 value, which is implemented directly > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > to detect the new feature. It can fail to start if using traditional > > AP and the new feature is not there. > > If we keep collecting these magic values, should architect it > and do a host/guest bitmap like virtio does? The value written in 0xB3 can certainly be a feature bitmap. For now we would have for example bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI bit 1-7 zero Paolo
Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support
On Wed, Nov 16, 2016 at 02:15:02PM +0100, Jiri Denemark wrote: > On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote: > > CCing qemu-devel. > > > > CCing Markus, in case he has any insights about the interface > > introspection. > > > > On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote: > > > On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote: > > > > On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote: > > > > > cpu features are passed to the qemu command with feature=on/off > > > > > instead of +/-feature. > > > > > > > > > > Signed-off-by: Collin L. Walling > > > > > > > > If I'm not mistaken, the "feature=on|off" syntax was added on > > > > QEMU 2.0.0. Does current libvirt support older QEMU versions? > > > > > > Of course it does. I'd love to switch to feature=on|off, but how can we > > > check if QEMU supports it? We can't really start using this syntax > > > without it. > > > > Actually, I was wrong, this was added in v2.4.0. "feat=on|off" > > needs two things to work (in x86): > > > > * Translation of all "foo=bar" options to QOM property setting. > > This was added in v2.0.0-rc0~162^2 > > * The actual QOM properties for feature names to be present. They > > were added in v2.4.0-rc0~101^2~1 > > > > So you can be sure "feat=on" is supported by checking if the > > feature flags are present in device-list-properties output for > > the CPU model. But device-list-properties is also messy[1]. > > > > Maybe we can use the availability of query-cpu-model-expansion to > > check if we can safely use the new "feat=on|off" system? It's > > easier than taking all the variables above into account. > > Yeah, this could work since s390 already supports > query-cpu-model-expansion. It would cause feature=on|off not to be used > on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal, > is it? Not a problem, as we have no plans to remove +feat/-feat support in x86 anymore. -- Eduardo
Re: [Qemu-devel] [PATCH for-2.8 v3 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
On Wed, Nov 16, 2016 at 02:04:41PM +0100, Igor Mammedov wrote: > Signed-off-by: Igor Mammedov Reviewed-by: Eduardo Habkost -- Eduardo
Re: [Qemu-devel] [PATCH] display: cirrus: check vga bits per pixel(bpp) value
Hi On Tue, Oct 18, 2016 at 11:46 AM P J P wrote: > From: Prasad J Pandit > > In Cirrus CLGD 54xx VGA Emulator, if cirrus graphics mode is VGA, > 'cirrus_get_bpp' returns zero(0), which could lead to a divide > by zero error in while copying pixel data. The same could occur > via blit pitch values. Add check to avoid it. > For completeness, do you have a reproducer and/or a backtrace? > > Reported-by: Huawei PSIRT > Signed-off-by: Prasad J Pandit > --- > hw/display/cirrus_vga.c | 14 ++ > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c > index 3d712d5..bdb092e 100644 > --- a/hw/display/cirrus_vga.c > +++ b/hw/display/cirrus_vga.c > @@ -272,6 +272,9 @@ static void cirrus_update_memory_access(CirrusVGAState > *s); > static bool blit_region_is_unsafe(struct CirrusVGAState *s, >int32_t pitch, int32_t addr) > { > +if (!pitch) { > +return true; > +} > That doesn't look directly related to 'cirrus_get_bpp', care to explain? if (pitch < 0) { > int64_t min = addr > + ((int64_t)s->cirrus_blt_height-1) * pitch; > @@ -715,7 +718,7 @@ static int > cirrus_bitblt_videotovideo_patterncopy(CirrusVGAState * s) > s->cirrus_addr_mask)); > } > > -static void cirrus_do_copy(CirrusVGAState *s, int dst, int src, int w, > int h) > +static int cirrus_do_copy(CirrusVGAState *s, int dst, int src, int w, int > h) > { > int sx = 0, sy = 0; > int dx = 0, dy = 0; > @@ -729,6 +732,9 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst, > int src, int w, int h) > int width, height; > > depth = s->vga.get_bpp(&s->vga) / 8; > +if (!depth) { > +return 0; > +} > Makes sense, since 'cirrus_get_bpp' would return 0 in VGA mode. But isn't this a cirrus operation (not VGA), how did it get there? Perhaps this should be catched earlier (invalid VGA operations). s->vga.get_resolution(&s->vga, &width, &height); > > /* extra x, y */ > @@ -783,6 +789,8 @@ static void cirrus_do_copy(CirrusVGAState *s, int dst, > int src, int w, int h) > cirrus_invalidate_region(s, s->cirrus_blt_dstaddr, > s->cirrus_blt_dstpitch, > s->cirrus_blt_width, > s->cirrus_blt_height); > + > +return 1; > } > > static int cirrus_bitblt_videotovideo_copy(CirrusVGAState * s) > @@ -790,11 +798,9 @@ static int > cirrus_bitblt_videotovideo_copy(CirrusVGAState * s) > if (blit_is_unsafe(s)) > return 0; > > -cirrus_do_copy(s, s->cirrus_blt_dstaddr - s->vga.start_addr, > +return cirrus_do_copy(s, s->cirrus_blt_dstaddr - s->vga.start_addr, > s->cirrus_blt_srcaddr - s->vga.start_addr, > s->cirrus_blt_width, s->cirrus_blt_height); > - > -return 1; > btw, not directly related to your patch, but the code looks strange in cirrus_bitblt_videotovideo(), cirrus_bitblt_reset() is called if(ret), and later if (!ret) in cirrus_bitblt_start(), that looks a bit weird, but it may be fine. I hope someone more familiar with the code can help review your patch. Thanks } > > /*** > -- > 2.7.4 > > > -- Marc-André Lureau
[Qemu-devel] [PULL 2/3] fw_cfg: move FW_CFG_NB_CPUS out of fw_cfg_init1()
From: Igor Mammedov PC will use this field in other way, so move it outside the common code so PC could set a different value, i.e. all CPUs regardless of where they are coming from (-smp X | -device cpu...). It's quick and dirty hack as it could be implemented in more generic way in MashineClass. But do it in simple way since only PC is affected so far. Later we can generalize it when another affected target gets support for -device cpu. Signed-off-by: Igor Mammedov Message-Id: <1479212236-183810-3-git-send-email-imamm...@redhat.com> Reviewed-by: Eduardo Habkost Signed-off-by: Eduardo Habkost --- hw/arm/virt.c | 4 +++- hw/i386/pc.c | 2 ++ hw/nvram/fw_cfg.c | 1 - hw/ppc/mac_newworld.c | 1 + hw/ppc/mac_oldworld.c | 1 + hw/sparc/sun4m.c | 1 + hw/sparc64/sun4u.c| 1 + 7 files changed, 9 insertions(+), 2 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 54a8b28..d04e4ac 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -929,9 +929,11 @@ static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as) { hwaddr base = vbi->memmap[VIRT_FW_CFG].base; hwaddr size = vbi->memmap[VIRT_FW_CFG].size; +FWCfgState *fw_cfg; char *nodename; -fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); +fw_cfg = fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base); qemu_fdt_add_subnode(vbi->fdt, nodename); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index c227ead..e8757b4 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -744,6 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms) int i, j; fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86: * @@ -1341,6 +1342,7 @@ void xen_load_linux(PCMachineState *pcms) assert(MACHINE(pcms)->kernel_filename != NULL); fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); rom_set_fw(fw_cfg); load_linux(pcms, fw_cfg); diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index 1f0c3e9..3ebecb2 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -884,7 +884,6 @@ static void fw_cfg_init1(DeviceState *dev) fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4); fw_cfg_add_bytes(s, FW_CFG_UUID, &qemu_uuid, 16); fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)!machine->enable_graphics); -fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(s, FW_CFG_BOOT_MENU, (uint16_t)boot_menu); fw_cfg_bootsplash(s); fw_cfg_reboot(s); diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c index 7d25106..2bfdb64 100644 --- a/hw/ppc/mac_newworld.c +++ b/hw/ppc/mac_newworld.c @@ -466,6 +466,7 @@ static void ppc_core99_init(MachineState *machine) /* No PCI init: the BIOS will do it */ fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, machine_arch); diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c index 4479487..56282c5 100644 --- a/hw/ppc/mac_oldworld.c +++ b/hw/ppc/mac_oldworld.c @@ -319,6 +319,7 @@ static void ppc_heathrow_init(MachineState *machine) /* No PCI init: the BIOS will do it */ fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, ARCH_HEATHROW); diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index 6224288..f5b6efd 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -1033,6 +1033,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, hwdef->ecc_version); fw_cfg = fw_cfg_init_mem(CFG_ADDR, CFG_ADDR + 2); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); fw_cfg_add_i16(fw_cfg, FW_CFG_MACHINE_ID, hwdef->machine_id); diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c index 271d8bc..4663315 100644 --- a/hw/sparc64/sun4u.c +++ b/hw/sparc64/sun4u.c @@ -855,6 +855,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem, (uint8_t *)&nd_table[0].macaddr); fw_cfg = fw_cfg_init_io(BIOS_CFG_IOPORT); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)max_cpus); fw_cfg
[Qemu-devel] [PULL 1/3] Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs"
From: Igor Mammedov This reverts commit 080ac219cc7d9c55adf925c3545b7450055ad625. Legacy FW_CFG_NB_CPUS will be reused instead of 'etc/boot-cpus' fw_cfg file since it does the same and there is no point to maintaing duplicate guest ABI, if it can be helped. Signed-off-by: Igor Mammedov Message-Id: <1479212236-183810-2-git-send-email-imamm...@redhat.com> Reviewed-by: Eduardo Habkost Signed-off-by: Eduardo Habkost --- hw/i386/pc.c | 44 +++- include/hw/i386/pc.h | 2 -- 2 files changed, 15 insertions(+), 31 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index a9b1950..c227ead 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1086,6 +1086,17 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) } } +static int pc_present_cpus_count(PCMachineState *pcms) +{ +int i, boot_cpus = 0; +for (i = 0; i < pcms->possible_cpus->len; i++) { +if (pcms->possible_cpus->cpus[i].cpu) { +boot_cpus++; +} +} +return boot_cpus; +} + static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) { @@ -1222,19 +1233,6 @@ static void pc_build_feature_control_file(PCMachineState *pcms) fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val)); } -static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count) -{ -if (cpus_count > 0xff) { -/* If the number of CPUs can't be represented in 8 bits, the - * BIOS must use "etc/boot-cpus". Set RTC field to 0 just - * to make old BIOSes fail more predictably. - */ -rtc_set_memory(rtc, 0x5f, 0); -} else { -rtc_set_memory(rtc, 0x5f, cpus_count - 1); -} -} - static void pc_machine_done(Notifier *notifier, void *data) { @@ -1243,7 +1241,7 @@ void pc_machine_done(Notifier *notifier, void *data) PCIBus *bus = pcms->bus; /* set the number of CPUs */ -rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le)); +rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1); if (bus) { int extra_hosts = 0; @@ -1264,15 +1262,8 @@ void pc_machine_done(Notifier *notifier, void *data) acpi_setup(); if (pcms->fw_cfg) { -MachineClass *mc = MACHINE_GET_CLASS(pcms); - pc_build_smbios(pcms->fw_cfg); pc_build_feature_control_file(pcms); - -if (mc->max_cpus > 255) { -fw_cfg_add_file(pcms->fw_cfg, "etc/boot-cpus", &pcms->boot_cpus_le, -sizeof(pcms->boot_cpus_le)); -} } if (pcms->apic_id_limit > 255) { @@ -1819,11 +1810,9 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, } } -/* increment the number of CPUs */ -pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) + 1); if (dev->hotplugged) { -/* Update the number of CPUs in CMOS */ -rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le)); +/* increment the number of CPUs */ +rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1); } found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); @@ -1877,10 +1866,7 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev, found_cpu->cpu = NULL; object_unparent(OBJECT(dev)); -/* decrement the number of CPUs */ -pcms->boot_cpus_le = cpu_to_le16(le16_to_cpu(pcms->boot_cpus_le) - 1); -/* Update the number of CPUs in CMOS */ -rtc_set_cpus_count(pcms->rtc, le16_to_cpu(pcms->boot_cpus_le)); +rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1); out: error_propagate(errp, local_err); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index 8eb517f..e32e957 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -36,7 +36,6 @@ /** * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling - * @boot_cpus_le: number of present VCPUs, referenced by 'etc/boot-cpus' fw_cfg */ struct PCMachineState { /*< private >*/ @@ -71,7 +70,6 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; CPUArchIdList *possible_cpus; -uint16_t boot_cpus_le; /* NUMA information: */ uint64_t numa_nodes; -- 2.7.4
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On Wed, 16 Nov 2016 14:17:47 +0100 Thomas Huth wrote: > On 16.11.2016 13:37, Greg Kurz wrote: > > On Wed, 16 Nov 2016 12:24:50 + > > "Dr. David Alan Gilbert" wrote: > > > >> * Greg Kurz (gr...@kaod.org) wrote: > >>> On Wed, 16 Nov 2016 09:39:31 +0100 > >>> Thomas Huth wrote: > >>> > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > fall back to TCG otherwise. > > >>> > >>> This patch addresses two issues actually: > >>> - the annoying warning when running on a ppc64 guest on a non-ppc64 host > >>> - the fact that KVM-PR seems to be currently broken > >>> > >>> I agree that the former makes sense, but what about the case of running > >>> a x86 guest on a non-x86 host ? > > Of course you also get these '"kvm" accelerator not found' messages > there. But so far, I think nobody complained about that yet (only for > ppc64 running on x86). And at least the test succeeds there - unlike > with KVM-PR, where the test fails completely. > > >>> I'm still feeling uncomfortable with the KVM-PR case... is this a > >>> workaround > >>> we want to keep until we find out what's going on or are we starting to > >>> partially deprecate KVM PR ? In any case, I guess we should document this > >>> and probably print some meaningful error message. > >> > >> This is certainly a work around for now, it doesn't suggest anything about > >> deprecation. > > > > Well it doesn't suggest anything actually, it just silently skips KVM PR... > > I would at least expect a comment in the code mentioning this is a > > workaround and maybe an explicit warning for the user. If the user really > > wants to run this test with KVM on ppc64, then she should ensure it is > > KVM HV. > > Honestly, also considering the number of patches that Laurent already > wrote here and never have been accepted, all this has become quite an > ugly bike-shed painting discussion. > Understood. I'm done with the trivial details ;) > My opinion: > > - If we want to properly test KVM (be it KVM-HV or KVM-PR), write > a proper kvm-unit-test instead. I.e. I personally don't care if this > test in QEMU is only run with TCG or with KVM. > Agreed. > - The current status of "make check" is broken, since it does not > work on KVM-PR. We've got to fix that before the release. > > That means I currently really don't care if we've spill out a warning > message for KVM-PR here or not - sure, somebody just got to look at > KVM-PR later, but that's IMHO off-topic for the test here in the QEMU > context. > > So if you think that the patch for fixing this issue here with the QEMU > test should look differently, please propose a different patch instead. > I'm fine with every other approach as long as we get this fixed in time > for QEMU 2.8. > The changes to the code look ok and I prefer to spend time chasing the KVM PR issue rather than arguing on a comment... Cheers. -- Greg > Thomas >
[Qemu-devel] [PULL 3/3] pc: fix FW_CFG_NB_CPUS to account for -device added CPUs
From: Igor Mammedov Signed-off-by: Igor Mammedov Message-Id: <1479301481-197333-1-git-send-email-imamm...@redhat.com> Reviewed-by: Eduardo Habkost Signed-off-by: Eduardo Habkost --- hw/i386/pc.c | 44 +++- include/hw/i386/pc.h | 2 ++ 2 files changed, 29 insertions(+), 17 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e8757b4..a9e64a8 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -744,7 +744,7 @@ static FWCfgState *bochs_bios_init(AddressSpace *as, PCMachineState *pcms) int i, j; fw_cfg = fw_cfg_init_io_dma(FW_CFG_IO_BASE, FW_CFG_IO_BASE + 4, as); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86: * @@ -1087,17 +1087,6 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) } } -static int pc_present_cpus_count(PCMachineState *pcms) -{ -int i, boot_cpus = 0; -for (i = 0; i < pcms->possible_cpus->len; i++) { -if (pcms->possible_cpus->cpus[i].cpu) { -boot_cpus++; -} -} -return boot_cpus; -} - static X86CPU *pc_new_cpu(const char *typename, int64_t apic_id, Error **errp) { @@ -1234,6 +1223,19 @@ static void pc_build_feature_control_file(PCMachineState *pcms) fw_cfg_add_file(pcms->fw_cfg, "etc/msr_feature_control", val, sizeof(*val)); } +static void rtc_set_cpus_count(ISADevice *rtc, uint16_t cpus_count) +{ +if (cpus_count > 0xff) { +/* If the number of CPUs can't be represented in 8 bits, the + * BIOS must use "FW_CFG_NB_CPUS". Set RTC field to 0 just + * to make old BIOSes fail more predictably. + */ +rtc_set_memory(rtc, 0x5f, 0); +} else { +rtc_set_memory(rtc, 0x5f, cpus_count - 1); +} +} + static void pc_machine_done(Notifier *notifier, void *data) { @@ -1242,7 +1244,7 @@ void pc_machine_done(Notifier *notifier, void *data) PCIBus *bus = pcms->bus; /* set the number of CPUs */ -rtc_set_memory(pcms->rtc, 0x5f, pc_present_cpus_count(pcms) - 1); +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); if (bus) { int extra_hosts = 0; @@ -1265,6 +1267,8 @@ void pc_machine_done(Notifier *notifier, void *data) if (pcms->fw_cfg) { pc_build_smbios(pcms->fw_cfg); pc_build_feature_control_file(pcms); +/* update FW_CFG_NB_CPUS to account for -device added CPUs */ +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); } if (pcms->apic_id_limit > 255) { @@ -1342,7 +1346,7 @@ void xen_load_linux(PCMachineState *pcms) assert(MACHINE(pcms)->kernel_filename != NULL); fw_cfg = fw_cfg_init_io(FW_CFG_IO_BASE); -fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, (uint16_t)smp_cpus); +fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); rom_set_fw(fw_cfg); load_linux(pcms, fw_cfg); @@ -1812,9 +1816,11 @@ static void pc_cpu_plug(HotplugHandler *hotplug_dev, } } +/* increment the number of CPUs */ +pcms->boot_cpus++; if (dev->hotplugged) { -/* increment the number of CPUs */ -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) + 1); +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); } found_cpu = pc_find_cpu_slot(pcms, CPU(dev), NULL); @@ -1868,7 +1874,11 @@ static void pc_cpu_unplug_cb(HotplugHandler *hotplug_dev, found_cpu->cpu = NULL; object_unparent(OBJECT(dev)); -rtc_set_memory(pcms->rtc, 0x5f, rtc_get_memory(pcms->rtc, 0x5f) - 1); +/* decrement the number of CPUs */ +pcms->boot_cpus--; +/* Update the number of CPUs in CMOS */ +rtc_set_cpus_count(pcms->rtc, pcms->boot_cpus); +fw_cfg_modify_i16(pcms->fw_cfg, FW_CFG_NB_CPUS, pcms->boot_cpus); out: error_propagate(errp, local_err); } diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h index e32e957..67a1a9e 100644 --- a/include/hw/i386/pc.h +++ b/include/hw/i386/pc.h @@ -36,6 +36,7 @@ /** * PCMachineState: * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling + * @boot_cpus: number of present VCPUs */ struct PCMachineState { /*< private >*/ @@ -70,6 +71,7 @@ struct PCMachineState { bool apic_xrupt_override; unsigned apic_id_limit; CPUArchIdList *possible_cpus; +uint16_t boot_cpus; /* NUMA information: */ uint64_t numa_nodes; -- 2.7.4
[Qemu-devel] [PULL 0/3] pc: remove redundant fw_cfg file "etc/boot-cpus"
Unfortunately not in time for -rc0, but we still want to remove "etc/boot-cpus" before 2.8.0 is released. The following changes since commit b0bcc86d2a87456f5a276f941dc775b265b309cf: Update version for v2.8.0-rc0 release (2016-11-15 20:55:12 +) are available in the git repository at: git://github.com/ehabkost/qemu.git tags/machine-pull-request for you to fetch changes up to e3cadac073a99489df1627be56c3f487f5cb9e31: pc: fix FW_CFG_NB_CPUS to account for -device added CPUs (2016-11-16 12:10:00 -0200) pc: remove redundant fw_cfg file "etc/boot-cpus" Igor Mammedov (3): Revert "pc: Add 'etc/boot-cpus' fw_cfg file for machine with more than 255 CPUs" fw_cfg: move FW_CFG_NB_CPUS out of fw_cfg_init1() pc: fix FW_CFG_NB_CPUS to account for -device added CPUs hw/arm/virt.c | 4 +++- hw/i386/pc.c | 26 -- hw/nvram/fw_cfg.c | 1 - hw/ppc/mac_newworld.c | 1 + hw/ppc/mac_oldworld.c | 1 + hw/sparc/sun4m.c | 1 + hw/sparc64/sun4u.c| 1 + include/hw/i386/pc.h | 4 ++-- 8 files changed, 21 insertions(+), 18 deletions(-) -- 2.7.4
Re: [Qemu-devel] [PATCH] tests/postcopy: Use KVM on ppc64 only if it is KVM-HV
On Wed, 16 Nov 2016 09:39:31 +0100 Thomas Huth wrote: > The ppc64 postcopy test does not work with KVM-PR, and it is also > causing annoying warning messages when run on a x86 host. So let's > use KVM here only if we know that we're running with KVM-HV (which > automatically also means that we're running on a ppc64 host), and > fall back to TCG otherwise. > > Signed-off-by: Thomas Huth > --- FWIW Reviewed-by: Greg Kurz > tests/postcopy-test.c | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/tests/postcopy-test.c b/tests/postcopy-test.c > index d6613c5..dafe8be 100644 > --- a/tests/postcopy-test.c > +++ b/tests/postcopy-test.c > @@ -380,17 +380,21 @@ static void test_migrate(void) >" -incoming %s", >tmpfs, bootpath, uri); > } else if (strcmp(arch, "ppc64") == 0) { > +const char *accel; > + > +/* On ppc64, the test only works with kvm-hv, but not with kvm-pr */ > +accel = access("/sys/module/kvm_hv", F_OK) ? "tcg" : "kvm:tcg"; > init_bootfile_ppc(bootpath); > -cmd_src = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > +cmd_src = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcsource,debug-threads=on" >" -serial file:%s/src_serial" >" -drive file=%s,if=pflash,format=raw", > - tmpfs, bootpath); > -cmd_dst = g_strdup_printf("-machine accel=kvm:tcg -m 256M" > + accel, tmpfs, bootpath); > +cmd_dst = g_strdup_printf("-machine accel=%s -m 256M" >" -name pcdest,debug-threads=on" >" -serial file:%s/dest_serial" >" -incoming %s", > - tmpfs, uri); > + accel, tmpfs, uri); > } else { > g_assert_not_reached(); > }
[Qemu-devel] QMP event on reboot when -no-reboot is set
Hey Guys, I want to get a qmp event when the qemu does a shutdown due to the -no-reboot flag. Looking at the code I realized that the -no-reboot flag just changes any reset request to a shutdown request. Does anybody already patched qemu to emit some kind of reboot event to the qmp socket? If no one already patched it, would you accept such a patch? Or is a non-wanted feature? Best regards, Dirk Braunschweiger
[Qemu-devel] [PATCH v2] HACKING: document #include order
It was not obvious to me why "qemu/osdep.h" must be the first #include. This documents the rationale and the overall #include order. Cc: Fam Zheng Cc: Markus Armbruster Cc: Eric Blake Signed-off-by: Stefan Hajnoczi --- HACKING | 18 ++ 1 file changed, 18 insertions(+) diff --git a/HACKING b/HACKING index 20a9101..4125c97 100644 --- a/HACKING +++ b/HACKING @@ -1,10 +1,28 @@ 1. Preprocessor +1.1. Variadic macros + For variadic macros, stick with this C99-like syntax: #define DPRINTF(fmt, ...) \ do { printf("IRQ: " fmt, ## __VA_ARGS__); } while (0) +1.2. Include directives + +Order include directives as follows: + +#include "qemu/osdep.h" /* Always first... */ +#include <...> /* then system headers... */ +#include "..." /* and finally QEMU headers. */ + +The "qemu/osdep.h" header contains preprocessor macros that affect the behavior +of core system headers like . It must be the first include so that +core system headers included by external libraries get the preprocessor macros +that QEMU depends on. + +Do not include "qemu/osdep.h" from header files since the .c file will have +already included it. + 2. C types It should be common sense to use the right type, but we have collected -- 2.7.4
Re: [Qemu-devel] [PATCH v2] HACKING: document #include order
On 11/16/2016 08:39 AM, Stefan Hajnoczi wrote: > It was not obvious to me why "qemu/osdep.h" must be the first #include. > This documents the rationale and the overall #include order. > > Cc: Fam Zheng > Cc: Markus Armbruster > Cc: Eric Blake > Signed-off-by: Stefan Hajnoczi > --- > HACKING | 18 ++ > 1 file changed, 18 insertions(+) Reviewed-by: Eric Blake -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v13 02/22] vfio: VFIO based driver for Mediated devices
On 11/16/2016 7:59 AM, Dong Jia Shi wrote: > * Kirti Wankhede [2016-11-15 20:59:45 +0530]: > > Hi Kirti, > >> vfio_mdev driver registers with mdev core driver. >> mdev core driver creates mediated device and calls probe routine of >> vfio_mdev driver for each device. >> Probe routine of vfio_mdev driver adds mediated device to VFIO core module >> >> This driver forms a shim layer that pass through VFIO devices operations >> to vendor driver for mediated devices. >> >> Signed-off-by: Kirti Wankhede >> Signed-off-by: Neo Jia >> Reviewed-by: Jike Song >> >> Change-Id: I583f4734752971d3d112324d69e2508c88f359ec >> --- >> drivers/vfio/mdev/Kconfig | 7 ++ >> drivers/vfio/mdev/Makefile| 1 + >> drivers/vfio/mdev/mdev_core.c | 16 - >> drivers/vfio/mdev/vfio_mdev.c | 148 >> ++ >> 4 files changed, 171 insertions(+), 1 deletion(-) >> create mode 100644 drivers/vfio/mdev/vfio_mdev.c >> >> diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig >> index 258481d65ebd..1aa0391d74f2 100644 >> --- a/drivers/vfio/mdev/Kconfig >> +++ b/drivers/vfio/mdev/Kconfig >> @@ -7,3 +7,10 @@ config VFIO_MDEV >>Provides a framework to virtualize devices. >> >>If you don't know what do here, say N. >> + >> +config VFIO_MDEV_DEVICE >> +tristate "VFIO support for Mediated devices" > > >> +depends on VFIO && VFIO_MDEV >> +default n >> +help >> + VFIO based driver for mediated devices. > > nit: > s/mediated/Mediated/ > > I saw in many places you use the term "Mediated device", so I guess this > is what you preferred to name them. > >> diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile >> index 31bc04801d94..fa2d5ea466ee 100644 >> --- a/drivers/vfio/mdev/Makefile >> +++ b/drivers/vfio/mdev/Makefile >> @@ -2,3 +2,4 @@ >> mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o >> >> obj-$(CONFIG_VFIO_MDEV) += mdev.o >> +obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o >> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c >> index 613e8a8a3b2a..1e0714ebc56a 100644 >> --- a/drivers/vfio/mdev/mdev_core.c >> +++ b/drivers/vfio/mdev/mdev_core.c >> @@ -354,7 +354,21 @@ int mdev_device_remove(struct device *dev, bool >> force_remove) >> >> static int __init mdev_init(void) >> { >> -return mdev_bus_register(); >> +int ret; >> + >> +ret = mdev_bus_register(); >> +if (ret) { >> +pr_err("Failed to register mdev bus\n"); > If you want to report an error message here, you should do it in a > previous patch where you introduce the call for mdev_bus_register. > Removing this error message. >> +return ret; >> +} >> + >> +/* >> + * Attempt to load known vfio_mdev. This gives us a working environment >> + * without the user needing to explicitly load vfio_mdev driver. >> + */ >> +request_module_nowait("vfio_mdev"); >> + >> +return ret; >> } >> >> static void __exit mdev_exit(void) > [...] > > Please: > Reviewed-by: Dong Jia Shi > Thanks.
Re: [Qemu-devel] [PATCH v13 09/22] vfio iommu type1: Add task structure to vfio_dma
On 11/16/2016 11:36 AM, Dong Jia Shi wrote: > * Kirti Wankhede [2016-11-15 20:59:52 +0530]: > > Hi Kirti, > > [...] > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> @@ -331,13 +338,16 @@ static long vfio_pin_pages_remote(unsigned long vaddr, >> long npage, >> } >> >> if (!rsvd) >> -vfio_lock_acct(current, i); >> +vfio_lock_acct(dma->task, i); >> +ret = i; >> >> -return i; >> +pin_pg_remote_exit: > out_mmput sounds a better name to me. > >> +mmput(mm); >> +return ret; >> } >> > [...] > >> @@ -510,6 +521,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >> while ((dma = vfio_find_dma(iommu, unmap->iova, unmap->size))) { >> if (!iommu->v2 && unmap->iova > dma->iova) >> break; >> +/* >> + * Task with same address space who mapped this iova range is >> + * allowed to unmap the iova range. >> + */ >> +if (dma->task->mm != current->mm) > How about: > if (dma->task != current) > As I mentioned in comment above this and commit description, if a process calls DMA_MAP, forks a thread and then child thread calls DMA_UNMAP, this should be allowed since address space is same for parent process and child. QEMU also works that way. >> +break; >> unmapped += dma->size; >> vfio_remove_dma(iommu, dma); >> } >> @@ -576,17 +593,55 @@ unwind: >> return ret; >> } >> >> +static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, >> +size_t map_size) > Do you factor out this function for future usage? > I didn't find the other callers. > This is pulled out to make caller simple and short. Otherwise vfio_dma_do_map() would have become a long function. >> +{ >> +dma_addr_t iova = dma->iova; >> +unsigned long vaddr = dma->vaddr; >> +size_t size = map_size; >> +long npage; >> +unsigned long pfn; >> +int ret = 0; >> + >> +while (size) { >> +/* Pin a contiguous chunk of memory */ >> +npage = vfio_pin_pages_remote(dma, vaddr + dma->size, >> + size >> PAGE_SHIFT, dma->prot, >> + &pfn); >> +if (npage <= 0) { >> +WARN_ON(!npage); >> +ret = (int)npage; >> +break; >> +} >> + >> +/* Map it! */ >> +ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, >> + dma->prot); >> +if (ret) { >> +vfio_unpin_pages_remote(dma, pfn, npage, >> + dma->prot, true); >> +break; >> +} >> + >> +size -= npage << PAGE_SHIFT; >> +dma->size += npage << PAGE_SHIFT; >> +} >> + >> +if (ret) >> +vfio_remove_dma(iommu, dma); >> + >> +return ret; >> +} >> + >> static int vfio_dma_do_map(struct vfio_iommu *iommu, >> struct vfio_iommu_type1_dma_map *map) >> { >> dma_addr_t iova = map->iova; >> unsigned long vaddr = map->vaddr; >> size_t size = map->size; >> -long npage; >> int ret = 0, prot = 0; >> uint64_t mask; >> struct vfio_dma *dma; >> -unsigned long pfn; >> >> /* Verify that none of our __u64 fields overflow */ >> if (map->size != size || map->vaddr != vaddr || map->iova != iova) >> @@ -612,47 +667,27 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, >> mutex_lock(&iommu->lock); >> >> if (vfio_find_dma(iommu, iova, size)) { >> -mutex_unlock(&iommu->lock); >> -return -EEXIST; >> +ret = -EEXIST; >> +goto do_map_err; >> } >> >> dma = kzalloc(sizeof(*dma), GFP_KERNEL); >> if (!dma) { >> -mutex_unlock(&iommu->lock); >> -return -ENOMEM; >> +ret = -ENOMEM; >> +goto do_map_err; >> } >> >> dma->iova = iova; >> dma->vaddr = vaddr; >> dma->prot = prot; >> +get_task_struct(current); >> +dma->task = current; >> >> /* Insert zero-sized and grow as we map chunks of it */ >> vfio_link_dma(iommu, dma); >> >> -while (size) { >> -/* Pin a contiguous chunk of memory */ >> -npage = vfio_pin_pages_remote(vaddr + dma->size, >> - size >> PAGE_SHIFT, prot, &pfn); >> -if (npage <= 0) { >> -WARN_ON(!npage); >> -ret = (int)npage; >> -break; >> -} >> - >> -/* Map it! */ >> -ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, prot); >> -if (ret) { >> -vfio_unpin_pages_remote(pfn, npage, prot, true); >> -
Re: [Qemu-devel] [PATCH v13 05/22] vfio iommu: Added pin and unpin callback functions to vfio_iommu_driver_ops
On 11/16/2016 8:33 AM, Dong Jia Shi wrote: > * Kirti Wankhede [2016-11-15 20:59:48 +0530]: > > Hi Kirti, > >> Added APIs for pining and unpining set of pages. These call back into >> backend iommu module to actually pin and unpin pages. >> Added two new callback functions to struct vfio_iommu_driver_ops. Backend >> IOMMU module that supports pining and unpinning pages for mdev devices >> should provide these functions. >> >> Renamed static functions in vfio_type1_iommu.c to resolve conflicts >> >> Signed-off-by: Kirti Wankhede >> Signed-off-by: Neo Jia >> Change-Id: Ia7417723aaae86bec2959ad9ae6c2915ddd340e0 >> --- >> drivers/vfio/vfio.c | 103 >> >> drivers/vfio/vfio_iommu_type1.c | 20 >> include/linux/vfio.h| 14 +- >> 3 files changed, 126 insertions(+), 11 deletions(-) >> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c >> index 2e83bdf007fe..3bf8a01bf67b 100644 >> --- a/drivers/vfio/vfio.c >> +++ b/drivers/vfio/vfio.c >> @@ -1799,6 +1799,109 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, >> size_t offset) >> } >> EXPORT_SYMBOL_GPL(vfio_info_cap_shift); >> >> + >> +/* >> + * Pin a set of guest PFNs and return their associated host PFNs for local >> + * domain only. >> + * @dev [in] : device >> + * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of >> user/guest >> + *PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES. > Move the second sentence to the @npage section? > >> + * @npage [in] :count of elements in array. This count should not be >> greater >> + * than PAGE_SIZE. > And remove the second sentence here. > >> + * @prot [in] : protection flags >> + * @phys_pfn[out] : array of host PFNs > nit: > I saw three differnt styles here: > @xxx [in] :xxx > @xxx [in]: xxx > @xxx[out]: xxx > > Frankly speeking, I didn't think the [in|out] flags helps much. > >> + * Return error or number of pages pinned. >> + */ >> +int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, int npage, >> + int prot, unsigned long *phys_pfn) >> +{ >> +struct vfio_container *container; >> +struct vfio_group *group; >> +struct vfio_iommu_driver *driver; >> +int ret; >> + >> +if (!dev || !user_pfn || !phys_pfn || !npage) >> +return -EINVAL; >> + >> +if (npage > VFIO_PIN_PAGES_MAX_ENTRIES) >> +return -E2BIG; >> + >> +group = vfio_group_get_from_dev(dev); >> +if (IS_ERR(group)) >> +return PTR_ERR(group); >> + >> +ret = vfio_group_add_container_user(group); >> +if (ret) >> +goto err_pin_pages; >> + >> +container = group->container; >> +down_read(&container->group_lock); >> + >> +driver = container->iommu_driver; >> +if (likely(driver && driver->ops->pin_pages)) >> +ret = driver->ops->pin_pages(container->iommu_data, user_pfn, >> + npage, prot, phys_pfn); >> +else >> +ret = -ENOTTY; >> + >> +up_read(&container->group_lock); >> +vfio_group_try_dissolve_container(group); >> + >> +err_pin_pages: >> +vfio_group_put(group); >> +return ret; >> +} >> +EXPORT_SYMBOL(vfio_pin_pages); >> + >> +/* >> + * Unpin set of host PFNs for local domain only. >> + * @dev [in] : device >> + * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of >> user/guest >> + *PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES. >> + * @npage [in] :count of elements in array. This count should not be >> greater >> + * than PAGE_SIZE. > Same nits as above here. > >> + * Return error or number of pages unpinned. >> + */ > [...] > >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h >> index 0ecae0b1cd34..420cdc928786 100644 >> --- a/include/linux/vfio.h >> +++ b/include/linux/vfio.h >> @@ -75,7 +75,11 @@ struct vfio_iommu_driver_ops { >> struct iommu_group *group); >> void(*detach_group)(void *iommu_data, >> struct iommu_group *group); >> - >> +int (*pin_pages)(void *iommu_data, unsigned long *user_pfn, >> + int npage, int prot, >> + unsigned long *phys_pfn); >> +int (*unpin_pages)(void *iommu_data, >> + unsigned long *user_pfn, int npage); >> }; >> >> extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops >> *ops); >> @@ -127,6 +131,14 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct >> iommu_group *group, >> } >> #endif /* CONFIG_EEH */ >> >> +#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) >> + >> +extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, >> + int npage, int prot, unsigned long *phys_pfn); >> + >> +extern int vfio_unp
Re: [Qemu-devel] [PATCH v13 12/22] vfio: Add notifier callback to parent's ops structure of mdev
On 11/16/2016 12:07 PM, Dong Jia Shi wrote: > * Kirti Wankhede [2016-11-15 20:59:55 +0530]: > > Hi Kirti, > > [...] > >> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c >> index ffc36758cb84..4fc63db38829 100644 >> --- a/drivers/vfio/mdev/vfio_mdev.c >> +++ b/drivers/vfio/mdev/vfio_mdev.c >> @@ -24,6 +24,15 @@ >> #define DRIVER_AUTHOR "NVIDIA Corporation" >> #define DRIVER_DESC "VFIO based driver for Mediated device" >> >> +static int vfio_mdev_notifier(struct notifier_block *nb, unsigned long >> action, >> + void *data) >> +{ >> +struct mdev_device *mdev = container_of(nb, struct mdev_device, nb); >> +struct parent_device *parent = mdev->parent; >> + >> +return parent->ops->notifier(mdev, action, data); >> +} >> + >> static int vfio_mdev_open(void *device_data) >> { >> struct mdev_device *mdev = device_data; >> @@ -36,9 +45,18 @@ static int vfio_mdev_open(void *device_data) >> if (!try_module_get(THIS_MODULE)) >> return -ENODEV; >> >> +if (likely(parent->ops->notifier)) { >> +mdev->nb.notifier_call = vfio_mdev_notifier; >> +if (vfio_register_notifier(&mdev->dev, &mdev->nb)) >> +pr_err("Failed to register notifier for mdev\n"); > I think we should just return here if the error value is not -ENOTTY. > It might be the case where iommu backend module might not support .register_notifier(). In that case vfio_register_notifier() returns -ENOTTY and that should not fail this open() call Changing it to: ret = vfio_register_notifier(&mdev->dev, &mdev->nb); if (ret && (ret != -ENOTTY)) { pr_err("Failed to register notifier for mdev\n"); module_put(THIS_MODULE); return ret; } Thanks, Kirti
Re: [Qemu-devel] [PATCH v13 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP
On 11/16/2016 10:06 AM, Alex Williamson wrote: > On Wed, 16 Nov 2016 09:46:20 +0530 > Kirti Wankhede wrote: > >> On 11/16/2016 9:28 AM, Alex Williamson wrote: >>> On Wed, 16 Nov 2016 09:13:37 +0530 >>> Kirti Wankhede wrote: >>> On 11/16/2016 8:55 AM, Alex Williamson wrote: > On Tue, 15 Nov 2016 20:16:12 -0700 > Alex Williamson wrote: > >> On Wed, 16 Nov 2016 08:16:15 +0530 >> Kirti Wankhede wrote: >> >>> On 11/16/2016 3:49 AM, Alex Williamson wrote: On Tue, 15 Nov 2016 20:59:54 +0530 Kirti Wankhede wrote: >>> ... >>> > @@ -854,7 +857,28 @@ static int vfio_dma_do_unmap(struct vfio_iommu > *iommu, >*/ > if (dma->task->mm != current->mm) > break; > + > unmapped += dma->size; > + > + if (iommu->external_domain && > !RB_EMPTY_ROOT(&dma->pfn_list)) { > + struct vfio_iommu_type1_dma_unmap nb_unmap; > + > + nb_unmap.iova = dma->iova; > + nb_unmap.size = dma->size; > + > + /* > + * Notifier callback would call > vfio_unpin_pages() which > + * would acquire iommu->lock. Release lock here > and > + * reacquire it again. > + */ > + mutex_unlock(&iommu->lock); > + blocking_notifier_call_chain(&iommu->notifier, > + > VFIO_IOMMU_NOTIFY_DMA_UNMAP, > + &nb_unmap); > + mutex_lock(&iommu->lock); > + if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list))) > + break; > + } Why exactly do we need to notify per vfio_dma rather than per unmap request? If we do the latter we can send the notify first, limiting us to races where a page is pinned between the notify and the locking, whereas here, even our dma pointer is suspect once we re-acquire the lock, we don't technically know if another unmap could have removed that already. Perhaps something like this (untested): >>> >>> There are checks to validate unmap request, like v2 check and who is >>> calling unmap and is it allowed for that task to unmap. Before these >>> checks its not sure that unmap region range which asked for would be >>> unmapped all. Notify call should be at the place where its sure that the >>> range provided to notify call is definitely going to be removed. My >>> change do that. >> >> Ok, but that does solve the problem. What about this (untested): > > s/does/does not/ > > BTW, I like how the retries here fill the gap in my previous proposal > where we could still race re-pinning. We've given it an honest shot or > someone is not participating if we've retried 10 times. I don't > understand why the test for iommu->external_domain was there, clearly > if the list is not empty, we need to notify. Thanks, > Ok. Retry is good to give a chance to unpin all. But is it really required to use BUG_ON() that would panic the host. I think WARN_ON should be fine and then when container is closed or when the last group is removed from the container, vfio_iommu_type1_release() is called and we have a chance to unpin it all. >>> >>> See my comments on patch 10/22, we need to be vigilant that the vendor >>> driver is participating. I don't think we should be cleaning up after >>> the vendor driver on release, if we need to do that, it implies we >>> already have problems in multi-mdev containers since we'll be left with >>> pfn_list entries that no longer have an owner. Thanks, >>> >> >> If any vendor driver doesn't clean its pinned pages and there are >> entries in pfn_list with no owner, that would be indicated by WARN_ON, >> which should be fixed by that vendor driver. I still feel it shouldn't >> cause host panic. >> When such warning is seen with multiple mdev devices in container, it is >> easy to isolate and find which vendor driver is not cleaning their >> stuff, same warning would be seen with single mdev device in a >> container. To isolate and find which vendor driver is culprit check with >> one mdev device at a time. >> Finally, we have a chance to clean all residue from >> vfio_iommu_type1_release() so that vfio_iommu_type1 module doesn't leave >> any leaks. > > How can we claim that we've resolved anything by unpinning the > residue? In
Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
On Wed, 16 Nov 2016 15:54:56 +0200 "Michael S. Tsirkin" wrote: > On Thu, Nov 10, 2016 at 12:44:47PM -0700, Alex Williamson wrote: > > On Thu, 10 Nov 2016 21:20:36 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote: > > > > On Thu, 10 Nov 2016 17:54:35 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote: > > > > > > On Thu, 10 Nov 2016 17:14:24 +0200 > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote: > > > > > > > > From: "Aviv Ben-David" > > > > > > > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register. > > > > > > > > This capability is controlled by "cache-mode" property of > > > > > > > > intel-iommu device. > > > > > > > > To enable this option call QEMU with "-device > > > > > > > > intel-iommu,cache-mode=true". > > > > > > > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the > > > > > > > > domain belong to > > > > > > > > registered notifier, and notify accordingly. > > > > > > > > > > > > > > This looks sane I think. Alex, care to comment? > > > > > > > Merging will have to wait until after the release. > > > > > > > Pls remember to re-test and re-ping then. > > > > > > > > > > > > I don't think it's suitable for upstream until there's a reasonable > > > > > > replay mechanism > > > > > > > > > > Could you pls clarify what do you mean by replay? > > > > > Is this when you attach a device by hotplug to > > > > > a running system? > > > > > > > > > > If yes this can maybe be addressed by disabling hotplug temporarily. > > > > > > > > > > > > > No, hotplug is not required, moving a device between existing domains > > > > requires replay, ie. actually using it for nested device assignment. > > > > > > Good point, that one is a correctness thing. Aviv, > > > could you add this in TODO list in a cover letter pls? > > > > > > > > > and we straighten out whether it's expected to get > > > > > > multiple notifies and the notif-ee is responsible for filtering > > > > > > them or if the notif-er should do filtering. > > > > > > > > > > OK this is a documentation thing. > > > > > > > > Well no, it needs to be decided and if necessary implemented. > > > > > > Let's assume it's the notif-ee for now. Less is more and all that. > > > > I think this is opposite of the approach dwg suggested. > > > > > > > > Without those, this is > > > > > > effectively just an RFC. > > > > > > > > > > It's infrastructure without users so it doesn't break things, > > > > > I'm more interested in seeing whether it's broken in > > > > > some way than whether it's complete. > > > > > > > > If it allows use with vfio but doesn't fully implement the complete set > > > > of interfaces, it does break things. We currently prevent viommu usage > > > > with vfio because it is incomplete. > > > > > > Right - that bit is still in as far as I can see. > > > > Nope, 3/3 changes vtd_iommu_notify_flag_changed() to allow use with > > vfio even though it's still incomplete. We would at least need > > something like a replay callback for VT-d that triggers an abort if you > > still want to accept it incomplete. Thanks, > > > > Alex > > IIUC practically things seems to work, right? AFAIK, no. > So how about disabling by default with a flag for people that want to > experiment with it? > E.g. x-vfio-allow-broken-translations ? We've already been through one round of "intel-iommu is incomplete for use with device assignment, how can we prevent it from being used", which led to the notify_flag_changed callback on MemoryRegionIOMMUOps. This series now claims to fix that yet still doesn't provide a mechanism to do memory_region_iommu_replay() given that VT-d has a much larger address width. Why is the onus on vfio to resolve this or provide some sort of workaround? vfio is using the QEMU iommu interface correctly, intel-iommu is still incomplete. The least it could do is add an optional replay callback to MemoryRegionIOMMUOps that supersedes the existing memory_region_iommu_replay() code and triggers an abort when it gets called. I don't know what an x-vfio-allow-broken-translations option would do, how I'd implement it, or why I'd bother to implement it. Thanks, Alex
[Qemu-devel] [PATCH] translate-all: Enable locking debug in a debug build
Unconditionally enable locking checks in debug builds so that we get wider testing. Using tcg_debug_assert() allows us to remove DEBUG_LOCKING define. Signed-off-by: Pranith Kumar --- translate-all.c | 50 +- 1 file changed, 17 insertions(+), 33 deletions(-) diff --git a/translate-all.c b/translate-all.c index cf828aa..a03f323 100644 --- a/translate-all.c +++ b/translate-all.c @@ -60,7 +60,6 @@ /* #define DEBUG_TB_INVALIDATE */ /* #define DEBUG_TB_FLUSH */ -/* #define DEBUG_LOCKING */ /* make various TB consistency checks */ /* #define DEBUG_TB_CHECK */ @@ -75,23 +74,13 @@ * access to the memory related structures are protected with the * mmap_lock. */ -#ifdef DEBUG_LOCKING -#define DEBUG_MEM_LOCKS 1 -#else -#define DEBUG_MEM_LOCKS 0 -#endif - #ifdef CONFIG_SOFTMMU #define assert_memory_lock() do { \ -if (DEBUG_MEM_LOCKS) { \ -g_assert(have_tb_lock); \ -} \ +tcg_debug_assert(have_tb_lock); \ } while (0) #else #define assert_memory_lock() do { \ -if (DEBUG_MEM_LOCKS) { \ -g_assert(have_mmap_lock()); \ -} \ +tcg_debug_assert(have_mmap_lock()); \ } while (0) #endif @@ -172,16 +161,24 @@ static void page_table_config_init(void) assert(v_l2_levels >= 0); } +#define assert_tb_locked() do { \ +tcg_debug_assert(have_tb_lock); \ +} while (0) + +#define assert_tb_unlocked() do { \ +tcg_debug_assert(!have_tb_lock);\ +} while (0) + void tb_lock(void) { -assert(!have_tb_lock); +assert_tb_unlocked(); qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock); have_tb_lock++; } void tb_unlock(void) { -assert(have_tb_lock); +assert_tb_locked(); have_tb_lock--; qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock); } @@ -194,19 +191,6 @@ void tb_lock_reset(void) } } -#ifdef DEBUG_LOCKING -#define DEBUG_TB_LOCKS 1 -#else -#define DEBUG_TB_LOCKS 0 -#endif - -#define assert_tb_lock() do { \ -if (DEBUG_TB_LOCKS) { \ -g_assert(have_tb_lock); \ -} \ -} while (0) - - static TranslationBlock *tb_find_pc(uintptr_t tc_ptr); void cpu_gen_init(void) @@ -840,7 +824,7 @@ static TranslationBlock *tb_alloc(target_ulong pc) { TranslationBlock *tb; -assert_tb_lock(); +assert_tb_locked(); if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) { return NULL; @@ -855,7 +839,7 @@ static TranslationBlock *tb_alloc(target_ulong pc) /* Called with tb_lock held. */ void tb_free(TranslationBlock *tb) { -assert_tb_lock(); +assert_tb_locked(); /* In practice this is mostly used for single use temporary TB Ignore the hard cases and just back up if this TB happens to @@ -1097,7 +1081,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) uint32_t h; tb_page_addr_t phys_pc; -assert_tb_lock(); +assert_tb_locked(); atomic_set(&tb->invalid, true); @@ -1412,7 +1396,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t end) #ifdef CONFIG_SOFTMMU void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end) { -assert_tb_lock(); +assert_tb_locked(); tb_invalidate_phys_range_1(start, end); } #else @@ -1455,7 +1439,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end, #endif /* TARGET_HAS_PRECISE_SMC */ assert_memory_lock(); -assert_tb_lock(); +assert_tb_locked(); p = page_find(start >> TARGET_PAGE_BITS); if (!p) { -- 2.10.2
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
Just crossed my mind that we're missing isb's. On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > From: Christopher Covington > > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > even for the smallest delta of two subsequent reads. > > Signed-off-by: Christopher Covington > Signed-off-by: Wei Huang > --- > arm/pmu.c | 98 > +++ > 1 file changed, 98 insertions(+) > > diff --git a/arm/pmu.c b/arm/pmu.c > index 0b29088..d5e3ac3 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -14,6 +14,7 @@ > */ > #include "libcflat.h" > > +#define PMU_PMCR_E (1 << 0) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -21,6 +22,10 @@ > #define PMU_PMCR_IMP_SHIFT 24 > #define PMU_PMCR_IMP_MASK 0xff > > +#define PMU_CYCLE_IDX 31 > + > +#define NR_SAMPLES 10 > + > #if defined(__arm__) > static inline uint32_t pmcr_read(void) > { > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > +} > + > +static inline void pmselr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); Probably want an isb here, users will call this and then immediately another PMU reg write, like is done below > +} > + > +static inline void pmxevtyper_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > +} > + > +/* > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, returning > 64 > + * bits doesn't seem worth the trouble when differential usage of the result > is > + * expected (with differences that can easily fit in 32 bits). So just return > + * the lower 32 bits of the cycle count in AArch32. Also, while we're discussing confirming upper bits are as expected, I guess we should confirm no overflow too. We should clear the overflow bit PMOVSCLR_EL0.C before we use the counter, and then check it at some point to confirm it's as expected. I guess that could be separate test cases though. > + */ > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value)); > +} > + > +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */ > +static inline void pmccfiltr_write(uint32_t value) > +{ > + pmselr_write(PMU_CYCLE_IDX); > + pmxevtyper_write(value); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -37,6 +83,29 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrs %0, pmcr_el0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("msr pmcr_el0, %0" : : "r" (value)); > +} > + > +static inline uint32_t pmccntr_read(void) > +{ > + uint32_t cycles; > + > + asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles)); > + return cycles; > +} > + > +static inline void pmcntenset_write(uint32_t value) > +{ > + asm volatile("msr pmcntenset_el0, %0" : : "r" (value)); > +} > + > +static inline void pmccfiltr_write(uint32_t value) > +{ > + asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > +} > #endif > > /* > @@ -63,11 +132,40 @@ static bool check_pmcr(void) > return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0; > } > > +/* > + * Ensure that the cycle counter progresses between back-to-back reads. > + */ > +static bool check_cycles_increase(void) > +{ > + pmcr_write(pmcr_read() | PMU_PMCR_E); Need isb() here > + > + for (int i = 0; i < NR_SAMPLES; i++) { > + unsigned long a, b; > + > + a = pmccntr_read(); > + b = pmccntr_read(); > + > + if (a >= b) { > + printf("Read %ld then %ld.\n", a, b); > + return false; > + } > + } > + > + pmcr_write(pmcr_read() & ~PMU_PMCR_E); > + Need isb() here > + return true; > +} > + > int main(void) > { > report_prefix_push("pmu"); > > + /* init for PMU event access, right now only care about cycle count */ > + pmcntenset_write(1 << PMU_CYCLE_IDX); > + pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */ Need isb() here > + > report("Control register", check_pmcr()); > + report("Monotonically increasing cycle count", check_cycles_increase()); > > return report_summary(); > } > -- > 1.8.3.1 > > Thanks, drew
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 3/3] arm: pmu: Add CPI checking
On Tue, Nov 08, 2016 at 12:17:15PM -0600, Wei Huang wrote: > From: Christopher Covington > > Calculate the numbers of cycles per instruction (CPI) implied by ARM > PMU cycle counter values. The code includes a strict checking facility > intended for the -icount option in TCG mode in the configuration file. > > Signed-off-by: Christopher Covington > Signed-off-by: Wei Huang > --- > arm/pmu.c | 101 > +- > arm/unittests.cfg | 14 > 2 files changed, 114 insertions(+), 1 deletion(-) > > diff --git a/arm/pmu.c b/arm/pmu.c > index d5e3ac3..09aff89 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -15,6 +15,7 @@ > #include "libcflat.h" > > #define PMU_PMCR_E (1 << 0) > +#define PMU_PMCR_C (1 << 2) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -75,6 +76,23 @@ static inline void pmccfiltr_write(uint32_t value) > pmselr_write(PMU_CYCLE_IDX); > pmxevtyper_write(value); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to > compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) > +{ > + asm volatile( > + " mcr p15, 0, %[pmcr], c9, c12, 0\n" isb > + "1: subs%[i], %[i], #1\n" > + " bgt 1b\n" > + " mcr p15, 0, %[z], c9, c12, 0\n" isb > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr), [z] "r" (0) > + : "cc"); > +} > #elif defined(__aarch64__) > static inline uint32_t pmcr_read(void) > { > @@ -106,6 +124,23 @@ static inline void pmccfiltr_write(uint32_t value) > { > asm volatile("msr pmccfiltr_el0, %0" : : "r" (value)); > } > + > +/* > + * Extra instructions inserted by the compiler would be difficult to > compensate > + * for, so hand assemble everything between, and including, the PMCR accesses > + * to start and stop counting. > + */ > +static inline void loop(int i, uint32_t pmcr) > +{ > + asm volatile( > + " msr pmcr_el0, %[pmcr]\n" isb > + "1: subs%[i], %[i], #1\n" > + " b.gt1b\n" > + " msr pmcr_el0, xzr\n" isb > + : [i] "+r" (i) > + : [pmcr] "r" (pmcr) > + : "cc"); > +} > #endif > > /* > @@ -156,8 +191,71 @@ static bool check_cycles_increase(void) > return true; > } > > -int main(void) > +/* > + * Execute a known number of guest instructions. Only odd instruction counts > + * greater than or equal to 3 are supported by the in-line assembly code. The > + * control register (PMCR_EL0) is initialized with the provided value > (allowing > + * for example for the cycle counter or event counters to be reset). At the > end > + * of the exact instruction loop, zero is written to PMCR_EL0 to disable > + * counting, allowing the cycle counter or event counters to be read at the > + * leisure of the calling code. > + */ > +static void measure_instrs(int num, uint32_t pmcr) > +{ > + int i = (num - 1) / 2; > + > + assert(num >= 3 && ((num - 1) % 2 == 0)); > + loop(i, pmcr); > +} > + > +/* > + * Measure cycle counts for various known instruction counts. Ensure that the > + * cycle counter progresses (similar to check_cycles_increase() but with more > + * instructions and using reset and stop controls). If supplied a positive, > + * nonzero CPI parameter, also strictly check that every measurement matches > + * it. Strict CPI checking is used to test -icount mode. > + */ > +static bool check_cpi(int cpi) > +{ > + uint32_t pmcr = pmcr_read() | PMU_PMCR_C | PMU_PMCR_E; > + > + if (cpi > 0) > + printf("Checking for CPI=%d.\n", cpi); > + printf("instrs : cycles0 cycles1 ...\n"); > + > + for (int i = 3; i < 300; i += 32) { > + int avg, sum = 0; > + > + printf("%d :", i); > + for (int j = 0; j < NR_SAMPLES; j++) { > + int cycles; > + > + measure_instrs(i, pmcr); > + cycles =pmccntr_read(); > + printf(" %d", cycles); > + > + if (!cycles || (cpi > 0 && cycles != i * cpi)) { > + printf("\n"); > + return false; > + } > + > + sum += cycles; > + } > + avg = sum / NR_SAMPLES; > + printf(" sum=%d avg=%d avg_ipc=%d avg_cpi=%d\n", > + sum, avg, i / avg, avg / i); > + } > + > + return true; > +} > + > +int main(int argc, char *argv[]) > { > + int cpi = 0; > + > + if (argc >= 1) > + cpi = atol(argv[0]); > + > report_prefix_push("pmu"); > > /* init for PMU event access, right now only care about cycle count */ > @@ -166,6 +264,7 @@ int main(void) > > report("Control register", check_p
Re: [Qemu-devel] [PATCH] translate-all: Enable locking debug in a debug build
Pranith Kumar writes: > Unconditionally enable locking checks in debug builds so that we get > wider testing. Using tcg_debug_assert() allows us to remove > DEBUG_LOCKING define. Interesting. The other option would be to add a debug build to .travis.yml that define this (and others) with -DFOO_DEBUG. > > Signed-off-by: Pranith Kumar > --- > translate-all.c | 50 +- > 1 file changed, 17 insertions(+), 33 deletions(-) > > diff --git a/translate-all.c b/translate-all.c > index cf828aa..a03f323 100644 > --- a/translate-all.c > +++ b/translate-all.c > @@ -60,7 +60,6 @@ > > /* #define DEBUG_TB_INVALIDATE */ > /* #define DEBUG_TB_FLUSH */ > -/* #define DEBUG_LOCKING */ > /* make various TB consistency checks */ > /* #define DEBUG_TB_CHECK */ So if we are enabling this for tcg_debug builds why not the other cases? > > @@ -75,23 +74,13 @@ > * access to the memory related structures are protected with the > * mmap_lock. > */ > -#ifdef DEBUG_LOCKING > -#define DEBUG_MEM_LOCKS 1 > -#else > -#define DEBUG_MEM_LOCKS 0 > -#endif > - In retrospect I should probably of had a comment in here about the roll of tb_lock in CONFIG_SOFTMMU versus the mmap_lock. > #ifdef CONFIG_SOFTMMU > #define assert_memory_lock() do { \ > -if (DEBUG_MEM_LOCKS) { \ > -g_assert(have_tb_lock); \ > -} \ > +tcg_debug_assert(have_tb_lock); \ > } while (0) > #else > #define assert_memory_lock() do { \ > -if (DEBUG_MEM_LOCKS) { \ > -g_assert(have_mmap_lock()); \ > -} \ > +tcg_debug_assert(have_mmap_lock()); \ > } while (0) > #endif > > @@ -172,16 +161,24 @@ static void page_table_config_init(void) > assert(v_l2_levels >= 0); > } > > +#define assert_tb_locked() do { \ > +tcg_debug_assert(have_tb_lock); \ > +} while (0) > + > +#define assert_tb_unlocked() do { \ > +tcg_debug_assert(!have_tb_lock);\ > +} while (0) > + I'm not sure we need all this multi-line stuff for a simple substitution? Richard? > void tb_lock(void) > { > -assert(!have_tb_lock); > +assert_tb_unlocked(); Hmm why introduce a helper for exactly one use? > qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock); > have_tb_lock++; > } > > void tb_unlock(void) > { > -assert(have_tb_lock); > +assert_tb_locked(); > have_tb_lock--; > qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock); > } > @@ -194,19 +191,6 @@ void tb_lock_reset(void) > } > } > > -#ifdef DEBUG_LOCKING > -#define DEBUG_TB_LOCKS 1 > -#else > -#define DEBUG_TB_LOCKS 0 > -#endif > - > -#define assert_tb_lock() do { \ > -if (DEBUG_TB_LOCKS) { \ > -g_assert(have_tb_lock); \ > -} \ > -} while (0) > - > - > static TranslationBlock *tb_find_pc(uintptr_t tc_ptr); > > void cpu_gen_init(void) > @@ -840,7 +824,7 @@ static TranslationBlock *tb_alloc(target_ulong pc) > { > TranslationBlock *tb; > > -assert_tb_lock(); > +assert_tb_locked(); > > if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) { > return NULL; > @@ -855,7 +839,7 @@ static TranslationBlock *tb_alloc(target_ulong pc) > /* Called with tb_lock held. */ > void tb_free(TranslationBlock *tb) > { > -assert_tb_lock(); > +assert_tb_locked(); > > /* In practice this is mostly used for single use temporary TB > Ignore the hard cases and just back up if this TB happens to > @@ -1097,7 +1081,7 @@ void tb_phys_invalidate(TranslationBlock *tb, > tb_page_addr_t page_addr) > uint32_t h; > tb_page_addr_t phys_pc; > > -assert_tb_lock(); > +assert_tb_locked(); > > atomic_set(&tb->invalid, true); > > @@ -1412,7 +1396,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t > start, tb_page_addr_t end) > #ifdef CONFIG_SOFTMMU > void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end) > { > -assert_tb_lock(); > +assert_tb_locked(); > tb_invalidate_phys_range_1(start, end); > } > #else > @@ -1455,7 +1439,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t > start, tb_page_addr_t end, > #endif /* TARGET_HAS_PRECISE_SMC */ > > assert_memory_lock(); > -assert_tb_lock(); > +assert_tb_locked(); > > p = page_find(start >> TARGET_PAGE_BITS); > if (!p) { -- Alex Bennée
Re: [Qemu-devel] [RFC PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
On Thu 10 Nov 2016 06:19:04 PM CET, Kevin Wolf wrote: > +typedef struct QuorumCo { > +QuorumAIOCB *acb; > int i; Maybe 'i' could rename to something a bit more descriptive ('idx', I don't know). > +} QuorumCo; > + > +static void read_quorum_children_entry(void *opaque) > +{ > +QuorumCo *co = opaque; > +QuorumAIOCB *acb = co->acb; > +BDRVQuorumState *s = acb->bs->opaque; > +int i = co->i; > +int ret; > +co = NULL; /* Not valid after the first yield */ I also don't understand this last line. Is it to make sure that no one tries to use it after the bdrv_co_preadv() call? > +acb->qcrs[i].bs = s->children[i]->bs; > +ret = bdrv_co_preadv(s->children[i], acb->sector_num * BDRV_SECTOR_SIZE, > + acb->nb_sectors * BDRV_SECTOR_SIZE, > + &acb->qcrs[i].qiov, 0); > +quorum_aio_cb(&acb->qcrs[i], ret); > +} Otherwise the patch looks good to me. Berto
Re: [Qemu-devel] [libvirt] [PATCH v1] qemu: command: rework cpu feature argument support
On 11/16/2016 09:05 AM, Eduardo Habkost wrote: On Wed, Nov 16, 2016 at 02:15:02PM +0100, Jiri Denemark wrote: On Tue, Nov 15, 2016 at 11:44:00 -0200, Eduardo Habkost wrote: CCing qemu-devel. CCing Markus, in case he has any insights about the interface introspection. On Tue, Nov 15, 2016 at 08:42:12AM +0100, Jiri Denemark wrote: On Mon, Nov 14, 2016 at 18:02:29 -0200, Eduardo Habkost wrote: On Mon, Nov 14, 2016 at 02:26:03PM -0500, Collin L. Walling wrote: cpu features are passed to the qemu command with feature=on/off instead of +/-feature. Signed-off-by: Collin L. Walling If I'm not mistaken, the "feature=on|off" syntax was added on QEMU 2.0.0. Does current libvirt support older QEMU versions? Of course it does. I'd love to switch to feature=on|off, but how can we check if QEMU supports it? We can't really start using this syntax without it. Actually, I was wrong, this was added in v2.4.0. "feat=on|off" needs two things to work (in x86): * Translation of all "foo=bar" options to QOM property setting. This was added in v2.0.0-rc0~162^2 * The actual QOM properties for feature names to be present. They were added in v2.4.0-rc0~101^2~1 So you can be sure "feat=on" is supported by checking if the feature flags are present in device-list-properties output for the CPU model. But device-list-properties is also messy[1]. Maybe we can use the availability of query-cpu-model-expansion to check if we can safely use the new "feat=on|off" system? It's easier than taking all the variables above into account. Yeah, this could work since s390 already supports query-cpu-model-expansion. It would cause feature=on|off not to be used on x86_64 with QEMU older than 2.9.0, but I guess that's not a big deal, is it? Not a problem, as we have no plans to remove +feat/-feat support in x86 anymore. Beautiful. Thanks for your responses everyone. :)
[Qemu-devel] [PATCH RFC 2/2] numa: make -numa parser dynamically allocate CPUs masks
so it won't impose an additional limits on max_cpus limits supported by different targets. It removes global MAX_CPUMASK_BITS constant and need to bump it up whenever max_cpus is being increased for a target above MAX_CPUMASK_BITS value. Use runtime max_cpus value instead to allocate sufficiently sized node_cpu bitmasks in numa parser. Signed-off-by: Igor Mammedov --- include/sysemu/numa.h | 2 +- include/sysemu/sysemu.h | 7 --- numa.c | 19 --- vl.c| 5 - 4 files changed, 13 insertions(+), 20 deletions(-) diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h index 4da808a..8f09dcf 100644 --- a/include/sysemu/numa.h +++ b/include/sysemu/numa.h @@ -17,7 +17,7 @@ struct numa_addr_range { typedef struct node_info { uint64_t node_mem; -DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); +unsigned long *node_cpu; struct HostMemoryBackend *node_memdev; bool present; QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 66c6f15..cccde56 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -168,13 +168,6 @@ extern int mem_prealloc; #define MAX_NODES 128 #define NUMA_NODE_UNASSIGNED MAX_NODES -/* The following shall be true for all CPUs: - * cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS - * - * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS. - */ -#define MAX_CPUMASK_BITS 288 - #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { const char *name; diff --git a/numa.c b/numa.c index 9c09e45..5542e40 100644 --- a/numa.c +++ b/numa.c @@ -266,20 +266,20 @@ static char *enumerate_cpus(unsigned long *cpus, int max_cpus) static void validate_numa_cpus(void) { int i; -DECLARE_BITMAP(seen_cpus, MAX_CPUMASK_BITS); +unsigned long *seen_cpus = bitmap_new(max_cpus); -bitmap_zero(seen_cpus, MAX_CPUMASK_BITS); +bitmap_zero(seen_cpus, max_cpus); for (i = 0; i < nb_numa_nodes; i++) { -if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, - MAX_CPUMASK_BITS)) { +if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) { bitmap_and(seen_cpus, seen_cpus, - numa_info[i].node_cpu, MAX_CPUMASK_BITS); + numa_info[i].node_cpu, max_cpus); error_report("CPU(s) present in multiple NUMA nodes: %s", enumerate_cpus(seen_cpus, max_cpus)); +bitmap_free(seen_cpus); exit(EXIT_FAILURE); } bitmap_or(seen_cpus, seen_cpus, - numa_info[i].node_cpu, MAX_CPUMASK_BITS); + numa_info[i].node_cpu, max_cpus); } if (!bitmap_full(seen_cpus, max_cpus)) { @@ -291,12 +291,17 @@ static void validate_numa_cpus(void) "in NUMA config"); g_free(msg); } +bitmap_free(seen_cpus); } void parse_numa_opts(MachineClass *mc) { int i; +for (i = 0; i < MAX_NODES; i++) { +numa_info[i].node_cpu = bitmap_new(max_cpus); +} + if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) { exit(1); } @@ -362,7 +367,7 @@ void parse_numa_opts(MachineClass *mc) numa_set_mem_ranges(); for (i = 0; i < nb_numa_nodes; i++) { -if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { +if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) { break; } } diff --git a/vl.c b/vl.c index d77dd86..37790e5 100644 --- a/vl.c +++ b/vl.c @@ -1277,11 +1277,6 @@ static void smp_parse(QemuOpts *opts) max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); -if (max_cpus > MAX_CPUMASK_BITS) { -error_report("unsupported number of maxcpus"); -exit(1); -} - if (max_cpus < cpus) { error_report("maxcpus must be equal to or greater than smp"); exit(1); -- 2.7.4
[Qemu-devel] [PATCH RFC 0/2] numa: allocate CPUs masks dynamically
This series removes global MAX_CPUMASK_BITS constant so that it won't inderectly influence maximum CPUs count supported by different targets. It replaces statically allocated bitmasks with dynamically allocated ones using '-smp maxcpus' value for setting bitmasks size. That would allocate just enough memory to handle all CPUs indexes that a QEMU instance would ever have. CC: Alexey Kardashevskiy CC: Greg Kurz CC: David Gibson CC: Eduardo Habkost CC: Paolo Bonzini Igor Mammedov (2): add bitmap_free() wrapper numa: make -numa parser dynamically allocate CPUs masks include/qemu/bitmap.h | 5 + include/sysemu/numa.h | 2 +- include/sysemu/sysemu.h | 7 --- numa.c | 19 --- vl.c| 5 - 5 files changed, 18 insertions(+), 20 deletions(-) -- 2.7.4
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
On 11/16/2016 08:01 AM, Andrew Jones wrote: > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: >> >> >> On 11/14/2016 09:12 AM, Christopher Covington wrote: >>> Hi Drew, Wei, >>> >>> On 11/14/2016 05:05 AM, Andrew Jones wrote: On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > > > On 11/11/2016 01:43 AM, Andrew Jones wrote: >> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: >>> From: Christopher Covington >>> >>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, >>> even for the smallest delta of two subsequent reads. >>> >>> Signed-off-by: Christopher Covington >>> Signed-off-by: Wei Huang >>> --- >>> arm/pmu.c | 98 >>> +++ >>> 1 file changed, 98 insertions(+) >>> >>> diff --git a/arm/pmu.c b/arm/pmu.c >>> index 0b29088..d5e3ac3 100644 >>> --- a/arm/pmu.c >>> +++ b/arm/pmu.c >>> @@ -14,6 +14,7 @@ >>> */ >>> #include "libcflat.h" >>> >>> +#define PMU_PMCR_E (1 << 0) >>> #define PMU_PMCR_N_SHIFT 11 >>> #define PMU_PMCR_N_MASK0x1f >>> #define PMU_PMCR_ID_SHIFT 16 >>> @@ -21,6 +22,10 @@ >>> #define PMU_PMCR_IMP_SHIFT 24 >>> #define PMU_PMCR_IMP_MASK 0xff >>> >>> +#define PMU_CYCLE_IDX 31 >>> + >>> +#define NR_SAMPLES 10 >>> + >>> #if defined(__arm__) >>> static inline uint32_t pmcr_read(void) >>> { >>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) >>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); >>> return ret; >>> } >>> + >>> +static inline void pmcr_write(uint32_t value) >>> +{ >>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); >>> +} >>> + >>> +static inline void pmselr_write(uint32_t value) >>> +{ >>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); >>> +} >>> + >>> +static inline void pmxevtyper_write(uint32_t value) >>> +{ >>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); >>> +} >>> + >>> +/* >>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, >>> returning 64 >>> + * bits doesn't seem worth the trouble when differential usage of the >>> result is >>> + * expected (with differences that can easily fit in 32 bits). So just >>> return >>> + * the lower 32 bits of the cycle count in AArch32. >> >> Like I said in the last review, I'd rather we not do this. We should >> return the full value and then the test case should confirm the upper >> 32 bits are zero. > > Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > register. We can force it to a more coarse-grained cycle counter with > PMCR.D bit=1 (see below). But it is still not a 64-bit register. >>> >>> AArch32 System Register Descriptions >>> Performance Monitors registers >>> PMCCNTR, Performance Monitors Cycle Count Register >>> >>> To access the PMCCNTR when accessing as a 32-bit register: >>> MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt >>> MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are >>> unchanged >>> >>> To access the PMCCNTR when accessing as a 64-bit register: >>> MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] >>> into Rt2 >>> MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to >>> PMCCNTR[63:32] >>> >> >> Thanks. I did some research based on your info and came back with the >> following proposals (Cov, correct me if I am wrong): >> >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I >> think this 64-bit cycle register is only available when running under >> aarch32 compatibility mode on ARMv8 because it is not specified in A15 >> TRM. That interpretation sounds really strange to me. My recollection is that the cycle counter was available as a 64 bit register in ARMv7 as well. I would expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference Manual is the complete and authoritative source. >> To further verify it, I tested 32-bit pmu code on QEMU with TCG >> mode. The result is: accessing 64-bit PMCCNTR using the following >> assembly failed on A15: >> >>volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); >> or >>volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); The PMU implementation on QEMU TCG mode is infantile. (I was trying to write these tests to help guide fixes and enhancements in a test-driven-development manner.) I would not trust QEMU TCG to behave properly here. If you want to execute those instructions, is there anything preventing you from doing it on hardware, or at least the Foundation Model? >> Given this difference, I think there are two solutions for 64-bit >> AArch32 pmccntr_read, as requested by Drew: >> >> 1) The PMU un
[Qemu-devel] [PATCH RFC 1/2] add bitmap_free() wrapper
it will be used for freeing bitmaps allocated with bitmap_[try]_new() Signed-off-by: Igor Mammedov --- include/qemu/bitmap.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h index 63ea2d0..0289836 100644 --- a/include/qemu/bitmap.h +++ b/include/qemu/bitmap.h @@ -98,6 +98,11 @@ static inline unsigned long *bitmap_new(long nbits) return ptr; } +static inline void bitmap_free(unsigned long *bitmap) +{ +g_free(bitmap); +} + static inline void bitmap_zero(unsigned long *dst, long nbits) { if (small_nbits(nbits)) { -- 2.7.4
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote: > On 11/16/2016 08:01 AM, Andrew Jones wrote: > > On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: > >> > >> > >> On 11/14/2016 09:12 AM, Christopher Covington wrote: > >>> Hi Drew, Wei, > >>> > >>> On 11/14/2016 05:05 AM, Andrew Jones wrote: > On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: > > > > > > On 11/11/2016 01:43 AM, Andrew Jones wrote: > >> On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > >>> From: Christopher Covington > >>> > >>> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > >>> even for the smallest delta of two subsequent reads. > >>> > >>> Signed-off-by: Christopher Covington > >>> Signed-off-by: Wei Huang > >>> --- > >>> arm/pmu.c | 98 > >>> +++ > >>> 1 file changed, 98 insertions(+) > >>> > >>> diff --git a/arm/pmu.c b/arm/pmu.c > >>> index 0b29088..d5e3ac3 100644 > >>> --- a/arm/pmu.c > >>> +++ b/arm/pmu.c > >>> @@ -14,6 +14,7 @@ > >>> */ > >>> #include "libcflat.h" > >>> > >>> +#define PMU_PMCR_E (1 << 0) > >>> #define PMU_PMCR_N_SHIFT 11 > >>> #define PMU_PMCR_N_MASK0x1f > >>> #define PMU_PMCR_ID_SHIFT 16 > >>> @@ -21,6 +22,10 @@ > >>> #define PMU_PMCR_IMP_SHIFT 24 > >>> #define PMU_PMCR_IMP_MASK 0xff > >>> > >>> +#define PMU_CYCLE_IDX 31 > >>> + > >>> +#define NR_SAMPLES 10 > >>> + > >>> #if defined(__arm__) > >>> static inline uint32_t pmcr_read(void) > >>> { > >>> @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > >>> asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > >>> return ret; > >>> } > >>> + > >>> +static inline void pmcr_write(uint32_t value) > >>> +{ > >>> + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > >>> +} > >>> + > >>> +static inline void pmselr_write(uint32_t value) > >>> +{ > >>> + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > >>> +} > >>> + > >>> +static inline void pmxevtyper_write(uint32_t value) > >>> +{ > >>> + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > >>> +} > >>> + > >>> +/* > >>> + * While PMCCNTR can be accessed as a 64 bit coprocessor register, > >>> returning 64 > >>> + * bits doesn't seem worth the trouble when differential usage of > >>> the result is > >>> + * expected (with differences that can easily fit in 32 bits). So > >>> just return > >>> + * the lower 32 bits of the cycle count in AArch32. > >> > >> Like I said in the last review, I'd rather we not do this. We should > >> return the full value and then the test case should confirm the upper > >> 32 bits are zero. > > > > Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit > > register. We can force it to a more coarse-grained cycle counter with > > PMCR.D bit=1 (see below). But it is still not a 64-bit register. > >>> > >>> AArch32 System Register Descriptions > >>> Performance Monitors registers > >>> PMCCNTR, Performance Monitors Cycle Count Register > >>> > >>> To access the PMCCNTR when accessing as a 32-bit register: > >>> MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > >>> MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are > >>> unchanged > >>> > >>> To access the PMCCNTR when accessing as a 64-bit register: > >>> MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] > >>> into Rt2 > >>> MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to > >>> PMCCNTR[63:32] > >>> > >> > >> Thanks. I did some research based on your info and came back with the > >> following proposals (Cov, correct me if I am wrong): > >> > >> By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I > >> think this 64-bit cycle register is only available when running under > >> aarch32 compatibility mode on ARMv8 because it is not specified in A15 > >> TRM. > > That interpretation sounds really strange to me. My recollection is that the > cycle counter was available as a 64 bit register in ARMv7 as well. I would > expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference > Manual is the complete and authoritative source. Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit. Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64. > > >> To further verify it, I tested 32-bit pmu code on QEMU with TCG > >> mode. The result is: accessing 64-bit PMCCNTR using the following > >> assembly failed on A15: > >> > >>volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi)); > >> or > >>volatile("mrrc p15, 0, %Q0, %R0, c9" : "=r" (val)); > > The PMU implementation on QEMU TCG mode is
Re: [Qemu-devel] QMP event on reboot when -no-reboot is set
On 11/16/2016 09:01 AM, Dirk Braunschweiger wrote: Hey Guys, I want to get a qmp event when the qemu does a shutdown due to the -no-reboot flag. Looking at the code I realized that the -no-reboot flag just changes any reset request to a shutdown request. Does anybody already patched qemu to emit some kind of reboot event to the qmp socket? If no one already patched it, would you accept such a patch? Or is a non-wanted feature? Best regards, Dirk Braunschweiger Is the existing "STOP" event insufficient for some reason? Is it important to distinguish between a 'real' stop and a stop that was originally intended to be a reboot? If you can elaborate on that case, you have a good chance of amending the event spec to add some new events. --js
Re: [Qemu-devel] [kvm-unit-tests PATCH v8 2/3] arm: pmu: Check cycle count increases
On 11/16/2016 11:25 AM, Andrew Jones wrote: > On Wed, Nov 16, 2016 at 11:08:42AM -0500, Christopher Covington wrote: >> On 11/16/2016 08:01 AM, Andrew Jones wrote: >>> On Tue, Nov 15, 2016 at 04:50:53PM -0600, Wei Huang wrote: On 11/14/2016 09:12 AM, Christopher Covington wrote: > Hi Drew, Wei, > > On 11/14/2016 05:05 AM, Andrew Jones wrote: >> On Fri, Nov 11, 2016 at 01:55:49PM -0600, Wei Huang wrote: >>> >>> >>> On 11/11/2016 01:43 AM, Andrew Jones wrote: On Tue, Nov 08, 2016 at 12:17:14PM -0600, Wei Huang wrote: > From: Christopher Covington > > Ensure that reads of the PMCCNTR_EL0 are monotonically increasing, > even for the smallest delta of two subsequent reads. > > Signed-off-by: Christopher Covington > Signed-off-by: Wei Huang > --- > arm/pmu.c | 98 > +++ > 1 file changed, 98 insertions(+) > > diff --git a/arm/pmu.c b/arm/pmu.c > index 0b29088..d5e3ac3 100644 > --- a/arm/pmu.c > +++ b/arm/pmu.c > @@ -14,6 +14,7 @@ > */ > #include "libcflat.h" > > +#define PMU_PMCR_E (1 << 0) > #define PMU_PMCR_N_SHIFT 11 > #define PMU_PMCR_N_MASK0x1f > #define PMU_PMCR_ID_SHIFT 16 > @@ -21,6 +22,10 @@ > #define PMU_PMCR_IMP_SHIFT 24 > #define PMU_PMCR_IMP_MASK 0xff > > +#define PMU_CYCLE_IDX 31 > + > +#define NR_SAMPLES 10 > + > #if defined(__arm__) > static inline uint32_t pmcr_read(void) > { > @@ -29,6 +34,47 @@ static inline uint32_t pmcr_read(void) > asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret)); > return ret; > } > + > +static inline void pmcr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value)); > +} > + > +static inline void pmselr_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value)); > +} > + > +static inline void pmxevtyper_write(uint32_t value) > +{ > + asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value)); > +} > + > +/* > + * While PMCCNTR can be accessed as a 64 bit coprocessor register, > returning 64 > + * bits doesn't seem worth the trouble when differential usage of > the result is > + * expected (with differences that can easily fit in 32 bits). So > just return > + * the lower 32 bits of the cycle count in AArch32. Like I said in the last review, I'd rather we not do this. We should return the full value and then the test case should confirm the upper 32 bits are zero. >>> >>> Unless I miss something in ARM documentation, ARMv7 PMCCNTR is a 32-bit >>> register. We can force it to a more coarse-grained cycle counter with >>> PMCR.D bit=1 (see below). But it is still not a 64-bit register. > > AArch32 System Register Descriptions > Performance Monitors registers > PMCCNTR, Performance Monitors Cycle Count Register > > To access the PMCCNTR when accessing as a 32-bit register: > MRC p15,0,,c9,c13,0 ; Read PMCCNTR[31:0] into Rt > MCR p15,0,,c9,c13,0 ; Write Rt to PMCCNTR[31:0]. PMCCNTR[63:32] are > unchanged > > To access the PMCCNTR when accessing as a 64-bit register: > MRRC p15,0,,,c9 ; Read PMCCNTR[31:0] into Rt and PMCCNTR[63:32] > into Rt2 > MCRR p15,0,,,c9 ; Write Rt to PMCCNTR[31:0] and Rt2 to > PMCCNTR[63:32] > Thanks. I did some research based on your info and came back with the following proposals (Cov, correct me if I am wrong): By comparing A57 TRM (page 394 in [1]) with A15 TRM (page 273 in [2]), I think this 64-bit cycle register is only available when running under aarch32 compatibility mode on ARMv8 because it is not specified in A15 TRM. >> >> That interpretation sounds really strange to me. My recollection is that the >> cycle counter was available as a 64 bit register in ARMv7 as well. I would >> expect the Cortex TRMs to omit such details. The ARMv7 Architecture Reference >> Manual is the complete and authoritative source. > > Yes, the v7 ARM ARM is the authoritative source, and it says 32-bit. > Whereas the v8 ARM ARM wrt to AArch32 mode says it's both 32 and 64. Just looked it up as well in the good old ARM DDI 0406C.c and you're absolutely right. Sorry for the bad recollection. Cov -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Col
Re: [Qemu-devel] [PATCH] hw/pci: disable pci-bridge's shpc by default
On Sat, Nov 05, 2016 at 06:46:34PM +0200, Marcel Apfelbaum wrote: > On 11/03/2016 09:40 PM, Michael S. Tsirkin wrote: > > On Thu, Nov 03, 2016 at 01:05:44PM +0200, Marcel Apfelbaum wrote: > > > On 11/03/2016 06:18 AM, Michael S. Tsirkin wrote: > > > > On Wed, Nov 02, 2016 at 05:16:42PM +0200, Marcel Apfelbaum wrote: > > > > > The shpc component is optional while ACPI hotplug is used > > > > > for hot-plugging PCI devices into a PCI-PCI bridge. > > > > > Disabling the shpc by default will make slot 0 usable at boot time > > > > > > Hi Michael > > > > > > > > > > > at the cost of breaking all hotplug for all non-acpi users. > > > > > > > > > > Do we have a non-acpi user that is able to use the shpc component as-is > > > today? > > > > power and some arm systems I guess? > > > > Adding Andrew , maybe he can give us an answer. Not really :-) My lack of PCI knowledge makes that difficult. I'd be happy to help with an experiment though. Can you give me command line arguments, qmp commands, etc. that I should use to try it out? I imagine I should just boot an ARM guest using DT (instead of ACPI) and then attempt to hotplug a PCI device. I'm not sure, however, what, if any, special configuration I need in order to ensure I'm testing what you're interested in. Thanks, drew > > Anybody else can help answering this? > > > > I remember we need to even tweak QEMU before it can be used, but I might > > > be wrong. > > > > > > And we don't touch the current machines < 2.8 . > > > > > > > > and not only for hot-plug, without loosing any functionality. > > > > > Older machines will have shpc enabled for compatibility reasons. > > > > > > > > > > Signed-off-by: Marcel Apfelbaum > > > > > > > > Is an extra slot such a big deal? You can always add more bridges ... > > > > > > > > > > It is not only about the slot itself, but more about the usage model. > > > The PCIe Upstream ports/DMI-PCI devices are also pci-bridges, > > > but for them slot 0 is allowed. > > > > The reason is that these devices are not themselves > > hotpluggable. Isn't there a flag that allows adding > > a non hotpluggable device? Allowing these would be one solution. > > > > > And what about the hotplug? Slot 0 is not usable at boot, but then is > > > usable again (for ACPI users) making people wondering: > > > https://bugzilla.redhat.com/show_bug.cgi?id=1175113 > > > > Let's just disallow that then for consistency? > > > > I suppose we can do that... not sure if it worth it. > > Thanks, > Marcel > > > > > > My point is - can shpc be used as-is today? Even so, I suspect there are > > > much (much) > > > less users using SHPC than ACPI based hotplug. If this is the case, why > > > bother the > > > majority of the users? And for the shpc users, they can keep the prev > > > machines > > > or change the command line, I think changes like this happens over the > > > time. > > > > > > Adding Markus for his opinion on command line changes. > > > > > > Thanks, > > > Marcel > > > > > --- > > > > > hw/pci-bridge/pci_bridge_dev.c | 2 +- > > > > > include/hw/compat.h| 4 > > > > > 2 files changed, 5 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/hw/pci-bridge/pci_bridge_dev.c > > > > > b/hw/pci-bridge/pci_bridge_dev.c > > > > > index 5dbd933..647ad80 100644 > > > > > --- a/hw/pci-bridge/pci_bridge_dev.c > > > > > +++ b/hw/pci-bridge/pci_bridge_dev.c > > > > > @@ -163,7 +163,7 @@ static Property pci_bridge_dev_properties[] = { > > > > > DEFINE_PROP_ON_OFF_AUTO(PCI_BRIDGE_DEV_PROP_MSI, PCIBridgeDev, > > > > > msi, > > > > > ON_OFF_AUTO_AUTO), > > > > > DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags, > > > > > -PCI_BRIDGE_DEV_F_SHPC_REQ, true), > > > > > +PCI_BRIDGE_DEV_F_SHPC_REQ, false), > > > > > DEFINE_PROP_END_OF_LIST(), > > > > > }; > > > > > > > > > > diff --git a/include/hw/compat.h b/include/hw/compat.h > > > > > index 0f06e11..388b7ec 100644 > > > > > --- a/include/hw/compat.h > > > > > +++ b/include/hw/compat.h > > > > > @@ -18,6 +18,10 @@ > > > > > .driver = "intel-iommu",\ > > > > > .property = "x-buggy-eim",\ > > > > > .value= "true",\ > > > > > +},{\ > > > > > +.driver = "pci-bridge",\ > > > > > +.property = "shpc",\ > > > > > +.value= "on",\ > > > > > }, > > > > > > > > > > #define HW_COMPAT_2_6 \ > > > > > -- > > > > > 2.5.5 > >
Re: [Qemu-devel] [PATCH] hw/pci: disable pci-bridge's shpc by default
On 11/16/2016 06:44 PM, Andrew Jones wrote: On Sat, Nov 05, 2016 at 06:46:34PM +0200, Marcel Apfelbaum wrote: On 11/03/2016 09:40 PM, Michael S. Tsirkin wrote: On Thu, Nov 03, 2016 at 01:05:44PM +0200, Marcel Apfelbaum wrote: On 11/03/2016 06:18 AM, Michael S. Tsirkin wrote: On Wed, Nov 02, 2016 at 05:16:42PM +0200, Marcel Apfelbaum wrote: The shpc component is optional while ACPI hotplug is used for hot-plugging PCI devices into a PCI-PCI bridge. Disabling the shpc by default will make slot 0 usable at boot time Hi Michael at the cost of breaking all hotplug for all non-acpi users. Do we have a non-acpi user that is able to use the shpc component as-is today? power and some arm systems I guess? Adding Andrew , maybe he can give us an answer. Not really :-) My lack of PCI knowledge makes that difficult. I'd be happy to help with an experiment though. Can you give me command line arguments, qmp commands, etc. that I should use to try it out? I imagine I should just boot an ARM guest using DT (instead of ACPI) and then attempt to hotplug a PCI device. I'm not sure, however, what, if any, special configuration I need in order to ensure I'm testing what you're interested in. Hi Drew, Just run QEMU with '-device pci-bridge,chassis_nr=1,id=bridge1 -monitor stdio' with an ARM guest using DT and wait until the guest finish booting. Then run at hmp: device_add virtio-net-pci,bus=bridge1,id=net2 Next run lspci in the guest to see the new device. BTW, will an ARM guest run 'fast' enough to be usable on a x86 machine? If yes, any pointers on how to create such a guest? Thanks, Marcel Thanks, drew [...]
Re: [Qemu-devel] [PATCH RFC 1/2] add bitmap_free() wrapper
On Wed, Nov 16, 2016 at 05:02:55PM +0100, Igor Mammedov wrote: > it will be used for freeing bitmaps allocated with bitmap_[try]_new() > > Signed-off-by: Igor Mammedov We need to change all code using g_free() for bitmaps to use bitmap_free(), as people in the future might assume that changing bitmap_free() is safe (and it won't be). Personally, I think g_free() is good enough and we don't need bitmap_free(). The assumption that bitmap_new() returns g_free()-able memory is part of the API. > --- > include/qemu/bitmap.h | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h > index 63ea2d0..0289836 100644 > --- a/include/qemu/bitmap.h > +++ b/include/qemu/bitmap.h > @@ -98,6 +98,11 @@ static inline unsigned long *bitmap_new(long nbits) > return ptr; > } > > +static inline void bitmap_free(unsigned long *bitmap) > +{ > +g_free(bitmap); > +} > + > static inline void bitmap_zero(unsigned long *dst, long nbits) > { > if (small_nbits(nbits)) { > -- > 2.7.4 > -- Eduardo
Re: [Qemu-devel] [PATCH v6 1/2] block/vxhs.c: Add support for a new block device type called "vxhs"
On Wed, Nov 16, 2016 at 3:27 AM, Stefan Hajnoczi wrote: > On Wed, Nov 16, 2016 at 9:49 AM, Fam Zheng wrote: >> On Wed, 11/16 10:04, Markus Armbruster wrote: >>> ashish mittal writes: >>> >>> > Thanks for concluding on this. >>> > >>> > I will rearrange the qnio_api.h header accordingly as follows: >>> > >>> > +#include "qemu/osdep.h" >>> >>> Headers should not include osdep.h. >> >> This is about including "osdep.h" _and_ "qnio_api.h" in block/vxhs.c, so what >> Ashish means looks good to me. > > Yes, I think "will rearrange the qnio_api.h header" was a typo and was > supposed to be block/vxhs.c. > > Stefan Thanks for the correction. Yes, i meant rearrange headers in block/vxhs.c.
Re: [Qemu-devel] [PATCH RFC 2/2] numa: make -numa parser dynamically allocate CPUs masks
On Wed, Nov 16, 2016 at 05:02:56PM +0100, Igor Mammedov wrote: > so it won't impose an additional limits on max_cpus limits > supported by different targets. > > It removes global MAX_CPUMASK_BITS constant and need to > bump it up whenever max_cpus is being increased for > a target above MAX_CPUMASK_BITS value. > > Use runtime max_cpus value instead to allocate sufficiently > sized node_cpu bitmasks in numa parser. > > Signed-off-by: Igor Mammedov > --- > include/sysemu/numa.h | 2 +- > include/sysemu/sysemu.h | 7 --- > numa.c | 19 --- > vl.c| 5 - > 4 files changed, 13 insertions(+), 20 deletions(-) > > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h > index 4da808a..8f09dcf 100644 > --- a/include/sysemu/numa.h > +++ b/include/sysemu/numa.h > @@ -17,7 +17,7 @@ struct numa_addr_range { > > typedef struct node_info { > uint64_t node_mem; > -DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); > +unsigned long *node_cpu; > struct HostMemoryBackend *node_memdev; > bool present; > QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */ > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > index 66c6f15..cccde56 100644 > --- a/include/sysemu/sysemu.h > +++ b/include/sysemu/sysemu.h > @@ -168,13 +168,6 @@ extern int mem_prealloc; > #define MAX_NODES 128 > #define NUMA_NODE_UNASSIGNED MAX_NODES > > -/* The following shall be true for all CPUs: > - * cpu->cpu_index < max_cpus <= MAX_CPUMASK_BITS > - * > - * Note that cpu->get_arch_id() may be larger than MAX_CPUMASK_BITS. > - */ > -#define MAX_CPUMASK_BITS 288 > - Nice! > #define MAX_OPTION_ROMS 16 > typedef struct QEMUOptionRom { > const char *name; > diff --git a/numa.c b/numa.c > index 9c09e45..5542e40 100644 > --- a/numa.c > +++ b/numa.c > @@ -266,20 +266,20 @@ static char *enumerate_cpus(unsigned long *cpus, int > max_cpus) > static void validate_numa_cpus(void) > { > int i; > -DECLARE_BITMAP(seen_cpus, MAX_CPUMASK_BITS); > +unsigned long *seen_cpus = bitmap_new(max_cpus); > > -bitmap_zero(seen_cpus, MAX_CPUMASK_BITS); > +bitmap_zero(seen_cpus, max_cpus); bitmap_new() already returns a zeroed bitmap. > for (i = 0; i < nb_numa_nodes; i++) { > -if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, > - MAX_CPUMASK_BITS)) { > +if (bitmap_intersects(seen_cpus, numa_info[i].node_cpu, max_cpus)) { > bitmap_and(seen_cpus, seen_cpus, > - numa_info[i].node_cpu, MAX_CPUMASK_BITS); > + numa_info[i].node_cpu, max_cpus); > error_report("CPU(s) present in multiple NUMA nodes: %s", > enumerate_cpus(seen_cpus, max_cpus)); > +bitmap_free(seen_cpus); > exit(EXIT_FAILURE); > } > bitmap_or(seen_cpus, seen_cpus, > - numa_info[i].node_cpu, MAX_CPUMASK_BITS); > + numa_info[i].node_cpu, max_cpus); > } > > if (!bitmap_full(seen_cpus, max_cpus)) { > @@ -291,12 +291,17 @@ static void validate_numa_cpus(void) > "in NUMA config"); > g_free(msg); > } > +bitmap_free(seen_cpus); See comment about bitmap_free() on patch 1/2. I think g_free() is good enough (unless you really want to review all callers of bitmap_[try_]new()). > } > > void parse_numa_opts(MachineClass *mc) > { > int i; > > +for (i = 0; i < MAX_NODES; i++) { > +numa_info[i].node_cpu = bitmap_new(max_cpus); > +} > + > if (qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, NULL, NULL)) { > exit(1); > } > @@ -362,7 +367,7 @@ void parse_numa_opts(MachineClass *mc) > numa_set_mem_ranges(); > > for (i = 0; i < nb_numa_nodes; i++) { > -if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) { > +if (!bitmap_empty(numa_info[i].node_cpu, max_cpus)) { > break; > } > } > diff --git a/vl.c b/vl.c > index d77dd86..37790e5 100644 > --- a/vl.c > +++ b/vl.c > @@ -1277,11 +1277,6 @@ static void smp_parse(QemuOpts *opts) > > max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); > > -if (max_cpus > MAX_CPUMASK_BITS) { > -error_report("unsupported number of maxcpus"); > -exit(1); > -} > - > if (max_cpus < cpus) { > error_report("maxcpus must be equal to or greater than smp"); > exit(1); > -- > 2.7.4 > -- Eduardo
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On 11/16/16 13:47, Paolo Bonzini wrote: > >> If the consensus is that the patch is a QEMU bugfix (as opposed to a >> feature) and that it is eligible for the currently supported upstream >> stable branches, that's the best, no doubt. > > The currently supported upstream stable branches is just 2.7. :) > > I'm okay with bending the rules and including it in 2.8, but it's > worrisome that you also needed to go back from relaxed to traditional > delivery, meaning that old QEMU + new OVMF will take ages to boot. > > If this is the case, I still think this needs some kind of discovery > mechanism, unless OVMF can just say "things were too broken, stop > supporting SMM on QEMUs older than 2.8". > > For example: > > - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP > setting is used for the PCD; this would be backwards compatibility mode. Okay, but this still means that the PCD has to become dynamic, and we must set the PCD earlier (likely in PlatformPei) based on something. I guess that's what the next paragraph is about: > - we could have another magic 0xB2 value, which is implemented directly > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > to detect the new feature. It can fail to start if using traditional > AP and the new feature is not there. Please explain in more detail. If I write to 0xB2 (by invoking the Trigger() method or somehow else), then on old QEMU's that will raise a sync / unicast SMI. The SMI handler in edk2 will run, but no request parameters will have been set up by OVMF, so the SMI handler will do... no clue what. I don't think this is a good idea. My preference is fw_cfg ATM. It provides a prove, flexible and extensible interface (it's easy to add new files for future features). If we expect more knobs in the area, I can modify my proposal to use "etc/smi/broadcast", so we can add "etc/smi/" later. Do you have any specific arguments against fw_cfg? As I suggested in my previous email, with fw_cfg I can implement the change in OVMF such that the default behavior wouldn't change -- the default delivery would remain relaxed, and the broadcast wouldn't be requested, unless the fw_cfg file told OVMF otherwise. > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > would make QEMU use broadcast behavior for all values in the 0x10-0xff > range, or something like that. Are we talking control/command (0xB2) or scratch/data (0xB3) register values? My patches currently use the scratch/data register to provide the hint to QEMU; that register is less likely to interfere with anything the SMM core in edk2 does. I seem to recall that SmmSwDispatch uses command/control values to distinguish the called functions. Should we keep the broadcast / unicast decision separate from the control/command value ? Thanks Laszlo > > Paolo > >> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The >> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually >> correct; when I was writing the OVMF docs, I must have misunderstood the >> requirements and needlessly required 2.5+; 2.4+ should have been fine.) >> >> Which means the fix should be backported as far as stable-2.4. >> >> Should we proceed with that? CC'ing Mike Roth and the stable list. >> >> Thanks! >> Laszlo >> >>> >>> > > Paolo > >> --- >> hw/isa/lpc_ich9.c | 12 +++- >> 1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >> index 10d1ee8b9310..f2fe644fdaa4 100644 >> --- a/hw/isa/lpc_ich9.c >> +++ b/hw/isa/lpc_ich9.c >> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool >> smm_enabled) >> >> /* APM */ >> >> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >> + >> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >> { >> ICH9LPCState *lpc = arg; >> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, >> void *arg) >> >> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >> -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >> +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >> +CPUState *cs; >> + >> +CPU_FOREACH(cs) { >> +cpu_interrupt(cs, CPU_INTERRUPT_SMI); >> +} >> +} else { >> +cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >> +} >> } >> } >> >> >> >>
[Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode
v2: * Uninitialized node->deleted gone [Fam] * Removed 1024 polling loop iteration qemu_clock_get_ns() optimization which created a weird step pattern [Fam] * Unified with AioHandler, dropped AioPollHandler struct [Paolo] (actually I think Paolo had more in mind but this is the first step) * Only poll when all event loop resources support it [Paolo] * Added run_poll_handlers_begin/end trace events for perf analysis * Sorry, Christian, no virtqueue kick suppression yet Recent performance investigation work done by Karl Rister shows that the guest->host notification takes around 20 us. This is more than the "overhead" of QEMU itself (e.g. block layer). One way to avoid the costly exit is to use polling instead of notification. The main drawback of polling is that it consumes CPU resources. In order to benefit performance the host must have extra CPU cycles available on physical CPUs that aren't used by the guest. This is an experimental AioContext polling implementation. It adds a polling callback into the event loop. Polling functions are implemented for virtio-blk virtqueue guest->host kick and Linux AIO completion. The QEMU_AIO_POLL_MAX_NS environment variable sets the number of nanoseconds to poll before entering the usual blocking poll(2) syscall. Try setting this variable to the time from old request completion to new virtqueue kick. By default no polling is done. The QEMU_AIO_POLL_MAX_NS must be set to get any polling! Stefan Hajnoczi (4): aio: add AioPollFn and io_poll() interface aio: add polling mode to AioContext virtio: poll virtqueues for new buffers linux-aio: poll ring for completions aio-posix.c | 115 ++-- async.c | 14 +- block/curl.c| 8 +-- block/iscsi.c | 3 +- block/linux-aio.c | 19 +++- block/nbd-client.c | 8 +-- block/nfs.c | 7 +-- block/sheepdog.c| 26 +- block/ssh.c | 4 +- block/win32-aio.c | 4 +- hw/virtio/virtio.c | 18 ++- include/block/aio.h | 8 ++- iohandler.c | 2 +- nbd/server.c| 9 ++-- stubs/set-fd-handler.c | 1 + tests/test-aio.c| 4 +- trace-events| 4 ++ util/event_notifier-posix.c | 2 +- 18 files changed, 207 insertions(+), 49 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH v2 1/4] aio: add AioPollFn and io_poll() interface
The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi --- aio-posix.c | 8 +--- async.c | 5 +++-- block/curl.c| 8 block/iscsi.c | 3 ++- block/linux-aio.c | 4 ++-- block/nbd-client.c | 8 block/nfs.c | 7 --- block/sheepdog.c| 26 +- block/ssh.c | 4 ++-- block/win32-aio.c | 4 ++-- hw/virtio/virtio.c | 4 ++-- include/block/aio.h | 5 - iohandler.c | 2 +- nbd/server.c| 9 - stubs/set-fd-handler.c | 1 + tests/test-aio.c| 4 ++-- util/event_notifier-posix.c | 2 +- 17 files changed, 56 insertions(+), 48 deletions(-) diff --git a/aio-posix.c b/aio-posix.c index e13b9ab..4379c13 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -200,6 +200,7 @@ void aio_set_fd_handler(AioContext *ctx, bool is_external, IOHandler *io_read, IOHandler *io_write, +AioPollFn *io_poll, void *opaque) { AioHandler *node; @@ -258,10 +259,11 @@ void aio_set_fd_handler(AioContext *ctx, void aio_set_event_notifier(AioContext *ctx, EventNotifier *notifier, bool is_external, -EventNotifierHandler *io_read) +EventNotifierHandler *io_read, +AioPollFn *io_poll) { -aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), - is_external, (IOHandler *)io_read, NULL, notifier); +aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external, + (IOHandler *)io_read, NULL, io_poll, notifier); } bool aio_prepare(AioContext *ctx) diff --git a/async.c b/async.c index b2de360..c8fbd63 100644 --- a/async.c +++ b/async.c @@ -282,7 +282,7 @@ aio_ctx_finalize(GSource *source) } qemu_mutex_unlock(&ctx->bh_lock); -aio_set_event_notifier(ctx, &ctx->notifier, false, NULL); +aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL); event_notifier_cleanup(&ctx->notifier); qemu_rec_mutex_destroy(&ctx->lock); qemu_mutex_destroy(&ctx->bh_lock); @@ -366,7 +366,8 @@ AioContext *aio_context_new(Error **errp) aio_set_event_notifier(ctx, &ctx->notifier, false, (EventNotifierHandler *) - event_notifier_dummy_cb); + event_notifier_dummy_cb, + NULL); #ifdef CONFIG_LINUX_AIO ctx->linux_aio = NULL; #endif diff --git a/block/curl.c b/block/curl.c index 0404c1b..792fef8 100644 --- a/block/curl.c +++ b/block/curl.c @@ -192,19 +192,19 @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, switch (action) { case CURL_POLL_IN: aio_set_fd_handler(s->aio_context, fd, false, - curl_multi_read, NULL, state); + curl_multi_read, NULL, NULL, state); break; case CURL_POLL_OUT: aio_set_fd_handler(s->aio_context, fd, false, - NULL, curl_multi_do, state); + NULL, curl_multi_do, NULL, state); break; case CURL_POLL_INOUT: aio_set_fd_handler(s->aio_context, fd, false, - curl_multi_read, curl_multi_do, state); + curl_multi_read, curl_multi_do, NULL, state); break; case CURL_POLL_REMOVE: aio_set_fd_handler(s->aio_context, fd, false, - NULL, NULL, NULL); + NULL, NULL, NULL, NULL); break; } diff --git a/block/iscsi.c b/block/iscsi.c index 71bd523..76d0308 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -362,6 +362,7 @@ iscsi_set_events(IscsiLun *iscsilun) false, (ev & POLLIN) ? iscsi_process_read : NULL, (ev & POLLOUT) ? iscsi_process_write : NULL, + NULL, iscsilun); iscsilun->events = ev; } @@ -1524,7 +1525,7 @@ static void iscsi_detach_aio_context(BlockDriverState *bs) IscsiLun *iscsilun = bs->opaque; aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi), - false, NULL, NULL, NULL); + false, NULL, NULL, NULL, NULL); iscsilun->events = 0; if (iscsilun->nop_timer) { diff --git a/block/linux-aio.c b/block/
[Qemu-devel] [PATCH v2 4/4] linux-aio: poll ring for completions
The Linux AIO userspace ABI includes a ring that is shared with the kernel. This allows userspace programs to process completions without system calls. Add an AioContext poll handler to check for completions in the ring. Signed-off-by: Stefan Hajnoczi --- block/linux-aio.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/block/linux-aio.c b/block/linux-aio.c index 69c4ed5..03ab741 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -255,6 +255,20 @@ static void qemu_laio_completion_cb(EventNotifier *e) } } +static bool qemu_laio_poll_cb(void *opaque) +{ +EventNotifier *e = opaque; +LinuxAioState *s = container_of(e, LinuxAioState, e); +struct io_event *events; + +if (!io_getevents_peek(s->ctx, &events)) { +return false; +} + +qemu_laio_process_completions_and_submit(s); +return true; +} + static void laio_cancel(BlockAIOCB *blockacb) { struct qemu_laiocb *laiocb = (struct qemu_laiocb *)blockacb; @@ -448,7 +462,8 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context) s->aio_context = new_context; s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s); aio_set_event_notifier(new_context, &s->e, false, - qemu_laio_completion_cb, NULL); + qemu_laio_completion_cb, + qemu_laio_poll_cb); } LinuxAioState *laio_init(void) -- 2.7.4
[Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext
The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file descriptors or until a timer expires. In cases like virtqueues, Linux AIO, and ThreadPool it is technically possible to wait for events via polling (i.e. continuously checking for events without blocking). Polling can be faster than blocking syscalls because file descriptors, the process scheduler, and system calls are bypassed. The main disadvantage to polling is that it increases CPU utilization. In classic polling configuration a full host CPU thread might run at 100% to respond to events as quickly as possible. This patch implements a timeout so we fall back to blocking syscalls if polling detects no activity. After the timeout no CPU cycles are wasted on polling until the next event loop iteration. This patch implements an experimental polling mode that can be controlled with the QEMU_AIO_POLL_MAX_NS= environment variable. The aio_poll() event loop function will attempt to poll instead of using blocking syscalls. The run_poll_handlers_begin() and run_poll_handlers_end() trace events are added to aid performance analysis and troubleshooting. If you need to know whether polling mode is being used, trace these events to find out. Signed-off-by: Stefan Hajnoczi --- aio-posix.c | 107 +++- async.c | 11 +- include/block/aio.h | 3 ++ trace-events| 4 ++ 4 files changed, 123 insertions(+), 2 deletions(-) diff --git a/aio-posix.c b/aio-posix.c index 4379c13..5e5a561 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -18,6 +18,8 @@ #include "block/block.h" #include "qemu/queue.h" #include "qemu/sockets.h" +#include "qemu/cutils.h" +#include "trace.h" #ifdef CONFIG_EPOLL_CREATE1 #include #endif @@ -27,12 +29,16 @@ struct AioHandler GPollFD pfd; IOHandler *io_read; IOHandler *io_write; +AioPollFn *io_poll; int deleted; void *opaque; bool is_external; QLIST_ENTRY(AioHandler) node; }; +/* How long to poll AioPollHandlers before monitoring file descriptors */ +static int64_t aio_poll_max_ns; + #ifdef CONFIG_EPOLL_CREATE1 /* The fd number threashold to switch to epoll */ @@ -206,11 +212,12 @@ void aio_set_fd_handler(AioContext *ctx, AioHandler *node; bool is_new = false; bool deleted = false; +int poll_disable_cnt = 0; node = find_aio_handler(ctx, fd); /* Are we deleting the fd handler? */ -if (!io_read && !io_write) { +if (!io_read && !io_write && !io_poll) { if (node == NULL) { return; } @@ -229,6 +236,10 @@ void aio_set_fd_handler(AioContext *ctx, QLIST_REMOVE(node, node); deleted = true; } + +if (!node->io_poll) { +poll_disable_cnt = -1; +} } else { if (node == NULL) { /* Alloc and insert if it's not already there */ @@ -238,10 +249,22 @@ void aio_set_fd_handler(AioContext *ctx, g_source_add_poll(&ctx->source, &node->pfd); is_new = true; + +if (!io_poll) { +poll_disable_cnt = 1; +} +} else { +if (!node->io_poll && io_poll) { +poll_disable_cnt = -1; +} else if (node->io_poll && !io_poll) { +poll_disable_cnt = 1; +} } + /* Update handler with latest information */ node->io_read = io_read; node->io_write = io_write; +node->io_poll = io_poll; node->opaque = opaque; node->is_external = is_external; @@ -251,6 +274,9 @@ void aio_set_fd_handler(AioContext *ctx, aio_epoll_update(ctx, node, is_new); aio_notify(ctx); + +ctx->poll_disable_cnt += poll_disable_cnt; + if (deleted) { g_free(node); } @@ -268,6 +294,7 @@ void aio_set_event_notifier(AioContext *ctx, bool aio_prepare(AioContext *ctx) { +/* TODO run poll handlers? */ return false; } @@ -402,6 +429,56 @@ static void add_pollfd(AioHandler *node) npfd++; } +/* run_poll_handlers: + * @ctx: the AioContext + * @max_ns: maximum time to poll for, in nanoseconds + * + * Polls for a given time. + * + * Note that ctx->notify_me must be non-zero so this function can detect + * aio_notify(). + * + * Note that the caller must have incremented ctx->walking_handlers. + * + * Returns: true if progress was made, false otherwise + */ +static bool run_poll_handlers(AioContext *ctx, int64_t max_ns) +{ +bool progress = false; +int64_t end_time; + +assert(ctx->notify_me); +assert(ctx->walking_handlers > 0); +assert(ctx->poll_disable_cnt == 0); + +trace_run_poll_handlers_begin(ctx, max_ns); + +end_time = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + max_ns; + +do { +AioHandler *node; + +/* Bail if aio_notify() was called (e.g. BH was scheduled) */ +if (atomic_read(&ctx->notified)) { +progres
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On 11/16/16 14:18, Michael S. Tsirkin wrote: > On Wed, Nov 16, 2016 at 07:47:42AM -0500, Paolo Bonzini wrote: >> >>> If the consensus is that the patch is a QEMU bugfix (as opposed to a >>> feature) and that it is eligible for the currently supported upstream >>> stable branches, that's the best, no doubt. >> >> The currently supported upstream stable branches is just 2.7. :) >> >> I'm okay with bending the rules and including it in 2.8, but it's >> worrisome that you also needed to go back from relaxed to traditional >> delivery, meaning that old QEMU + new OVMF will take ages to boot. >> >> If this is the case, I still think this needs some kind of discovery >> mechanism, unless OVMF can just say "things were too broken, stop >> supporting SMM on QEMUs older than 2.8". >> >> For example: >> >> - OVMF should keep on using 0x00 (no broadcast) if the relaxed AP >> setting is used for the PCD; this would be backwards compatibility mode. >> >> - we could have another magic 0xB2 value, which is implemented directly >> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >> to detect the new feature. It can fail to start if using traditional >> AP and the new feature is not there. > > If we keep collecting these magic values, should architect it > and do a host/guest bitmap like virtio does? A feature bitmap is not a bad idea; I can modify my proposal to say, '"etc/smi/features" is a little-endian uint64_t feature bitmap, where bit #0 is the availability of broadcast SMIs. Request it by writing 'Q' to STS before triggering an SMI via writing CNT'. Another example where we use a feature bitmap is fw_cfg itself (the DMA capability is signaled by bit 1). However, feature *negotiation* is overkill, in my opinion. > >> By the way, in case OVMF needs to use SmmSwDispatch in the future, I >> would make QEMU use broadcast behavior for all values in the 0x10-0xff >> range, or something like that. >> >> Paolo > > It bothers me with all these ideas is that it's PV. > Unavoidable? It seems so, yes -- as I understand it, the software-initiated SMI on bare metal Q35 is meant to be broadcast unconditionally, but we had diverged from that in our Q35 implementation, historically. SeaBIOS came to rely on the unicast nature of QEMU's SMI (AIUI) and now we have to invent a way to select the non-historical broadcast. ( BTW, I foresee further Frankensteinization of Q35, as the maximum amount of SMRAM (TSEG) it provides, by spec, is 8MB, and that might not be enough for a very large VCPU count. (The SMM stack was originally tested against 255 VCPUs, yes, but the VCPU max continues to grow, plus edk2 developers keep adding SMM features that require more SMRAM -- sometimes more SMRAM even per CPU.) We have one unused bit pattern left in the TSEG_SZ bit field of the ESMRAMC register, namely binary 11, which stands for "reserved". We might want to commandeer that down the line, and associate a really large SMRAM / TSEG size with it -- 128MB or 256MB, for example. Or, we could use it to signal some other way for TSEG size configuration. The TSEG is carved out of the end of the <4GB RAM, so larger TSEGs than 8MB should fit, as long as the guest is started with enough memory. Anyway, I digress... ) Thanks Laszlo > >>> For reference, the OVMF documentation recommends QEMU 2.5+ for SMM. The >>> SMM enablement in libvirt enforces QEMU 2.4+. (Libvirt is actually >>> correct; when I was writing the OVMF docs, I must have misunderstood the >>> requirements and needlessly required 2.5+; 2.4+ should have been fine.) >>> >>> Which means the fix should be backported as far as stable-2.4. >>> >>> Should we proceed with that? CC'ing Mike Roth and the stable list. >>> >>> Thanks! >>> Laszlo >>> >> >> Paolo >> >>> --- >>> hw/isa/lpc_ich9.c | 12 +++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c >>> index 10d1ee8b9310..f2fe644fdaa4 100644 >>> --- a/hw/isa/lpc_ich9.c >>> +++ b/hw/isa/lpc_ich9.c >>> @@ -372,6 +372,8 @@ void ich9_lpc_pm_init(PCIDevice *lpc_pci, bool >>> smm_enabled) >>> >>> /* APM */ >>> >>> +#define QEMU_ICH9_APM_STS_BROADCAST_SMI 'Q' >>> + >>> static void ich9_apm_ctrl_changed(uint32_t val, void *arg) >>> { >>> ICH9LPCState *lpc = arg; >>> @@ -386,7 +388,15 @@ static void ich9_apm_ctrl_changed(uint32_t val, >>> void *arg) >>> >>> /* SMI_EN = PMBASE + 30. SMI control and enable register */ >>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) { >>> -cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI); >>> +if (lpc->apm.apms == QEMU_ICH9_APM_STS_BROADCAST_SMI) { >>> +CPUState *cs; >>> + >>> +CPU_FOREACH(cs) { >>> +cpu_interrupt(cs, CPU_INTERRUPT_SMI); >>> +
[Qemu-devel] [PATCH v2 3/4] virtio: poll virtqueues for new buffers
Add an AioContext poll handler to detect new virtqueue buffers without waiting for a guest->host notification. Signed-off-by: Stefan Hajnoczi --- hw/virtio/virtio.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 8985a2f..982ba85 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -2015,13 +2015,27 @@ static void virtio_queue_host_notifier_aio_read(EventNotifier *n) } } +static bool virtio_queue_host_notifier_aio_poll(void *opaque) +{ +EventNotifier *n = opaque; +VirtQueue *vq = container_of(n, VirtQueue, host_notifier); + +if (virtio_queue_empty(vq)) { +return false; +} + +virtio_queue_notify_aio_vq(vq); +return true; +} + void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext *ctx, VirtIOHandleOutput handle_output) { if (handle_output) { vq->handle_aio_output = handle_output; aio_set_event_notifier(ctx, &vq->host_notifier, true, - virtio_queue_host_notifier_aio_read, NULL); + virtio_queue_host_notifier_aio_read, + virtio_queue_host_notifier_aio_poll); } else { aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL); /* Test and clear notifier before after disabling event, -- 2.7.4
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On 11/16/16 15:05, Paolo Bonzini wrote: > > > On 16/11/2016 14:18, Michael S. Tsirkin wrote: >>> - we could have another magic 0xB2 value, which is implemented directly >>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>> to detect the new feature. It can fail to start if using traditional >>> AP and the new feature is not there. >> >> If we keep collecting these magic values, should architect it >> and do a host/guest bitmap like virtio does? > > The value written in 0xB3 can certainly be a feature bitmap. For now we > would have for example > > bit 0 if set, writing 0x10-0xFF to 0xB2 results in a broadcast SMI > bit 1-7 zero Doable, but: - doesn't address how OVMF learns about the broadcast SMI availability, - the command value OVMF currently writes is 0. How about this: - etc/smi/features is the LE uint64_t bitmap proposed earlier, bit#0 stands for broadcast SMI availability - 0xB2 is the command value (independent of 0xB3) - 0XB3 is a guest feature bitmap (valid for the next request). SeaBIOS reserves bit#0 already (uses values 0 and 1), so we can use the remaining 7 bits for requesting features. Bit#1 (value 2) could be the broadcast SMI. This does resemble a kind of feature negotiation, except the host cannot signal back an error (unsupported combination of features), like virtio-1.0 can. We can make QEMU abort in that case, or ignore the flags. Thanks Laszlo
[Qemu-devel] [PATCH 2/3] virtio: access ISR atomically
This will be needed once dataplane will be able to set it outside the big QEMU lock. Signed-off-by: Paolo Bonzini --- v1->v2: squash syntax error fix from patch 3 [Christian] hw/virtio/virtio-mmio.c | 6 +++--- hw/virtio/virtio-pci.c | 9 +++-- hw/virtio/virtio.c | 18 +- 3 files changed, 19 insertions(+), 14 deletions(-) diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c index a30270f..17412cb 100644 --- a/hw/virtio/virtio-mmio.c +++ b/hw/virtio/virtio-mmio.c @@ -191,7 +191,7 @@ static uint64_t virtio_mmio_read(void *opaque, hwaddr offset, unsigned size) return virtio_queue_get_addr(vdev, vdev->queue_sel) >> proxy->guest_page_shift; case VIRTIO_MMIO_INTERRUPTSTATUS: -return vdev->isr; +return atomic_read(&vdev->isr); case VIRTIO_MMIO_STATUS: return vdev->status; case VIRTIO_MMIO_HOSTFEATURESSEL: @@ -299,7 +299,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, uint64_t value, } break; case VIRTIO_MMIO_INTERRUPTACK: -vdev->isr &= ~value; +atomic_and(&vdev->isr, ~value); virtio_update_irq(vdev); break; case VIRTIO_MMIO_STATUS: @@ -347,7 +347,7 @@ static void virtio_mmio_update_irq(DeviceState *opaque, uint16_t vector) if (!vdev) { return; } -level = (vdev->isr != 0); +level = (atomic_read(&vdev->isr) != 0); DPRINTF("virtio_mmio setting IRQ %d\n", level); qemu_set_irq(proxy->irq, level); } diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 97b32fe..521ba0b 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -73,7 +73,7 @@ static void virtio_pci_notify(DeviceState *d, uint16_t vector) msix_notify(&proxy->pci_dev, vector); else { VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); -pci_set_irq(&proxy->pci_dev, vdev->isr & 1); +pci_set_irq(&proxy->pci_dev, atomic_read(&vdev->isr) & 1); } } @@ -449,8 +449,7 @@ static uint32_t virtio_ioport_read(VirtIOPCIProxy *proxy, uint32_t addr) break; case VIRTIO_PCI_ISR: /* reading from the ISR also clears it. */ -ret = vdev->isr; -vdev->isr = 0; +ret = atomic_xchg(&vdev->isr, 0); pci_irq_deassert(&proxy->pci_dev); break; case VIRTIO_MSI_CONFIG_VECTOR: @@ -1379,9 +1378,7 @@ static uint64_t virtio_pci_isr_read(void *opaque, hwaddr addr, { VirtIOPCIProxy *proxy = opaque; VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); -uint64_t val = vdev->isr; - -vdev->isr = 0; +uint64_t val = atomic_xchg(&vdev->isr, 0); pci_irq_deassert(&proxy->pci_dev); return val; diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index b7d5828..ecf13bd 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -945,7 +945,7 @@ void virtio_reset(void *opaque) vdev->guest_features = 0; vdev->queue_sel = 0; vdev->status = 0; -vdev->isr = 0; +atomic_set(&vdev->isr, 0); vdev->config_vector = VIRTIO_NO_VECTOR; virtio_notify_vector(vdev, vdev->config_vector); @@ -1318,10 +1318,18 @@ void virtio_del_queue(VirtIODevice *vdev, int n) vdev->vq[n].vring.num_default = 0; } +static void virtio_set_isr(VirtIODevice *vdev, int value) +{ +uint8_t old = atomic_read(&vdev->isr); +if ((old & value) != value) { +atomic_or(&vdev->isr, value); +} +} + void virtio_irq(VirtQueue *vq) { trace_virtio_irq(vq); -vq->vdev->isr |= 0x01; +virtio_set_isr(vq->vdev, 0x1); virtio_notify_vector(vq->vdev, vq->vector); } @@ -1355,7 +1363,7 @@ void virtio_notify(VirtIODevice *vdev, VirtQueue *vq) } trace_virtio_notify(vdev, vq); -vdev->isr |= 0x01; +virtio_set_isr(vq->vdev, 0x1); virtio_notify_vector(vdev, vq->vector); } @@ -1364,7 +1372,7 @@ void virtio_notify_config(VirtIODevice *vdev) if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) return; -vdev->isr |= 0x03; +virtio_set_isr(vdev, 0x3); vdev->generation++; virtio_notify_vector(vdev, vdev->config_vector); } @@ -1895,7 +1903,7 @@ void virtio_init(VirtIODevice *vdev, const char *name, vdev->device_id = device_id; vdev->status = 0; -vdev->isr = 0; +atomic_set(&vdev->isr, 0); vdev->queue_sel = 0; vdev->config_vector = VIRTIO_NO_VECTOR; vdev->vq = g_malloc0(sizeof(VirtQueue) * VIRTIO_QUEUE_MAX); -- 2.9.3
[Qemu-devel] [PATCH 1/3] virtio: introduce grab/release_ioeventfd to fix vhost
Following the recent refactoring of virtio notifiers [1], more specifically the patch ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to start/stop ioeventfd") that uses virtio_bus_set_host_notifier [2] by default, core virtio code requires 'ioeventfd_started' to be set to true/false when the host notifiers are configured. When vhost is stopped and started, however, there is a stop followed by another start. Since ioeventfd_started was never set to true, the 'stop' operation triggered by virtio_bus_set_host_notifier() will not result in a call to virtio_pci_ioeventfd_assign(assign=false). This leaves the memory regions with stale notifiers and results on the next start triggering the following assertion: kvm_mem_ioeventfd_add: error adding ioeventfd: File exists Aborted This patch reintroduces (hopefully in a cleaner way) the concept that was present with ioeventfd_disabled before the refactoring. When ioeventfd_grabbed>0, ioeventfd_started tracks whether ioeventfd should be enabled or not, but ioeventfd is actually not started at all until vhost releases the host notifiers. [1] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07748.html [2] http://lists.nongnu.org/archive/html/qemu-devel/2016-10/msg07760.html Reported-by: Felipe Franciosi Reported-by: Christian Borntraeger Reported-by: Alex Williamson Fixes: ed08a2a0b ("virtio: use virtio_bus_set_host_notifier to start/stop ioeventfd") Signed-off-by: Paolo Bonzini Message-Id: <2016192855.26350-1-pbonz...@redhat.com> Signed-off-by: Paolo Bonzini --- v1->v2: more comments [Cornelia] hw/virtio/vhost.c | 14 +- hw/virtio/virtio-bus.c | 58 ++ hw/virtio/virtio.c | 16 include/hw/virtio/virtio-bus.h | 14 ++ include/hw/virtio/virtio.h | 2 ++ 5 files changed, 86 insertions(+), 18 deletions(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 30aee88..f7f7023 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1214,17 +1214,17 @@ void vhost_dev_cleanup(struct vhost_dev *hdev) int vhost_dev_enable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) { BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev))); -VirtioBusState *vbus = VIRTIO_BUS(qbus); -VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus); int i, r, e; -if (!k->ioeventfd_assign) { +/* We will pass the notifiers to the kernel, make sure that QEMU + * doesn't interfere. + */ +r = virtio_device_grab_ioeventfd(vdev); +if (r < 0) { error_report("binding does not support host notifiers"); -r = -ENOSYS; goto fail; } -virtio_device_stop_ioeventfd(vdev); for (i = 0; i < hdev->nvqs; ++i) { r = virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), hdev->vq_index + i, true); @@ -1244,7 +1244,7 @@ fail_vq: } assert (e >= 0); } -virtio_device_start_ioeventfd(vdev); +virtio_device_release_ioeventfd(vdev); fail: return r; } @@ -1267,7 +1267,7 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, VirtIODevice *vdev) } assert (r >= 0); } -virtio_device_start_ioeventfd(vdev); +virtio_device_release_ioeventfd(vdev); } /* Test and clear event pending status. diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index bf61f66..d6c0c72 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -147,6 +147,39 @@ void virtio_bus_set_vdev_config(VirtioBusState *bus, uint8_t *config) } } +/* On success, ioeventfd ownership belongs to the caller. */ +int virtio_bus_grab_ioeventfd(VirtioBusState *bus) +{ +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus); + +/* vhost can be used even if ioeventfd=off in the proxy device, + * so do not check k->ioeventfd_enabled. + */ +if (!k->ioeventfd_assign) { +return -ENOSYS; +} + +if (bus->ioeventfd_grabbed == 0 && bus->ioeventfd_started) { +virtio_bus_stop_ioeventfd(bus); +/* Remember that we need to restart ioeventfd + * when ioeventfd_grabbed becomes zero. + */ +bus->ioeventfd_started = true; +} +bus->ioeventfd_grabbed++; +return 0; +} + +void virtio_bus_release_ioeventfd(VirtioBusState *bus) +{ +assert(bus->ioeventfd_grabbed != 0); +if (--bus->ioeventfd_grabbed == 0 && bus->ioeventfd_started) { +/* Force virtio_bus_start_ioeventfd to act. */ +bus->ioeventfd_started = false; +virtio_bus_start_ioeventfd(bus); +} +} + int virtio_bus_start_ioeventfd(VirtioBusState *bus) { VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus); @@ -161,10 +194,14 @@ int virtio_bus_start_ioeventfd(VirtioBusState *bus) if (bus->ioeventfd_started) { return 0; } -r = vdc->start_ioeventfd(vdev); -if (r < 0) { -error_report("%s: failed. Fallback to userspace (slower).", __
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
> I guess that's what the next paragraph is about: > > > - we could have another magic 0xB2 value, which is implemented directly > > in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it > > after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) > > to detect the new feature. It can fail to start if using traditional > > AP and the new feature is not there. > > Please explain in more detail. If I write to 0xB2 (by invoking the > Trigger() method or somehow else), then on old QEMU's that will raise a > sync / unicast SMI. The SMI handler in edk2 will run, but no request > parameters will have been set up by OVMF, so the SMI handler will do... > no clue what. It should hopefully do nothing. A spurious SMI (such as the one caused by the write to 0xB2) should not crash OVMF. SMBASE relocation uses IPIs, so my hope was to use the SmmCpuFeaturesSmmRelocationComplete hook. > My preference is fw_cfg ATM. It provides a prove, flexible and > extensible interface (it's easy to add new files for future features). > If we expect more knobs in the area, I can modify my proposal to use > "etc/smi/broadcast", so we can add "etc/smi/" later. Did you know there are 16 entries only for fw_cfg files? :) And we're using already 20 in the worst case: genroms/linuxboot.bin genroms/kvmvapic.bin NVDIMM_DSM_MEM_FILE "etc/smbios/smbios-tables" "etc/smbios/smbios-anchor" "etc/acpi/tables" "etc/table-loader" ACPI_BUILD_TPMLOG_FILE ACPI_BUILD_RSDP_FILE "etc/e820" "etc/msr_feature_control" "etc/reserved-memory-end" "etc/pvpanic-port" "etc/boot-menu-wait" "bootsplash.jpg" "etc/boot-fail-wait" "etc/igd-opregion" "etc/igd-bdsm-size" "etc/extra-pci-roots" "bootorder" Therefore, so close to the release I'm a bit worried about doing changes to fw_cfg or adding more fw_cfg files. Though we just got rid of one file for the number of CPUs, so I guess we might not care. > Do you have any specific arguments against fw_cfg? As I suggested in my > previous email, with fw_cfg I can implement the change in OVMF such that > the default behavior wouldn't change -- the default delivery would > remain relaxed, and the broadcast wouldn't be requested, unless the > fw_cfg file told OVMF otherwise. > > > By the way, in case OVMF needs to use SmmSwDispatch in the future, I > > would make QEMU use broadcast behavior for all values in the 0x10-0xff > > range, or something like that. > > Are we talking control/command (0xB2) or scratch/data (0xB3) register > values? My patches currently use the scratch/data register to provide > the hint to QEMU; that register is less likely to interfere with > anything the SMM core in edk2 does. Sorry I confused the two registers. 0xb3 is more or less unused as far as I can see indeed. Paolo
Re: [Qemu-devel] [PATCH v14 1/2] virtio-crypto: Add virtio crypto device specification
On 11/11/2016 10:23 AM, Gonglei wrote: > The virtio crypto device is a virtual crypto device (ie. hardware > crypto accelerator card). Currently, the virtio crypto device provides > the following crypto services: CIPHER, MAC, HASH, and AEAD. > > In this patch, CIPHER, MAC, HASH, AEAD services are introduced. > > VIRTIO-153 > > Signed-off-by: Gonglei > CC: Michael S. Tsirkin > CC: Cornelia Huck > CC: Stefan Hajnoczi > CC: Lingli Deng > CC: Jani Kokkonen > CC: Ola Liljedahl > CC: Varun Sethi > CC: Zeng Xin > CC: Keating Brian > CC: Ma Liang J > CC: Griffin John > CC: Hanweidong > CC: Mihai Claudiu Caraman > --- > content.tex | 2 + > virtio-crypto.tex | 945 > ++ > 2 files changed, 947 insertions(+) > create mode 100644 virtio-crypto.tex > > diff --git a/content.tex b/content.tex > index 4b45678..ab75f78 100644 > --- a/content.tex > +++ b/content.tex > @@ -5750,6 +5750,8 @@ descriptor for the \field{sense_len}, \field{residual}, > \field{status_qualifier}, \field{status}, \field{response} and > \field{sense} fields. > > +\input{virtio-crypto.tex} > + > \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > > Currently there are three device-independent feature bits defined: > diff --git a/virtio-crypto.tex b/virtio-crypto.tex > new file mode 100644 > index 000..9f7faf0 > --- /dev/null > +++ b/virtio-crypto.tex > @@ -0,0 +1,945 @@ > +\section{Crypto Device}\label{sec:Device Types / Crypto Device} > + > +The virtio crypto device is a virtual cryptography device as well as a kind > of > +virtual hardware accelerator for virtual machines. The encryption and > +decryption requests are placed in the data queue and are ultimately handled > by the ~~ The data queue can be misleading since its rather any of the data active queues. > +backend crypto accelerators. The second queue is the control queue used to > create This could be confusing since it is a second type or kind of queue but not necessarily the queue with index 1. > +or destroy sessions for symmetric algorithms and will control some advanced > +features in the future. The virtio crypto device provides the following > crypto Promising future advanced features seems to be out of scope for this specification. > +services: CIPHER, MAC, HASH, and AEAD. > + > + > +\subsection{Device ID}\label{sec:Device Types / Crypto Device / Device ID} > + > +20 > + > +\subsection{Virtqueues}\label{sec:Device Types / Crypto Device / Virtqueues} > + > +\begin{description} > +\item[0] dataq1 > +\item[\ldots] > +\item[N-1] dataqN > +\item[N] controlq > +\end{description} > + > +N is set by \field{max_dataqueues}. > + > +\subsection{Feature bits}\label{sec:Device Types / Crypto Device / Feature > bits} > + > +Undefined currently. Could use "None currently defined." like entropy device. > + > +\subsection{Device configuration layout}\label{sec:Device Types / Crypto > Device / Device configuration layout} > + > +The following driver-read-only configuration fields are defined: > + > +\begin{lstlisting} > +struct virtio_crypto_config { > +le32 status; > +le32 max_dataqueues; > +le32 crypto_services; > +/* Detailed algorithms mask */ > +le32 cipher_algo_l; > +le32 cipher_algo_h; > +le32 hash_algo; > +le32 mac_algo_l; > +le32 mac_algo_h; > +le32 aead_algo; > +/* Maximum length of cipher key */ > +le32 max_cipher_key_len; > +/* Maximum length of authenticated key */ > +le32 max_auth_key_len; > +le32 reserve; > +/* Maximum size of each crypto request's content */ > +le64 max_size; > +}; > +\end{lstlisting} > + > +The value of the \field{status} field is VIRTIO_CRYPTO_S_HW_READY or > VIRTIO_CRYPTO_S_STARTED. > + > +\begin{lstlisting} > +#define VIRTIO_CRYPTO_S_HW_READY (1 << 0) > +#define VIRTIO_CRYPTO_S_STARTED (1 << 1) > +\end{lstlisting} > + Could not really figure out what this status actually does and how does it relate to the device status field if at all. Furthermore I see no mention of VIRTIO_CRYPTO_S_STARTED except for this one, so the only thing I can think of is that it's the initial value and means hardware not ready (you state these are the only two values). This however does not seem consistent with what your QEMU reference implementation does. Another thing is your implementations seem to use VIRTIO_CRYPTO_S_HW_READY as flag but your specification would (prohibit combining flags because you get another value). There are more comments on this topic below. > +The following driver-read-only fields include \field{max_dataqueues}, which > specifies the > +maximum number of data virtqueues (dataq1\ldots dataqN), and > \field{crypto_services}, > +which indicates the crypto services the virtio crypto supports. > + > +The following services are defined: > + > +\begin{lstlisting} > +/* CIPH
Re: [Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext
On 16/11/2016 18:47, Stefan Hajnoczi wrote: > The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file > descriptors or until a timer expires. In cases like virtqueues, Linux > AIO, and ThreadPool it is technically possible to wait for events via > polling (i.e. continuously checking for events without blocking). > > Polling can be faster than blocking syscalls because file descriptors, > the process scheduler, and system calls are bypassed. > > The main disadvantage to polling is that it increases CPU utilization. > In classic polling configuration a full host CPU thread might run at > 100% to respond to events as quickly as possible. This patch implements > a timeout so we fall back to blocking syscalls if polling detects no > activity. After the timeout no CPU cycles are wasted on polling until > the next event loop iteration. > > This patch implements an experimental polling mode that can be > controlled with the QEMU_AIO_POLL_MAX_NS= environment > variable. The aio_poll() event loop function will attempt to poll > instead of using blocking syscalls. > > The run_poll_handlers_begin() and run_poll_handlers_end() trace events > are added to aid performance analysis and troubleshooting. If you need > to know whether polling mode is being used, trace these events to find > out. > > Signed-off-by: Stefan Hajnoczi > --- > aio-posix.c | 107 > +++- > async.c | 11 +- > include/block/aio.h | 3 ++ > trace-events| 4 ++ > 4 files changed, 123 insertions(+), 2 deletions(-) Nice! > diff --git a/aio-posix.c b/aio-posix.c > index 4379c13..5e5a561 100644 > --- a/aio-posix.c > +++ b/aio-posix.c > @@ -18,6 +18,8 @@ > #include "block/block.h" > #include "qemu/queue.h" > #include "qemu/sockets.h" > +#include "qemu/cutils.h" > +#include "trace.h" > #ifdef CONFIG_EPOLL_CREATE1 > #include > #endif > @@ -27,12 +29,16 @@ struct AioHandler > GPollFD pfd; > IOHandler *io_read; > IOHandler *io_write; > +AioPollFn *io_poll; > int deleted; > void *opaque; > bool is_external; > QLIST_ENTRY(AioHandler) node; > }; > > +/* How long to poll AioPollHandlers before monitoring file descriptors */ > +static int64_t aio_poll_max_ns; > + > #ifdef CONFIG_EPOLL_CREATE1 > > /* The fd number threashold to switch to epoll */ > @@ -206,11 +212,12 @@ void aio_set_fd_handler(AioContext *ctx, > AioHandler *node; > bool is_new = false; > bool deleted = false; > +int poll_disable_cnt = 0; poll_disable_cnt = !io_poll - !node->io_poll ? Not the most readable thing, but effective... > node = find_aio_handler(ctx, fd); > > /* Are we deleting the fd handler? */ > -if (!io_read && !io_write) { > +if (!io_read && !io_write && !io_poll) { > if (node == NULL) { > return; > } > @@ -229,6 +236,10 @@ void aio_set_fd_handler(AioContext *ctx, > QLIST_REMOVE(node, node); > deleted = true; > } > + > +if (!node->io_poll) { > +poll_disable_cnt = -1; > +} > } else { > if (node == NULL) { > /* Alloc and insert if it's not already there */ > @@ -238,10 +249,22 @@ void aio_set_fd_handler(AioContext *ctx, > > g_source_add_poll(&ctx->source, &node->pfd); > is_new = true; > + > +if (!io_poll) { > +poll_disable_cnt = 1; > +} > +} else { > +if (!node->io_poll && io_poll) { > +poll_disable_cnt = -1; > +} else if (node->io_poll && !io_poll) { > +poll_disable_cnt = 1; > +} > } > + > /* Update handler with latest information */ > node->io_read = io_read; > node->io_write = io_write; > +node->io_poll = io_poll; > node->opaque = opaque; > node->is_external = is_external; > > @@ -251,6 +274,9 @@ void aio_set_fd_handler(AioContext *ctx, > > aio_epoll_update(ctx, node, is_new); > aio_notify(ctx); > + > +ctx->poll_disable_cnt += poll_disable_cnt; > + > if (deleted) { > g_free(node); > } > @@ -268,6 +294,7 @@ void aio_set_event_notifier(AioContext *ctx, > > bool aio_prepare(AioContext *ctx) > { > +/* TODO run poll handlers? */ > return false; > } > > @@ -402,6 +429,56 @@ static void add_pollfd(AioHandler *node) > npfd++; > } > > +/* run_poll_handlers: > + * @ctx: the AioContext > + * @max_ns: maximum time to poll for, in nanoseconds > + * > + * Polls for a given time. > + * > + * Note that ctx->notify_me must be non-zero so this function can detect > + * aio_notify(). > + * > + * Note that the caller must have incremented ctx->walking_handlers. > + * > + * Returns: true if progress was made, false otherwise > + */ > +static bool run_poll_handlers(AioContext *ctx, int64_t max_ns) > +{ > +bool p
[Qemu-devel] [PATCH 3/3] virtio: set ISR on dataplane notifications
Dataplane has been omitting forever the step of setting ISR when an interrupt is raised. This caused little breakage, because the specification actually says that ISR may not be updated in MSI mode. Some versions of the Windows drivers however didn't clear MSI mode correctly, and proceeded using polling mode (using ISR, not the used ring index!) for crashdump and hibernation. If it were just crashdump and hibernation it would not be a big deal, but recent releases of Windows do not really shut down, but rather log out and hibernate to make the next startup faster. Hence, this manifested as a more serious hang during shutdown with e.g. Windows 8.1 and virtio-win 1.8.0 RPMs. Newer versions fixed this, while older versions do not use MSI at all. The failure has always been there for virtio dataplane, but it became visible after commits 9ffe337 ("virtio-blk: always use dataplane path if ioeventfd is active", 2016-10-30) and ad07cd6 ("virtio-scsi: always use dataplane path if ioeventfd is active", 2016-10-30) made virtio-blk and virtio-scsi always use the dataplane code under KVM. The good news therefore is that it was not a bug in the patches---they were doing exactly what they were meant for, i.e. shake out remaining dataplane bugs. The fix is not hard, so it's worth arranging for the broken drivers. The virtio_should_notify+event_notifier_set pair that is common to virtio-blk and virtio-scsi dataplane is replaced with a new public function virtio_notify_irqfd that also sets ISR. The irqfd emulation code now need not set ISR anymore, so virtio_irq is removed. Signed-off-by: Paolo Bonzini --- hw/block/dataplane/virtio-blk.c | 4 +--- hw/scsi/virtio-scsi-dataplane.c | 7 --- hw/scsi/virtio-scsi.c | 2 +- hw/virtio/trace-events | 2 +- hw/virtio/virtio.c | 20 include/hw/virtio/virtio-scsi.h | 1 - include/hw/virtio/virtio.h | 2 +- 7 files changed, 16 insertions(+), 22 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index 90ef557..d1f9f63 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -68,9 +68,7 @@ static void notify_guest_bh(void *opaque) unsigned i = j + ctzl(bits); VirtQueue *vq = virtio_get_queue(s->vdev, i); -if (virtio_should_notify(s->vdev, vq)) { -event_notifier_set(virtio_queue_get_guest_notifier(vq)); -} +virtio_notify_irqfd(s->vdev, vq); bits &= bits - 1; /* clear right-most bit */ } diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index f2ea29d..6b8d0f0 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -95,13 +95,6 @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n, return 0; } -void virtio_scsi_dataplane_notify(VirtIODevice *vdev, VirtIOSCSIReq *req) -{ -if (virtio_should_notify(vdev, req->vq)) { -event_notifier_set(virtio_queue_get_guest_notifier(req->vq)); -} -} - /* assumes s->ctx held */ static void virtio_scsi_clear_aio(VirtIOSCSI *s) { diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index 3e5ae6a..10fd687 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -69,7 +69,7 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req) qemu_iovec_from_buf(&req->resp_iov, 0, &req->resp, req->resp_size); virtqueue_push(vq, &req->elem, req->qsgl.size + req->resp_iov.size); if (s->dataplane_started && !s->dataplane_fenced) { -virtio_scsi_dataplane_notify(vdev, req); +virtio_notify_irqfd(vdev, vq); } else { virtio_notify(vdev, vq); } diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index 8756cef..7b6f55e 100644 --- a/hw/virtio/trace-events +++ b/hw/virtio/trace-events @@ -5,7 +5,7 @@ virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) " virtqueue_flush(void *vq, unsigned int count) "vq %p count %u" virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) "vq %p elem %p in_num %u out_num %u" virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p" -virtio_irq(void *vq) "vq %p" +virtio_notify_irqfd(void *vdev, void *vq) "vdev %p vq %p" virtio_notify(void *vdev, void *vq) "vdev %p vq %p" virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u" diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index ecf13bd..860ebdb 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -1326,13 +1326,6 @@ static void virtio_set_isr(VirtIODevice *vdev, int value) } } -void virtio_irq(VirtQueue *vq) -{ -trace_virtio_irq(vq); -virtio_set_isr(vq->vdev, 0x1); -virtio_notify_vector(vq->vdev, vq->vector); -} - bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq) { uint16_t old, new; @@ -1356,6 +1349,17 @@ bool virtio_should_notify(VirtIODevice *vdev, VirtQ
[Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes
Patch 1 fixes vhost, patches 2-3 fix Windows hibernation. Paolo v1->v2: more comments [Cornelia] squash syntax error fix from patch 3 into patch 2 [Christian] Paolo Bonzini (3): virtio: introduce grab/release_ioeventfd to fix vhost virtio: access ISR atomically virtio: set ISR on dataplane notifications hw/block/dataplane/virtio-blk.c | 4 +-- hw/scsi/virtio-scsi-dataplane.c | 7 - hw/scsi/virtio-scsi.c | 2 +- hw/virtio/trace-events | 2 +- hw/virtio/vhost.c | 14 +- hw/virtio/virtio-bus.c | 58 + hw/virtio/virtio-mmio.c | 6 ++--- hw/virtio/virtio-pci.c | 9 +++ hw/virtio/virtio.c | 46 +--- include/hw/virtio/virtio-bus.h | 14 ++ include/hw/virtio/virtio-scsi.h | 1 - include/hw/virtio/virtio.h | 4 ++- 12 files changed, 117 insertions(+), 50 deletions(-) -- 2.9.3
[Qemu-devel] [PATCH for-2.9] qmp: Report QOM type name on query-cpu-definitions
The new typename attribute on query-cpu-definitions will be used to help management software use device-list-properties to check which properties can be set using -cpu or -global for the CPU model. Signed-off-by: Eduardo Habkost --- qapi-schema.json| 4 +++- target-arm/helper.c | 1 + target-i386/cpu.c | 1 + target-ppc/translate_init.c | 1 + target-s390x/cpu_models.c | 1 + 5 files changed, 7 insertions(+), 1 deletion(-) diff --git a/qapi-schema.json b/qapi-schema.json index b0b4bf6..9a3bdd4 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -3216,6 +3216,8 @@ # @unavailable-features: #optional List of properties that prevent #the CPU model from running in the current #host. (since 2.8) +# @typename: Type name that can be used as argument to @device-list-properties, +#to introspect properties configurable using -cpu or -global. # # @unavailable-features is a list of QOM property names that # represent CPU model attributes that prevent the CPU from running. @@ -3237,7 +3239,7 @@ ## { 'struct': 'CpuDefinitionInfo', 'data': { 'name': 'str', '*migration-safe': 'bool', 'static': 'bool', -'*unavailable-features': [ 'str' ] } } +'*unavailable-features': [ 'str' ], 'typename': 'str' } } ## # @query-cpu-definitions: diff --git a/target-arm/helper.c b/target-arm/helper.c index b5b65ca..3fc01b5 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -5207,6 +5207,7 @@ static void arm_cpu_add_definition(gpointer data, gpointer user_data) info = g_malloc0(sizeof(*info)); info->name = g_strndup(typename, strlen(typename) - strlen("-" TYPE_ARM_CPU)); +info->q_typename = g_strdup(typename); entry = g_malloc0(sizeof(*entry)); entry->value = info; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 6eec5dc..725f6cb 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -2239,6 +2239,7 @@ static void x86_cpu_definition_entry(gpointer data, gpointer user_data) info->name = x86_cpu_class_get_model_name(cc); x86_cpu_class_check_missing_features(cc, &info->unavailable_features); info->has_unavailable_features = true; +info->q_typename = g_strdup(object_class_get_name(oc)); entry = g_malloc0(sizeof(*entry)); entry->value = info; diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c index 208fa1e..42b9274 100644 --- a/target-ppc/translate_init.c +++ b/target-ppc/translate_init.c @@ -10305,6 +10305,7 @@ CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp) info = g_malloc0(sizeof(*info)); info->name = g_strdup(alias->alias); +info->q_typename = g_strdup(object_class_get_name(oc)); entry = g_malloc0(sizeof(*entry)); entry->value = info; diff --git a/target-s390x/cpu_models.c b/target-s390x/cpu_models.c index c1e729d..5b66d33 100644 --- a/target-s390x/cpu_models.c +++ b/target-s390x/cpu_models.c @@ -290,6 +290,7 @@ static void create_cpu_model_list(ObjectClass *klass, void *opaque) info->has_migration_safe = true; info->migration_safe = scc->is_migration_safe; info->q_static = scc->is_static; +info->q_typename = g_strdup(object_class_get_name(klass)); entry = g_malloc0(sizeof(*entry)); -- 2.7.4
Re: [Qemu-devel] [PATCH v2 2/4] aio: add polling mode to AioContext
On 16/11/2016 18:47, Stefan Hajnoczi wrote: > +if (max_ns && run_poll_handlers(ctx, max_ns)) { > +atomic_sub(&ctx->notify_me, 2); > +blocking = false; /* poll again, don't block */ You don't need to poll---you only need to run bottom halves and timers. Paolo > +progress = true; > +} > +}
Re: [Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode
Hi, Your series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Type: series Subject: [Qemu-devel] [PATCH v2 0/4] aio: experimental virtio-blk polling mode Message-id: 1479318422-10979-1-git-send-email-stefa...@redhat.com === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=16 make docker-test-quick@centos6 make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 7f175bc linux-aio: poll ring for completions 937de16 virtio: poll virtqueues for new buffers 3d0f4c1 aio: add polling mode to AioContext 3e75e2a aio: add AioPollFn and io_poll() interface === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into 'dtc'... Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf' BUILD centos6 make[1]: Entering directory `/var/tmp/patchew-tester-tmp-r21_4ojm/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch gcc-4.4.7-17.el6.x86_64 git-1.7.1-4.el6_7.1.x86_64 glib2-devel-2.28.8-5.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=1956184f8abf TERM=xterm MAKEFLAGS= -j16 HISTSIZE=1000 J=16 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install No C++ compiler available; disabling C++ specific optional code Install prefix/var/tmp/qemu-build/install BIOS directory/var/tmp/qemu-build/install/share/qemu binary directory /var/tmp/qemu-build/install/bin library directory /var/tmp/qemu-build/install/lib module directory /var/tmp/qemu-build/install/lib/qemu libexec directory /var/tmp/qemu-build/install/libexec include directory /var/tmp/qemu-build/install/include config directory /var/tmp/qemu-build/install/etc local state directory /var/tmp/qemu-build/install/var Manual directory /var/tmp/qemu-build/install/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /tmp/qemu-test/src C compilercc Host C compiler cc C++ compiler Objective-C compiler cc ARFLAGS rv CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g make make install install pythonpython -B smbd /usr/sbin/smbd module supportno host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu tcg debug enabled no gprof enabled no sparse enabledno strip binariesyes profiler no static build no pixmansystem SDL support yes (1.2.14) GTK support no GTK GL supportno VTE support no TLS priority NORMAL GNUTLS supportno GNUTLS rndno libgcrypt no libgcrypt kdf no nettleno nettle kdfno libtasn1 no curses supportno virgl support no curl support no mingw32 support no Audio drivers oss Block whitelist (rw) Block whitelist (ro) VirtFS supportno VNC support yes VNC SASL support no VNC JPEG support no VNC PNG support no xen support no brlapi supportno bluez supportno Documentation no PIE yes vde support no netmap supportno Linux AIO support no ATTR/XATTR support yes Install blobs yes KVM support yes COLO support yes RDMA support no TCG interpreter no fdt suppor
Re: [Qemu-devel] [PATCH for-2.9] qmp: Report QOM type name on query-cpu-definitions
On 11/16/2016 12:21 PM, Eduardo Habkost wrote: > The new typename attribute on query-cpu-definitions will be used > to help management software use device-list-properties to check > which properties can be set using -cpu or -global for the CPU > model. > > Signed-off-by: Eduardo Habkost > --- > qapi-schema.json| 4 +++- > target-arm/helper.c | 1 + > target-i386/cpu.c | 1 + > target-ppc/translate_init.c | 1 + > target-s390x/cpu_models.c | 1 + > 5 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/qapi-schema.json b/qapi-schema.json > index b0b4bf6..9a3bdd4 100644 > --- a/qapi-schema.json > +++ b/qapi-schema.json > @@ -3216,6 +3216,8 @@ > # @unavailable-features: #optional List of properties that prevent > #the CPU model from running in the current > #host. (since 2.8) > +# @typename: Type name that can be used as argument to > @device-list-properties, > +#to introspect properties configurable using -cpu or -global. Missing a '(since 2.9)' designation. > # > # @unavailable-features is a list of QOM property names that > # represent CPU model attributes that prevent the CPU from running. > @@ -3237,7 +3239,7 @@ > ## > { 'struct': 'CpuDefinitionInfo', >'data': { 'name': 'str', '*migration-safe': 'bool', 'static': 'bool', > -'*unavailable-features': [ 'str' ] } } > +'*unavailable-features': [ 'str' ], 'typename': 'str' } } > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2] hw/isa/lpc_ich9: inject SMI on all VCPUs if APM_STS == 'Q'
On 11/16/16 19:04, Paolo Bonzini wrote: >> I guess that's what the next paragraph is about: >> >>> - we could have another magic 0xB2 value, which is implemented directly >>> in QEMU and sets 0xB3 to a magic value. Then OVMF can invoke it >>> after SMBASE relocation and SMM IPL (so as not to crash on old QEMUs) >>> to detect the new feature. It can fail to start if using traditional >>> AP and the new feature is not there. >> >> Please explain in more detail. If I write to 0xB2 (by invoking the >> Trigger() method or somehow else), then on old QEMU's that will raise a >> sync / unicast SMI. The SMI handler in edk2 will run, but no request >> parameters will have been set up by OVMF, so the SMI handler will do... >> no clue what. > > It should hopefully do nothing. A spurious SMI (such as the one caused > by the write to 0xB2) should not crash OVMF. > > SMBASE relocation uses IPIs, so my hope was to use the > SmmCpuFeaturesSmmRelocationComplete hook. >From a cursory look, SmmCpuFeaturesSmmRelocationComplete() seems to be called early enough from PiSmmCpuDxeSmm that we might be able to call PcdSet() from it, for updating PcdCpuSmmApSyncTimeout and PcdCpuSmmSyncMode. I perceive it a bit too close to the edge :) >> My preference is fw_cfg ATM. It provides a prove, flexible and >> extensible interface (it's easy to add new files for future features). >> If we expect more knobs in the area, I can modify my proposal to use >> "etc/smi/broadcast", so we can add "etc/smi/" later. > > Did you know there are 16 entries only for fw_cfg files? :) Yes, I've known that, but it can be changed by redefining FW_CFG_FILE_SLOTS, can't it? The key type for fw_cfg is uint16_t, so we should have some reserves. > And we're > using already 20 in the worst case: > > genroms/linuxboot.bin > genroms/kvmvapic.bin > NVDIMM_DSM_MEM_FILE > "etc/smbios/smbios-tables" > "etc/smbios/smbios-anchor" > "etc/acpi/tables" > "etc/table-loader" > ACPI_BUILD_TPMLOG_FILE > ACPI_BUILD_RSDP_FILE > "etc/e820" > "etc/msr_feature_control" > "etc/reserved-memory-end" > "etc/pvpanic-port" > "etc/boot-menu-wait" > "bootsplash.jpg" > "etc/boot-fail-wait" > "etc/igd-opregion" > "etc/igd-bdsm-size" > "etc/extra-pci-roots" > "bootorder" > > Therefore, so close to the release I'm a bit worried about doing > changes to fw_cfg or adding more fw_cfg files. Though we just got > rid of one file for the number of CPUs, so I guess we might not care. I agree with your caution about this. I'm also perfectly fine if this update misses 2.8. :) > >> Do you have any specific arguments against fw_cfg? As I suggested in my >> previous email, with fw_cfg I can implement the change in OVMF such that >> the default behavior wouldn't change -- the default delivery would >> remain relaxed, and the broadcast wouldn't be requested, unless the >> fw_cfg file told OVMF otherwise. >> >>> By the way, in case OVMF needs to use SmmSwDispatch in the future, I >>> would make QEMU use broadcast behavior for all values in the 0x10-0xff >>> range, or something like that. >> >> Are we talking control/command (0xB2) or scratch/data (0xB3) register >> values? My patches currently use the scratch/data register to provide >> the hint to QEMU; that register is less likely to interfere with >> anything the SMM core in edk2 does. > > Sorry I confused the two registers. 0xb3 is more or less unused as far > as I can see indeed. Thanks Laszlo
Re: [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes
Hi, Your series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Type: series Subject: [Qemu-devel] [PATCH v2 for-2.8 0/3] virtio fixes Message-id: 20161116180551.9611-1-pbonz...@redhat.com === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=16 make docker-test-quick@centos6 make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 4476079 virtio: set ISR on dataplane notifications f45efd4 virtio: access ISR atomically 9fd4e4a virtio: introduce grab/release_ioeventfd to fix vhost === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into 'dtc'... Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf' BUILD centos6 make[1]: Entering directory `/var/tmp/patchew-tester-tmp-5tzxa5rp/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch gcc-4.4.7-17.el6.x86_64 git-1.7.1-4.el6_7.1.x86_64 glib2-devel-2.28.8-5.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=4fce3ac805f5 TERM=xterm MAKEFLAGS= -j16 HISTSIZE=1000 J=16 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install No C++ compiler available; disabling C++ specific optional code Install prefix/var/tmp/qemu-build/install BIOS directory/var/tmp/qemu-build/install/share/qemu binary directory /var/tmp/qemu-build/install/bin library directory /var/tmp/qemu-build/install/lib module directory /var/tmp/qemu-build/install/lib/qemu libexec directory /var/tmp/qemu-build/install/libexec include directory /var/tmp/qemu-build/install/include config directory /var/tmp/qemu-build/install/etc local state directory /var/tmp/qemu-build/install/var Manual directory /var/tmp/qemu-build/install/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /tmp/qemu-test/src C compilercc Host C compiler cc C++ compiler Objective-C compiler cc ARFLAGS rv CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wmissing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g make make install install pythonpython -B smbd /usr/sbin/smbd module supportno host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu tcg debug enabled no gprof enabled no sparse enabledno strip binariesyes profiler no static build no pixmansystem SDL support yes (1.2.14) GTK support no GTK GL supportno VTE support no TLS priority NORMAL GNUTLS supportno GNUTLS rndno libgcrypt no libgcrypt kdf no nettleno nettle kdfno libtasn1 no curses supportno virgl support no curl support no mingw32 support no Audio drivers oss Block whitelist (rw) Block whitelist (ro) VirtFS supportno VNC support yes VNC SASL support no VNC JPEG support no VNC PNG support no xen support no brlapi supportno bluez supportno Documentation no PIE yes vde support no netmap supportno Linux AIO support no ATTR/XATTR support yes Install blobs yes KVM support yes COLO support yes RDMA support no TCG interpreter no fdt support yes preadv supportyes fdatasync yes madvise
[Qemu-devel] [PATCH for-2.9 0/2] qom, qdev: Cleanup release functions
While working on the qdev class properteis series, I've noticed that the release function for class properties is never called, and have unclear semantics (should it be called when the object is destroyed, or when the class is destroyed?). Patch 1/1 removes the unused feature. Patch 2/2 changes the function signature of qdev property release functions to make their implementations simpler and safer, and make them not depend on the way property release functions are implemented (so the functions don't need to be rewritten if we change qdev to use class properties). Eduardo Habkost (2): qom: Remove release function from class properties qdev: Change signature of PropertyInfo::release backends/hostmem.c | 4 ++-- hw/core/machine.c| 6 +++--- hw/core/qdev-properties-system.c | 8 ++-- hw/core/qdev-properties.c| 10 +- hw/core/qdev.c | 10 +- hw/i386/pc.c | 8 hw/ppc/pnv.c | 2 +- include/hw/qdev-core.h | 2 +- include/qom/object.h | 1 - qom/object.c | 14 -- 10 files changed, 31 insertions(+), 34 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH for-2.9 1/2] qom: Remove release function from class properties
The release functions are never called for class properties, and their semantics aren't even defined clearly (should the release function be called when an instance is destroyed, or when a class is destroyed?). Remove the unused functionality. Signed-off-by: Eduardo Habkost --- backends/hostmem.c | 4 ++-- hw/core/machine.c| 6 +++--- hw/i386/pc.c | 8 hw/ppc/pnv.c | 2 +- include/qom/object.h | 1 - qom/object.c | 14 -- 6 files changed, 14 insertions(+), 21 deletions(-) diff --git a/backends/hostmem.c b/backends/hostmem.c index 4256d24..856e96e 100644 --- a/backends/hostmem.c +++ b/backends/hostmem.c @@ -368,11 +368,11 @@ host_memory_backend_class_init(ObjectClass *oc, void *data) object_class_property_add(oc, "size", "int", host_memory_backend_get_size, host_memory_backend_set_size, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_add(oc, "host-nodes", "int", host_memory_backend_get_host_nodes, host_memory_backend_set_host_nodes, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_add_enum(oc, "policy", "HostMemPolicy", HostMemPolicy_lookup, host_memory_backend_get_policy, diff --git a/hw/core/machine.c b/hw/core/machine.c index b0fd91f..c64e5f1 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -372,13 +372,13 @@ static void machine_class_init(ObjectClass *oc, void *data) object_class_property_add(oc, "kernel-irqchip", "OnOffSplit", NULL, machine_set_kernel_irqchip, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, "kernel-irqchip", "Configure KVM in-kernel irqchip", &error_abort); object_class_property_add(oc, "kvm-shadow-mem", "int", machine_get_kvm_shadow_mem, machine_set_kvm_shadow_mem, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, "kvm-shadow-mem", "KVM shadow MMU size", &error_abort); @@ -409,7 +409,7 @@ static void machine_class_init(ObjectClass *oc, void *data) object_class_property_add(oc, "phandle-start", "int", machine_get_phandle_start, machine_set_phandle_start, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, "phandle-start", "The first phandle ID we may generate dynamically", &error_abort); diff --git a/hw/i386/pc.c b/hw/i386/pc.c index a9b1950..46f95bf 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -2308,24 +2308,24 @@ static void pc_machine_class_init(ObjectClass *oc, void *data) object_class_property_add(oc, PC_MACHINE_MEMHP_REGION_SIZE, "int", pc_machine_get_hotplug_memory_region_size, NULL, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_add(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "size", pc_machine_get_max_ram_below_4g, pc_machine_set_max_ram_below_4g, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, PC_MACHINE_MAX_RAM_BELOW_4G, "Maximum ram below the 4G boundary (32bit boundary)", &error_abort); object_class_property_add(oc, PC_MACHINE_SMM, "OnOffAuto", pc_machine_get_smm, pc_machine_set_smm, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, PC_MACHINE_SMM, "Enable SMM (pc & q35)", &error_abort); object_class_property_add(oc, PC_MACHINE_VMPORT, "OnOffAuto", pc_machine_get_vmport, pc_machine_set_vmport, -NULL, NULL, &error_abort); +NULL, &error_abort); object_class_property_set_description(oc, PC_MACHINE_VMPORT, "Enable vmport (pc & q35)", &error_abort); diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c index 9df7b25..3fb68c3 100644 --- a/hw/ppc/pnv.c +++ b/hw/ppc/pnv.c @@ -777,7 +777,7 @@ static void powernv_machine_class_props_init(ObjectClass *oc) { object_class_property_add(oc, "num-chips", "uint32_t", pnv_get_num_chips, pnv_set_num_chips, - NULL, NULL, NULL); + NULL, NULL); object_class_property_set_description(oc, "num-chips", "Specifies the number of processor chips", NULL); diff --git a/include/qom/object.h b/include/qom/object.h index 5ecc2d1..fbf9df2 100644 --- a/include/qom/object.h +++ b/include/qom/object.h @@ -945,7 +945,6 @@ ObjectProperty *object_class_property_add(ObjectClass *klass, const char *name, const char *type, ObjectPropertyAccessor *get, ObjectPropertyAccessor *set, - ObjectPropertyR
[Qemu-devel] [PATCH for-2.9 2/2] qdev: Change signature of PropertyInfo::release
Change the function signature to make implementations simpler and safer. No void pointers and Object->DeviceState casts inside each release function. Signed-off-by: Eduardo Habkost --- hw/core/qdev-properties-system.c | 8 ++-- hw/core/qdev-properties.c| 10 +- hw/core/qdev.c | 10 +- include/hw/qdev-core.h | 2 +- 4 files changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index 1b7ea50..4f49109 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -112,10 +112,8 @@ fail: } } -static void release_drive(Object *obj, const char *name, void *opaque) +static void release_drive(DeviceState *dev, Property *prop) { -DeviceState *dev = DEVICE(obj); -Property *prop = opaque; BlockBackend **ptr = qdev_get_prop_ptr(dev, prop); if (*ptr) { @@ -210,10 +208,8 @@ static void set_chr(Object *obj, Visitor *v, const char *name, void *opaque, g_free(str); } -static void release_chr(Object *obj, const char *name, void *opaque) +static void release_chr(DeviceState *dev, Property *prop) { -DeviceState *dev = DEVICE(obj); -Property *prop = opaque; CharBackend *be = qdev_get_prop_ptr(dev, prop); qemu_chr_fe_deinit(be); diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c index 2a82768..3709050 100644 --- a/hw/core/qdev-properties.c +++ b/hw/core/qdev-properties.c @@ -383,10 +383,9 @@ PropertyInfo qdev_prop_uint64 = { /* --- string --- */ -static void release_string(Object *obj, const char *name, void *opaque) +static void release_string(DeviceState *dev, Property *prop) { -Property *prop = opaque; -g_free(*(char **)qdev_get_prop_ptr(DEVICE(obj), prop)); +g_free(*(char **)qdev_get_prop_ptr(dev, prop)); } static void get_string(Object *obj, Visitor *v, const char *name, @@ -823,7 +822,7 @@ PropertyInfo qdev_prop_pci_host_devaddr = { typedef struct { struct Property prop; char *propname; -ObjectPropertyRelease *release; +void (*release)(DeviceState *dev, Property *prop); } ArrayElementProperty; /* object property release callback for array element properties: @@ -832,9 +831,10 @@ typedef struct { */ static void array_element_release(Object *obj, const char *name, void *opaque) { +DeviceState *dev = DEVICE(obj); ArrayElementProperty *p = opaque; if (p->release) { -p->release(obj, name, opaque); +p->release(dev, &p->prop); } g_free(p->propname); g_free(p); diff --git a/hw/core/qdev.c b/hw/core/qdev.c index 5783442..b859e15 100644 --- a/hw/core/qdev.c +++ b/hw/core/qdev.c @@ -774,6 +774,14 @@ static void qdev_property_add_legacy(DeviceState *dev, Property *prop, g_free(name); } +static void qdev_release_prop(Object *obj, const char *name, void *opaque) +{ +DeviceState *dev = DEVICE(obj); +Property *prop = opaque; + +prop->info->release(dev, prop); +} + /** * qdev_property_add_static: * @dev: Device to add the property to. @@ -801,7 +809,7 @@ void qdev_property_add_static(DeviceState *dev, Property *prop, object_property_add(obj, prop->name, prop->info->name, prop->info->get, prop->info->set, -prop->info->release, +prop->info->release ? qdev_release_prop : NULL, prop, &local_err); if (local_err) { diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index 2c97347..5ea2095 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -251,7 +251,7 @@ struct PropertyInfo { int (*print)(DeviceState *dev, Property *prop, char *dest, size_t len); ObjectPropertyAccessor *get; ObjectPropertyAccessor *set; -ObjectPropertyRelease *release; +void (*release)(DeviceState *dev, Property *prop); }; /** -- 2.7.4
Re: [Qemu-devel] [PATCH v6 0/3] IOMMU: intel_iommu support map and unmap notifications
On Thu, Nov 10, 2016 at 9:20 PM, Michael S. Tsirkin wrote: > On Thu, Nov 10, 2016 at 09:04:13AM -0700, Alex Williamson wrote: > > On Thu, 10 Nov 2016 17:54:35 +0200 > > "Michael S. Tsirkin" wrote: > > > > > On Thu, Nov 10, 2016 at 08:30:21AM -0700, Alex Williamson wrote: > > > > On Thu, 10 Nov 2016 17:14:24 +0200 > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Tue, Nov 08, 2016 at 01:04:21PM +0200, Aviv B.D wrote: > > > > > > From: "Aviv Ben-David" > > > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register. > > > > > > This capability is controlled by "cache-mode" property of > intel-iommu device. > > > > > > To enable this option call QEMU with "-device > intel-iommu,cache-mode=true". > > > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the > domain belong to > > > > > > registered notifier, and notify accordingly. > > > > > > > > > > This looks sane I think. Alex, care to comment? > > > > > Merging will have to wait until after the release. > > > > > Pls remember to re-test and re-ping then. > > > > > > > > I don't think it's suitable for upstream until there's a reasonable > > > > replay mechanism > > > > > > Could you pls clarify what do you mean by replay? > > > Is this when you attach a device by hotplug to > > > a running system? > > > > > > If yes this can maybe be addressed by disabling hotplug temporarily. > > > > No, hotplug is not required, moving a device between existing domains > > requires replay, ie. actually using it for nested device assignment. > > Good point, that one is a correctness thing. Aviv, > could you add this in TODO list in a cover letter pls? > Sure, no problem. > > > > > and we straighten out whether it's expected to get > > > > multiple notifies and the notif-ee is responsible for filtering > > > > them or if the notif-er should do filtering. > > > > > > OK this is a documentation thing. > > > > Well no, it needs to be decided and if necessary implemented. > > Let's assume it's the notif-ee for now. Less is more and all that. > > > > > Without those, this is > > > > effectively just an RFC. > > > > > > It's infrastructure without users so it doesn't break things, > > > I'm more interested in seeing whether it's broken in > > > some way than whether it's complete. > > > > If it allows use with vfio but doesn't fully implement the complete set > > of interfaces, it does break things. We currently prevent viommu usage > > with vfio because it is incomplete. > > Right - that bit is still in as far as I can see. > > > > The patchset spent out of tree too long and I'd like to see > > > us make progress towards device assignment working with > > > vIOMMU sooner rather than later, so if it's broken I won't > > > merge it but if it's incomplete I will. > > > > So long as it's incomplete and still prevents vfio usage, I'm ok with > > merging it, but I don't want to enable vfio usage until it's complete. > > Thanks, > > > > Alex > > > > > > > > Currently this patch still doesn't enabling VFIO devices support > with vIOMMU > > > > > > present. Current problems: > > > > > > * vfio_iommu_map_notify is not aware about memory range belong > to specific > > > > > > VFIOGuestIOMMU. > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it > itterate over > > > > > > 64bit address space. Commenting out the call to this function > enables > > > > > > workable VFIO device while vIOMMU present. > > > > > > * vfio_iommu_map_notify should check if address space range is > suitable for > > > > > > current notifier. > > > > > > > > > > > > Changes from v1 to v2: > > > > > > * remove assumption that the cache do not clears > > > > > > * fix lockup on high load. > > > > > > > > > > > > Changes from v2 to v3: > > > > > > * remove debug leftovers > > > > > > * split to sepearate commits > > > > > > * change is_write to flags in vtd_do_iommu_translate, add > IOMMU_NO_FAIL > > > > > > to suppress error propagating to guest. > > > > > > > > > > > > Changes from v3 to v4: > > > > > > * Add property to intel_iommu device to control the CM > capability, > > > > > > default to False. > > > > > > * Use s->iommu_ops.notify_flag_changed to register notifiers. > > > > > > > > > > > > Changes from v4 to v4 RESEND: > > > > > > * Fix codding style pointed by checkpatch.pl script. > > > > > > > > > > > > Changes from v4 to v5: > > > > > > * Reduce the number of changes in patch 2 and make flags real > bitfield. > > > > > > * Revert deleted debug prints. > > > > > > * Fix memory leak in patch 3. > > > > > > > > > > > > Changes from v5 to v6: > > > > > > * fix prototype of iommu_translate function for more IOMMU types. > > > > > > * VFIO will be notified only on the difference, without unmap > > > > > > before change to maps. > > > > > > > > > > > > Aviv Ben-David (3): > > > > > > IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility > exposoed to > > > > > > guest > > > > > > IOMMU: change iommu_op->transla